Recent comments in /f/deeplearning
amado88 t1_iynennc wrote
Bonus points for rhyming 'precise' with 'suffice' - that's some Eminem-type s**t right there.
trajo123 t1_iymuivu wrote
Reply to Doubt regarding activation functions by Santhosh999
To answer you question concretely: in classification you want your model output to reflect a probability distribution over the classes. If you have only 2 classes this can be achieved with 1 output unit producing values ranging from 0 to 1. If you have more than 2 classes then you need 1 unit per class so that each one produces a value in the (0,1) range and also that the sum of all units adds up to 1 to pass as a probability distribution. In case of 1 output unit the sigmoid function ensures that the output is 0,1 and in case of multiple output units softmax ensures the conditions mentioned above. Now, in practice, classification models don't use an explicit activation function after the last layer, instead the loss incorporates the appropriate activation function due to efficiency and numerical stability reasons. So in case of binary classification you have two equivalent options:
- use 1 output unit with torch.nn.BCEWithLogitsLoss
>This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.
- use 2 output units with torch.nn.CrossEntropyLoss
>This criterion computes the cross entropy loss between input logits and target
Both of these approaches are mathematically equivalent and should produce the same results up to numerical considerations. If you get wildly different predictions, it means you did something wrong.
On another note, using accuracy when looking at credit card fraud detection is not a good idea because the dataset is most likely highly unbalanced. Probably more than 99% of the data samples are labelled as "not fraud". In this case, having a stupid model always produce "not fraud" regardless of input will already give you 99% accuracy. You may want to look into metrics for unbalanced datasets, e.g. F1 score, false positive rate, false negative rate, etc.
Have fun on your (deep) learning journey!
normie1990 t1_iyma2hh wrote
Reply to comment by suflaj in Will I ever need more than 24GB VRAM to train models like Detectron2 and YOLOv5? by [deleted]
>Be as it be, using Pytorch itself, NVLink gets you less than 5% gains. Obviously not worth compared to 30-90% gains from a 4090.
Thanks, I think I have my answer.
Obviously I'm new to ML and didn't understand everything that you tried to explain (which I appreciate). I know that much - I will be freezing layers when fine-tuning, so from your earlier comment I guess I won't need more than 24GB.
suflaj t1_iym94jr wrote
Reply to comment by normie1990 in Will I ever need more than 24GB VRAM to train models like Detectron2 and YOLOv5? by [deleted]
> I probably should have specified that I'll do fine tuning, not training from scratch, if that makes any difference.
Unless you're freezing layers, it doesn't.
> I know it's a software feature, AFAIK pytorch supports it, right?
No. PyTorch supports Data Parallelism. To get pooling in its full meaning, you need Model Parallelism, for which you'd have to write your own multi-GPU layers and a load balancing heuristic.
Be as it be, using Pytorch itself, NVLink gets you less than 5% gains. Obviously not worth compared to 30-90% gains from a 4090. You need stuff like Apex to see visible improvements, but they do not compare to generational leaps, nor do they parallelize the model (you still have to do it yourself). Apex' data parallelism is similar to PyTorches anyways.
Once you parallelize your model, however, you're bound to be bottlenecked by bandwidth. This is the reason it's not done more often, as it makes sense only if the model itself is very large, yet its gradients fit in pooled memory. NVLink provides only 300 GB/s of bandwidth in the best case scenario, amounting to roughly 30% performance gains in bandwidth bottlenecked tasks in the best case.
normie1990 t1_iym8yq4 wrote
Reply to comment by suflaj in Will I ever need more than 24GB VRAM to train models like Detectron2 and YOLOv5? by [deleted]
I probably should have specified that I'll do fine tuning, not training from scratch, if that makes any difference.
​
>Memory pools are a software feature.
I know it's a software feature, AFAIK pytorch supports it, right?
democracyab OP t1_iym8uul wrote
Reply to comment by incrediblediy in RTX 2060 or RTX 3050 by democracyab
Thanks for your advice. I bought a brand new 3060 for 380$
suflaj t1_iym8sa5 wrote
Reply to comment by normie1990 in Will I ever need more than 24GB VRAM to train models like Detectron2 and YOLOv5? by [deleted]
NVLink itself does not pool memory. It just increases bandwidth. Memory pools are a software feature, partially made easier by NVLink.
> Could you elaborate?
Those model are trained with batch sizes that are too large to fit on any commercial GPU, meaning you will have to accumulate them either way.
normie1990 t1_iym8leb wrote
Reply to comment by suflaj in Will I ever need more than 24GB VRAM to train models like Detectron2 and YOLOv5? by [deleted]
I thought memory pooling was the whole point of NVLink?
​
>Tbose models already require more than 24GB RAM if you do not accumulate your gradients
Could you elaborate?
91o291o t1_iym8f9l wrote
suflaj t1_iym8e85 wrote
Reply to Will I ever need more than 24GB VRAM to train models like Detectron2 and YOLOv5? by [deleted]
NVLink will not pool your memory. You were already told this in your previous post.
Tbose models already require more than 24GB RAM if you do not accumulate your gradients, and it's unlikely they'll need more than 24 GB per batch even for their auccessors. 4090s will be faster, obviously.
CauseSigns t1_iym7va9 wrote
Reply to comment by jiminiminimini in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
Robo tunes that I produce, I might post on youtube
Make the networks look like noobs, I’m laying down the ground truth
Never make the right moves, they should play a game or two
I’m reinforcing their mistakes, laughin at em with my crew
sdmat t1_iym5spa wrote
Reply to comment by trajo123 in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
Schmidhuber wrote verse first but came off worst
trajo123 t1_iylxuu0 wrote
Come on, I can't believe that Schmidhuber wasn't picked as one of the "combatants"!
Santhosh999 OP t1_iylsz52 wrote
Reply to comment by suflaj in Doubt regarding activation functions by Santhosh999
Thanks for clearing my doubt. It is working now.
ApplicationBoth1829 t1_iylqr22 wrote
Reply to comment by mr_birrd in RTX 2060 or RTX 3050 by democracyab
fact
mr_birrd t1_iyln0js wrote
Reply to comment by ApplicationBoth1829 in RTX 2060 or RTX 3050 by democracyab
Just realised no mixed precision on a p40 so there you go..
jiminiminimini t1_iylmdfi wrote
Reply to comment by CauseSigns in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
You should do a rap battle with AI and produce it.
ApplicationBoth1829 t1_iylijsp wrote
Reply to comment by mr_birrd in RTX 2060 or RTX 3050 by democracyab
In China, electronic components are much cheaper, so creazy thing always happens.
I just modified my 2080ti to 22g and it worked just fine.
ApplicationBoth1829 t1_iylhuux wrote
Reply to comment by mr_birrd in RTX 2060 or RTX 3050 by democracyab
As far as i know, yes. At least in China. You can get p40 for about 130$.
Dont worry, its not fake, it just come from old server retired by the company.
Nerveregenerator t1_iyldcaz wrote
Reply to RTX 2060 or RTX 3050 by democracyab
1080ti
CauseSigns t1_iyl356u wrote
Reply to comment by hayAbhay in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
Passed the turing test, I’m just tryna do my best
Squeezin out my brain to give the bars a little zest
So apologetic if you say you aren’t impressed
Traversing thru my landscape, I’m not even at my crest
hayAbhay OP t1_iyl2bo9 wrote
Reply to comment by Superschlenz in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
Guessing he'll have to bend some syllables!
hayAbhay OP t1_iyl2a3n wrote
Reply to comment by CauseSigns in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
I can't tell if GPT3 wrote this or a human :/
CauseSigns t1_iyl1tw9 wrote
Intelligent poets, computational beings
Spittin fresh rhymes, are they being or just seeming?
Human flows can come and go, but training set is all it knows
Yann’s bars got me yawning, so stale in silico
computing_professor t1_iynllw4 wrote
Reply to GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
I'm far from an expert but remember the 4090s are powerful but won't pool memory. I'm actually looking into a lighter setup than you with either an A6000 or, more likely, 2x 3090s with nvlink so I can get access to 48GB of vRAM. While the 4090 is much faster, you won't have access to as much vRAM. But if you can make do with 24GB and/or can parallelize your model, 2x 4090s would be awesome.
edit: Just re-read your post and I see I missed you mention parallelizing already. Still, if you can manage, 2x 4090 seems incredibly fast. I would do that if it was me, but I don't care much about computer vision.