Recent comments in /f/deeplearning
Appropriate_Ant_4629 t1_j8h5l44 wrote
Reply to comment by artsybashev in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
> Keeping your rig at 100% utilization for 3 years might be hard if you plan to have holidays.
With his ask, he probably has jobs big enough they'll run through the holidays.
Long_Two_6176 t1_j8h28vw wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
So today I found out that mps on PyTorch 13.1 (stable) has bugs causing a lack of learning. FashionMNIST accuracies bounced around in <10%. Switched to cpu and worked fine (>80%)
FastestLearner t1_j8gmxcw wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
Read this:
My recommendation: (1) Abandon macOS and get a laptop with an Nvidia GPU. Or (2) If you don’t like working on Linux/Windows and prefer Macs, then get a cheap MBA and an laptop with an Nvidia GPU. Use the Mac for coding but run the codes over ssh on the Nvidia laptop. The combined price would not exceed that of a specced out MacBook Pro, while the perf benefit would be more than 10x. Or (3) If you want to both code and run your code on a Mac and also don’t want to carry two laptops, then get the highest specced MacBook Pro possible. Neural network training is computationally very expensive. Normally we run our neural networks in our lab servers that contain anywhere between 4 to 64 GPUs. Even the highest end M2 Maxes are nothing to an RTX 4090.
BellyDancerUrgot t1_j8gmune wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
PyTorch MPS is buggy. Even with the stable build. Something with cuda is far better imo. Personally I use a mbp 14’ with the M1 Pro base model for literally everything and then I have a desktop (had one cuz I play games, just upgraded the gpu to a cheap 3090 I found online, works like a charm for 99% of work loads when it comes to training something.
For the 1% when I do not have enough compute I use my universities cluster or compute Canada for distributed training.
lambda_matt t1_j8facir wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
Short answer is, it’s complicated. Some workloads can handle being distributed across slower memory busses.
Frameworks have also implemented strategies for doing single node distributed training https://pytorch.org/tutorials/beginner/dist_overview.html
lambda_matt t1_j8estga wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
That’s a server. The DGX station was a downclocked v/a100 based workstation
https://images.nvidia.com/aem-dam/Solutions/Data-Center/nvidia-dgx-station-a100-infographic.pdf
BellyDancerUrgot t1_j8e6rq6 wrote
Top reply presents it well. Also, I think Jeff Heaton might make a video on the rtx6000 since he just posted an unboxing recently. Might want to check that out incase he talks about it in details.
artsybashev t1_j8e2dmj wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
I understand that you have given up hope for Cloud. Just so you understand the options, $50k gives you about 1000 days of 4x A100 from vast.ai with todays pricing. Since in 3 years there is going to be at least one new generation, you will probably get more like 6 years of 4x A100 or one year of 4x A100 + 1 year of 4x H100. Keeping your rig at 100% utilization for 3 years might be hard if you plan to have holidays.
CKtalon t1_j8e0j48 wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
You’ll have to use something like DeepSpeed to split the layers across multiple GPUs. Of course, if the model can fit on one GPU, then you can go to crazier with bigger batch sizes
N3urAlgorithm OP t1_j8drfq9 wrote
Reply to comment by lambda_matt in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
you said that nvidia has killed of the dgx workstation but as I can see from here there's still something for h100?
N3urAlgorithm OP t1_j8dqtv4 wrote
Reply to comment by CKtalon in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
Thank you, the TMA is actually a big deal to speed up things but as far as i've found even if the 4x rtx has more vram it can't be used for memory pooling. But basically if i'm not wrong even with this limitation I can still distribute training along the 4 gpus but still for a maximum of 48gb.
N3urAlgorithm OP t1_j8dokrh wrote
Reply to comment by lambda_matt in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
So basically rtx6k does not support shared memory and so the stack of ada rtx will only be useful to accelerate things, isn't it?
For the h100 instead is it possible to do something like that?
Is the price difference of 7.5k for the 6000 wrt 30k for the h100 legit?
CKtalon t1_j8dbtpk wrote
RTX 6000 Ada has no NVLink. Speedwise, 2x RTX 6000 Ada should be ~ 1x H100 based on last gen's A6000 vs A100. 4x RTX 6000 should be faster, and has more VRAM than a single H100.
Thing to take note is the likely lack of a Tensor Memory Accelerator on the RTX 6000 Ada which is present on the H100—if you plan on training FP8 models.
lambda_matt t1_j8db78v wrote
No more NVLink on the-cards-formerly-known-as-quadro, so if your models are VRAM hungry you may be constrained by the ada6ks. PCIe 5 and Genoa/Sapphire Rapids might even this out, but I am not on the product development side of things and am not fully up to speed on next-gen and there have been lots of delays on the cpu/motherboards.
Also, the TDPs for pretty much all of the Ada cards are massive and will make multi-gpu configurations difficult and likely limited to 2x.
NVIDIA has killed off the the dgx workstation so they are pretty committed to keeping the H100s a server platform.
There still isn’t much real world info, as there are very few of any of these cards in the wild.
Here are some benchmarks for the H100 at least https://lambdalabs.com/gpu-benchmarks And are useful for comparing to to Ampere-gen.
Disclaimer: I work for Lambda
N3urAlgorithm OP t1_j8cwwcr wrote
Reply to comment by Zeratas in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
Yes due to the fact I'm going to use it for work, it'll be ok to build a server option
Zeratas t1_j8cwg7l wrote
You're not going to be putting in an H100, and a workstation. That's a server card.
With the GPUs you were mentioning, are you prepared to spend 30 to 50 thousand dollars just on the GPUs?
IIRC, the A6000s are the top of the line desktop cards.
IMHO, take a look at the specs, performance in your own workload. You'd get better value doing something like one or two A6000s, and maybe investing in a longer term server-based solution.
riversilence t1_j86w4v4 wrote
Reply to M1 MAX vs M2 MAX by markupdev
If you know your way around things, set up an on-demand AWS instance. For example, a g4dn.xlarge instance with jupyter notebook configured with SSL. It has a NVIDIA Tesla T4 GPU, roughy equivalent to a 1080 Ti. There are much more powerful options. Just don’t forget to turn them off when not training.
I_will_delete_myself t1_j85xxy6 wrote
Reply to comment by YoghurtDull1466 in M1 MAX vs M2 MAX by markupdev
It’s terrible at the moment.
I_will_delete_myself t1_j85uz4i wrote
Reply to M1 MAX vs M2 MAX by markupdev
If you are doing PyTorch you are signing up for a nightmare with the mps backend.
perrohunter t1_j84x4mf wrote
Reply to M1 MAX vs M2 MAX by markupdev
It’s absolutely worth it, I upgraded from a Core i9 with 12 cores and 32 GB to an M1 Max with 64GB and it was insane, almost 4 times faster, seeing that the new M2 Max beats the M1 Max by more than 20% would motivate me to pay the extra $1,000 bucks. These machines will last a while, my M1 Max is still too much machine more than a year later since I got it, and PyTorch has amazing performance on it.
clueless1245 t1_j83y36r wrote
Reply to M1 MAX vs M2 MAX by markupdev
Much better to rent lol.
YoghurtDull1466 t1_j83vn46 wrote
Reply to M1 MAX vs M2 MAX by markupdev
Is it true the m chips aren’t optimized for most ml applications?…
suflaj t1_j83p27m wrote
Reply to M1 MAX vs M2 MAX by markupdev
What does worth mean? The M1 GPU is roughly equivalent to a 2060. The M2 GPU is roughly equivalent to a 3060. Whether its worth for you depends on whether you want to pay that kind of money and endure shortened lifespan of your devices due to heat.
For me personally that wouldn't be worth, since it's cheaper to buy a rig to ssh in and a trusty Lenovo laptop, both of which will last longer and perform better.
personnealienee t1_j82n3k6 wrote
Reply to My Machine Learning: Learning Program by Learning_DL
I'd say ditch everything but math and start implementing your models in Pytorch and reading papers and blogs. Python can be learned by doing by someone with cs background. The field moves too fast, relevant stuff starts in ~2014 and is all on arxiv and github (both reference implementations and state-of-the-art code), there are no up to date textbooks. This
https://uvadlc-notebooks.readthedocs.io/en/latest/index.html
course is about the only one I encountered that teaches recent model architectures (it is helpful to read their implementations too). A lot of their models are mostly relevant for vision, but transformers and autoencoders are really useful in NLP. For stuff more specific to NLP ,HugginFace tutorials is a good starting point for digging
rockefeller22 t1_j8h8tnp wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
I’m just confused as to who the MacBook is for