Recent comments in /f/deeplearning

agentfuzzy999 t1_j3pbk38 wrote

I have trained locally and in the cloud on a variety of cards and server arch’s, depending on what model you are training it could be for a huge variety of reasons, but if you can fit the model on a 3080 you really aren’t going to be taking advantage of the A100s huge memory, the higher clock speed of the 3080 might simply suit this model and parameter set better.

4

hjups22 t1_j3nqeim wrote

Then I agree. If you are doing ResNet inference on 8K images, then it will probably be quite slow. However 8K segmentation will probably be even slower (the point of comparison that I was thinking of).
Also, when you get to large images, I suspect the PCIe will become a bottleneck (sending data to the GPUs), which will not be helped by the setup described by the OP.

1

BellyDancerUrgot t1_j3nq5pn wrote

There would be a 5-8% overhead for the same gpu in a bare vm vs physical comparison. A100 is significantly faster for ML workloads than a 3090 iirc. So it’s probably something related to how it’s setup in your case. Also try using a single gpu instead of distributed learning if you are. MPI might be leading to more overhead in your compute node.

2

susoulup t1_j3npxch wrote

i’m not into deep learning so fast as too using cloud gpu and local. I have benchmarked one locally and not sure if i did it properly so i really don’t have any advice. my question is if ECC buffering plays a factor in how the data is processed and stored? I thought that was one advantages of using a workstation gpu but i could be way off.

2

VinnyVeritas t1_j3nh3g4 wrote

I suppose one PSU will take care of motherboard + CPU + some GPUs and the other one will take care of remaining GPUs.

So if you get 4x 3090, that's 350W x4 = 1400W just for GPUs, +300 watts for CPU, +powering the rest of the components, drives, etc... So let's say we round that up to 2000W, then add at least 10% margin, that's 2200W total.

So maybe 1600W PSU for mobo and some GPUs, and another 1000W or more for the remaining GPUs. Note, if you go with 3090TI, it's more like 450-500W per card, so you have to do the maths.

Or if you want to go future proof, just put two 1600W PSUs, and then you can just swap your 3090 with 4090 in the future and not worry about upgrading PSUs.

1

waffles2go2 t1_j3m8xga wrote

The punchline is, if you need solid results, you can't generally risk the newer frameworks as you end-up debugging and/or not trusting the results (so you validate on other stacks....).

It's sort of close to a meritocracy + most people aren't using the capabilities of the existing stuff...

2