Recent comments in /f/deeplearning
agentfuzzy999 t1_j3pbk38 wrote
I have trained locally and in the cloud on a variety of cards and server arch’s, depending on what model you are training it could be for a huge variety of reasons, but if you can fit the model on a 3080 you really aren’t going to be taking advantage of the A100s huge memory, the higher clock speed of the 3080 might simply suit this model and parameter set better.
tholladay3 t1_j3opny9 wrote
What kind of condition. Can it be a post processing thing that happens at inference?
soupstock123 OP t1_j3of5oq wrote
Reply to comment by qiltb in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
What do you mean least problems and why is one worst case?
qiltb t1_j3o7ull wrote
Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
I actually assumed you will be having 2 PSUs. For least problems, buy 2xAX1600i, for cheaper option buy 2xAX1200i. One PSU is actually the worst case, but yeah you can try with a single SFL 2000.
soupstock123 OP t1_j3o4t0l wrote
Reply to comment by Final-Rush759 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
It's a budgeting issue, if I could do 4 4090s, I would.
soupstock123 OP t1_j3o42j4 wrote
Reply to comment by Final-Rush759 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
For sure that's the upgrade path in the future, but rn my electricity is free so it's not too much of an issue.
Final-Rush759 t1_j3o39ds wrote
Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
3090 is not a good card, running at high temperatures, high noise, excessively high VRAM temperatures.
Final-Rush759 t1_j3o2zd3 wrote
Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
You save a lot electricity cost. Much beter value if you plan to use it a lot. It"s much easier to manage 2 cards than 4 cards.
soupstock123 OP t1_j3o2jk3 wrote
Reply to comment by Final-Rush759 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Performance is only 1.9 times the 3090 for deep learning and price is more than double, just bad value rn.
Final-Rush759 t1_j3o261b wrote
Reply to Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Buy 2× 4090
[deleted] t1_j3nxy3d wrote
Reply to comment by qiltb in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
[deleted]
Infamous_Age_7731 OP t1_j3nr9wg wrote
Reply to comment by susoulup in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731
I haven't looked into that. I would guess it wouldn't matter in my case, but I might be wrong.
Infamous_Age_7731 OP t1_j3nqwgy wrote
Reply to comment by BellyDancerUrgot in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731
I see, thanks! In that case, I might be asking the vendor more questions.
hjups22 t1_j3nqeim wrote
Reply to comment by qiltb in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Then I agree. If you are doing ResNet inference on 8K images, then it will probably be quite slow. However 8K segmentation will probably be even slower (the point of comparison that I was thinking of).
Also, when you get to large images, I suspect the PCIe will become a bottleneck (sending data to the GPUs), which will not be helped by the setup described by the OP.
BellyDancerUrgot t1_j3nq5pn wrote
There would be a 5-8% overhead for the same gpu in a bare vm vs physical comparison. A100 is significantly faster for ML workloads than a 3090 iirc. So it’s probably something related to how it’s setup in your case. Also try using a single gpu instead of distributed learning if you are. MPI might be leading to more overhead in your compute node.
susoulup t1_j3npxch wrote
i’m not into deep learning so fast as too using cloud gpu and local. I have benchmarked one locally and not sure if i did it properly so i really don’t have any advice. my question is if ECC buffering plays a factor in how the data is processed and stored? I thought that was one advantages of using a workstation gpu but i could be way off.
soupstock123 OP t1_j3nmsho wrote
Reply to comment by VinnyVeritas in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
I'm seeing the argument for 2 1600 PSUs. It's fine for the mining rig case frame, but it's baiscly confirming to me that this is never going to fit in a case lol.
VinnyVeritas t1_j3nh3g4 wrote
Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
I suppose one PSU will take care of motherboard + CPU + some GPUs and the other one will take care of remaining GPUs.
So if you get 4x 3090, that's 350W x4 = 1400W just for GPUs, +300 watts for CPU, +powering the rest of the components, drives, etc... So let's say we round that up to 2000W, then add at least 10% margin, that's 2200W total.
So maybe 1600W PSU for mobo and some GPUs, and another 1000W or more for the remaining GPUs. Note, if you go with 3090TI, it's more like 450-500W per card, so you have to do the maths.
Or if you want to go future proof, just put two 1600W PSUs, and then you can just swap your 3090 with 4090 in the future and not worry about upgrading PSUs.
VinnyVeritas t1_j3ng2u9 wrote
Reply to comment by qiltb in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Do you have some numbers or a link because all benchmarks I've seen point to the contrary? I'm happy to update my opinion if things have changed and there's data to support it.
soupstock123 OP t1_j3mteu8 wrote
Reply to comment by qiltb in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Hmm, the Axi1600 might not be enough for me. This is my new build: https://ca.pcpartpicker.com/user/sixartsdragon/saved/DCh6Q7 and I'm looking at 1821W, so realistically I'm looking for a 2K PSU. I've chosen a Super Flower Leadex 2000 for now? What do you think?
Dankmemexplorer t1_j3mpmt0 wrote
Reply to comment by rockpooperscissors in Building an NBA game prediction model - failing to improve between epochs by vagartha
this is likely the problem
qiltb t1_j3mmcca wrote
Reply to comment by hjups22 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Sorry, I referred explicitly to the the last paragraph of yours (that it's quick for small models)
waffles2go2 t1_j3m8xga wrote
Reply to comment by Nater5000 in Does anyone here use newer or custom frameworks aside from TensorFlow, Keras and PyTorch? by ConsciousInsects
The punchline is, if you need solid results, you can't generally risk the newer frameworks as you end-up debugging and/or not trusting the results (so you validate on other stacks....).
It's sort of close to a meritocracy + most people aren't using the capabilities of the existing stuff...
Garbage-Shoddy t1_j3lzwxm wrote
Reply to comment by junetwentyfirst2020 in Building an NBA game prediction model - failing to improve between epochs by vagartha
I also do this. Always verify that the model is actually able to learn “something”. After that you can go ahead and solve the task at hand. That way, you can be (more) sure about your model working correctly.
agentfuzzy999 t1_j3pc4pe wrote
Reply to comment by tholladay3 in where can i find yolov7 source explanation ? by yesterdaymee
Yeah I’m not following either. What do you mean by yolov7 algorithm? Yolov7 is a model architecture, maybe they are talking about nms and iou?