Recent comments in /f/deeplearning
vampiire OP t1_iysfi1y wrote
Reply to comment by sEi_ in Can you explain what this means? [wiki on deep learning] by vampiire
Haha amazing. Like asking the brain to explain itself.
I’ll look into making an account. Is this like a wiki chat bot you can ask anything to?
vampiire OP t1_iys7phn wrote
Reply to comment by CrwdsrcEntrepreneur in Can you explain what this means? [wiki on deep learning] by vampiire
Right. I misunderstood symbolic to mean a third form of signal passing, in addition to analog and digital.
CrwdsrcEntrepreneur t1_iys7hwi wrote
Reply to comment by vampiire in Can you explain what this means? [wiki on deep learning] by vampiire
The electrical signals in our nervous system are analog, not digital.
vampiire OP t1_iys7bi1 wrote
Reply to comment by CrwdsrcEntrepreneur in Can you explain what this means? [wiki on deep learning] by vampiire
Ah I see. I had always heard analog vs digital. Thought there was some semantic meaning I was missing. Thanks for the answers.
CrwdsrcEntrepreneur t1_iys6ybl wrote
Reply to comment by vampiire in Can you explain what this means? [wiki on deep learning] by vampiire
Analog means it operates with some form of continuity. Your nervous system (which your brain, and thus, your neurons, is part of) is operates thru the passing of continuous electrical signals.
Symbolic just means that is serves as a representation. I.e. the "neuron" in an ANN is not a real neuron. It just symbolizes one.
LetMeGuessYourAlts t1_iyruft9 wrote
Reply to comment by Dexamph in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Do you know: Are there any Nvidia GPUs at a decent price/performance point that can pool memory? Every avenue I've looked down seems to point to nothing a hobbyist could afford being able to get a large amount of memory without resorting to old workstation GPUs that have relatively slow processors. Best bet seems to be a single 3090 if memory is the priority?
sEi_ t1_iyqpzny wrote
Ask the question here. It gives good results, and can do follow ups if you ask it.
I got this answer plotting in your question:
>Artificial neural networks (ANNs) are computational models that are inspired by the structure and function of the biological brain. While they share some similarities with the brain, there are also some important differences between the two.
>
>One of the key differences between ANNs and biological brains is that ANNs are typically static and symbolic, while the brain is dynamic and analog. This means that ANNs are typically designed to perform a specific set of tasks, and their structure and function remain the same throughout their use. In contrast, the brain is constantly changing and adapting to new information and experiences. It is able to learn and adapt to new situations, and it uses continuous, analog signals to process information.
>
>Another key difference between ANNs and the brain is that ANNs are designed to process information in a highly parallel and distributed manner, while the brain uses a more hierarchical and sequential processing approach. In ANNs, many simple processing units (neurons) are connected together to form a network, and information is processed by the network as a whole. In the brain, on the other hand, information is processed by distinct brain regions, and there is a hierarchical organization of brain regions, with higher-level regions receiving input from lower-level regions.
>
>Despite these differences, ANNs have been able to achieve impressive performance on a wide range of tasks, such as image and speech recognition, language translation, and even playing complex games like chess and Go. While ANNs may not be perfect models of the brain, they continue to be an important tool for researchers and developers who are working on developing intelligent systems.
computing_professor t1_iyqaku8 wrote
Reply to comment by DingWrong in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Thanks. So it really isn't the same as how the Quadro cards share vram. That's really confusing.
DingWrong t1_iyq0nr0 wrote
Reply to comment by computing_professor in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Big models get sharded and chunks get loaded on each gpu. There are a lot of frameworks ready for this as the big NLP models can't fit on a single gpu. Alpa even shards the model on different machines.
BrotherAmazing t1_iypzqk6 wrote
To be fair, there is research into ANNs that adapt their architectures over time or dynamically adapt the plasticity of certain weights while engaged in “lifelong learning”, and groups have built such networks, but these are the exceptions and almost always the architecture gets fixed and weights are just updated with some standard backprop that can lead to the so-called “catastrophic forgetting” when a dataset shifts it’s PDF if you don’t do anything more advanced than the “vanilla” NN setup.
vampiire OP t1_iyptzq3 wrote
Reply to comment by CrwdsrcEntrepreneur in Can you explain what this means? [wiki on deep learning] by vampiire
Thank you. This explains static vs dynamic. But what does symbolic vs analog mean in this context?
CrwdsrcEntrepreneur t1_iypseni wrote
Neural networks are basically groups of containers for series of mathematical operations, both within the container and across layers of containers. "Symbolic" is referring to the fact that early researchers decided to call these containers "neurons", to symbolize the way biological neurons in our brains share "information" with each other (i.e. a network of neurons). ANNs are static in the sense that once you define the architecture (# of layers, neurons per layer, layer-to-layer connections) this architecture does not change. Your brain, however, does transform itself as you age and learn new things, hence it is dynamic.
Superschlenz t1_iyp5elu wrote
Reply to comment by amado88 in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
Bill from the Dronebot Workshop channel on Youtube pronounces 'suffice' that way too. He's Canadian. I'd be surprised if he turned out to be a fan of Eminem.
lanternish t1_iyon7o8 wrote
LOL, I can see future Kpop idol rappers (they rap what their company push them to do, not what they create) use this to polish their lyrics and claim they're creative
computing_professor t1_iyokex9 wrote
Reply to comment by Dexamph in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Huh. If it requires parallelization then why is the 3090 singled out as the one consumer GeForce card that is capable of memory pooling? It just seems weird. What exactly is memory pooling then, that the 3090 is capable of? I'm clearly confused.
edit: I did find this from Puget that says
> For example, a system with 2x GeForce RTX 3090 GPUs would have 48GB of total VRAM
So it's possible to pool memory with a pair of 3090s. But I'm not sure how it's done in practice.
Dexamph t1_iyoebd1 wrote
Reply to comment by computing_professor in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
You certainly can if you put the time and effort into model parallelisation, just not in a seamless way where you get a single big memory pool needing no code changes or debugging to run larger models that wouldn’t fit on one GPU that I and many others were expecting. Notice how most published benchmarks with NVLink have only tested data parallel model training because it’s really straightforward?
suflaj t1_iyodcdi wrote
2x 4090 is the most money efficient if you have model parallelism for CV. For other tasks or vision transformers, it's probably bad because of low bandwidth.
The RTX A6000 will be better for deployment. If you're only planning on training your stuff this is a non-factor. Note that it has similar, even lower bandwidth than a 4090, so there are little benefits besides power consumption, non-FP32 performance and a bigger chunk of RAM.
So honestly it's between whether or not you want a local or cloud setup. Personally, I 'd go for 1x4090 and rest on compute. If there is something you can't run on 1x4090, the A100 compute will be both more money and time efficient.
Dexamph t1_iyocn1i wrote
Reply to comment by TheButteryNoodle in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
I doubt the Torrent’s fans will do much if the blower isn’t enough because they were designed around a front to back air flow pathway with much, much higher static pressure to force air through the heatsink. We run V100s in Dell R740s on the local cluster and here’s how they sound to get the GPUs their needed airflow. So you might want to factor in the cost of custom loop water cooling into the A100 cost figure if things go south. And the spare rig as well so the true cost difference vs RTX 6000 Ada isn’t so close anymore.
I don’t know how the RTX 6000 Ada will really perform vs the A100 either because I haven’t seen the FP8 Transformer engine in action. Maybe it’ll skirt the halved memory bandwidth and land close to the A100, but the A100 delivers its performance today using today’s code.
computing_professor t1_iyo97p0 wrote
Reply to comment by Dexamph in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
So this means you cannot access 48GB of vRAM with a pair of 3090s and nvlink, with TF and PyTorch? I could have sworn I've seen that it's possible. Not a deal breaker for me, but a bummer to be sure. I will likely end up with an a6000 instead, then, which isn't as fast but has that sweet vram.
sancho_tranza t1_iyo88nh wrote
Reply to comment by Superschlenz in GPT-3 Generated Rap Battle between Yann LeCun & Gary Marcus by hayAbhay
League with primitive.
TheButteryNoodle OP t1_iyo7url wrote
Reply to comment by Dexamph in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Right. Model parallelization was one of my concerns with any type of dual GPU setup as it can be a hassle at times and isn't always suitable for all models/use cases.
As for the A100, the general plan was to purchase a card that still has Nvidia's Manufacturer Warranty active (albeit that may be a bit tough at that price point). If there is any type of extended warranty that I could purchase, whether it's from Nvidia or a reputable third party, I would definitely be looking into those. In general, if the A100 was the route I would be going, there would be some level of protection purchased, even if it costs a little bit more.
As for the cooling, you're right... that is another pain point to consider. The case that I currently have is a fractal design torrent. In this case I have 2 180mm fans in the front, 3 140mm fans at the bottom, and then a 120mm exhaust fan at the back. I would hope that these fans alongside an initial blower fan setup would provide sufficient airflow. However, if it doesn't, I would likely move again to custom water cooling.
What I'm not sure though is how close the performance of the RTX 6000 ADA comes to an A100. If the performance difference isn't ridiculous for fp16 and fp32, then it would likely make sense to lean toward the 6000. Also, there is the fp8 performance for the 6000 with CUDA 12 being right around the corner.
Dexamph t1_iyo1ryt wrote
Reply to comment by TheButteryNoodle in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Looked into this last night and yeah, NVLink works the way you described because of misleading marketing- no contiguous memory pool, just a faster interconnect so maybe model parallelisation scales a bit better but you still have to implement it. Also saw an example where some PyTorch GPT2 models scaled horrifically in training with multiple PCIe V100s and 3090s that didn’t have NVLink so that’s a caveat with dual 4090s not having NVLink.
The RTX 6000 Ada lets you skip model sharding so that’s factored into the price. You lose the extra GPU so you have less throughput though.
You might be able to get away with leaving the 4090s at the stock 450W power limit since it seems the 3090/3090Ti transient spikes have been fixed.
I’m a bit skeptical about the refurb A100, like how would warranty work if it died one day? Did you consider how you’d cool it since it seems you have a standard desktop case while they were designed for rack mount servers with screaming loud fans hence the passive heatsink? Put thoughts and prayers that the little blower fan kits on eBay for ewasted Teslas are up to the task of cooling it?
computing_professor t1_iynwyu2 wrote
Reply to comment by TheButteryNoodle in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
I think 2x 3090 will pool memory with nvlink, but not treat them as a single card. I think it depends on the software you're using. I'm pretty sure pytorch and tensorflow are able to take advantage of memory pooling. But the 3090 is the last GeForce card that will allow it. I hope somebody else comes into the thread with some examples of how to use it, because I can't seem to find any online.
TheButteryNoodle OP t1_iynr0g4 wrote
Reply to comment by computing_professor in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Hey there! Thanks for the response! I'm a little bit of a novice when it comes to how nvlink works, but wouldn't you still need to use model parallelization to fit a model over 24GB with 2x 3090s connected via nvlink? I thought they would effectively still show as two different devices, similar to 2 4090s; of course the benefit here being that the nvlink bridge directly connects the two gpus instead of going over pcie. Not too knowledgeable about this, so please feel free to correct me if I'm wrong!
cadoi t1_iysj94d wrote
Reply to I have an idea which can be solved with machine learning, but no idea where to start by excooo
I thought software engineers were "professional googlers"?
Your description of the problem makes it sounds very similar to autocorrect. Maybe learn how that is done and adapt it?
No one is actually going to be of any help unless you more precisely phrase your problem.