Recent comments in /f/deeplearning

CrwdsrcEntrepreneur t1_iys6ybl wrote

Analog means it operates with some form of continuity. Your nervous system (which your brain, and thus, your neurons, is part of) is operates thru the passing of continuous electrical signals.

Symbolic just means that is serves as a representation. I.e. the "neuron" in an ANN is not a real neuron. It just symbolizes one.

1

LetMeGuessYourAlts t1_iyruft9 wrote

Do you know: Are there any Nvidia GPUs at a decent price/performance point that can pool memory? Every avenue I've looked down seems to point to nothing a hobbyist could afford being able to get a large amount of memory without resorting to old workstation GPUs that have relatively slow processors. Best bet seems to be a single 3090 if memory is the priority?

1

sEi_ t1_iyqpzny wrote

Ask the question here. It gives good results, and can do follow ups if you ask it.

https://chat.openai.com/chat

I got this answer plotting in your question:

>Artificial neural networks (ANNs) are computational models that are inspired by the structure and function of the biological brain. While they share some similarities with the brain, there are also some important differences between the two.
>
>One of the key differences between ANNs and biological brains is that ANNs are typically static and symbolic, while the brain is dynamic and analog. This means that ANNs are typically designed to perform a specific set of tasks, and their structure and function remain the same throughout their use. In contrast, the brain is constantly changing and adapting to new information and experiences. It is able to learn and adapt to new situations, and it uses continuous, analog signals to process information.
>
>Another key difference between ANNs and the brain is that ANNs are designed to process information in a highly parallel and distributed manner, while the brain uses a more hierarchical and sequential processing approach. In ANNs, many simple processing units (neurons) are connected together to form a network, and information is processed by the network as a whole. In the brain, on the other hand, information is processed by distinct brain regions, and there is a hierarchical organization of brain regions, with higher-level regions receiving input from lower-level regions.
>
>Despite these differences, ANNs have been able to achieve impressive performance on a wide range of tasks, such as image and speech recognition, language translation, and even playing complex games like chess and Go. While ANNs may not be perfect models of the brain, they continue to be an important tool for researchers and developers who are working on developing intelligent systems.

1

BrotherAmazing t1_iypzqk6 wrote

To be fair, there is research into ANNs that adapt their architectures over time or dynamically adapt the plasticity of certain weights while engaged in “lifelong learning”, and groups have built such networks, but these are the exceptions and almost always the architecture gets fixed and weights are just updated with some standard backprop that can lead to the so-called “catastrophic forgetting” when a dataset shifts it’s PDF if you don’t do anything more advanced than the “vanilla” NN setup.

2

CrwdsrcEntrepreneur t1_iypseni wrote

Neural networks are basically groups of containers for series of mathematical operations, both within the container and across layers of containers. "Symbolic" is referring to the fact that early researchers decided to call these containers "neurons", to symbolize the way biological neurons in our brains share "information" with each other (i.e. a network of neurons). ANNs are static in the sense that once you define the architecture (# of layers, neurons per layer, layer-to-layer connections) this architecture does not change. Your brain, however, does transform itself as you age and learn new things, hence it is dynamic.

6

computing_professor t1_iyokex9 wrote

Huh. If it requires parallelization then why is the 3090 singled out as the one consumer GeForce card that is capable of memory pooling? It just seems weird. What exactly is memory pooling then, that the 3090 is capable of? I'm clearly confused.

edit: I did find this from Puget that says

> For example, a system with 2x GeForce RTX 3090 GPUs would have 48GB of total VRAM

So it's possible to pool memory with a pair of 3090s. But I'm not sure how it's done in practice.

0

Dexamph t1_iyoebd1 wrote

You certainly can if you put the time and effort into model parallelisation, just not in a seamless way where you get a single big memory pool needing no code changes or debugging to run larger models that wouldn’t fit on one GPU that I and many others were expecting. Notice how most published benchmarks with NVLink have only tested data parallel model training because it’s really straightforward?

3

suflaj t1_iyodcdi wrote

2x 4090 is the most money efficient if you have model parallelism for CV. For other tasks or vision transformers, it's probably bad because of low bandwidth.

The RTX A6000 will be better for deployment. If you're only planning on training your stuff this is a non-factor. Note that it has similar, even lower bandwidth than a 4090, so there are little benefits besides power consumption, non-FP32 performance and a bigger chunk of RAM.

So honestly it's between whether or not you want a local or cloud setup. Personally, I 'd go for 1x4090 and rest on compute. If there is something you can't run on 1x4090, the A100 compute will be both more money and time efficient.

3

Dexamph t1_iyocn1i wrote

I doubt the Torrent’s fans will do much if the blower isn’t enough because they were designed around a front to back air flow pathway with much, much higher static pressure to force air through the heatsink. We run V100s in Dell R740s on the local cluster and here’s how they sound to get the GPUs their needed airflow. So you might want to factor in the cost of custom loop water cooling into the A100 cost figure if things go south. And the spare rig as well so the true cost difference vs RTX 6000 Ada isn’t so close anymore.

I don’t know how the RTX 6000 Ada will really perform vs the A100 either because I haven’t seen the FP8 Transformer engine in action. Maybe it’ll skirt the halved memory bandwidth and land close to the A100, but the A100 delivers its performance today using today’s code.

3

TheButteryNoodle OP t1_iyo7url wrote

Right. Model parallelization was one of my concerns with any type of dual GPU setup as it can be a hassle at times and isn't always suitable for all models/use cases.

As for the A100, the general plan was to purchase a card that still has Nvidia's Manufacturer Warranty active (albeit that may be a bit tough at that price point). If there is any type of extended warranty that I could purchase, whether it's from Nvidia or a reputable third party, I would definitely be looking into those. In general, if the A100 was the route I would be going, there would be some level of protection purchased, even if it costs a little bit more.

As for the cooling, you're right... that is another pain point to consider. The case that I currently have is a fractal design torrent. In this case I have 2 180mm fans in the front, 3 140mm fans at the bottom, and then a 120mm exhaust fan at the back. I would hope that these fans alongside an initial blower fan setup would provide sufficient airflow. However, if it doesn't, I would likely move again to custom water cooling.

What I'm not sure though is how close the performance of the RTX 6000 ADA comes to an A100. If the performance difference isn't ridiculous for fp16 and fp32, then it would likely make sense to lean toward the 6000. Also, there is the fp8 performance for the 6000 with CUDA 12 being right around the corner.

2

Dexamph t1_iyo1ryt wrote

Looked into this last night and yeah, NVLink works the way you described because of misleading marketing- no contiguous memory pool, just a faster interconnect so maybe model parallelisation scales a bit better but you still have to implement it. Also saw an example where some PyTorch GPT2 models scaled horrifically in training with multiple PCIe V100s and 3090s that didn’t have NVLink so that’s a caveat with dual 4090s not having NVLink.

The RTX 6000 Ada lets you skip model sharding so that’s factored into the price. You lose the extra GPU so you have less throughput though.

You might be able to get away with leaving the 4090s at the stock 450W power limit since it seems the 3090/3090Ti transient spikes have been fixed.

I’m a bit skeptical about the refurb A100, like how would warranty work if it died one day? Did you consider how you’d cool it since it seems you have a standard desktop case while they were designed for rack mount servers with screaming loud fans hence the passive heatsink? Put thoughts and prayers that the little blower fan kits on eBay for ewasted Teslas are up to the task of cooling it?

3

computing_professor t1_iynwyu2 wrote

I think 2x 3090 will pool memory with nvlink, but not treat them as a single card. I think it depends on the software you're using. I'm pretty sure pytorch and tensorflow are able to take advantage of memory pooling. But the 3090 is the last GeForce card that will allow it. I hope somebody else comes into the thread with some examples of how to use it, because I can't seem to find any online.

1

TheButteryNoodle OP t1_iynr0g4 wrote

Hey there! Thanks for the response! I'm a little bit of a novice when it comes to how nvlink works, but wouldn't you still need to use model parallelization to fit a model over 24GB with 2x 3090s connected via nvlink? I thought they would effectively still show as two different devices, similar to 2 4090s; of course the benefit here being that the nvlink bridge directly connects the two gpus instead of going over pcie. Not too knowledgeable about this, so please feel free to correct me if I'm wrong!

1