Recent comments in /f/deeplearning

I_will_delete_myself t1_jb2zavm wrote

I suggest using Colab free. The resources are more than most people need and use the cloud when you got a serious work load like a business or research.

If you want to do gaming with that then try the rtx 3060 instead. More VRAM let’s you do more than rtx 3070 ironically.

Either paths will eventually lead you to the cloud to be remotely competitive in serious workloads.

6

ChristmasInOct OP t1_jb2cwwf wrote

Thanks for the response. Do you recall where you read the "only 200 people" bit? I'll take a look around for it as well; seems like the context could have found itself surrounded by interesting conversation.

P2P is not so much of a limitation so long as you can fit the entire model / pipeline into a single cards VRAM though, correct?

So for example, if you have a 7B Param model at FP16 and its around 14GB, presumably you should be safe with 24GB VRAM?

Thanks again for your time.

1

Appropriate_Ant_4629 t1_jb1rhkh wrote

Take a step back:

  • Start on a cloud -- renting GPUs or TPUs -- with nonsensitive data.

I know you said "but bottom line the data running through our platform is all back-office, highly sensitive business information, and many have agreements explicitly restricting the movement of data to or from any cloud services".

You shouldn't be touching such information during development anyway.

Make or find a non-sensitive dataset of similar scale for development.

Don't buy hardware up front until you have almost the entire data pipeline working well on rented servers. Rent them hourly on any of the big cloud platforms, and you'll quickly be able to quantify most of your hardware requirements. How much RAM you need in GPUs/TPUs. How much RAM you need on CPUs. How fast a storage layer you'll need.

Only after you have an at-scale dev/qa environment working on a cloud, will you have any idea what physical hardware you'd want to buy.

3

karyo t1_jb03jq0 wrote

The first question is kinda difficult. Deep speed, zero, Megatron all play into it. There's a reason somebody recently said that there are only 200 people on the world atm that can pull it off.

For the second question ,

4090s just won't cut it. Nvidia fused off P2P this generation so unless you have an embarrassingly parallel pipeline ( which current llms aren't) they are not useful. Problem is ada a6000 was restricted severely P2P wise.

If you're doing llms at billion scale you gotta get v,a,h100s

2