No_Difference9752 t1_jb3z8xg wrote on March 6, 2023 at 6:37 AM

Reply to Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Get a power supply and 3090. Cheaper than keeping up. Making one now.

xRaptorGG t1_jb3tjgt wrote on March 6, 2023 at 5:31 AM

Reply to comment by I_will_delete_myself in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Whenever I try to connect to a GPU on Colab, I get a GPU limit message. This has been the case for the last 3 weeks

Nerveregenerator t1_jb3milx wrote on March 6, 2023 at 4:21 AM

Reply to LLaMA model parallelization and server configuration by ChristmasInOct

I was able to run the 7B on a v100 on lambda labs. Didnt try the other ones.

I_will_delete_myself t1_jb3gzlz wrote on March 6, 2023 at 3:32 AM

Reply to comment by incrediblediy in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

OP's use case though is just looking for a cheap gpu to dabble into. If you have the money for the 3090 then go ahead. However the cloud and Colab is a lot cheaper at the moment until Google decides to screw everyone over in the future.

incrediblediy t1_jb3g1qw wrote on March 6, 2023 at 3:24 AM

Reply to comment by I_will_delete_myself in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

I have dual 3090 + 3060 setup running on 850 W PSU. 3090 is about 4x speed of 3060

incrediblediy t1_jb3fxyt wrote on March 6, 2023 at 3:23 AM

Reply to comment by Cyp9715 in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

power supplies are quite cheap now, probably around $125 for 850 W.

Cyp9715 OP t1_jb30vxg wrote on March 6, 2023 at 1:21 AM

Reply to comment by I_will_delete_myself in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Thanks to you, I found that the RTX 3060 has 12GB VRAM. I'll consider it.

I_will_delete_myself t1_jb2zavm wrote on March 6, 2023 at 1:08 AM

Reply to Should I choose Colab or RTX3070 for deep learning? by Cyp9715

I suggest using Colab free. The resources are more than most people need and use the cloud when you got a serious work load like a business or research.

If you want to do gaming with that then try the rtx 3060 instead. More VRAM let’s you do more than rtx 3070 ironically.

Either paths will eventually lead you to the cloud to be remotely competitive in serious workloads.

I_will_delete_myself t1_jb2z5ju wrote on March 6, 2023 at 1:07 AM

Reply to comment by Cyp9715 in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

3060 is better. The vram let’s you get more stuff done

Cyp9715 OP t1_jb2xcry wrote on March 6, 2023 at 12:53 AM

Reply to comment by No_Dust_9578 in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Thank you for your kind comments. I'm seriously considering buying the RTX3070, and I'll probably keep it within this week.

Cyp9715 OP t1_jb2x4ak wrote on March 6, 2023 at 12:51 AM

Reply to comment by karyo in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Thank you for your kind comments.

Cyp9715 OP t1_jb2wxsd wrote on March 6, 2023 at 12:49 AM

Reply to comment by incrediblediy in Should I choose Colab or RTX3070 for deep learning? by Cyp9715

I can find it, but installing the RTX3090 requires changing the power supply as well, so I'm trying to compromise with the 3070.

incrediblediy t1_jb2tdir wrote on March 6, 2023 at 12:21 AM

Reply to Should I choose Colab or RTX3070 for deep learning? by Cyp9715

can you find an used RTX3090 ?

No_Dust_9578 t1_jb2r3mw wrote on March 6, 2023 at 12:03 AM

Reply to Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Colab became expensive since they introduced compute units. I was using pro+ and running code for 2-3 hours a day. They give you 500 units per month and a premium gpu that costs 13-15 units per hour. I’d rather get a 3070 and run all I want.

karyo t1_jb2qwti wrote on March 6, 2023 at 12:01 AM

Reply to Should I choose Colab or RTX3070 for deep learning? by Cyp9715

Unless everything you want to try out fits in 3070 memory(8gb), if recommend colab.

karyo t1_jb2qo4e wrote on March 5, 2023 at 11:59 PM

Reply to comment by ChristmasInOct in LLaMA model parallelization and server configuration by ChristmasInOct

For inference?yes. Look at eleutherai transformer math page. Also others are trying out llama rn so check them out

karyo t1_jb2qhcd wrote on March 5, 2023 at 11:58 PM

Reply to comment by ChristmasInOct in LLaMA model parallelization and server configuration by ChristmasInOct

https://twitter.com/ericjang11/status/1627818245406461952?s=20

ChristmasInOct OP t1_jb2enle wrote on March 5, 2023 at 10:29 PM

Reply to comment by Appropriate_Ant_4629 in LLaMA model parallelization and server configuration by ChristmasInOct

I really appreciate this response.

I'm not planning on using any of our data or touching the infrastructure yet, but for some reason I never considered using the cloud to determine hardware configuration.

Thanks again. Exactly what I needed!

ChristmasInOct OP t1_jb2cwwf wrote on March 5, 2023 at 10:17 PM

Reply to comment by karyo in LLaMA model parallelization and server configuration by ChristmasInOct

Thanks for the response. Do you recall where you read the "only 200 people" bit? I'll take a look around for it as well; seems like the context could have found itself surrounded by interesting conversation.

P2P is not so much of a limitation so long as you can fit the entire model / pipeline into a single cards VRAM though, correct?

So for example, if you have a 7B Param model at FP16 and its around 14GB, presumably you should be safe with 24GB VRAM?

Thanks again for your time.

Appropriate_Ant_4629 t1_jb1rhkh wrote on March 5, 2023 at 7:47 PM

Reply to LLaMA model parallelization and server configuration by ChristmasInOct

Take a step back:

Start on a cloud -- renting GPUs or TPUs -- with nonsensitive data.

I know you said "but bottom line the data running through our platform is all back-office, highly sensitive business information, and many have agreements explicitly restricting the movement of data to or from any cloud services".

You shouldn't be touching such information during development anyway.

Make or find a non-sensitive dataset of similar scale for development.

Don't buy hardware up front until you have almost the entire data pipeline working well on rented servers. Rent them hourly on any of the big cloud platforms, and you'll quickly be able to quantify most of your hardware requirements. How much RAM you need in GPUs/TPUs. How much RAM you need on CPUs. How fast a storage layer you'll need.

Only after you have an at-scale dev/qa environment working on a cloud, will you have any idea what physical hardware you'd want to buy.

diepala OP t1_jb0hjyo wrote on March 5, 2023 at 2:24 PM

Reply to comment by boosandy in General NN Architecture guidelines for a regression problem with tabular data by diepala

What kind of statistical models?

boosandy t1_jb0c05f wrote on March 5, 2023 at 1:34 PM

Reply to General NN Architecture guidelines for a regression problem with tabular data by diepala

Forget nn for regression. I would consider statistical models for it .

boosandy t1_jb0btu3 wrote on March 5, 2023 at 1:32 PM

Reply to What do you recommend for a text summarization task? by average-joee

Flan t5 xl is a very good transformer finetuned on instructions. T5 is also good. You might want to tweak around some generation parameters to get good results.

boosandy t1_jb0bl7b wrote on March 5, 2023 at 1:30 PM

Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16

Padding has zero attention hence you don't lose context.

karyo t1_jb03jq0 wrote on March 5, 2023 at 12:00 PM

Reply to LLaMA model parallelization and server configuration by ChristmasInOct

The first question is kinda difficult. Deep speed, zero, Megatron all play into it. There's a reason somebody recently said that there are only 200 people on the world atm that can pull it off.

For the second question ,

4090s just won't cut it. Nvidia fused off P2P this generation so unless you have an embarrassingly parallel pipeline ( which current llms aren't) they are not useful. Problem is ada a6000 was restricted severely P2P wise.

If you're doing llms at billion scale you gotta get v,a,h100s

Recent comments in /f/deeplearning