GPUaccelerated OP t1_iu4uxld wrote on October 28, 2022 at 3:47 PM

Reply to comment by konze in Do companies actually care about their model's training/inference speed? by GPUaccelerated

That makes a lot of sense. And also really cool. Also, people resorting to ASICs for inference are definitely playing in the big boy leagues.

Thanks for sharing!

GPUaccelerated OP t1_iu4umuw wrote on October 28, 2022 at 3:45 PM

Reply to comment by ShadowStormDrift in Do companies actually care about their model's training/inference speed? by GPUaccelerated

Yeah, see in your use case, speed makes so much sense. Thank you for sharing.

Mind sharing that site with us here?

I'm always interested in taking a look at cool projects.

Also what kind of hardware is currently tasked with your project's inference?

suflaj t1_iu4ue5y wrote on October 28, 2022 at 3:43 PM

Reply to comment by GPUaccelerated in Do companies actually care about their model's training/inference speed? by GPUaccelerated

Well, that is your clients' choice. It's not cost effective to buy Quadros when you could just rent them as you go, especially given their low resale value. It's not like there are many places you can't rent a nearby server with sub 10ms or at least 100ms latency.

GPUaccelerated OP t1_iu4u69c wrote on October 28, 2022 at 3:42 PM

Reply to comment by hp2304 in Do companies actually care about their model's training/inference speed? by GPUaccelerated

Wow, your perspective is really something to take note of. I appreciate your comment!

What I'm understanding is that speed matters more in inference than it does for training.

GPUaccelerated OP t1_iu4tflp wrote on October 28, 2022 at 3:37 PM

Reply to comment by suflaj in Do companies actually care about their model's training/inference speed? by GPUaccelerated

This makes sense. Scaling horizontally is usually the case. Thank you for commenting!

But I would argue that hardware for inference is actually bought more than one would assume. I have many clients who purchase mini-workstations to put in settings where data processing and inference jobs are done in the same premise. To limit latency and data travel.

GPUaccelerated OP t1_iu4smhu wrote on October 28, 2022 at 3:31 PM

Reply to comment by THE_REAL_ODB in Do companies actually care about their model's training/inference speed? by GPUaccelerated

Definitely. But important enough to spend $ simply for increasing speed? That's what I'm trying to figure out.

Melodic-Scallion-416 t1_iu4s1r8 wrote on October 28, 2022 at 3:28 PM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

I have been reading articles about OpenAI Triton and how that helps to optimize memory usage during GPU processing. I have not used it personally but was planning to try it out, to address this same concern.

dafoshiznit t1_iu4lg1t wrote on October 28, 2022 at 2:43 PM

Reply to comment by sabeansauce in Question about using more than one gpu for deeplearning tasks. by sabeansauce

Thank you sir. I'm going to embark on an adventure to learn everything I can about deep learning to answer your question.

allanmeter t1_iu49y6n wrote on October 28, 2022 at 1:19 PM

Reply to comment by wingedrasengan927 in Do companies actually care about their model's training/inference speed? by GPUaccelerated

This.

FuB4R32 t1_iu45yrl wrote on October 28, 2022 at 12:47 PM

Reply to comment by sabeansauce in Question about using more than one gpu for deeplearning tasks. by sabeansauce

Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc

sabeansauce OP t1_iu45vee wrote on October 28, 2022 at 12:46 PM

Reply to comment by nutpeabutter in Question about using more than one gpu for deeplearning tasks. by sabeansauce

that is a good intro on the topic I bookmarked the paper they referenced. Good to know I have this in the toolbox thank you

sabeansauce OP t1_iu45f4w wrote on October 28, 2022 at 12:42 PM

Reply to comment by FuB4R32 in Question about using more than one gpu for deeplearning tasks. by sabeansauce

for training. Essentially I have to choose between one powerful gpu or multiple average ones. But I know that the average ones on their own don't have enough space (because i have one) for the task at hand. I prefer the one gpu but company is asking if a multi-gpu setup of lesser capabilities will also work if used together.

FuB4R32 t1_iu41a65 wrote on October 28, 2022 at 12:05 PM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem

waa007 t1_iu3x6bn wrote on October 28, 2022 at 11:23 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

Of course, it’s depends on applying situation

nutpeabutter t1_iu3v2bd wrote on October 28, 2022 at 11:00 AM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

There is currently no easy way of pooling vram. If the model can't fit onto vram I suggest you check out https://huggingface.co/transformers/v4.9.2/parallelism.html#tensor-parallelism.

sabeansauce OP t1_iu3s2p9 wrote on October 28, 2022 at 10:24 AM

Reply to comment by dafoshiznit in Question about using more than one gpu for deeplearning tasks. by sabeansauce

glad to have you still

dafoshiznit t1_iu3oznc wrote on October 28, 2022 at 9:42 AM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

I have no idea how I got here

hp2304 t1_iu3ixav wrote on October 28, 2022 at 8:14 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

Inference: If real time is a requirement then it's necessary to buy high end GPUs to reduce latency other than that it's not worth it.

Training: This loosely depends on how often is a model reiterated in production. Suppose if that period is one year (seems reasonable to me), which means new model will be trained on new data gathered over this duration plus old data. Doing this fast won't make a difference. I would rather use slow GPU even if take days or few weeks. It's not worth it.

A problem to DL models in general is they are only growing in terms of number of parameters. Requiring more VRAM to fit them in single GPU. Huge thanks to model parallelism techniques and ZERO which handles this issue. Otherwise one would have to buy new hardware to train large models. I don't like where AI research is headed. Increasing parameters is not an efficient solution, we need new direction to effectively and practically solve general intelligence. On top of that, models not detecting or misdetecting objects in self driving cars despite huge training datasets is a serious red flag showing we are still far from solving AGI.

ShadowStormDrift t1_iu3fkqs wrote on October 28, 2022 at 7:26 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

I code up a semantic search engine. I was able to get it down to 3 seconds for one search.

That's blazingly fast by my standard (used to take 45 minutes) that still haunts my dreams. If 10 people use the site simultaneously that's 30 seconds before number 10 gets his results back. Which is unacceptable.

So yes. I do care if I can get that done quicker.

mayiSLYTHERINyourbed t1_iu3dafc wrote on October 28, 2022 at 6:54 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

On a regular basis. We care down to the ms how fast inference or training is. In my last organisation we had to process like 200k images while inferencing. At this point even a delay of 2ms would cost 6.7 minutes just for getting the feature vectors. Which really matters.

VonPosen t1_iu395me wrote on October 28, 2022 at 6:00 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

Yes, I spend a lot of time making sure our models train and infer as fast as possible. Faster training/inference means cheaper training/inference. That also means we can afford more training.

konze t1_iu37t3g wrote on October 28, 2022 at 5:43 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

I’m coming from academia with a lot of industry connections. Yes, there are a lot of companies that need fast DNN inference to point where they build custom ASICs just to fulfill their latency demands.

THE_REAL_ODB t1_iu37080 wrote on October 28, 2022 at 5:34 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

I cant imagine it not being very important in any setting.

wingedrasengan927 t1_iu2ue5z wrote on October 28, 2022 at 3:25 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

Yes. Down to ms

sckuzzle t1_iu2aa7o wrote on October 28, 2022 at 12:48 AM

Reply to Do companies actually care about their model's training/inference speed? by GPUaccelerated

We use models to control things in real-time. We need to be able to predict what is going to happen in 5 or 15 minutes and proactively take actions NOW. If it takes 5 minutes to predict what is going to happen 5 minutes in the future, the model is useless.

So yes. We care about speed. The faster it runs the more we can include in the model (making it more accurate).

Recent comments in /f/deeplearning