Recent comments in /f/deeplearning
GPUaccelerated OP t1_iu4umuw wrote
Reply to comment by ShadowStormDrift in Do companies actually care about their model's training/inference speed? by GPUaccelerated
Yeah, see in your use case, speed makes so much sense. Thank you for sharing.
Mind sharing that site with us here?
I'm always interested in taking a look at cool projects.
Also what kind of hardware is currently tasked with your project's inference?
suflaj t1_iu4ue5y wrote
Reply to comment by GPUaccelerated in Do companies actually care about their model's training/inference speed? by GPUaccelerated
Well, that is your clients' choice. It's not cost effective to buy Quadros when you could just rent them as you go, especially given their low resale value. It's not like there are many places you can't rent a nearby server with sub 10ms or at least 100ms latency.
GPUaccelerated OP t1_iu4u69c wrote
Reply to comment by hp2304 in Do companies actually care about their model's training/inference speed? by GPUaccelerated
Wow, your perspective is really something to take note of. I appreciate your comment!
What I'm understanding is that speed matters more in inference than it does for training.
GPUaccelerated OP t1_iu4tflp wrote
Reply to comment by suflaj in Do companies actually care about their model's training/inference speed? by GPUaccelerated
This makes sense. Scaling horizontally is usually the case. Thank you for commenting!
But I would argue that hardware for inference is actually bought more than one would assume. I have many clients who purchase mini-workstations to put in settings where data processing and inference jobs are done in the same premise. To limit latency and data travel.
GPUaccelerated OP t1_iu4smhu wrote
Reply to comment by THE_REAL_ODB in Do companies actually care about their model's training/inference speed? by GPUaccelerated
Definitely. But important enough to spend $ simply for increasing speed? That's what I'm trying to figure out.
Melodic-Scallion-416 t1_iu4s1r8 wrote
I have been reading articles about OpenAI Triton and how that helps to optimize memory usage during GPU processing. I have not used it personally but was planning to try it out, to address this same concern.
dafoshiznit t1_iu4lg1t wrote
Reply to comment by sabeansauce in Question about using more than one gpu for deeplearning tasks. by sabeansauce
Thank you sir. I'm going to embark on an adventure to learn everything I can about deep learning to answer your question.
allanmeter t1_iu49y6n wrote
Reply to comment by wingedrasengan927 in Do companies actually care about their model's training/inference speed? by GPUaccelerated
This.
FuB4R32 t1_iu45yrl wrote
Reply to comment by sabeansauce in Question about using more than one gpu for deeplearning tasks. by sabeansauce
Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc
sabeansauce OP t1_iu45vee wrote
Reply to comment by nutpeabutter in Question about using more than one gpu for deeplearning tasks. by sabeansauce
that is a good intro on the topic I bookmarked the paper they referenced. Good to know I have this in the toolbox thank you
sabeansauce OP t1_iu45f4w wrote
Reply to comment by FuB4R32 in Question about using more than one gpu for deeplearning tasks. by sabeansauce
for training. Essentially I have to choose between one powerful gpu or multiple average ones. But I know that the average ones on their own don't have enough space (because i have one) for the task at hand. I prefer the one gpu but company is asking if a multi-gpu setup of lesser capabilities will also work if used together.
FuB4R32 t1_iu41a65 wrote
Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem
waa007 t1_iu3x6bn wrote
Of course, it’s depends on applying situation
nutpeabutter t1_iu3v2bd wrote
There is currently no easy way of pooling vram. If the model can't fit onto vram I suggest you check out https://huggingface.co/transformers/v4.9.2/parallelism.html#tensor-parallelism.
sabeansauce OP t1_iu3s2p9 wrote
Reply to comment by dafoshiznit in Question about using more than one gpu for deeplearning tasks. by sabeansauce
glad to have you still
dafoshiznit t1_iu3oznc wrote
I have no idea how I got here
hp2304 t1_iu3ixav wrote
Inference: If real time is a requirement then it's necessary to buy high end GPUs to reduce latency other than that it's not worth it.
Training: This loosely depends on how often is a model reiterated in production. Suppose if that period is one year (seems reasonable to me), which means new model will be trained on new data gathered over this duration plus old data. Doing this fast won't make a difference. I would rather use slow GPU even if take days or few weeks. It's not worth it.
A problem to DL models in general is they are only growing in terms of number of parameters. Requiring more VRAM to fit them in single GPU. Huge thanks to model parallelism techniques and ZERO which handles this issue. Otherwise one would have to buy new hardware to train large models. I don't like where AI research is headed. Increasing parameters is not an efficient solution, we need new direction to effectively and practically solve general intelligence. On top of that, models not detecting or misdetecting objects in self driving cars despite huge training datasets is a serious red flag showing we are still far from solving AGI.
ShadowStormDrift t1_iu3fkqs wrote
I code up a semantic search engine. I was able to get it down to 3 seconds for one search.
That's blazingly fast by my standard (used to take 45 minutes) that still haunts my dreams. If 10 people use the site simultaneously that's 30 seconds before number 10 gets his results back. Which is unacceptable.
So yes. I do care if I can get that done quicker.
mayiSLYTHERINyourbed t1_iu3dafc wrote
On a regular basis. We care down to the ms how fast inference or training is. In my last organisation we had to process like 200k images while inferencing. At this point even a delay of 2ms would cost 6.7 minutes just for getting the feature vectors. Which really matters.
VonPosen t1_iu395me wrote
Yes, I spend a lot of time making sure our models train and infer as fast as possible. Faster training/inference means cheaper training/inference. That also means we can afford more training.
konze t1_iu37t3g wrote
I’m coming from academia with a lot of industry connections. Yes, there are a lot of companies that need fast DNN inference to point where they build custom ASICs just to fulfill their latency demands.
THE_REAL_ODB t1_iu37080 wrote
I cant imagine it not being very important in any setting.
wingedrasengan927 t1_iu2ue5z wrote
Yes. Down to ms
sckuzzle t1_iu2aa7o wrote
We use models to control things in real-time. We need to be able to predict what is going to happen in 5 or 15 minutes and proactively take actions NOW. If it takes 5 minutes to predict what is going to happen 5 minutes in the future, the model is useless.
So yes. We care about speed. The faster it runs the more we can include in the model (making it more accurate).
GPUaccelerated OP t1_iu4uxld wrote
Reply to comment by konze in Do companies actually care about their model's training/inference speed? by GPUaccelerated
That makes a lot of sense. And also really cool. Also, people resorting to ASICs for inference are definitely playing in the big boy leagues.
Thanks for sharing!