qiltb t1_j3rtr6a wrote on January 10, 2023 at 5:20 PM

Reply to Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

Be sure to check logs (i.e. dmesg for starters). Many A100s on AWS for example suffer from memory corruptions which leads to severe degradation in performance. Also check temps.

A single A100 (even the least capable one - 400W with 40GB) should be more of a level of 3090Ti.

You also need to check memory usage (if it's on a limit - like 78.9/80 - there's a problem somewhere). Also don't exclude drivers.

Those are some common headaches when setting up remote GPU instances for DL...

Balocre t1_j3rt4xt wrote on January 10, 2023 at 5:16 PM

Reply to comment by lohvei0r in TypeError: 'module' object is not callable by ContributionFun3037

No, because this is a lazy person that could not be bothered trying to look for the answer on the internet first You on the other hand should be more polite

VinnyVeritas t1_j3rrzvr wrote on January 10, 2023 at 5:09 PM

Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123

Actually I've been sort of looking at ML computers (kind of of browsing and dreaming one day I would have one, but it's always going to be out of my means and needs anyway). Anyway, they can put two PSUs in a box, obviously it's made by companies, so the total cost is twice or 3 times the cost of the parts alone (e.g. building yourself would be 2-3x cheaper) but it could inspire you for picking your parts.https://bizon-tech.com/amd-ryzen-threadripper-up-to-64-cores-workstation-pc

https://shop.lambdalabs.com/gpu-workstations/vector/customize

lohvei0r t1_j3rq4gi wrote on January 10, 2023 at 4:58 PM

Reply to comment by Balocre in TypeError: 'module' object is not callable by ContributionFun3037

You should give actionable feedback or STFU

Balocre t1_j3rlblk wrote on January 10, 2023 at 4:29 PM

Reply to TypeError: 'module' object is not callable by ContributionFun3037

That s not a DL problem, you should review your python basics

vulchinithin t1_j3riry6 wrote on January 10, 2023 at 4:13 PM

Reply to Time-series forecasting by AwayBobcat2273

How many columns do you have? You could forecast with reasonable accuracy with simple machine learning models if the data has less noise

ASalvail t1_j3r5kxy wrote on January 10, 2023 at 2:46 PM

Reply to comment by TheLoneKid in Time-series forecasting by AwayBobcat2273

A statistical model. I'm personally partial to using the ETS model (error, trend, seasonality), but a SARIMAX is also another good one. The 'issue' with a stats model will be that you need to do some hand tuning and thus need to understand how the model works (and ETS is a fairly simple one to comprehend).

atlvet t1_j3r06ld wrote on January 10, 2023 at 2:07 PM

Reply to Time-series forecasting by AwayBobcat2273

Agree with what others have said about the limited amount of data. If you’re just looking for some code to experiment with, I’ve used the open source library Prophet from Meta which is built for time-series forecasting.

https://facebook.github.io/prophet/

Infamous_Age_7731 OP t1_j3qye3r wrote on January 10, 2023 at 1:54 PM

Reply to comment by No_Cryptographer9806 in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

Thank you for your input, it makes sense. Nonetheless, I have adequate RAM and I just checked the IO speed (using sysbench) and actually are pretty much the same with the VM's being a bit faster.

Infamous_Age_7731 OP t1_j3qy6qv wrote on January 10, 2023 at 1:52 PM

Reply to comment by ivan_kudryavtsev in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

> multithreaded
>
>sysbench
>
> to compare CPU and RAM

Thanks a lot for your input! I checked the CPU %steal it seems optimal ranging from 0.0 to 0.1st. Then, I don't think it's a shard since in the NVIDIA I have the full 80Gb memory at my disposal (unless they do some trickeries). I did a series of `sysbench tests and I found out that the VM's CPU is slightly worse for single-thread performance, but what is more astounding is the RAM speed. For 1 or 8 threads the write is 0.8x slower and the read is 1.5x slower. The Ram speed drop seems to reflect the iteration per second speed drop when I train the model. I guess this might be the fault.

trajo123 t1_j3qwrtb wrote on January 10, 2023 at 1:41 PM

Reply to comment by trajo123 in Time-series forecasting by AwayBobcat2273

I understand that you want to use 13 rows of history for prediction, but do you have more than 13 rows to train the model? How many rows do you have in total?

trajo123 t1_j3qwftm wrote on January 10, 2023 at 1:38 PM

Reply to Time-series forecasting by AwayBobcat2273

13 periods as history to forecast another 13, this seems like a very atypical/extreme TS forecasting problem, do these services actually handle so little data?

First, it's unlikely that this little data is enough for anything but the simplest models. Probably the best you could do in terms of a domain independent model is linear regression. Even so calculating performance metrics - knowing how good the model is - is going to be challenging as that would require you to further reduce the amount of training data in order to have a validation/"out of sample" set.

Getting useful predictions with so little data is probably going to require you to make a model with strong assumptions - e.g. come up with a set of domain-specific parametrized equations that govern the time-series and then fit those parameters to the data.

In any case, Deep Learning is far from the first approach that comes to mind trying to solve this problem. Solving this problem is probably just a few lines of code using R or scipy.stats + sklearn, probably less than calling the cloud API functions. The trick is to use the right mathematical model.

Infamous_Age_7731 OP t1_j3qrkhp wrote on January 10, 2023 at 12:56 PM

Reply to comment by ivan_kudryavtsev in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

Yes indeed, I am not doing anything in parallel. I use them separately and I wanted to compare their internal design as you said.

Infamous_Age_7731 OP t1_j3qrctv wrote on January 10, 2023 at 12:54 PM

Reply to comment by agentfuzzy999 in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

Thanks for your advice. FYI, I use the A100 for larger models and/or longer inputs/outputs that don't fit to my 3080.

No_Cryptographer9806 t1_j3q38ya wrote on January 10, 2023 at 7:52 AM

Reply to Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

You might want to check what storage you have on your VM. Slow storage could be a bottleneck. I would suggest to get a high speed SSD.

You should use nvidia-smi or nvtop to monitor GPU usage

ivan_kudryavtsev t1_j3q0c02 wrote on January 10, 2023 at 7:15 AM

Reply to comment by BellyDancerUrgot in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

>Oh I thought maybe he is going for distributed learning since he has access to 2 GPUs. In that case MPI has some overhead simply because it has to replicate, scatter and gather all the gradients per batch every epoch.

It looks like no; they speculated about the internal design of A100.

TheLoneKid t1_j3q01a4 wrote on January 10, 2023 at 7:11 AM

Reply to comment by ASalvail in Time-series forecasting by AwayBobcat2273

"Statistical is the way to go in your case." What do you mean here?

BellyDancerUrgot t1_j3pw2m2 wrote on January 10, 2023 at 6:25 AM

Reply to comment by ivan_kudryavtsev in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

Oh I thought maybe he is going for distributed learning since he has access to 2 GPUs. In that case MPI has some overhead simply because it has to replicate, scatter and gather all the gradients per batch every epoch.

ASalvail t1_j3pvlls wrote on January 10, 2023 at 6:20 AM

Reply to Time-series forecasting by AwayBobcat2273

You don't have enough data to use AI, you're likely just going to overfit the series. In fact, time series are usually fairly short which led to the whole forecasting community to erroneously think ML could never be used for forecasting (see the M4 competition). Statistical is the way to go in your case.

If you absolutely want ML, use a simple random forest library.

ivan_kudryavtsev t1_j3pssyp wrote on January 10, 2023 at 5:50 AM

Reply to comment by BellyDancerUrgot in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731

Why so? GPUs are passed to VM in a pass-through mode, so no significant performance pitfails must happen. I recommend OP to look at CPU %steal, nvidia-smi (maybe it is A100 1/7 shard, not a full GPU). Run a single and multithreaded sysbench to compare CPU and RAM. Also, PCI-E generation or deficated bandwidth can be outperforming on your hardware if a cloud provider uses a not well-balanced custom build.

AwayBobcat2273 OP t1_j3poov9 wrote on January 10, 2023 at 5:09 AM

Reply to comment by DustinEwan in Time-series forecasting by AwayBobcat2273

ok thanks, yeah, I realised that there may not be much to train on so might not work

DustinEwan t1_j3pnwfz wrote on January 10, 2023 at 5:02 AM

Reply to Time-series forecasting by AwayBobcat2273

Time series is an extremely complex problem unless there's clear periodicity to the data.

In this case linear regression may be a better approach than deep learning.

shrshk7 t1_j3pf6di wrote on January 10, 2023 at 3:48 AM

Reply to Time-series forecasting by AwayBobcat2273

I’ve been wanting to do the same, I did some research over past few days and seems like this can be achieved in tensorflow js, for the amount of data you mentioned I think you can train the model in the browser

yesterdaymee OP t1_j3pdwud wrote on January 10, 2023 at 3:39 AM

Reply to comment by agentfuzzy999 in where can i find yolov7 source explanation ? by yesterdaymee

I mean i need to make some changes in detect.py code. I Just dont know in which part i should write a code.(for a video input.)

yesterdaymee OP t1_j3pdlfo wrote on January 10, 2023 at 3:37 AM

Reply to comment by tholladay3 in where can i find yolov7 source explanation ? by yesterdaymee

Yes, I only need to detect persons with some tracking id to them and if there are no persons in the frame "no persons in the room" must be printed. And also if the person if the person wears a mask print "with mask" else "without" mask. I know if no classes were detected then i need to make some change. I just couldnt understand where the class is being finalised. Could you help me ? (For a video input)

Recent comments in /f/deeplearning