VirtualHat t1_izrki2g wrote on December 11, 2022 at 8:45 AM

Reply to [D] Does Google TPU v4 compete with GPUs in price/performance? by Shardsmp

I would also like to know the answer to this...

VirtualHat t1_izjmbm0 wrote on December 9, 2022 at 4:21 PM

Reply to comment by RSchaeffer in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]

Oh my bad, didn't realise Reddit automatically created links when writing abc.xyz. I've edited the reply to include links to my code.

VirtualHat t1_izgvx9j wrote on December 9, 2022 at 12:38 AM

Reply to comment by moyle in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]

This looks great.

VirtualHat t1_izfu724 wrote on December 8, 2022 at 8:15 PM

Reply to comment by 1bir in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]

I use three scripts.

train.py (which trains my model)

worker.py (which picks up the next job and runs it using train.py)

runner.py (which is basically a list of jobs and code to display what's happening).

I then have multiple machines running multiple instances of worker.py. When a new job is created, the workers see it and start processing it. Work is broken into 5-epoch blocks, and at the end of each block, a new job from the priority queue is selected.

This way I can simply add a new job and within 30 minutes or so one of the workers will finish its current block and pick it up. Also because of the chunking, I get early results on all the jobs rather than having to wait for them to finish. This is important as I often know early on if it's worth finishing or not.

I evaluate the results in a Jupyter notebook using the logs that each job creates.

edit: fixed links.

VirtualHat t1_izczlg3 wrote on December 8, 2022 at 4:46 AM

Reply to [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]

I have a system where I can go from idea to initial results in 2-hours and full results by the next day. I've found a short loop like this critical for testing the hundreds of ideas that come to mind.

VirtualHat t1_iz3li4d wrote on December 6, 2022 at 5:13 AM

Reply to comment by Oceanboi in [D] Determining the right time to quit training (CNN) by thanderrine

https://en.wikipedia.org/wiki/Mutual_information

VirtualHat t1_iz2qj72 wrote on December 6, 2022 at 12:59 AM

Reply to comment by Oceanboi in [D] Determining the right time to quit training (CNN) by thanderrine

Yes, massive amounts of epochs with an overparameterized model. As mentioned, I wouldn't recommend it, though. It's just interesting that some of the intuition about how long to train for is changing from "too much is bad" to "too much is good".

If you are inserted in this subject, I'd highly recommend https://openai.com/blog/deep-double-descent/ (which is about overparameterization), as well as the paper mentioned above (which is about over-training). Again - I wouldn't recommend this for your problem. It's just interesting.

It's also worth remembering that there will be a natural error rate for your problem (i.e. does X actually tell us what y is). So it is possible that 70-75 test accuracy is the best you can do on your problem.

VirtualHat t1_iyz96uh wrote on December 5, 2022 at 7:47 AM

Reply to comment by Oceanboi in [D] Determining the right time to quit training (CNN) by thanderrine

If you make your model large enough, you will get to 100%. In fact, not only can you get to 100% accuracy, but you can also get train loss to effectively 0. The paper I linked above discusses how this previously was considered a very bad idea, but if done carefully can actually improve generalization.

Probably the best bet though is to just stick to the "stop when validation goes up" rule.

VirtualHat t1_iyv1t3r wrote on December 4, 2022 at 11:19 AM

Reply to [D] Determining the right time to quit training (CNN) by thanderrine

Traditionally you stop when the validation loss starts to increase. However, modern thinking is now to train very large models until ~~zero~~ (very low) training loss and then keep going. Eventually, your model will start to generalize. A sort of double descent on training time.

https://arxiv.org/pdf/2002.08709.pdf

VirtualHat t1_ixpcg7b wrote on November 25, 2022 at 6:12 AM

Reply to [D] Informal meetup at NeurIPS next week by tlyleung

I'm interested.

VirtualHat t1_itkajqr wrote on October 24, 2022 at 7:36 AM

Reply to comment by jaschau in [D] Building the Future of TensorFlow by eparlan

I use Pytorch every day and haven't gone back to TF for years. That being said, there are lots of old projects still on TF, and indeed on the older 1.x version before they fixed most of the stuff.

I'm glad they're working on XLA and JAX though.

VirtualHat t1_isullw4 wrote on October 18, 2022 at 8:31 PM

Reply to [D] How frustrating are the ML interviews these days!!! TOP 3% interview joke by Mogady

Sometimes I get the feeling that people's reasons for rejecting a candidate don't align with the real reason. Could be as simple as "we're already hiring a friend of one of our co-workers", but rather than tell you that, they make up a reason that is (legally) defensible but obviously not correct.

This happens a bit in certain companies where an internal promotion has already been decided on, but for 'fairness', they need to interview external applications just to reject them.

VirtualHat t1_ise591d wrote on October 15, 2022 at 7:48 AM

Reply to [D] Could a ML model be used for Image Compression? by midasp

Short answer is yes (https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202).

VirtualHat t1_ir3tiey wrote on October 5, 2022 at 3:10 AM

Reply to [D] How do you go about hyperparameter tuning when network takes a long time to train? by twocupv60

Here are some options

Tune a smaller network, then apply the hyperparameters to the larger one and 'hope for the best'.
As others have said, train less, for example, 10 epochs rather than 100. I typically find this produces the wrong results though (the best performer is often poor early on)
For low dim (2d) perform a very coarse grid search (space samples an order of magnitude apart, maybe two), then use just the best model. This is often the best method as you don't want to overtune the hyperparameters.
For high dim, just use random search, then marginalize over all but one parameter using the mean of the best 5-runs. This works really well.
If the goal is often to compare two methods rather than to maximize the score, you can use other people's hyperparameters.
Baysian optimization is usually not worth the time. In small dims do grid search, in large do random search.
If you have the resources then train your models in parallel. This is a really easy way to make use of multiple GPUs if you have them.
In some cases you can perform early stopping for models which are clearly not working. I try not to do this though.
When I do HPS I'm doing it on another dataset than my main one. This helps make things quicker. I'm doing RL though, so it's a bit different I guess.

VirtualHat t1_iquhpin wrote on October 3, 2022 at 4:33 AM

Reply to comment by rudimentarythoughts in [D] Is it worth attending Neurips 2022? by rudimentarythoughts

I'm not sure about other countries, but in Australia, I just searched for ML on Meetup.com, and found a group that frequently talks about NLP. You can always email the relevant lecturer at your local university, I'm sure they'd be glad to recommend a group to you.

VirtualHat t1_iql9ewk wrote on October 1, 2022 at 4:49 AM

Reply to [D] Is it worth attending Neurips 2022? by rudimentarythoughts

I haven't heard this talked about much, but I think reading groups are by far the best way to dip your toes into a research field. It's a chance to read some papers you might not normally read, as well as get to know some interesting people in the field.

In my experience, reading groups are typically very open to outsiders, especially if you have an interest in the field.

VirtualHat t1_iql9639 wrote on October 1, 2022 at 4:46 AM

Reply to [D] Is it worth attending Neurips 2022? by rudimentarythoughts

I'm heading to NeurIPS this year (with a paper), and I see it as an opportunity to network as well as promote my research. This is my first time, and so I'm also asking the same question if it's worth it or not.

If you end up going, I'd be really interested to get your thoughts in two-months time about your experience and if it ended up being worthwhile or not.