Recent comments in /f/MachineLearning

antonb90 t1_jczajd1 wrote

Things are improving fast.

>COLT5 is better at any speed. For 16k input length, COLT5 matches or exceeds LONGT5 quality for Large and XL with 35-75% training speedup and 50-100% inference speedup on top of the order-of-magnitude inference speedup from MQA. Encoder speedups are even greater (Appendix D). COLT5-XL also achieves SOTA performance on the SCROLLS benchmark

​

>COLT5 achieves both stronger performance and faster inference speed at all input lengths and is able to effectively make use of extremely long inputs. We note that COLT5 achieves large quality gains by going from 32k to 64k tokens even while keeping the number of routed tokens constant, providing more evidence for our hypothesis.

Google's new COLT5 64k,

https://arxiv.org/abs/2303.09752

1

alfredr OP t1_jcz3keg wrote

I understand that it’s about LLMs and that it is not comprehensive — also that the site author has (perhaps questionably) embedded some of their own work in the list. That said, it does otherwise appear to be a list of influential papers representing a current major thrust.

I did not downvote you, btw

2

lmericle t1_jcyxiex wrote

Well, no, it isn't. You are looking for machine learning research. That list is only about LLMs, a very specific and over-hyped sub-sub-application of ML techniques.

If all you want is to attach yourself to the hype cycle, then that link still won't be enough, but at least it's a start.

0

Papaya_lawrence t1_jcyqbyw wrote

I will be teaching a class of about 18 students. Each student will need to train their own StyleGAN2 model towards the end of the semester and I'm trying to figure out which platform I want them to use. These students will be coming from different disciplines and so ideally we'd use something like Google Colab because then they could easily work off of my code, avoid learning how to ssh into a virtual machine, using bash commands, etc. And for context, this is not a technical course so I'm more concerned with ease of use than having a detailed introduction to using a virtual/remote machine. The other parts of this course involve more reading & discussion on the history of Generative Art. So I see training their own model as a chance to bring in a hands-on approach to thinking with and about Machine Learning in a creative context. I can propose a budget to my institution so it is possible that I use a paid platform (although logistically, it may be more difficult to figure out how to allocate funds to different accounts). I've looked at Paperspace's Gradient tool as well. I know apps like RunwayML would allow students to train a model code-free, but my concern is that Runway uses transfer learning and I kind of want them to only train the model on their own data that they've collected. I'm curious if any of you have suggestions or anecdotes from your own personal experience using different platforms. Thanks in advance!

1

trnka t1_jcyped6 wrote

Some systems output the most probable token in each context, so those will be consistent given a prompt. Traditionally that could lead to very generic responses.

So it's common to add a bit of randomness into it. The simplest approach is to generate tokens according to their probability. There are many other variations on this to allow more control over how "creative" the generator can be.

1

save_the_panda_bears t1_jcya2fm wrote

I’m going to be honest, this is a truly terrible idea. Do you have any idea the potential legal exposure you have with this product? If you’re serious about pursuing this, take the site offline and call a lawyer right now.

7