Recent comments in /f/MachineLearning
currentscurrents t1_jcziz0q wrote
Reply to [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
I'm gonna end up buying a bunch of 24GB 3090s at this rate.
farmingvillein t1_jczf7z8 wrote
Reply to Smarty-GPT: wrapper of prompts/contexts [P] by usc-ur
Maybe I'm reading too quickly, but I can't figure out what this actually does, from the README.
[deleted] t1_jczdwt5 wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
[deleted]
G_fucking_G t1_jczd46d wrote
Reply to [D]: Vanishing Gradients and Resnets by Blutorangensaft
https://old.reddit.com/r/MachineLearning/comments/px3hzd/d_has_the_resnet_hypothesis_been_debunked/
The advantage of ResNets are most probably not the erasure of vanishing gradients but a smoothing of the loss landscape.
kross00 t1_jczd3i2 wrote
Reply to comment by Civil_Collection7267 in [D] Best ChatBot that can be run locally? by rustymonster2000
I’m having a hard time understanding what LoRA is and why it makes the 7B model better? I thought it only improves hardware requirements, but it also improves model coherency? This is all new for me
lucidraisin t1_jczarq8 wrote
Reply to comment by antonb90 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
that isn't for decoders. encoder only, and still needs to be verified. the majority of research paper never work out on closer examination. just trust me, stick with flash attention for now until further notice and save yourself a lot of headache
antonb90 t1_jczajd1 wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Things are improving fast.
>COLT5 is better at any speed. For 16k input length, COLT5 matches or exceeds LONGT5 quality for Large and XL with 35-75% training speedup and 50-100% inference speedup on top of the order-of-magnitude inference speedup from MQA. Encoder speedups are even greater (Appendix D). COLT5-XL also achieves SOTA performance on the SCROLLS benchmark
​
>COLT5 achieves both stronger performance and faster inference speed at all input lengths and is able to effectively make use of extremely long inputs. We note that COLT5 achieves large quality gains by going from 32k to 64k tokens even while keeping the number of routed tokens constant, providing more evidence for our hypothesis.
Google's new COLT5 64k,
lmericle t1_jcz5z92 wrote
Reply to comment by alfredr in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr
People get mad when you call LLMs what they are. It will pass, as with all things.
alfredr OP t1_jcz3keg wrote
Reply to comment by lmericle in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr
I understand that it’s about LLMs and that it is not comprehensive — also that the site author has (perhaps questionably) embedded some of their own work in the list. That said, it does otherwise appear to be a list of influential papers representing a current major thrust.
I did not downvote you, btw
edjez t1_jcyz2nu wrote
Reply to comment by SmackMyPitchHup in [P] TherapistGPT by SmackMyPitchHup
Curious- what is it using? OpenAI apis or Azure OpenAI ?
[deleted] t1_jcyy8hy wrote
Reply to comment by fromnighttilldawn in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr
[deleted]
lmericle t1_jcyxiex wrote
Reply to comment by alfredr in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr
Well, no, it isn't. You are looking for machine learning research. That list is only about LLMs, a very specific and over-hyped sub-sub-application of ML techniques.
If all you want is to attach yourself to the hype cycle, then that link still won't be enough, but at least it's a start.
Blutorangensaft OP t1_jcywm46 wrote
Reply to comment by IntelArtiGen in [D]: Vanishing Gradients and Resnets by Blutorangensaft
I don't. I have heard of using layer norm for RNNs, but I am unfamiliar with instance norm. Will look into it, thank you.
IntelArtiGen t1_jcyqmdu wrote
Reply to [D]: Vanishing Gradients and Resnets by Blutorangensaft
Do you use another kind of normalization? You can try InstanceNorm / LayerNorm if you can't use batchnorm.
Papaya_lawrence t1_jcyqbyw wrote
Reply to [D] Simple Questions Thread by AutoModerator
I will be teaching a class of about 18 students. Each student will need to train their own StyleGAN2 model towards the end of the semester and I'm trying to figure out which platform I want them to use. These students will be coming from different disciplines and so ideally we'd use something like Google Colab because then they could easily work off of my code, avoid learning how to ssh into a virtual machine, using bash commands, etc. And for context, this is not a technical course so I'm more concerned with ease of use than having a detailed introduction to using a virtual/remote machine. The other parts of this course involve more reading & discussion on the history of Generative Art. So I see training their own model as a chance to bring in a hands-on approach to thinking with and about Machine Learning in a creative context. I can propose a budget to my institution so it is possible that I use a paid platform (although logistically, it may be more difficult to figure out how to allocate funds to different accounts). I've looked at Paperspace's Gradient tool as well. I know apps like RunwayML would allow students to train a model code-free, but my concern is that Runway uses transfer learning and I kind of want them to only train the model on their own data that they've collected. I'm curious if any of you have suggestions or anecdotes from your own personal experience using different platforms. Thanks in advance!
trnka t1_jcyped6 wrote
Reply to comment by disastorm in [D] Simple Questions Thread by AutoModerator
Some systems output the most probable token in each context, so those will be consistent given a prompt. Traditionally that could lead to very generic responses.
So it's common to add a bit of randomness into it. The simplest approach is to generate tokens according to their probability. There are many other variations on this to allow more control over how "creative" the generator can be.
michaelthwan_ai OP t1_jcyo94y wrote
Reply to [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai
Added an "examples of prompts" on the top for showcases!
blevlabs t1_jcyj9dc wrote
I think that Cosmo-XL has to be one of the best dialogue-focused models available that is really lightweight
derek_ml t1_jcyin91 wrote
Reply to comment by mrcet007 in [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai
Pros:
- The code and the app are closer, its easier for users to duplicate, explore, make issues/prs etc.
- Its easier to discover given there is a large community there
- Deployment is easier
Cons:
- Github is a bit more advanced for prs/issues etc
- Heroku is probably more configurable
save_the_panda_bears t1_jcya2fm wrote
Reply to [P] TherapistGPT by SmackMyPitchHup
I’m going to be honest, this is a truly terrible idea. Do you have any idea the potential legal exposure you have with this product? If you’re serious about pursuing this, take the site offline and call a lawyer right now.
pkuba208 t1_jcy83gf wrote
Reply to comment by Art10001 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
I use swap too. For now, it can only run on flagships tho. You have to have at least 8gb of ram, because running it directly on let's say 3gb(3gb used by system) ram and 3-5gb SWAP may not even be possible and if it is, then it will be very slow and prone to crashing
Art10001 t1_jcy7rqs wrote
Reply to comment by pkuba208 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Yes, that's why it was tried in a Pixel 7 which has 8 GB of RAM and maybe even swap.
pkuba208 t1_jcy7nxg wrote
Reply to comment by 1stuserhere in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Should be faster than 1 word per second. Judging by the fact, that modern PC's run it at 5 words per second and a raspberry pi 4b runs it at 1 word per second, it should run somewhere near the 2.5 words per second mark
2muchnet42day t1_jczj8da wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
>I'm gonna end up buying a bunch of 24GB 3090s at this rate.
Better hurry up...