2muchnet42day t1_jczj8da wrote on March 20, 2023 at 7:16 PM

I’m having a hard time understanding what LoRA is and why it makes the 7B model better? I thought it only improves hardware requirements, but it also improves model coherency? This is all new for me

lucidraisin t1_jczarq8 wrote on March 20, 2023 at 6:22 PM

Reply to comment by antonb90 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

that isn't for decoders. encoder only, and still needs to be verified. the majority of research paper never work out on closer examination. just trust me, stick with flash attention for now until further notice and save yourself a lot of headache

antonb90 t1_jczajd1 wrote on March 20, 2023 at 6:20 PM

Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Things are improving fast.

>COLT5 is better at any speed. For 16k input length, COLT5 matches or exceeds LONGT5 quality for Large and XL with 35-75% training speedup and 50-100% inference speedup on top of the order-of-magnitude inference speedup from MQA. Encoder speedups are even greater (Appendix D). COLT5-XL also achieves SOTA performance on the SCROLLS benchmark

>COLT5 achieves both stronger performance and faster inference speed at all input lengths and is able to effectively make use of extremely long inputs. We note that COLT5 achieves large quality gains by going from 32k to 64k tokens even while keeping the number of routed tokens constant, providing more evidence for our hypothesis.

Google's new COLT5 64k,

https://arxiv.org/abs/2303.09752

lmericle t1_jcz5z92 wrote on March 20, 2023 at 5:51 PM

Reply to comment by alfredr in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr

People get mad when you call LLMs what they are. It will pass, as with all things.

alfredr OP t1_jcz3keg wrote on March 20, 2023 at 5:35 PM

Reply to comment by lmericle in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr

I understand that it’s about LLMs and that it is not comprehensive — also that the site author has (perhaps questionably) embedded some of their own work in the list. That said, it does otherwise appear to be a list of influential papers representing a current major thrust.

I did not downvote you, btw

edjez t1_jcyz2nu wrote on March 20, 2023 at 5:07 PM

Reply to comment by SmackMyPitchHup in [P] TherapistGPT by SmackMyPitchHup

Curious- what is it using? OpenAI apis or Azure OpenAI ?

[deleted] t1_jcyy8hy wrote on March 20, 2023 at 5:01 PM

Reply to comment by fromnighttilldawn in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr

[deleted]

[deleted] t1_jcyxnc3 wrote on March 20, 2023 at 4:57 PM

Reply to comment by evangelion-unit-two in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152

[removed]

lmericle t1_jcyxiex wrote on March 20, 2023 at 4:57 PM

Reply to comment by alfredr in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr

Well, no, it isn't. You are looking for machine learning research. That list is only about LLMs, a very specific and over-hyped sub-sub-application of ML techniques.

If all you want is to attach yourself to the hype cycle, then that link still won't be enough, but at least it's a start.

Blutorangensaft OP t1_jcywm46 wrote on March 20, 2023 at 4:51 PM

Reply to comment by IntelArtiGen in [D]: Vanishing Gradients and Resnets by Blutorangensaft

I don't. I have heard of using layer norm for RNNs, but I am unfamiliar with instance norm. Will look into it, thank you.

IntelArtiGen t1_jcyqmdu wrote on March 20, 2023 at 4:12 PM

Reply to [D]: Vanishing Gradients and Resnets by Blutorangensaft

Do you use another kind of normalization? You can try InstanceNorm / LayerNorm if you can't use batchnorm.

Papaya_lawrence t1_jcyqbyw wrote on March 20, 2023 at 4:10 PM

Reply to [D] Simple Questions Thread by AutoModerator

I will be teaching a class of about 18 students. Each student will need to train their own StyleGAN2 model towards the end of the semester and I'm trying to figure out which platform I want them to use. These students will be coming from different disciplines and so ideally we'd use something like Google Colab because then they could easily work off of my code, avoid learning how to ssh into a virtual machine, using bash commands, etc. And for context, this is not a technical course so I'm more concerned with ease of use than having a detailed introduction to using a virtual/remote machine. The other parts of this course involve more reading & discussion on the history of Generative Art. So I see training their own model as a chance to bring in a hands-on approach to thinking with and about Machine Learning in a creative context. I can propose a budget to my institution so it is possible that I use a paid platform (although logistically, it may be more difficult to figure out how to allocate funds to different accounts). I've looked at Paperspace's Gradient tool as well. I know apps like RunwayML would allow students to train a model code-free, but my concern is that Runway uses transfer learning and I kind of want them to only train the model on their own data that they've collected. I'm curious if any of you have suggestions or anecdotes from your own personal experience using different platforms. Thanks in advance!

trnka t1_jcyped6 wrote on March 20, 2023 at 4:04 PM

Reply to comment by disastorm in [D] Simple Questions Thread by AutoModerator

Some systems output the most probable token in each context, so those will be consistent given a prompt. Traditionally that could lead to very generic responses.

So it's common to add a bit of randomness into it. The simplest approach is to generate tokens according to their probability. There are many other variations on this to allow more control over how "creative" the generator can be.

michaelthwan_ai OP t1_jcyo94y wrote on March 20, 2023 at 3:57 PM

Reply to [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai

Added an "examples of prompts" on the top for showcases!

blevlabs t1_jcyj9dc wrote on March 20, 2023 at 3:23 PM

Reply to [D] Best ChatBot that can be run locally? by rustymonster2000

I think that Cosmo-XL has to be one of the best dialogue-focused models available that is really lightweight

derek_ml t1_jcyin91 wrote on March 20, 2023 at 3:19 PM

Reply to comment by mrcet007 in [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai

Pros:

The code and the app are closer, its easier for users to duplicate, explore, make issues/prs etc.
Its easier to discover given there is a large community there
Deployment is easier

Cons:

Github is a bit more advanced for prs/issues etc
Heroku is probably more configurable

save_the_panda_bears t1_jcya2fm wrote on March 20, 2023 at 2:13 PM

Reply to [P] TherapistGPT by SmackMyPitchHup

I’m going to be honest, this is a truly terrible idea. Do you have any idea the potential legal exposure you have with this product? If you’re serious about pursuing this, take the site offline and call a lawyer right now.

pkuba208 t1_jcy83gf wrote on March 20, 2023 at 1:59 PM

Reply to comment by Art10001 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

I use swap too. For now, it can only run on flagships tho. You have to have at least 8gb of ram, because running it directly on let's say 3gb(3gb used by system) ram and 3-5gb SWAP may not even be possible and if it is, then it will be very slow and prone to crashing

Art10001 t1_jcy7rqs wrote on March 20, 2023 at 1:56 PM

Reply to comment by pkuba208 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

Yes, that's why it was tried in a Pixel 7 which has 8 GB of RAM and maybe even swap.

pkuba208 t1_jcy7nxg wrote on March 20, 2023 at 1:55 PM

Reply to comment by 1stuserhere in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

Should be faster than 1 word per second. Judging by the fact, that modern PC's run it at 5 words per second and a raspberry pi 4b runs it at 1 word per second, it should run somewhere near the 2.5 words per second mark

Recent comments in /f/MachineLearning