light24bulbs t1_jdrm9kh wrote on March 26, 2023 at 5:23 PM

Reply to comment by baffo32 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

That's the part I wasn't getting. I assumed the fine tuning involved a different process. I see now that it is fact just more training data, often templated into a document in such a way that it's framed clearly for the LLM.

The confusing thing is that most of the LLM-as-a-service companies, Open-AI included, will ONLY take data in the question answer format, as if that's the only data you'd want to use to fine tune.

What if i want to feed a book in so we can talk about the book? A set of legal documents? Documentation of my project? Transcriptions of TV shows?

There are so many use cases for training on top of an already pre-trained LLM that aren't just question answering.

I'm into training llama now. I simply took some training code i found, removed the JSON parsing question answer templating stuff, and done.

light24bulbs t1_jdoxig4 wrote on March 26, 2023 at 1:13 AM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Your brain has 86 billion neurons. They're very expensive to your body to run.

You need them to be as smart as you are.

Edit: nevermind, made a false equivalence. This blog is a good explanation of how many "parameters" our brain uses for language. https://www.beren.io/2022-08-06-The-scale-of-the-brain-vs-machine-learning/

light24bulbs t1_jdntdbb wrote on March 25, 2023 at 8:10 PM

Reply to comment by baffo32 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

I'm not hoping to do instruction tuning, i want to do additional pre-training.

light24bulbs t1_jdmad5n wrote on March 25, 2023 at 1:29 PM

Reply to comment by machineko in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

Hey, I've been looking at this more and it's very cool. One thing I REALLY like is that I see see self-training using dataset generation on your roadmap. This is essentially the technique that Facebook used to train ToolFormer, if I'm reading their paper correctly.

I'd really love to use your library to try to reimplement toolformers approach someday.

light24bulbs t1_jdm413r wrote on March 25, 2023 at 12:31 PM

Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

This is an insane way to communicate knowledge.

light24bulbs t1_jdm04sx wrote on March 25, 2023 at 11:49 AM

Reply to [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

Are those it? Surely there's a bunch more notable open source ones?

light24bulbs t1_jdlrnll wrote on March 25, 2023 at 9:59 AM

Reply to comment by elbiot in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

That's cool, that's exactly what I want to do. I'm hunting around for a ready-made pipeline to do that on top of a good open source model.

light24bulbs t1_jdks13d wrote on March 25, 2023 at 2:45 AM

Reply to comment by machineko in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

Question: i notice there's a focus here on fine tuning for instruction following, which is clearly different from the main training where the LLM just reads stuff and tries to predict the next word.

Is there any easy way to continue that bulk part of the training with some additional data? Everyone seems to be trying to get there with injecting embedding chunk text into prompts (my team included) but that approach just stinks for a lot of uses.

light24bulbs t1_jditg4b wrote on March 24, 2023 at 6:18 PM

Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501

Nope, I'm struggling along with you on that I'm afraid. That's why these new plugins will be nice.

Maybe we can make some money selling premium feature access to ours once we get it

light24bulbs t1_jdijmr3 wrote on March 24, 2023 at 5:16 PM

Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501

My strategy was to have the outer LLM make a JSON object where one of the args is an instruction or question, and then pass that to the inner LLM wrapped in a template like "given the following document, <instruction>"

Works for a fair few general cases and it can get the context that ends up in the outer LLM down to a few sentences aka few tokens, meaning there's plenty of room for more reasoning and cost savings

light24bulbs t1_jdi5qau wrote on March 24, 2023 at 3:48 PM

Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501

That's what I'm doing. Using 3.5 to take big documents and search them for answers, and then 4 to do the overall reasoning.

It's very possible. You can have gpt4 writing prompts to gpt 3.5 telling it to do things

light24bulbs t1_jdfd2yz wrote on March 24, 2023 at 12:03 AM

Reply to comment by sebzim4500 in [N] ChatGPT plugins by Singularian2501

Oh, yeah, understanding what the tools do isn't the problem.

The thing changing its mind about how to fill out the prompt is the issue, forgetting the prompt altogether, etc. And then you have to have smarter and smarter regexs and..yeah. it's rough.

It's POSSIBLE to get it to work but it's a pain. And it introduces lots of round trips to their slow API and multiplies the token costs.

light24bulbs t1_jdecutq wrote on March 23, 2023 at 8:01 PM

Reply to [N] ChatGPT plugins by Singularian2501

I've been using langchain but it screws up a lot no matter how good of a prompt you write. For those familiar, it's the same concept as this, in a loop, so more expensive. You can run multiple tools though (or let the model run multiple tools, that is)

Having all that pretraining about how to use "tools" built into the model (I'm 99% sure that's what they've done) will fix that problem really nicely.

light24bulbs t1_jcqco8i wrote on March 18, 2023 at 7:11 PM

Reply to comment by Prymu in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

Old reddit does too

light24bulbs t1_jc5e0zk wrote on March 14, 2023 at 3:54 AM

Reply to comment by Lajamerr_Mittesdine in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

yeah theres definitely a threshold in there where its fast enough for human interaction. It's only an order of magnitude off, that's not too bad.

light24bulbs t1_jc2s2oc wrote on March 13, 2023 at 4:57 PM

Reply to comment by Kinexity in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Oh, definitely, it's an amazing optimization.

But less than a token a second is going to be too slow for a lot of real time applications like human chat.

Still, very cool though

light24bulbs t1_jc0s4wr wrote on March 13, 2023 at 5:00 AM

Reply to comment by Kinexity in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

That is slowwwww

light24bulbs t1_j94yae8 wrote on February 19, 2023 at 7:17 AM

Reply to comment by [deleted] in Why are fevers cyclical? by Key-Marionberry-9854

Mmm..no, it's probably not a caloric constraint. Being hot is pretty damaging to the host as well.

Maybe you should source the calory thing?

light24bulbs t1_j6nom42 wrote on January 31, 2023 at 4:56 PM

Reply to TIL about cargo cults, where indigenous people of small tropical islands would perform elaborate rituals to mimic air traffic control and marching patterns after witnessing airplanes drop supplies on airforce bases during world war II. by sciencedit

That is one bad Wikipedia article

light24bulbs t1_isi9wop wrote on October 16, 2022 at 4:51 AM

Reply to comment by reginald_burke in When it's said 99.9% of human DNA is the same in all humans, is this referring to only coding DNA or both coding and non-coding DNA combined? by PeanutSalsa

Truly that's just a count of the number of differing base pairs, which makes complete sense. This isn't that complicated. I'm sure you could argue it isn't the most RELEVANT figure that a geneticist would be concerned with, but, I think it's fair to say that's what they would take it to mean. I'd love to know if I'm wrong about that.

It's binary data, run a diff and give me the count. Since we are talking about the number 24, if there's 24 base pairs out of the total different, it's just total / 24 = variance ratio.

Likewise, the average is simply: take any two people, could the number of base pairs differing in each or present in one and not the other. Do that many times between different people, that's the average.