light24bulbs
light24bulbs t1_jdoxig4 wrote
Your brain has 86 billion neurons. They're very expensive to your body to run.
You need them to be as smart as you are.
Edit: nevermind, made a false equivalence. This blog is a good explanation of how many "parameters" our brain uses for language. https://www.beren.io/2022-08-06-The-scale-of-the-brain-vs-machine-learning/
light24bulbs t1_jdntdbb wrote
Reply to comment by baffo32 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
I'm not hoping to do instruction tuning, i want to do additional pre-training.
light24bulbs t1_jdmad5n wrote
Reply to comment by machineko in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Hey, I've been looking at this more and it's very cool. One thing I REALLY like is that I see see self-training using dataset generation on your roadmap. This is essentially the technique that Facebook used to train ToolFormer, if I'm reading their paper correctly.
I'd really love to use your library to try to reimplement toolformers approach someday.
light24bulbs t1_jdm413r wrote
Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
This is an insane way to communicate knowledge.
light24bulbs t1_jdm04sx wrote
Are those it? Surely there's a bunch more notable open source ones?
light24bulbs t1_jdlrnll wrote
Reply to comment by elbiot in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
That's cool, that's exactly what I want to do. I'm hunting around for a ready-made pipeline to do that on top of a good open source model.
light24bulbs t1_jdks13d wrote
Reply to comment by machineko in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Question: i notice there's a focus here on fine tuning for instruction following, which is clearly different from the main training where the LLM just reads stuff and tries to predict the next word.
Is there any easy way to continue that bulk part of the training with some additional data? Everyone seems to be trying to get there with injecting embedding chunk text into prompts (my team included) but that approach just stinks for a lot of uses.
light24bulbs t1_jditg4b wrote
Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501
Nope, I'm struggling along with you on that I'm afraid. That's why these new plugins will be nice.
Maybe we can make some money selling premium feature access to ours once we get it
light24bulbs t1_jdijmr3 wrote
Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501
My strategy was to have the outer LLM make a JSON object where one of the args is an instruction or question, and then pass that to the inner LLM wrapped in a template like "given the following document, <instruction>"
Works for a fair few general cases and it can get the context that ends up in the outer LLM down to a few sentences aka few tokens, meaning there's plenty of room for more reasoning and cost savings
light24bulbs t1_jdi5qau wrote
Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501
That's what I'm doing. Using 3.5 to take big documents and search them for answers, and then 4 to do the overall reasoning.
It's very possible. You can have gpt4 writing prompts to gpt 3.5 telling it to do things
light24bulbs t1_jdfd2yz wrote
Reply to comment by sebzim4500 in [N] ChatGPT plugins by Singularian2501
Oh, yeah, understanding what the tools do isn't the problem.
The thing changing its mind about how to fill out the prompt is the issue, forgetting the prompt altogether, etc. And then you have to have smarter and smarter regexs and..yeah. it's rough.
It's POSSIBLE to get it to work but it's a pain. And it introduces lots of round trips to their slow API and multiplies the token costs.
light24bulbs t1_jdecutq wrote
Reply to [N] ChatGPT plugins by Singularian2501
I've been using langchain but it screws up a lot no matter how good of a prompt you write. For those familiar, it's the same concept as this, in a loop, so more expensive. You can run multiple tools though (or let the model run multiple tools, that is)
Having all that pretraining about how to use "tools" built into the model (I'm 99% sure that's what they've done) will fix that problem really nicely.
light24bulbs t1_jcqco8i wrote
Reply to comment by Prymu in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Old reddit does too
light24bulbs t1_jc5e0zk wrote
Reply to comment by Lajamerr_Mittesdine in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
yeah theres definitely a threshold in there where its fast enough for human interaction. It's only an order of magnitude off, that's not too bad.
light24bulbs t1_jc2s2oc wrote
Reply to comment by Kinexity in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Oh, definitely, it's an amazing optimization.
But less than a token a second is going to be too slow for a lot of real time applications like human chat.
Still, very cool though
light24bulbs t1_jc0s4wr wrote
Reply to comment by Kinexity in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
That is slowwwww
light24bulbs t1_j94yae8 wrote
Reply to comment by [deleted] in Why are fevers cyclical? by Key-Marionberry-9854
Mmm..no, it's probably not a caloric constraint. Being hot is pretty damaging to the host as well.
Maybe you should source the calory thing?
light24bulbs t1_j6nom42 wrote
light24bulbs t1_isi9wop wrote
Reply to comment by reginald_burke in When it's said 99.9% of human DNA is the same in all humans, is this referring to only coding DNA or both coding and non-coding DNA combined? by PeanutSalsa
Truly that's just a count of the number of differing base pairs, which makes complete sense. This isn't that complicated. I'm sure you could argue it isn't the most RELEVANT figure that a geneticist would be concerned with, but, I think it's fair to say that's what they would take it to mean. I'd love to know if I'm wrong about that.
It's binary data, run a diff and give me the count. Since we are talking about the number 24, if there's 24 base pairs out of the total different, it's just total / 24 = variance ratio.
Likewise, the average is simply: take any two people, could the number of base pairs differing in each or present in one and not the other. Do that many times between different people, that's the average.
light24bulbs t1_jdrm9kh wrote
Reply to comment by baffo32 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
That's the part I wasn't getting. I assumed the fine tuning involved a different process. I see now that it is fact just more training data, often templated into a document in such a way that it's framed clearly for the LLM.
The confusing thing is that most of the LLM-as-a-service companies, Open-AI included, will ONLY take data in the question answer format, as if that's the only data you'd want to use to fine tune.
What if i want to feed a book in so we can talk about the book? A set of legal documents? Documentation of my project? Transcriptions of TV shows?
There are so many use cases for training on top of an already pre-trained LLM that aren't just question answering.
I'm into training llama now. I simply took some training code i found, removed the JSON parsing question answer templating stuff, and done.