light24bulbs

light24bulbs t1_jdrm9kh wrote

That's the part I wasn't getting. I assumed the fine tuning involved a different process. I see now that it is fact just more training data, often templated into a document in such a way that it's framed clearly for the LLM.

The confusing thing is that most of the LLM-as-a-service companies, Open-AI included, will ONLY take data in the question answer format, as if that's the only data you'd want to use to fine tune.

What if i want to feed a book in so we can talk about the book? A set of legal documents? Documentation of my project? Transcriptions of TV shows?

There are so many use cases for training on top of an already pre-trained LLM that aren't just question answering.

I'm into training llama now. I simply took some training code i found, removed the JSON parsing question answer templating stuff, and done.

1

light24bulbs t1_jdmad5n wrote

Hey, I've been looking at this more and it's very cool. One thing I REALLY like is that I see see self-training using dataset generation on your roadmap. This is essentially the technique that Facebook used to train ToolFormer, if I'm reading their paper correctly.

I'd really love to use your library to try to reimplement toolformers approach someday.

2

light24bulbs t1_jdks13d wrote

Question: i notice there's a focus here on fine tuning for instruction following, which is clearly different from the main training where the LLM just reads stuff and tries to predict the next word.

Is there any easy way to continue that bulk part of the training with some additional data? Everyone seems to be trying to get there with injecting embedding chunk text into prompts (my team included) but that approach just stinks for a lot of uses.

8

light24bulbs t1_jdijmr3 wrote

Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501

My strategy was to have the outer LLM make a JSON object where one of the args is an instruction or question, and then pass that to the inner LLM wrapped in a template like "given the following document, <instruction>"

Works for a fair few general cases and it can get the context that ends up in the outer LLM down to a few sentences aka few tokens, meaning there's plenty of room for more reasoning and cost savings

1

light24bulbs t1_jdi5qau wrote

Reply to comment by TFenrir in [N] ChatGPT plugins by Singularian2501

That's what I'm doing. Using 3.5 to take big documents and search them for answers, and then 4 to do the overall reasoning.

It's very possible. You can have gpt4 writing prompts to gpt 3.5 telling it to do things

3

light24bulbs t1_jdfd2yz wrote

Reply to comment by sebzim4500 in [N] ChatGPT plugins by Singularian2501

Oh, yeah, understanding what the tools do isn't the problem.

The thing changing its mind about how to fill out the prompt is the issue, forgetting the prompt altogether, etc. And then you have to have smarter and smarter regexs and..yeah. it's rough.

It's POSSIBLE to get it to work but it's a pain. And it introduces lots of round trips to their slow API and multiplies the token costs.

4

light24bulbs t1_jdecutq wrote

I've been using langchain but it screws up a lot no matter how good of a prompt you write. For those familiar, it's the same concept as this, in a loop, so more expensive. You can run multiple tools though (or let the model run multiple tools, that is)

Having all that pretraining about how to use "tools" built into the model (I'm 99% sure that's what they've done) will fix that problem really nicely.

15

light24bulbs t1_isi9wop wrote

Truly that's just a count of the number of differing base pairs, which makes complete sense. This isn't that complicated. I'm sure you could argue it isn't the most RELEVANT figure that a geneticist would be concerned with, but, I think it's fair to say that's what they would take it to mean. I'd love to know if I'm wrong about that.

It's binary data, run a diff and give me the count. Since we are talking about the number 24, if there's 24 base pairs out of the total different, it's just total / 24 = variance ratio.

Likewise, the average is simply: take any two people, could the number of base pairs differing in each or present in one and not the other. Do that many times between different people, that's the average.

0