Recent comments in /f/MachineLearning

light24bulbs t1_je7mr9p wrote

False, they all do. The process of fine-tuning is identical to the initial pre-training, though perhaps with different settings. They're mostly setup to take q&a data for getting llama to take instructions better, but actually that's just text wrapped in some context and passed in straight up.

I was very confused by this as well but no, you can train new stuff.

−1

light24bulbs t1_je7ilvq wrote

I disagree that it's not viable to train a model. The problem is that the best public model (llama) is non-commercial.

That said, it's extremely possible to train things into it. There's a ton of new scripts floating around online. The Lora training is especially good.

The trouble with vectors is they are so limited. They're fine if you need to look up one distinct thing (and the vector gets the match right) but they're utterly useless if you'd like the model to learn about something in general.

9

bgighjigftuik t1_je7if31 wrote

I told GPT-4 to write the code for a small 4-wheeled robot to act as a Roomba-like device. It wrote the MicroPython code for doing so (I did not know that the project existed). Bought the board (I was using Arduino), re-hooked everything together, and got it to work as expected on the second try. It even created what I believe it is a kind of memory module for long-term storage of my dorm's shape, so the robot has memorized and optimized the cleaning routes on itself.

Not bad for 3 mins of prompting

2

gmork_13 t1_je7h3rc wrote

Having started with TF and moved to torch myself, torch was just easier to work with when doing something a bit out of the ordinary. Since then it has gained in popularity and with popularity comes lots of walkthroughs, documentation, videos guides and research papers with github repos.

1

t_minus_1 t1_je7gsc7 wrote

Please look at sketch and langchain pandas/SQL plugins. I have seen excellent results with both of these approaches. Both of these approaches will require you to send metadata to openAI.

If you would like to do it yourself may be start with GPT-J / LORA and use the same instruction approach/fine tuning like databricks did .

9

gmork_13 t1_je7fmm8 wrote

I'm assuming you don't mean missing values in your dataset.

  1. You can create 'missing' data, but if you create the missing data out of the data you already give to the model you're sort of doing the work for it. For compute efficient reasons you might want to avoid giving it 'unnecessary' data. What is unnecessary can be hard to define. Think about what you want the model to grasp in the first place.

  2. I'm not sure what you mean by performing a test. If you were to train a language model the context of the word would define its meaning. You can always take the output probs of a model and do something with that if you'd like (for instance, if it's lots of low probability alternatives - do something).

1

currentscurrents OP t1_je7faup wrote

This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.

I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.

17

athos45678 t1_je7ercw wrote

Train a Llama LoRa model. The 30 b model isn’t too expensive to tune (40 bucks ish), and is ridiculously capable.

You just need to format the data in a long text doc with each prompt separated by two line breaks. I found it worked best in the alpaca style, where you have a single line break after the prompt, like “write a function that sorts this table in python def sort():” followed by the written out code, and then the double line break to signal the start of the next input.

Then use the simple-llama trainer app to make it all easy.

3

planetofthemapes15 t1_je7efmz wrote

This should basically disqualify it IMO, thanks for bringing up that point

Edit: There are other suggestions that GPT-4 has abstract understanding. This paper is based on data collected before the release GPT-4 or even GPT-3.5 (October 22). For those drive-by downvoting my comment, explain why this paper is valuable in the face of contrary evidence such as that in https://arxiv.org/abs/2303.12712 which is actually based on the bleeding-edge technology which has generated all the recent interest in LLM's.

−11