Recent comments in /f/MachineLearning

ephemeralentity t1_jdm6wkc wrote

Playing around with this. Running BaseModel.create("llama_lora") seems to return "Killed". I'm running it on WSL2 from Windows 11 so I'm not sure if that could be the issue. Running on my RTX 3070 with only 8GB VRAM so maybe that's the issue ...

EDIT - Side note, I first tried directly on Windows 11 but it seems deepspeed dependency is not fully supported: https://github.com/microsoft/DeepSpeed/issues/1769

2

WonderFactory t1_jdm4pk1 wrote

How long though before LLMs perform at the same level as experts in a most fields? A year, two, three? When you get to that point you can generate synthetic data that's the same quality as human produced data. The Reflexion paper mentioned in another thread claims that giving GPT 4 the ability to test the output of its code produces expert level coding performance. This output could be used to train an open source model.

6

MjrK t1_jdm4ola wrote

For many (perhaps these days, most) use cases, absolutely! The advantage of vision in some others might be interacting more directly with the browser itself, as well as other applications, and multi-tasking... perhaps similar to the way we use PCs and mobile devices to accomplish more complex tasks

2

Disastrous_Elk_6375 t1_jdm4h39 wrote

> I have seen many inaccurate claims, e.g. LLaMa-7B with Alpaca being as capable as ChatGPT

I believe you might have misunderstood the claims in Alpaca. They never stated it is as capable as ChatGPT, they found (and you can confirm this yourself) that it accurately replicates the instruction tuning. That is, for most of the areas in the fine-tuning set, a smaller model will output in the same style of davinci. And that's an amazing progress from the raw outputs of the raw models.

20

harharveryfunny t1_jdm3bm4 wrote

It seems most current models don't need the number of parameters that they have. DeepMind did a study on model size vs number of training tokens and concluded that for each doubling of number of parameters the number of training tokens also needs to double, and that a model like GPT-3, trained on 300B tokens would really need to be trained on 3.7T tokens (a 10x increase) to take advantage of it's size.

To prove their scaling law, DeepMind built the 70B params Chinchilla model, and trained it on the predicted optimal 1.4T (!) tokens, and found it to outperform GPT-3.

https://arxiv.org/abs/2203.15556

2

WonderFactory t1_jdm1slk wrote

We don't understand how it works. We understand how it's trained but we don't really understand the result of the training and exactly how it arrives at a particular output. The trained model is an incredibly complex system.

4

badabummbadabing t1_jdm1poy wrote

Well, if you apply all of those tricks that these smaller models perform (to get decent performance) AND increase the parameter count, can you get an even better model? Who knows, "Open"AI might already apply these.

The question is not: "Do fewer than 100B parameters suffice to get a model that performs 'reasonably' for a March 2023 observer?"

Chinchilla scaling rules tell us some upper bounds to the number of parameters that we can expect to still yield an improvement given the amount of available training data (PaLM is too big for instance), but even that only tells us half of the story: How good can our models get, if we make do with sub-optimal training efficiency (see LLaMA)? What is the influence of data quality/type? What if we train (gasp) multiple epochs with the same training set?

5

jabowery t1_jdm16ig wrote

Algorithmic information theory: Smallest model that memorizes all the data is optimal. "Large" is only there because of the need to expand in order to compress. Think decompress gz in order to compress with bz2. Countering over-fitting with over-informing (bigger data) yields interpolation, sacrificing extrapolation.

If you understand all of the above you'll be light years beyond the current ML industry including the political/religious bias of "algorithmic bias experts".

0

sneakpeekbot t1_jdm0yoj wrote

2

Nyanraltotlapun t1_jdm0r15 wrote

>This is not an alien intelligence yet. We understand how it works how it thinks.

Its alien not because we don't understand It, but because It is not protein life form. It have nothing common with humans, It does not feel hunger, does not need sex, does not feel love or pain. It is metal plastic and silicone. It is something completely nonhuman that can think and reason. It is the true horror, wont you see?

>We understand how it works how it thinks

Sort of partially. And also, it is false to assume in general. Long story short, main property of complex systems is the ability to pretend and mimic. You cannot properly study something that can pretend and mimic.

0

sdmat t1_jdm0pmi wrote

> It's like a new starting line and we don't know what human skills will be valuable in the future.

With each passing day, the creature stirs, growing hungrier and more restless. The ground trembles beneath our feet, but we dismiss the warning signs.

Text above naturally written by GPT4.

Maybe we should start flipping the assumption - why would you want a human if inexpensive and dependable AI competence is the default?

5