Recent comments in /f/MachineLearning
ginger_beer_m t1_jdm6xfe wrote
Reply to comment by visarga in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
This will kill so many smaller startups that do bespoke fine-tuned models as their core business.
ephemeralentity t1_jdm6wkc wrote
Reply to comment by machineko in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Playing around with this. Running BaseModel.create("llama_lora") seems to return "Killed". I'm running it on WSL2 from Windows 11 so I'm not sure if that could be the issue. Running on my RTX 3070 with only 8GB VRAM so maybe that's the issue ...
EDIT - Side note, I first tried directly on Windows 11 but it seems deepspeed dependency is not fully supported: https://github.com/microsoft/DeepSpeed/issues/1769
WonderFactory t1_jdm4pk1 wrote
Reply to comment by Blacky372 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
How long though before LLMs perform at the same level as experts in a most fields? A year, two, three? When you get to that point you can generate synthetic data that's the same quality as human produced data. The Reflexion paper mentioned in another thread claims that giving GPT 4 the ability to test the output of its code produces expert level coding performance. This output could be used to train an open source model.
MjrK t1_jdm4ola wrote
Reply to comment by modcowboy in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
For many (perhaps these days, most) use cases, absolutely! The advantage of vision in some others might be interacting more directly with the browser itself, as well as other applications, and multi-tasking... perhaps similar to the way we use PCs and mobile devices to accomplish more complex tasks
Disastrous_Elk_6375 t1_jdm4h39 wrote
Reply to comment by wojtek15 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
> I have seen many inaccurate claims, e.g. LLaMa-7B with Alpaca being as capable as ChatGPT
I believe you might have misunderstood the claims in Alpaca. They never stated it is as capable as ChatGPT, they found (and you can confirm this yourself) that it accurately replicates the instruction tuning. That is, for most of the areas in the fine-tuning set, a smaller model will output in the same style of davinci. And that's an amazing progress from the raw outputs of the raw models.
heuboi t1_jdm4h0t wrote
Reply to comment by DarkTarantino in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Powerpoint engineer
light24bulbs t1_jdm413r wrote
Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
This is an insane way to communicate knowledge.
fv42622 t1_jdm3vtm wrote
michaelthwan_ai OP t1_jdm3v2y wrote
Reply to comment by light24bulbs in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Please suggest so.
michaelthwan_ai OP t1_jdm3sgb wrote
Reply to comment by wywywywy in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Agreed
_Repeats_ t1_jdm3h7a wrote
For enterprise use cases, you might need only a small model in the 1-3 billion range that answers specific queries. For general knowledge, it remains to be seen how big or small you can retrain them.
harharveryfunny t1_jdm3bm4 wrote
It seems most current models don't need the number of parameters that they have. DeepMind did a study on model size vs number of training tokens and concluded that for each doubling of number of parameters the number of training tokens also needs to double, and that a model like GPT-3, trained on 300B tokens would really need to be trained on 3.7T tokens (a 10x increase) to take advantage of it's size.
To prove their scaling law, DeepMind built the 70B params Chinchilla model, and trained it on the predicted optimal 1.4T (!) tokens, and found it to outperform GPT-3.
[deleted] t1_jdm2vq5 wrote
Reply to comment by kromem in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
[removed]
WonderFactory t1_jdm1slk wrote
Reply to comment by brucebay in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
We don't understand how it works. We understand how it's trained but we don't really understand the result of the training and exactly how it arrives at a particular output. The trained model is an incredibly complex system.
badabummbadabing t1_jdm1poy wrote
Well, if you apply all of those tricks that these smaller models perform (to get decent performance) AND increase the parameter count, can you get an even better model? Who knows, "Open"AI might already apply these.
The question is not: "Do fewer than 100B parameters suffice to get a model that performs 'reasonably' for a March 2023 observer?"
Chinchilla scaling rules tell us some upper bounds to the number of parameters that we can expect to still yield an improvement given the amount of available training data (PaLM is too big for instance), but even that only tells us half of the story: How good can our models get, if we make do with sub-optimal training efficiency (see LLaMA)? What is the influence of data quality/type? What if we train (gasp) multiple epochs with the same training set?
addandsubtract t1_jdm1d9h wrote
Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Ok, no worries. I'm just glad there's a map to guide the madness going on, atm. Adding legacy models would be good for people who come across them now, to know that they are legacy.
[deleted] t1_jdm19ct wrote
Reply to [D] Simple Questions Thread by AutoModerator
[deleted]
wywywywy t1_jdm16va wrote
Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
In my opinion, it'd be better to include only the currently relevant ones rather than everything under the sun.
Too much noise makes the chart less useful.
jabowery t1_jdm16ig wrote
Algorithmic information theory: Smallest model that memorizes all the data is optimal. "Large" is only there because of the need to expand in order to compress. Think decompress gz in order to compress with bz2. Countering over-fitting with over-informing (bigger data) yields interpolation, sacrificing extrapolation.
If you understand all of the above you'll be light years beyond the current ML industry including the political/religious bias of "algorithmic bias experts".
sneakpeekbot t1_jdm0yoj wrote
Reply to comment by wywywywy in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Here's a sneak peek of /r/OpenAssistant using the top posts of all time!
#1: the default UI on the pinned Google Colab is buggy so I made my own frontend - YAFFOA. | 27 comments
#2: Progress Update | 4 comments
#3: Paper reduces resource requirement of a 175B model down to 16GB GPU | 19 comments
^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^Contact ^^| ^^Info ^^| ^^Opt-out ^^| ^^GitHub
wywywywy t1_jdm0xwo wrote
Reply to comment by __Maximum__ in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
/r/OpenAssistant
Nyanraltotlapun t1_jdm0r15 wrote
Reply to comment by brucebay in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
>This is not an alien intelligence yet. We understand how it works how it thinks.
Its alien not because we don't understand It, but because It is not protein life form. It have nothing common with humans, It does not feel hunger, does not need sex, does not feel love or pain. It is metal plastic and silicone. It is something completely nonhuman that can think and reason. It is the true horror, wont you see?
>We understand how it works how it thinks
Sort of partially. And also, it is false to assume in general. Long story short, main property of complex systems is the ability to pretend and mimic. You cannot properly study something that can pretend and mimic.
sdmat t1_jdm0pmi wrote
Reply to comment by visarga in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
> It's like a new starting line and we don't know what human skills will be valuable in the future.
With each passing day, the creature stirs, growing hungrier and more restless. The ground trembles beneath our feet, but we dismiss the warning signs.
Text above naturally written by GPT4.
Maybe we should start flipping the assumption - why would you want a human if inexpensive and dependable AI competence is the default?
atheist-projector t1_jdm7hw1 wrote
Reply to comment by soggy_mattress in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Especialy when considr that sgd is a local minima we can probably do a whole lot better if we find a niced optimizer.