SWESWESWEh t1_jdolo78 wrote on March 25, 2023 at 11:43 PM

Reply to comment by machineko in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

M1 macbook pro

biggieshiba t1_jdojnn6 wrote on March 25, 2023 at 11:28 PM

Reply to comment by hangtime79 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

I don't understand why anyone would care, in a few years half the internet will be ai generated. If someone uses GPT-4 to generate a sentence posted on Wikipedia how will you know before using it ? Don't you think many models will use that sentence?

Plus, how will they know, training data is not easy to extract from a model. Except if you are a direct OpenAI competitor they won't ever care or even look at you (well maybe their superAI will).

Lastly the dataset is full of errors, better generate again or even pay people would be quite cheap for 50k examples. This is quite a bad dataset when you really look at it, empty inputs or outputs, unclear instructions, instructions not fit for model... The fact that it is bad and small is very encouraging BTW since it performs pretty well.

blose1 t1_jdoj8kl wrote on March 25, 2023 at 11:25 PM

Reply to comment by WonderFactory in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

GPT models struggle with out of distribution programming tasks, which means it can't create novel ideas, I tested this myself many times and it's not a prompt engineering issue. I think LLMs could act as great teachers but not researchers, teachers just teach what we already know, researchers create novel knowledge that teachers use.

ILOVETOCONBANDITS t1_jdoj5e7 wrote on March 25, 2023 at 11:24 PM

Reply to comment by nokpil in [D] ICML 2023 Reviewer-Author Discussion by zy415

I similarly had a reviewer raise their score from a 3 to a 6!

lanky_cowriter t1_jdoeyi4 wrote on March 25, 2023 at 10:52 PM

Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai

Sam talked about this on the Lex Friedman podcast, it's not true

big_ol_tender t1_jdoe9k1 wrote on March 25, 2023 at 10:47 PM

Reply to [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

Alpaca dataset is not open source so alpaca-lora is not open source.

[deleted] t1_jdoe3wi wrote on March 25, 2023 at 10:46 PM

Reply to comment by Zealousideal_Low1287 in [D] Do you use a website or program to organise and annotate your papers? by who_here_condemns_me

[removed]

itshouldjustglide t1_jdoazux wrote on March 25, 2023 at 10:22 PM

Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Don't bigger models need more data so that all of the neurons can be trained so as to reduce unnecessary noise and randomness?

frequenttimetraveler t1_jdo9gw5 wrote on March 25, 2023 at 10:10 PM

Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai

Altman did not say anything about that in Lex Fridman show. He said the 100T rumor was just a meme

How would run time scale with parameter size? Can we infer if 1T is true from the latency of the responses?

KarlKani44 t1_jdo96om wrote on March 25, 2023 at 10:08 PM

Reply to comment by snylekkie in [D]: Different data normalisation schemes for GANs by Blutorangensaft

Thanks! :) I work as machine learning engineer since around 2 years at quite a big company (not FAANG). There's a good chance that you have used models that I trained! I also did my masters degree mostly in the field of image generation

LahmacunBear t1_jdo7k0w wrote on March 25, 2023 at 9:55 PM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Here’s a thought — 175B in GPT3 original, the best stuff thrown at it, performs as it did. ChatGPT training tricks, suddenly same size performs magnitudes better. I doubt that currently the LLMs are fully efficient, i.e. just as with GPT3 to 3.5, with the same size we can continue to get much better results, and therefore current results with much smaller models.

kitmiauham t1_jdo5qst wrote on March 25, 2023 at 9:42 PM

Reply to comment by Maleficent_Refuse_11 in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII

I think the authors of the paper have more than just a basic understanding of how transformers work.

snylekkie t1_jdo5afc wrote on March 25, 2023 at 9:38 PM

Reply to comment by Jean-Porte in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Absolutely mental

snylekkie t1_jdo56fy wrote on March 25, 2023 at 9:38 PM

Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft

You seem knowledgeable. Do you work in ML ?

londons_explorer t1_jdo4kj3 wrote on March 25, 2023 at 9:33 PM

Reply to comment by farmingvillein in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Think how many hard drives there are in the world...

All of that data is potential training material.

I think a lot of companies/individuals might give up 'private' data in bulk for ML training if they get a viable benefit from it (for example, having a version of ChatGPT with perfect knowledge of all my friends and neighbours, what they like and do, etc. would be handy)

baffo32 t1_jdo24su wrote on March 25, 2023 at 9:15 PM

Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

It is the same thing. The alpaca data is just further pretraining data consisting of instructions and responses. Doing this is called finetuning.

farmingvillein t1_jdo16sz wrote on March 25, 2023 at 9:08 PM

Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

> This 17 page could be a few sentences.

> Tl;DR the authors wrote prompts to tell GPT-4 to fix code given some unit tests and the output of the broken code. It performs better than GPT-4 that doesn't have access to the output of the code execution.

I agree with your overall sentiment--the paper IMO could be, in the very least, substantially re-organized for clarity--but your summary isn't actually accurate, since the paper itself has nothing to do with coding(!).

The coding work is all in their blog post...

...which also suffers from the same issue: a long preamble to scroll down and find the core nugget.

_underlines_ t1_jdo0o85 wrote on March 25, 2023 at 9:04 PM

Reply to [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

nice one.

i would add the natively trained alpaca models, which exist besides alpaca-lora. see my model card for this:

https://github.com/underlines/awesome-marketing-datascience/blob/master/llama.md#3rd-party-llama-and-alpaca-models

and here's an overview of almost every LLM under the sun:

https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878

Blutorangensaft OP t1_jdnzzx1 wrote on March 25, 2023 at 8:59 PM

Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft

Love the visualisation, I will definitely do that. Thanks so much for answering all my questions.

KarlKani44 t1_jdnzo65 wrote on March 25, 2023 at 8:56 PM

Reply to comment by Blutorangensaft in [D]: Different data normalisation schemes for GANs by Blutorangensaft

>Plotting the scores the critic assigns for real and fake samples separately? Or do you mean taking mean and standard deviation of the logits for real and fake data and comparing those?

Both ot those work. I like to plot the critic output of real samples into a histogram and then do the same for generated samples. This shows you how well your critic does at separating real from fake samples. You can do this every few epochs during training. You should see that at early epochs those two histograms barely overlap and during the training they will get closer to each other.

It might look like this: https://imgur.com/a/OknV5l0

the left plot is at early training, the right is after some epochs when the critic partially converged. At the end they will overlap almost completely

Single_Blueberry t1_jdnyc2d wrote on March 25, 2023 at 8:46 PM

Reply to comment by MassiveIndependence8 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Hmm I don't know. It's pretty bad at getting dead-on accurate results, but in many cases the relative error of the result is pretty low.

Blutorangensaft OP t1_jdnxm94 wrote on March 25, 2023 at 8:41 PM

Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft

I see, I will improve my critic then (maybe give it more depth) and abstain from tricks like TTUR for now.

What do you mean with "easily seperable distribution of output logits" btw? Plotting the scores the critic assigns for real and fake samples separately? Or do you mean taking mean and standard deviation of the logits for real and fake data and comparing those?

sampdoria_supporter t1_jdnwwdd wrote on March 25, 2023 at 8:36 PM

Reply to [D] Simple Questions Thread by AutoModerator

Does anybody else feel overwhelmed and frozen in the face of all this concurrent development and releases? I can't seem to even jump on much of what is going on because it seems like the next day will just flip the table.

farmingvillein t1_jdnwda6 wrote on March 25, 2023 at 8:32 PM

Reply to comment by londons_explorer in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

> But apply those same tricks to a big model, and it works even better.

In general, yes, although there are many techniques that help small models that do not help large ones.

That said, agree with your overall point. I think the only reason we won't see model sizes continue to inflate is if 1) there are substantial underlying architecture discoveries (possible!) or 2) we really hit problems with data availability. But synthetic + multi-modal probably gives us a ways to go there.

[deleted] t1_jdnw01y wrote on March 25, 2023 at 8:29 PM

Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

[deleted]

Recent comments in /f/MachineLearning