Submitted by Vegetable-Skill-9700 t3_121agx4 in deeplearning
BellyDancerUrgot t1_jdno8w6 wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
That paper is laughable and a meme. My twitter feed has been spammed by people tweeting about this paper and as someone in academia it’s sad to see the quality for research publications to be this low. I can’t believe I’m saying this as a student of Deep Learning but Gary Marcus on his latest blogpost is actually right.
StrippedSilicon t1_jdnukc7 wrote
People who point to this paper to claim sentience or AGI or whatever are obviously wrong, it's nothing of the sort. Still, saying that it's just memorizing is also very silly, given it can answer questions that aren't in the training data, or even particularly close to anything in the training data.
BellyDancerUrgot t1_jdpa0mz wrote
Tbf I think I went a bit too far when I said it has everything memorized. But it also has access to an internet worth of contextual information on basically everything that has ever existed. So even though it’s wrong to say it’s 100% memorized, it’s still just intelligently regurgitating information it has learnt with new context. Being able to re-contextualize information isn’t a small feat mind u. I think gpt is amazing just like I found the original diffusion paper and wgans to be. It’s Just really overhyped to be something it isn’t and fails quite spectacularly on logical and factual queries. Cites things that don’t exist, makes simple mistakes but solves more complex ones. Tell tale sign of the model lacking a fundamental understanding of the subject.
StrippedSilicon t1_jdrldvz wrote
Recontextualize information is not unfair, but I'm not sure that it really explains things like the example in 4.4 where it answers a math Olympiad question that there's no way was in the training set (assuming that they're being honest about the training set). I don't know how a model can arrive at the answer it does without some kind of deeper understanding than just putting existing information together in a different order. Maybe the most correct thing is simply to admit we don't really know what's going on since a 100 billion parameters, or however big gpt-4 is, is beyond a simple interpretation.
"Open"AI's recent turn to secrecy isn't helping things either.
BellyDancerUrgot t1_jds7iva wrote
The reason I say it’s a recontextualization and lacks deeper understanding is because it doesn’t hallucinate sometimes , it hallucinates all the time, sometimes the hallucinations align with reality that’s all. Take this thread for eg:
- 
https://twitter.com/ylecun/status/1639685628722806786?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q 
- 
https://twitter.com/stanislavfort/status/1639731204307005443?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q 
- 
https://twitter.com/phillipharr1s/status/1640029380670881793?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q 
A system that fully understood the underlying structure of the question would not give you varying answers with the same prompt.
Inconclusive is the third likeliest answer. Despite having a big bias toward the correct answer (keywords like dubious for eg) it still makes mistakes to a rather simple question. Sometimes it does get it right with the bias sometimes even without the bias.
Language imo lacks causality for intelligence since it’s a mere byproduct of intelligence. Which is why these models imo hallucinate all the time, and sometimes the hallucinations line up with reality and sometimes they don’t. The likelihood of the prior is just increased because of the huge train size.
StrippedSilicon t1_jdt7h5o wrote
So... how does it solve a complicated math problem it hasn't seen before exactly with only regurgitating information?
BellyDancerUrgot t1_jdtci38 wrote
Well let me ask you, how does it fail simple problems if it can solve more complex ones? If you solve these problems analytically then it stands to reason that you wouldn’t be making an error , ever, for a simple question as that.
StrippedSilicon t1_jdte8lj wrote
That's why I'm appealing to "we don't actually understand what it's doing" case. Certainly the AGI-like intelligence explanation falls apart in alot of cases, but the explanation of only spitting out the training data in a different order or context doesn't work either.
Viewing a single comment thread. View all comments