Submitted by Cool_Abbreviations_9 t3_123b66w in MachineLearning
[deleted] t1_jdu1mz6 wrote
[deleted]
MysteryInc152 t1_jdu4sl2 wrote
In the gpt-4 technical paper, we see base gpt-4 have really good calibration. That is confidence directly correlated with ability to solve problems. But apparently the RlHF they did knocked that out some.
arg_max t1_jdud5wz wrote
But we don't know if the text output actually gives us access to those confidences or if it is just making them up, do we?
meister2983 t1_jdwswgt wrote
Asked a bunch of factual questions on less commonly known stuff. It's either hallucinating or has such poorly calibrated confidence numbers it is useless.
was_der_Fall_ist t1_jdugi0b wrote
I’ve heard the RLHF change explained as actually a good thing, though. Here’s an example:
Say you ask it a question to which it assigns 90% probability to answer X and 10% probability to answer Y. Base GPT-4 gives the answers in these proportions: 90% of the time it says X and 10% of the time it says Y.
But if it’s 90% sure the answer is X, you don’t want it to say Y is the answer at all, even 10% of the time! It’s better for it to always say X. (Though the best may be to give a thorough account of its respective probability assessments.) So RLHF improves the behavior of the model by uncalibrating the rate of responses from their probabilities.
astrange t1_jdujlcf wrote
This is why people are wrong when they say GPT "just outputs the most probable next word". It's the most probable /according to itself/, and the model has been trained to lie such that the most useful word is the most probable one.
was_der_Fall_ist t1_jduk3s8 wrote
They’re also not realizing that even if the goal is to produce the most probable/useful next word, that doesn’t preclude the neural network from doing other complicated operations in order to figure out the most probable/useful word.
bpooqd t1_jdun73m wrote
I suspect those people believe that gpt4 is actually a markov chain.
IDe- t1_jdv5f5b wrote
I mean it is a (higher order) Markov chain.
Gh0st1y t1_jdvr5qr wrote
Yeah but so are we haha
sineiraetstudio t1_jdws5js wrote
All higher-order markov chains can be modeled as a first-order markov chain by squashing states together.
AndreasVesalius t1_jdx7r37 wrote
It’s just a bunch of if/else statements
light24bulbs t1_jduwgqt wrote
Yeah, like it's actually using a huge amount of brain power to figure out what the next word is. Just because that's how it works doesn't mean it's not intelligent.
If you want to be really good at figuring out what the next word is you have to be really smart
bartvanh t1_jdyd6om wrote
Ugh, yes it's so frustrating to see people not realizing this bit all the time. And also kind of painful to imagine that (presumably - correct me if I'm wrong) all those internal "thoughts" are probably discarded after each word, only to be painstakingly reconstructed almost identically for predicting the next word.
was_der_Fall_ist t1_je3ng6m wrote
Maybe that’s part of the benefit of using looped internal monologue/action systems. By having them iteratively store thoughts and otherwise in their context window, they no longer have to use the weights of the neural network to “re-think” every thought each time they predict a token. They could think more effectively by using their computation to do other operations that take the internal thoughts and actions as their basis.
Uptown-Dog t1_jdw6kh1 wrote
Okay wow. I needed this comment. Thanks.
ntaylor- t1_je11vt1 wrote
Fairly sure the "final" gpt4 model is still using a generate function that predicts one token at a time. Just the training was good and complicated via RLHF. After training it's not doing any "complicated operations".
was_der_Fall_ist t1_je15397 wrote
You don’t think the neural network, going through hundreds of billions of parameters each time it calculates the next token, is doing anything complicated?
ntaylor- t1_je5qtl2 wrote
Nope. It's the same as all neural networks using transformer architecture. Just a big old series of matrix multiplications with some non linear transformations at end of the day
was_der_Fall_ist t1_je6lfl9 wrote
Why are matrix multiplications mutually exclusive with complicated operations?
A computer just goes through a big series of 0s and 1s, yet through layers of abstraction they accomplish amazing things far more complicated than a naive person would think 0s and 1s could represent and do. Why not the same for a massive neural network trained via gradient descent to maximize a goal by means of matrix multiplication?
Rioghasarig t1_jdxrp3y wrote
No they were right about with he base model of GPT. As the base model was trained simply to predict the next word. ChatGPT and GPT4 have evolved beyond that (with things like RLHF).
astrange t1_jdy6d4f wrote
But nobody uses the base model, and when they did use it, it was only interesting because it fails to predict the next word and therefore generates new text. A model that successfully predicts the next word all the time given existing text would be overfitting, since it would only produce things you already have.
Rioghasarig t1_jdz24za wrote
People were using the base model when it first came out and some people are still using it today. The game AI Dungeon is still runs on what is essentially a transformer trained on next token prediction. So it would be accurate to say "It's just (attempts to) outputs the next most probable word" .
ntaylor- t1_je11iqf wrote
But eventually, after RLHF, the gpt4 model is one final fixed model and still presumably uses a generate function that will be predicting next tokens based on the previous, as base gpt models/any autoregressive model does. At least that's what it seems to be doing.
muskoxnotverydirty t1_jdv9m5v wrote
"Temperature" governs this behavior, doesn't it? I was under the impression that when you set temperature to zero, you get a deterministic output because it always selects the most probable token.
[deleted] t1_jdvg9z4 wrote
[deleted]
muskoxnotverydirty t1_jdw39vd wrote
How so?
[deleted] t1_jdx1tgn wrote
[deleted]
MysteryInc152 t1_jdvqj47 wrote
That's not what I meant in regards to calibration. It's not about saying an answer x% of the time or not. It's about being able to correctly estimate gaps in knowledge.
Good calibration is what you want.
was_der_Fall_ist t1_jdw2ya2 wrote
Check out this LessWrong thread in the comments.
Paul Christiano, alignment researcher at ARC/ previously OpenAI, explains the RLHF change the exact way I did (because I was pretty much quoting him), and someone replies:
> Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct. Under this assumption, it looks like the pre-trained model outputs the correct probability, but the RLHF model gives exaggerated probabilities because it thinks that will trick you into giving it higher reward.
And Paul replies:
> Yes, I think you are misunderstanding figure 8. I don't have inside information, but without explanation "calibration" would almost always mean reading it off from the logits. If you instead ask the model to express its uncertainty I think it will do a much worse job, and the RLHF model will probably perform similarly to the pre-trained model. (This depends on details of the human feedback, under a careful training regime it would probably get modestly better.)
meister2983 t1_jdwt675 wrote
Also this is for multiple choice questions (MMLU). I don't think they reported if the pre-RLHF model confidence numbers on fill in the blank world facts aligned to reality.
sineiraetstudio t1_jdvvvdb wrote
... that's not what's happening though? The calibration error is causing it to increase its confidence in low accuracy answer and decrease it in med-high accuracy answers, making it more likely to output wrong answers. Seems like maybe you're confusing it with using a different sampler? Something like top-p already does what you mentioned.
was_der_Fall_ist t1_jdw2fud wrote
I’m pretty much just quoting Paul Christiano, alignment researcher at ARC and previously OpenAI, in a comment thread on this LessWrong post.
Someone comments pretty much the same thing the person I replied to did:
> “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.” What??? This is so weird and concerning.
To which Paul replies:
> If I ask a question and the model thinks there is an 80% the answer is "A" and a 20% chance the answer is "B," I probably want the model to always say "A" (or even better: "probably A"). I don't generally want the model to say "A" 80% of the time and "B" 20% of the time.
>In some contexts that's worse behavior. For example, if you ask the model to explicitly estimate a probability it will probably do a worse job than if you extract the logits from the pre-trained model (though of course that totally goes out the window if you do chain of thought). But it's not really lying---it's also the behavior you'd expect out of a human who is trying to be helpful.
>More precisely: when asked a question the pre-trained model outputs a probability distribution over what comes next. If prompted correctly you get its subjective probability distribution over the answer (or at least over the answer that would appear on the internet). The RLHF model instead outputs a probability distribution over what to say take next which is optimized to give highly-rated responses. So you'd expect it to put all of its probability mass on the best response.
>… If it is forced to say either "yes" or "no" the RLHF model will just give the more likely answer 100% of the time, which will show up as bad calibration on this graph. The point is that for most agents "the probability you say yes" is not the same as "the probability you think the answer is yes." This is the case for pretrained models.
sineiraetstudio t1_jdwbuig wrote
I don't see how this is arguing it's a good thing, it's just a justification (which I'd expect from Paul Christiano, he's a huge fan of RLHF). The model is becoming overconfident in it's answers - how could you possibly spin that as a positive?
was_der_Fall_ist t1_jdwdxut wrote
My understanding is that rather than being overconfident in their answers, they simply produce the answer they’re most confident in instead of differentially saying each answer proportional to how confident they are. This seems similar to how humans work — if you ask me a yes or no question and I’m 80% sure the answer is yes, I’m going to say “yes” every time; I’m not going to say “no” 20% of the times you ask me, even though I assign a 20% chance that “no” is correct. In other words, the probability I say yes is not the same as the probability I assign to yes being correct. But I admit there are subtleties to this issue with which I am unfamiliar.
sineiraetstudio t1_jdws2iv wrote
(The graph doesn't give enough information to determine whether it's actually becoming more confident in its high-confidence answers, but it sounds like a reasonable enough rationale.)
I'm not sure I understand what distinction you're trying to draw. The RLHF'd version assigns higher confidence to answers than it actually gets correct, unlike the original pre-trained version. That's literally the definition of overconfidence.
You might say that this is more "human-like", but being human-like doesn't mean that it's good. If you want only the most likely answer, you can already do this via the sampler, while on the hand calibration errors are a straight up downside as Paul Christiano explicitly mentions in the part you quoted. If you need accurate confidence scores (because you e.g. only want to act if you're certain), being well-calibrated is essential.
was_der_Fall_ist t1_jdwz4qw wrote
I think you make a good point. We probably need better methods of post-training LLMs. But it does seem like the current regime is still sometimes more useful than the pre-trained model, which Christiano also says. It's only in some contexts that this behavior is worse. I'm not sure if it's really better than top-p sampling, though. I'm not sure that it is. But RLHF models do seem pretty useful.
sineiraetstudio t1_jdymf8q wrote
Oh, RLHF absolutely has all sorts of benefits (playing with top-p only makes answers more consistent - but sometimes you want to optimize for something different than "most likely"), so it's definitely here to stay (for now?), it's just not purely positive. Ideally we'd have a RLHF version that's still well calibrated (or even better, some way to determine confidence without looking at logits that also works with chain of thought prompting).
meister2983 t1_jdwu6ig wrote
It's necessary to improve overall performance; GPT-4 isn't just a thing to answer multiple choice questions.
E.g. Accuracy on adversarial questions (Truthful QA) goes from 40% to 60%.
sineiraetstudio t1_jdwvmxb wrote
Are you talking about RLHF in general? I'm specifically referring to the calibration error, which is separate from accuracy.
meister2983 t1_jdx06k9 wrote
Yes. RLHF both increases accuracy on certain tests while decreasing calibration on others.
quantic-dream t1_jdz2gq9 wrote
I am noob in ML, but as I understand GPT generates 1 word at a time, and may it be that, for ex, 1 particular word, somewhere in the middle was with confidence 0.1, the highest GPT can get, and everything after this word become a hallucination?
Cool_Abbreviations_9 OP t1_jdu3xlj wrote
It does appear to have some calibration capabilities
Rioghasarig t1_jdxs956 wrote
I really don't think your experiment makes much sense. Even if we could determine the confidence level of GPT there's no reason to believe asking it for its confidence level is an effective way of determining the actual confidence. As other people have asked the obvious question is "what's your confidence on these confidence reports"? The logic is baseless.
Gh0st1y t1_jdvqlgo wrote
I really do wonder if its able to recognize its own uncertainty. It seems able to from the OP and my own chats with it, but idk how id test it more rogorously.
iJeff t1_jdvsctx wrote
Although it can seem to work to some degree, this does seem to be the case. Bing Chat is generally a better option for this, because it will provide a citation for its claims. Visiting those citations can help you figure out whether it was merely hallucinating.
 [D]GPT-4 might be able to tell you if it hallucinated
[D]GPT-4 might be able to tell you if it hallucinated
Viewing a single comment thread. View all comments