Recent comments in /f/MachineLearning
sineiraetstudio t1_jdwvmxb wrote
Reply to comment by meister2983 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Are you talking about RLHF in general? I'm specifically referring to the calibration error, which is separate from accuracy.
meister2983 t1_jdwu6ig wrote
Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
It's necessary to improve overall performance; GPT-4 isn't just a thing to answer multiple choice questions.
E.g. Accuracy on adversarial questions (Truthful QA) goes from 40% to 60%.
elkhornslew t1_jdwu61u wrote
What’s its confidence in its confidence scores?
meister2983 t1_jdwt675 wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Also this is for multiple choice questions (MMLU). I don't think they reported if the pre-RLHF model confidence numbers on fill in the blank world facts aligned to reality.
Smallpaul t1_jdwt4ao wrote
So I guess LlamaIndex has nothing to do with Meta's LLaMa except that they both have "LLM" in their names? They switched from one confusing name to another!
meister2983 t1_jdwswgt wrote
Reply to comment by arg_max in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Asked a bunch of factual questions on less commonly known stuff. It's either hallucinating or has such poorly calibrated confidence numbers it is useless.
turfptax OP t1_jdws798 wrote
Reply to comment by Badbabyboyo in [N] Predicting Finger Movement and Pressure with Machine Learning and Open Hardware Bracelet by turfptax
Thank you!
I have friends with other sensors systems but my goal is also to provide the platform for all types of biometric sensors with labels to be used in a similar vein.
I'm working on the next prototype that can be tested in the field with higher bit ADCs.
The label system was the most important to simplify the problem to be able to test different sensors and configurations.
sineiraetstudio t1_jdws5js wrote
Reply to comment by IDe- in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
All higher-order markov chains can be modeled as a first-order markov chain by squashing states together.
sineiraetstudio t1_jdws2iv wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
(The graph doesn't give enough information to determine whether it's actually becoming more confident in its high-confidence answers, but it sounds like a reasonable enough rationale.)
I'm not sure I understand what distinction you're trying to draw. The RLHF'd version assigns higher confidence to answers than it actually gets correct, unlike the original pre-trained version. That's literally the definition of overconfidence.
You might say that this is more "human-like", but being human-like doesn't mean that it's good. If you want only the most likely answer, you can already do this via the sampler, while on the hand calibration errors are a straight up downside as Paul Christiano explicitly mentions in the part you quoted. If you need accurate confidence scores (because you e.g. only want to act if you're certain), being well-calibrated is essential.
Badbabyboyo t1_jdwreio wrote
Reply to [N] Predicting Finger Movement and Pressure with Machine Learning and Open Hardware Bracelet by turfptax
That’s awesome! Keep up the good work. Most people don’t realize we got voice to text translation technology about as far as it could go in the 90’s and it wasn’t until they combined it with machine learning in the 2000’s that it really improved to the point of being useful. A majority of future human machine interfaces will probably have to be developed using machine learning and this is a perfect example!
Alhoshka t1_jdwpjmt wrote
Reply to comment by brierrat in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Good catch! I didn't notice.
1bir t1_jdwodl0 wrote
Reply to [D] GPT4 and coding problems by enryu42
There's a version that interacts with Wolfram Alpha; does that do any better?
fmfbrestel t1_jdwmb7z wrote
Reply to comment by nonotan in [D] Can we train a decompiler? by vintergroena
Most of those problems are due to the input/memory limitations for general use. I can imagine locally hosted GPTs that have training access to an organization's source code, development standards, and database data structures. Such a system could be incredibly useful. Human developers would just provide the prompts, supervise, approve, and test new/updated code.
Would have to be locally hosted, because most orgs are NOT going to feed their source code to an outside agency regardless of the promises of efficiency.
ChezMere t1_jdwllmb wrote
Reply to comment by Colecoman1982 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
This, but unironically.
DirtyKinkyInLove t1_jdwlgmp wrote
Reply to comment by Hamoodzstyle in [D] GPT4 and coding problems by enryu42
It also reduces token usage. If the chatbot has a wordy response, it takes up more space in the context window and the chatbot will forget its instructions sooner. If sounds like gibberish, let me know and I'll break it down.
Taenk t1_jdwlejh wrote
Reply to comment by JohnyWalkerRed in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
The Open Assistant project is working on that as well.
JohnyWalkerRed OP t1_jdwjvxy wrote
Reply to comment by big_ol_tender in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
Yeah like the databricks dolly post is funny to me because they are an enterprise software company and dolly is not really useful in the context they operate in. I guess they just wanted to get some publicity.
Looks like openassist, when mature, could enable this. Although it seems the precursor to an Alpaca-like dataset is an RLHF model, which itself needs human-labeled dataset, so that bottleneck needs to be solved too.
muskoxnotverydirty t1_jdwjc1w wrote
Reply to comment by tamilupk in [D] Will prompting the LLM to review it's own answer be any helpful to reduce chances of hallucinations? I tested couple of tricky questions and it seems it might work. by tamilupk
And this method doesn't have some of the drawbacks seen in OP's prompting. Giving an example of an incorrect response followed by self-correction within the prompt may make it more likely that the initial response is wrong, since that's the pattern you're showing it.
sad_dad_is_a_mad_lad t1_jdwhg8a wrote
OpenAI commercial use will not be easily enforced... They used copyright data to train their own models.
col-summers t1_jdwhbte wrote
Reply to [D] Can we train a decompiler? by vintergroena
Yes, that is obviously a hugely valuable application of machine intelligence. Want to work on it?
esquire900 t1_jdwh7v8 wrote
Reply to comment by quitenominal in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
Yea I was afraid so, just hadn't found it. Thank you for pointing that out :)
[deleted] t1_jdwfq0x wrote
Reply to comment by Deep-Station-1746 in [D] Build a ChatGPT from zero by manuelfraile
[deleted]
was_der_Fall_ist t1_jdwdxut wrote
Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
My understanding is that rather than being overconfident in their answers, they simply produce the answer they’re most confident in instead of differentially saying each answer proportional to how confident they are. This seems similar to how humans work — if you ask me a yes or no question and I’m 80% sure the answer is yes, I’m going to say “yes” every time; I’m not going to say “no” 20% of the times you ask me, even though I assign a 20% chance that “no” is correct. In other words, the probability I say yes is not the same as the probability I assign to yes being correct. But I admit there are subtleties to this issue with which I am unfamiliar.
[deleted] t1_jdwx9z6 wrote
Reply to [D] Simple Questions Thread by AutoModerator
[removed]