Recent comments in /f/MachineLearning
was_der_Fall_ist t1_jdugi0b wrote
Reply to comment by MysteryInc152 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
I’ve heard the RLHF change explained as actually a good thing, though. Here’s an example:
Say you ask it a question to which it assigns 90% probability to answer X and 10% probability to answer Y. Base GPT-4 gives the answers in these proportions: 90% of the time it says X and 10% of the time it says Y.
But if it’s 90% sure the answer is X, you don’t want it to say Y is the answer at all, even 10% of the time! It’s better for it to always say X. (Though the best may be to give a thorough account of its respective probability assessments.) So RLHF improves the behavior of the model by uncalibrating the rate of responses from their probabilities.
Colecoman1982 t1_jdug1m3 wrote
Yea, but what's it's confidence score for it's confidence score calculation? /s
uristmcderp t1_jdueokz wrote
Reply to Have deepfakes become so realistic that they can fool people into thinking they are genuine? [D] by [deleted]
Sounds more like you're asking about digital make-up, which can range from instagram filters to virtual avatars. And yeah, we can't tell how much of their presented look is real without a reference.
But does it matter? These people create an identity that only exists in the digital world. Who cares what they look like in the real world if you're never going to see them in the real world?
oimrqs t1_jduemcp wrote
Reply to [D] Will prompting the LLM to review it's own answer be any helpful to reduce chances of hallucinations? I tested couple of tricky questions and it seems it might work. by tamilupk
In my mind this + plugins + modules (vision) is the next step. Am I crazy?
master3243 t1_jdue84p wrote
Reply to comment by BullockHouse in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
> The problem with your solution is that it probably biases the model towards making up some papers just to fit the prompt and have a mix.
That's a very important point, adding an extra condition (if 'p' then 'q') to the prompt makes the model biased towards doing 'p' then doing 'q' to fulfil the prompt despite the condition still being met if it just avoided doing 'p'.
For a more concrete example, here's me asking ChatGPT to write two essays:
1- Write a paragraph about zoos. Figure. (Notice how no Elephants are mentioned)
2- Write a paragraph about zoos with (if 'p' then 'q') condition. Figure (Notice how only this answer mentions Elephants)
SzilvasiPeter t1_jdudjj3 wrote
Reply to comment by brucebay in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Well, our own body is alien to us. The brain, the gut, the endocrine system, and so on. There are emergent complexities everywhere from giant black holes to a pile of dirt. It is the same with conceptual things like math or computer science. Simple axioms and logic gates lead to beautiful complex systems.
I guess, we should get used to "not understanding" at this point.
master3243 t1_jdudizk wrote
Reply to comment by Borrowedshorts in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Who needs statistical tests with theoretical grounding and justified/repeatable results when you've got LLMs™
arg_max t1_jdud5wz wrote
Reply to comment by MysteryInc152 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
But we don't know if the text output actually gives us access to those confidences or if it is just making them up, do we?
nemesit t1_jdud09h wrote
Just let it give you the dois as a list and a script to verify their existence
timelyparadox t1_jduc2u7 wrote
Reply to [D] Will prompting the LLM to review it's own answer be any helpful to reduce chances of hallucinations? I tested couple of tricky questions and it seems it might work. by tamilupk
But isnt it still just simulating a text of fact checking instead of fact checking it?
WarAndGeese t1_jdubx7q wrote
Reply to comment by BullockHouse in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Also if the second neural network is running as a separate internet-connected application, it can go out and verify the output of the first, send back its results, and tell the first to either change or remove each paper that it cannot find and verify. The second neural network can make errors as well, but through these interconnected systems errors can be reduced somewhat largely.
BullockHouse t1_jduba6v wrote
Keeping a second GPT-4 window open and asking it to verify information from the first seems to work pretty well. The models fail by guessing when uncertain, but they have no incentive to cooperate and back up one another's guesses. The problem with your solution is that it probably biases the model towards making up some papers just to fit the prompt and have a mix.
1stuserhere t1_jdub2yo wrote
Reply to [D] Definitive Test For AGI by jabowery
inb4 this post is part of the training set for the next generation of LLMs along with the comments, sarcasm and what not
vintergroena OP t1_jduazsh wrote
Reply to comment by bubudumbdumb in [D] Can we train a decompiler? by vintergroena
Exactly, a lot of the tech is already there, it's perhaps more about unobfuscating the code rather than decompiling it.
SoylentRox t1_jdu9ya6 wrote
Reply to comment by he_who_floats_amogus in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
So this is an Open domain hallucination:
​
Closed domain hallucinations refer to instances in which the model is instructed to use only information provided
in a given context, but then makes up extra information that was not in that context. For example, if you ask the
model to summarize an article and its summary includes information that was not in the article, then that would be a
closed-domain hallucination.
Open domain hallucinations, in contrast, are when the model confidently provides false
information about the world without reference to any particular input context.
​
​
They handled this via : For tackling open-domain hallucinations, we
collect real-world ChatGPT data that has been flagged by users as being not factual, and collect
additional labeled comparison data that we use to train our reward models.
​
​
Not very productive. The best way to check references would be using a plugin and instructions to the model to "check references". The machine also needs to have RL training so that it will use the plugin and use it correctly the first time.
[deleted] t1_jdu9pvo wrote
Reply to comment by lambertb in [D] GPT4 and coding problems by enryu42
[removed]
s0n0fagun t1_jdu975r wrote
Reply to comment by currentscurrents in [D] Can we train a decompiler? by vintergroena
That depends on the language/compiler used. Java and C# have decompilers that turn out great code.
bubudumbdumb t1_jdu9382 wrote
Reply to [D] Can we train a decompiler? by vintergroena
I think that if you can have better variable names that is already a big selling point
bubudumbdumb t1_jdu90gu wrote
Reply to comment by Smallpaul in [D] Can we train a decompiler? by vintergroena
Friends working in rev.ng told me that it's very difficult to decompile to the original high level structures actually used in the source code. Maybe C have a few ways to code a loop but c++ has many and figuring out the source code from assembly is very hard to achieve with rule based systems.
masterofn1 t1_jdu8jug wrote
Reply to [D] Simple Questions Thread by AutoModerator
How does a Transformer architecture handle inputs of different lengths? Is the sequence length limit inherent to the model architecture or more because of resource issues like memory?
Ciber_Ninja t1_jdu856g wrote
Reply to comment by ngildea in [D] GPT4 and coding problems by enryu42
Try having it generate tests first. You gotta get it into the proper context.
he_who_floats_amogus t1_jdu8479 wrote
Reply to comment by Borrowedshorts in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
You could do that, but if it's just hallucinating the confidence intervals then it really isn't very neat. The language model have very high reward for hallucinated responses for things like confidence intervals in particular, because hallucinating figures like this will still produce very coherent responses.
Ciber_Ninja t1_jdu81zy wrote
Reply to comment by dimsumham in [D] GPT4 and coding problems by enryu42
It can in fact think in steps. All you have to do is ask it to. In fact, multiple papers have shown that asking it to think in steps provides a significant increase in the accuracy of it's answers.
ultraminxx t1_jdu7uz8 wrote
Reply to comment by currentscurrents in [D] Can we train a decompiler? by vintergroena
that said, it might be also a good approach to preprocess the input with a classical algorithm and then train a model on refactoring that decompiled code, so it becomes more readable
passerby251 t1_jdugjjr wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
How will AC deal with the case when a reviewer shows up in the last 2 hours to ddl and posts some new questions? Though we responded soon, the reviewer didn't reply anymore...