Recent comments in /f/MachineLearning

_faizan_ t1_jdwdwsm wrote

Reply to comment by iamspro in [N] ChatGPT plugins by Singularian2501

Is there an open Implementation of ToolFormer? or you rolled your own implementation for finetuning? They did mention in their paper that they gave few In-context examples of tool usage and then used GPT-J to label more text which they finally used for fine-tuning. Did you follow a similar approach. I have been looking to reproduce tool-former but not sure where to start even.

1

spacefoxy99 t1_jdwaxsh wrote

i tried with both 3.5 and 4 to create a simple memory game and not only did it cut the code off halfway through but the continued code didn't match what was happening in the first and the cide didn't work. tried two other times over the course of this month and the code is filled with errors and missing statements. gpt seems bad at coding, at least to me.

1

AlgoTrade t1_jdwa6it wrote

Hey everyone, I am looking for a way to take some old maps and overlay them using google's overlay features.
Google is kind enough to overlay the maps for me if I give precise lat/long boundaries on the image, but i'm unsure of some of those lat/long values. Moving and centering the map works fine for me, but is extremely manual. I was wondering if there are any tools or techniques that exist to auto tag maps/lines/boundaries? Any information helps, or even just a few key search terms to look for!
Thanks!

1

robobub t1_jdwa5wf wrote

Reply to comment by robobub in [D] GPT4 and coding problems by enryu42

Ill add this:

If it is possible for GPT to do 1+1, it can do a large number of them incrementally. It's not smart enough to do it all the time by planning ahead, (you'll have more success if you encourage GPT to have a train of thought reasoning here and here) but it's possible.

1

adventuringraw t1_jdw6enx wrote

You're right that there isn't a system yet that has the power of a LLM without the risk of hallucinated 'facts' woven in, but I don't think it's fair to say 'we're a long ways from that'. There's a ton of research going into different ways to approach this problem, approaches involving a tool using LLM seem likely to work even in the relatively short term (production models in the next few years, say) and that's only one approach.

I certainly don't think it's a /given/ that this problem will be solved soon, I wouldn't bet money that you're wrong about it taking a long time to get it perfect. But I also wouldn't bet money that you're right, given all the progress being made on multiple fronts towards solving this, and given the increasingly extreme focus by so many researchers and companies on this problem, and especially given the fact that solutions like this are both promising and seemingly realistic. After all, if there's a sub-system to detect that an arxiv search should be used to verify a reference before giving it, you could at least eliminate halucinated examples in this narrow area. The downside then might just be an incomplete overview of available papers, but it could eliminate any false papers from what the user sees.

All that said, this only fixes formal citations with a somewhat bespoke system. Fixing ALL inaccurate facts probably won't be possible with even dozens of 'tools'... that'll take more what you're thinking I imagine: something more like a truly general learned knowledge graph embedded as a system component. I know there's work on that too, but when THAT's fully solved, (like, TRULY solved, where modular elements of the world can be inferred from raw sensory data, and facts accumulated about their nature from interaction and written content) we'll be a lot closer to something that's arguably AGI, so... yeah. I think you're right about that being a fair ways away at least (hopefully).

13

tinkr_ t1_jdw30p3 wrote

Based on my recent experience using it to write code, that would certainly help for some--but not all--bugs coming out of GPT-4.

I posted about it in a different thread, but this was my experience: >Interestingly, I used GPT-4 to create a simply Neovim plugin yesterday and the experience was not as seamless as I was led to believe it'd be by the hype. It gave me generally ok code, but almost everything was buggy.

>It was able to debug itself sometimes, but the finally finish the plugin I needed to fix the code myself and post it back in the chat, telling it to use my fixed code to create a related function that it was unable to adequately generate.

>The problem I gave it was actually a simplified version of an already simple concept, I did not give it the full details of what I wanted. If you're interested, you can find the final plugin (after my corrections and updating it to allow user configs) here. A printout of the conversation to create the plugin can be found here.

Even with a simplified version of the objective, I had to step in and debug it myself and then give it the "good" code to use further. Maybe if I'd been more patient, it could've fixed itself entirely, but the experience to me seemed more like pair programming with a junior/mid-level software engineer. I was able to immediately see the issue with it's code, even though it was not.

Will still be revolutionary though. Definitely a massive boost to productivity using it, but I would trust it running in production without a thorough code review.

1

SoylentRox t1_jdw2yey wrote

It is not learning from your chats. Apparently OAI does farm for information from CHATGPT queries specifically for RL runs. And I was mentioning that in order for "plugin" support to work even sorta ok the machine absolutely has to learn from it's mistakes.

Remember all it knows is a plugin claims to do something by a description. The machine needs to accurately estimate if a particular user request is going to actually be satisfied by a particular plugin and also how to format the query correctly the first time.

Without this feature it would probably just use a single plugin, ignoring all the others, or get stuck emitting malformed requests a lot and just guess the answer like it does now.

2

was_der_Fall_ist t1_jdw2ya2 wrote

Check out this LessWrong thread in the comments.

Paul Christiano, alignment researcher at ARC/ previously OpenAI, explains the RLHF change the exact way I did (because I was pretty much quoting him), and someone replies:

> Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct. Under this assumption, it looks like the pre-trained model outputs the correct probability, but the RLHF model gives exaggerated probabilities because it thinks that will trick you into giving it higher reward.

And Paul replies:

> Yes, I think you are misunderstanding figure 8. I don't have inside information, but without explanation "calibration" would almost always mean reading it off from the logits. If you instead ask the model to express its uncertainty I think it will do a much worse job, and the RLHF model will probably perform similarly to the pre-trained model. (This depends on details of the human feedback, under a careful training regime it would probably get modestly better.)

5

was_der_Fall_ist t1_jdw2fud wrote

I’m pretty much just quoting Paul Christiano, alignment researcher at ARC and previously OpenAI, in a comment thread on this LessWrong post.

Someone comments pretty much the same thing the person I replied to did:

> “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.” What??? This is so weird and concerning.

To which Paul replies:

> If I ask a question and the model thinks there is an 80% the answer is "A" and a 20% chance the answer is "B," I probably want the model to always say "A" (or even better: "probably A"). I don't generally want the model to say "A" 80% of the time and "B" 20% of the time.

>In some contexts that's worse behavior. For example, if you ask the model to explicitly estimate a probability it will probably do a worse job than if you extract the logits from the pre-trained model (though of course that totally goes out the window if you do chain of thought). But it's not really lying---it's also the behavior you'd expect out of a human who is trying to be helpful.

>More precisely: when asked a question the pre-trained model outputs a probability distribution over what comes next. If prompted correctly you get its subjective probability distribution over the answer (or at least over the answer that would appear on the internet). The RLHF model instead outputs a probability distribution over what to say take next which is optimized to give highly-rated responses. So you'd expect it to put all of its probability mass on the best response.

>… If it is forced to say either "yes" or "no" the RLHF model will just give the more likely answer 100% of the time, which will show up as bad calibration on this graph. The point is that for most agents "the probability you say yes" is not the same as "the probability you think the answer is yes." This is the case for pretrained models.

6

Smallpaul t1_jdw0vx9 wrote

It seems to me that if a researcher uses OpenAI to generate an open source Instruct dataset, and a different corporation takes that dataset and uses it commercially, they are both legally in the clear unless they collude. The entity that is legally in contact with OpenAI has a legitimately non-commercial purpose and the entity doing the commercial work has no relationship with OpenAI.

2

metigue t1_jdw08fp wrote

Doesn't GPT-4 have some kind of reinforcement learning already baked in though? I asked it what "green as gravy" meant and it responded with a hallucination about it being a widely used expression and examples of its usage. I said "Nice try, but green as gravy is not a widely used expression is it?" It clarified that it is not a widely used expression and it made the stuff up as a possible definition of green as gravy.

Edit: Tried again just now and it still works. Leave system on default and try the user message: What is the meaning of "green as gravy"

1