plocco-tocco t1_jdzpyf8 wrote on March 28, 2023 at 11:35 AM

Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

I do not see any evidence of this happening in the article. Also, OpenAI claims to have checked for contamination in every benchmark, so I don't see what the author's are trying to show here.

sigmatrophic t1_jdzpk9m wrote on March 28, 2023 at 11:31 AM

Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Honestly I paid for GTP 4... It's a bit better but felt like gtp3 before they dumbed it down.

[deleted] t1_jdzpeq4 wrote on March 28, 2023 at 11:29 AM

Reply to comment by [deleted] in [D] FOMO on the rapid pace of LLMs by 00001746

[removed]

[deleted] t1_jdzp97e wrote on March 28, 2023 at 11:28 AM

Reply to [D] FOMO on the rapid pace of LLMs by 00001746

[removed]

keepthepace t1_jdzp4ge wrote on March 28, 2023 at 11:26 AM

Reply to comment by rfxap in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Could some parts of the dataset be copied into the LeetCode problem or is there a guarantee that these problems are 100% novel?

petrastales OP t1_jdzp1ag wrote on March 28, 2023 at 11:25 AM

Reply to comment by AuspiciousApple in [D] Can DeepL learn from edits to the translations it produces immediately? by petrastales

Ahh ok. Thank you for the explanation!

bjj_starter t1_jdzoafq wrote on March 28, 2023 at 11:18 AM

Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

This title is misleading. The only thing they found was that GPT-4 was trained on code questions it wasn't tested on.

Dartagnjan t1_jdzo44q wrote on March 28, 2023 at 11:16 AM

Reply to [D] Simple Questions Thread by AutoModerator

Is anyone in need of machine learning protégé? I am looking for a doctorate position in the German and English speaking worlds.

My experience is in deep learning, specifically GNNs applied to science problems. I would like to remain in deep learning, broadly but would not mind changing topic to some other application, or to a more theoretical research project.

I am also interested in theoretical questions, e.g. given a well defined problem (e.g. the approximation of the solution of a PDE), what can we say about the "training difficulty", is optimization at all possible (re. Tangent kernel analysis), how architectures help facilitate optimization, and solid mathematical foundations of deep learning theory.

I have a strong mathematical background with knowledge in functional analysis and differential geometry, and also hold a BSc in Physics, adjacent to my main mathematical educational track.

Last week I also started getting into QML with pennylane and find the area also quite interesting.

Please get in touch if you think I could be a good fit for your research group or know an open position that might fit my profile.

bjj_starter t1_jdzo3zq wrote on March 28, 2023 at 11:16 AM

Reply to comment by muskoxnotverydirty in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

But that's pure speculation. They showed that a problem existed with training data, and OpenAI had already dealt with that problem and wasn't hiding it at all - GPT-4 wasn't tested on any of that data. Moreover, it's perfectly fine for problems like the ones it will be tested on to be in the training data, as in past problems. What's important is that what it's actually tested on is not in the training data. There is no evidence that it was tested on training data, at this point.

Moreover, the Microsoft Research team was able to repeat some impressive results in a similar domain on tests that didn't exist before the training data cut-off. There isn't any evidence that this is a problem with a widespread effect on performance. It's also worth noting that it seems pretty personal for the guy behind this paper, judging by the way he wrote his tweet.

hardmaru t1_jdznq2v wrote on March 28, 2023 at 11:11 AM

Reply to comment by keepthepace in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Not sure if this article has been peer reviewed

But saw some “peer reviews” on Twitter :)

See: https://twitter.com/sleepinyourhat/status/1638988283018465300

AuspiciousApple t1_jdznf40 wrote on March 28, 2023 at 11:08 AM

Reply to comment by petrastales in [D] Can DeepL learn from edits to the translations it produces immediately? by petrastales

Sorry, that was not very clearly explained on my part.

Do you understand that these models have weights/parameters - numbers that define their behaviour? The standard sense of "learning" in ML is to update these weights to fit some training data better.

And are you aware that large language model get a sequence of text (the "context") and predict the next bit of text from that? Now, these models can use examples in the text they are given to do things they otherwise wouldn't be able to. This is called in-context learning. However, here the parameters of the model don't change and if the examples aren't in the context, then the model doesn't remember anything about it.

noobgolang t1_jdzmqmf wrote on March 28, 2023 at 11:00 AM

Reply to [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally? by RedditLovingSun

search for a library to do something in some programming language.

It gives you the exact one, the library that you could never search on google.

sebzim4500 t1_jdzmpee wrote on March 28, 2023 at 11:00 AM

Reply to comment by TehDing in [P] two copies of gpt-3.5 (one playing as the oracle, and another as the guesser) performs poorly on the game of 20 Questions (68/1823). by evanthebouncy

Wordle is kind of unfair though, because the LLM takes input in the form of tokens rather than letters, so doing anything which requires reasoning on the level of letters is difficult. Incidentally, this might also be affecting it's ability to do arithmetic, LLaMA by comparison uses one token for each digit to avoid the issue (but of course suffers from the same problems with breaking words into characters).

Qpylon t1_jdzmiaq wrote on March 28, 2023 at 10:57 AM

Reply to comment by antonivs in [D] FOMO on the rapid pace of LLMs by 00001746

I’m curious, is this for your company wiki or something? Was considering trying that with our documentation etc.

keepthepace t1_jdzm2ic wrote on March 28, 2023 at 10:52 AM

Reply to comment by rfxap in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

That articles with peer-review is not something that should be avoided, even by Microsoft AI, sorry, "Open"AI

Deep-Station-1746 t1_jdzlqrw wrote on March 28, 2023 at 10:48 AM

Reply to [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally? by RedditLovingSun

Oh I've used it to <insert something I previously used to do to do with google>. It's great.

[deleted] t1_jdzju2q wrote on March 28, 2023 at 10:24 AM

Reply to [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning? by Subject_Ad_9680

[removed]

[deleted] t1_jdzjhti wrote on March 28, 2023 at 10:20 AM

Reply to comment by Janderhungrige in [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning? by Subject_Ad_9680

[removed]

AssHypnotized t1_jdzjacv wrote on March 28, 2023 at 10:17 AM

Reply to comment by RedditLovingSun in [D] FOMO on the rapid pace of LLMs by 00001746

r/chatgptpro and maybe r/chatgptcoding

Suspicious-Box- t1_jdzj7wr wrote on March 28, 2023 at 10:16 AM

Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Just need training for that. Its amazing but what could it do with camera vision into the world and a robot body. Would it need specific training or could it brute force its way to moving a limb. The model would need to be able to improve itself real time though.

RedditLovingSun t1_jdziqh1 wrote on March 28, 2023 at 10:09 AM

Reply to comment by deepneuralnetwork in [D] FOMO on the rapid pace of LLMs by 00001746

Me too, I've used it to aid me reading books, study for tests, complete some small side projects, etc. Wish there was a list or subreddit somewhere for people to share what ways they've gotten value out of it so far

muskoxnotverydirty t1_jdzi41h wrote on March 28, 2023 at 10:01 AM

Reply to comment by Simcurious in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

It's correct and it's not correct. The article mentions this, but then they say that it's likely that they weren't able to cleanly separate pre-2021 questions on non-coding benchmarks.

kkimdev OP t1_jdzhwp1 wrote on March 28, 2023 at 9:58 AM

Reply to comment by asdfzzz2 in [D] Small language model suitable for personal-scale pre-training research? by kkimdev

This paper covers exactly what I was looking for, thanks!

ppff01 t1_jdzhmhs wrote on March 28, 2023 at 9:54 AM

Reply to comment by wazis in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

then*

Simcurious t1_jdzhbyu wrote on March 28, 2023 at 9:50 AM

Reply to comment by sb1729 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

The title implies that they evaluated on data from before 2021 while the source says they didn't.

Recent comments in /f/MachineLearning