Recent comments in /f/MachineLearning

xander76 OP t1_je6uh96 wrote

All great points! I tend to think that it's best suited to problems that you can't solve with traditional programming. If the problem you have is "reverse this array of numbers", then writing the code or having Copilot write the code is a better answer.

But if the problem you want to solve is "come up with good titles for this blog post" or "summarize this user email" or "categorize these customer service complaints by anger level", there really isn't a JavaScript/TypeScript function you can write to do that. In this case, I think the latency is often worth the functionality.

As to the non-determinism, I think that's a real issue. Right now, the state of the art of testing GPT prompts feels very shaky; one person I talked to said that they "change the prompt and then bang the keyboard to try four or five inputs". This clearly isn't ok for serious projects. To help with this, we're currently building some tools to help developers generate test inputs and evaluate how their imaginary functions perform.

ETA: and thanks for the comment about the name! (I can't take credit for it, though; I believe it was first coined by Shane Milligan.)

8

TheEdes t1_je6tweq wrote

Sure but what's being advertised isn't sentience per se, at least with the leetcode part of their benchmarks. The issue here is that they claim that it can do X% on leetcode, but it seems like it's much less on new data. Even if it learned to find previous solutions and replace it with changes it should be able to perform well due to the nature of the problems.

1

xander76 OP t1_je6tpox wrote

You can check out the code at https://github.com/imaginary-dev/imaginary-dev , and there's code for demo projects as well:

Blog Writing demo from the screencast: Live site: blog-demo.imaginary.dev Code: https://github.com/imaginary-dev/imaginary-dev/tree/main/example-clients/nextjs-blog

Emojifier Live site: emojifier.imaginary.dev Code: https://github.com/imaginary-dev/imaginary-dev/tree/main/example-clients/nextjs-api

I'm also happy to answer any questions you might have!

3

reditum t1_je6nzh0 wrote

Dude, this is pretty amazing.

My biggest concern with it not writing the code is that it might not perform as well (network latency or connectivity issues) and won't be deterministic and could hallucinate, but I could also imagine a few cases where GPT would generate faster than code would run (and some non-determinism will be desirable sometimes).

The name of your product is super-catchy as well. I can definitely see imaginary programming becoming a trend!

3

was_der_Fall_ist t1_je6lfl9 wrote

Why are matrix multiplications mutually exclusive with complicated operations?

A computer just goes through a big series of 0s and 1s, yet through layers of abstraction they accomplish amazing things far more complicated than a naive person would think 0s and 1s could represent and do. Why not the same for a massive neural network trained via gradient descent to maximize a goal by means of matrix multiplication?

1

azorsenpai t1_je6hjpu wrote

Is there any reason you're really restraining to a Unet based model ? I'd recommend testing different architectures such as DeepLab V3 or FPN and see whether stuff improves. If it doesn't I'd recommend looking to your data and the quality of the ground truth as with only 100 data points you should be very much limited by the information contained in your data.

If the data is clean I'd recommend using some kind of ensemble method, this might be overkill, especially with heavy models but having multiple models with random initializations infer on a same input generally gives a few more points of accuracy/dice so if you really need it , this is an option.

4

Haycart t1_je6grih wrote

>The Transformer is not a universal function approximator. This is simply shown by the fact that it cannot process arbitrary long input due to the finite context limitations.

We can be more specific, then: the transformer is a universal function approximator* on the space of sequences that fit within its context. I don't this distinction is necessarily relevant to the point I'm making, though.

*again with caveats regarding continuity etc.

>Your conclusion is not at all obvious or likely given your facts. They seem to be in hindsight given the strong performance of large models.

Guilty as charged, regarding hindsight. I won't claim to have predicted GPT-3's performance a-priori. That said, my point was never that the strong performance we've observed from recent LLMs was obvious or likely--only that it shouldn't be surprising. And, in particular it should not be surprising that a GPT model (not necessarily GPT-3 or 4) trained on a language modeling task would have the abilities we've seen. Everything we've seen falls well within the bounds of what transformers are theoretically capable of doing.

There are, of course, aspects of the current situation specifically that you can be surprised about. Maybe you're surprised that 100 billion-ish parameters is enough, or that the current volume of training data was sufficient. My argument is mostly aimed at claims along the lines of "GPT-n can't do X because transformers lack capability Y" or "GPT-n can't do X because it is only trained to model language".

1