Recent comments in /f/MachineLearning

MjrK t1_jdiflsw wrote

I'm confident that someone can fine-tune an end-to-end vision-tranformer that can extract user interface elements from photos and enumerate interaction options.

Seems like such an obviously-useful tool and Vit-22B should be able to handle it, or many other Computer Vision tools on Hugging Face... I would've assumed some grad student somewhere is already hacking away at that.

But then also, compute costs are a b**** but generating training data set should be somewhat easy.

Free research paper idea, I guess.

20

nixed9 t1_jdifhni wrote

This is quite literally what we hope for/deeply fear at /r/singularity. It's going to be able to interact with computer systems itself. Give it read/write memory access and access to it's own API, or the ability to just simply visually process the screen output... and then.... what?

Several years ago, as recently as 2017 or so, this seemed extremely far-fetched and the "estimation" of a technological singularity of 2045 seemed wildly optimistic.

Right now it seems like it's more like than not to happen by 2030.

46

Username2upTo20chars t1_jdiesqt wrote

>But here they prompted GPT-4 to generate code that would generate a picture in a specific style.

5 seconds of googling "code which generates random images in the style of the painter Kandinsky":

http://www.cad.zju.edu.cn/home/jhyu/Papers/LeonardoKandinsky.pdf

https://github.com/henrywoody/kandinsky-bot

GPT's trained on the whole of the WWW sensible text are just sophisticated echo/recombination chambers. True, it works far better than most would have predicted, but that doesn't change the way they work. I am also impressed, but GPT-3 got known for parroting content, why should the next generation be fundamentally different? It just gets harder and harder to verify.

Nevertheless I even expect such generative models to be good enough to become very general. Most human work isn't doing novel things either. Just copying up to smart recombination.

17

harharveryfunny t1_jdic1s3 wrote

>Obviously having the visual system provide data that the model can use directly is going to be far more effective, but nothing about dense object detection and description is going to be fundamentally incompatible with any level of detail you could extract into an embedding vectror. I'm not saying it would be a smart or effective solution, but it could be done.

I can't see how that could work for something like my face example. You could individually detect facial features, subclassified into hundreds of different eye/mouth/hair/etc/etc variants, and still fail to capture the subtle differences that differentiate one individual from another.

4

ReadSeparate t1_jdi9wic wrote

What if some of the latent patterns in the training data that it's recreating are those that underlie creativity, critique, and theory of mind? Why are people so afraid of the idea that both of these things can be true? It's just re-creating patterns from its training data, and an emergent property from doing that at scale is a form of real intelligence because that's the best way to do it, because intelligence is how those patterns originated from in the first place.

3

TikiTDO t1_jdi8ims wrote

The embeddings are still just a representation of information. They are extremely dense, effectively continuous representations, true, but in theory you could represent that information using other formats. It would just take far more space and require more processing.

Obviously having the visual system provide data that the model can use directly is going to be far more effective, but nothing about dense object detection and description is going to be fundamentally incompatible with any level of detail you could extract into an embedding vectror. I'm not saying it would be a smart or effective solution, but it could be done.

In fact, going to another level, LLMs aren't restricted to working with just words. You could train an LLM to receive a serialized embedding as text input, and then train it to interpret those. After all, it's effectively just a list of numbers. I'm not sure why you'd do that if you could just feed it in directly, but maybe it's more convenient to not have to train in on different types of inputs or something.

3

man_im_rarted t1_jdi7jyx wrote

I get that people are excited, but nobody with a basic understanding of how evolutionary biology works should give room to this. The problem is not just that it is IGF optimizing/blind hill climbing. At best it can randomly stumble onto useful patterns. There is no element of critique and no element of creativity. There is no theory of mind, there is just a stochastic reproduction maximizing, when prompted they just respond with what maximizes their odds of reproducing. Still, get the excitement. Am excited, too. But hype hurts the industry.

12

Chris_The_Pekka t1_jdi7gqr wrote

Hello everyone, I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that is must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else.

1