Art10001 t1_jdihrod wrote on March 24, 2023 at 5:04 PM

Reply to comment by BullockHouse in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Probably. And if not, certainly someday.

yokingato t1_jdig5x0 wrote on March 24, 2023 at 4:54 PM

Reply to comment by WarmSignificance1 in [N] ChatGPT plugins by Singularian2501

Oh. Thanks for explaining. I have no idea tbh. I think most people are lazy and want the easiest option, but that could be wrong.

MjrK t1_jdiflsw wrote on March 24, 2023 at 4:50 PM

Reply to comment by ThirdMover in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

I'm confident that someone can fine-tune an end-to-end vision-tranformer that can extract user interface elements from photos and enumerate interaction options.

Seems like such an obviously-useful tool and Vit-22B should be able to handle it, or many other Computer Vision tools on Hugging Face... I would've assumed some grad student somewhere is already hacking away at that.

But then also, compute costs are a b**** but generating training data set should be somewhat easy.

Free research paper idea, I guess.

nixed9 t1_jdifhni wrote on March 24, 2023 at 4:49 PM

Reply to comment by ginger_beer_m in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

This is quite literally what we hope for/deeply fear at /r/singularity. It's going to be able to interact with computer systems itself. Give it read/write memory access and access to it's own API, or the ability to just simply visually process the screen output... and then.... what?

Several years ago, as recently as 2017 or so, this seemed extremely far-fetched and the "estimation" of a technological singularity of 2045 seemed wildly optimistic.

Right now it seems like it's more like than not to happen by 2030.

G_fucking_G t1_jdifa1c wrote on March 24, 2023 at 4:48 PM

Reply to [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92

Very interesting.

Quick question. How long does training take? For:

SFT Model
Reward Model
RLHF

I saw you used one 3090Ti, so was it done in hours/days/weeks?

Username2upTo20chars t1_jdiesqt wrote on March 24, 2023 at 4:45 PM

Reply to comment by Necessary-Meringue-1 in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII

>But here they prompted GPT-4 to generate code that would generate a picture in a specific style.

5 seconds of googling "code which generates random images in the style of the painter Kandinsky":

http://www.cad.zju.edu.cn/home/jhyu/Papers/LeonardoKandinsky.pdf

https://github.com/henrywoody/kandinsky-bot

GPT's trained on the whole of the WWW sensible text are just sophisticated echo/recombination chambers. True, it works far better than most would have predicted, but that doesn't change the way they work. I am also impressed, but GPT-3 got known for parroting content, why should the next generation be fundamentally different? It just gets harder and harder to verify.

Nevertheless I even expect such generative models to be good enough to become very general. Most human work isn't doing novel things either. Just copying up to smart recombination.

RustaceanNation t1_jdiekiv wrote on March 24, 2023 at 4:44 PM

Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Google's Spotlight paper is intended for this use case.

[deleted] t1_jdie32g wrote on March 24, 2023 at 4:41 PM

Reply to comment by passerby251 in [D] ICML 2023 Reviewer-Author Discussion by zy415

[deleted]

BullockHouse t1_jdidje0 wrote on March 24, 2023 at 4:37 PM

Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

I'm curious if it can be instructed to play minecraft in a keyboard only mode simply by connecting a sequence of images to key stroke outputs.

LABTUD t1_jdicqo8 wrote on March 24, 2023 at 4:32 PM

Reply to comment by man_im_rarted in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII

are you using evolutionary biology as an argument against the effectiveness of blind hill climbing? O.o

EDIT: i dont understand sarcasm, srry lol

stormelc t1_jdicbm7 wrote on March 24, 2023 at 4:30 PM

Reply to [R] Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity by blabboy

Okay let's create yet another useless acronym/phrase and toss it in the acronym soup of useless letters.

TFenrir t1_jdicafu wrote on March 24, 2023 at 4:29 PM

Reply to comment by light24bulbs in [N] ChatGPT plugins by Singularian2501

Awesome! Good to know it will work

harharveryfunny t1_jdic1s3 wrote on March 24, 2023 at 4:28 PM

Reply to comment by TikiTDO in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

>Obviously having the visual system provide data that the model can use directly is going to be far more effective, but nothing about dense object detection and description is going to be fundamentally incompatible with any level of detail you could extract into an embedding vectror. I'm not saying it would be a smart or effective solution, but it could be done.

I can't see how that could work for something like my face example. You could individually detect facial features, subclassified into hundreds of different eye/mouth/hair/etc/etc variants, and still fail to capture the subtle differences that differentiate one individual from another.

countalabs t1_jdibk1j wrote on March 24, 2023 at 4:25 PM

Reply to comment by iamspro in [N] ChatGPT plugins by Singularian2501

The "fine tuning" in OpenAI API can be few-shots. The other approach of putting the instruction or example in context should be called zero-shots.

[deleted] t1_jdib0n8 wrote on March 24, 2023 at 4:21 PM

Reply to comment by StellaAthena in [D] ICML 2023 Reviewer-Author Discussion by zy415

[deleted]

passerby251 t1_jdiao3b wrote on March 24, 2023 at 4:19 PM

Reply to comment by StellaAthena in [D] ICML 2023 Reviewer-Author Discussion by zy415

I think reminding could work. I finally receive my first response only after posting reminding comments to each reviewer. Hope the other responses would come in soon. Good luck!

CommunismDoesntWork t1_jdia6kb wrote on March 24, 2023 at 4:16 PM

Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

It can do this just fine

Imnimo t1_jdi9wze wrote on March 24, 2023 at 4:14 PM

Reply to [R] Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity by blabboy

Surely the Alternative Uses Test is all over the place in the LLM training data?

ReadSeparate t1_jdi9wic wrote on March 24, 2023 at 4:14 PM

Reply to comment by Maleficent_Refuse_11 in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII

What if some of the latent patterns in the training data that it's recreating are those that underlie creativity, critique, and theory of mind? Why are people so afraid of the idea that both of these things can be true? It's just re-creating patterns from its training data, and an emergent property from doing that at scale is a form of real intelligence because that's the best way to do it, because intelligence is how those patterns originated from in the first place.

ginger_beer_m t1_jdi9e5j wrote on March 24, 2023 at 4:11 PM

Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Carry this to the conclusion. Maybe not GPT4, but future LLM could interpret what's on the screen and drive the interaction with the computer themselves. This would potentially displace millions of human out of job as they get automated by the model.

TikiTDO t1_jdi8ims wrote on March 24, 2023 at 4:06 PM

Reply to comment by harharveryfunny in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

The embeddings are still just a representation of information. They are extremely dense, effectively continuous representations, true, but in theory you could represent that information using other formats. It would just take far more space and require more processing.

Obviously having the visual system provide data that the model can use directly is going to be far more effective, but nothing about dense object detection and description is going to be fundamentally incompatible with any level of detail you could extract into an embedding vectror. I'm not saying it would be a smart or effective solution, but it could be done.

In fact, going to another level, LLMs aren't restricted to working with just words. You could train an LLM to receive a serialized embedding as text input, and then train it to interpret those. After all, it's effectively just a list of numbers. I'm not sure why you'd do that if you could just feed it in directly, but maybe it's more convenient to not have to train in on different types of inputs or something.

nicku_a OP t1_jdi7ks3 wrote on March 24, 2023 at 4:00 PM

Reply to comment by [deleted] in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

So this is a training framework that can be used to train agents in gym environments (and other custom environments that use the gym-style format).

You can select a gym environment from its name, e.g. 'LunarLander-v2' when creating the environment, and then train it. See the docs for more info.

man_im_rarted t1_jdi7jyx wrote on March 24, 2023 at 4:00 PM

Reply to comment by Maleficent_Refuse_11 in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII

I get that people are excited, but nobody with a basic understanding of how evolutionary biology works should give room to this. The problem is not just that it is IGF optimizing/blind hill climbing. At best it can randomly stumble onto useful patterns. There is no element of critique and no element of creativity. There is no theory of mind, there is just a stochastic reproduction maximizing, when prompted they just respond with what maximizes their odds of reproducing. Still, get the excitement. Am excited, too. But hype hurts the industry.

Chris_The_Pekka t1_jdi7gqr wrote on March 24, 2023 at 3:59 PM

Reply to [D] Simple Questions Thread by AutoModerator

Hello everyone, I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that is must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else.

[deleted] t1_jdi71ly wrote on March 24, 2023 at 3:56 PM

Reply to [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

[deleted]

Recent comments in /f/MachineLearning