Recent comments in /f/MachineLearning
acutelychronicpanic t1_jdhksvy wrote
Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
Let it move a "mouse" and loop the next screen at some time interval. Probably not the best way to do it, but that seems to be how humans do it.
CptTombstone t1_jdhkpsp wrote
Reply to comment by Maleficent_Refuse_11 in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII
>At best it can recreate latent patterns in the training data.
Have you actually read the paper? On fresh LeetCode tests, GPT-4 significantly outperforms humans on all difficulty questions, reaching nearly double the performance of LeetCode users on medium and hard questions. Those are tests that were recently added to LeetCode's database and were not in the training data. Also, It performs genuinely well with image generation through SVG-code. The 3D modelling in Javascript example (Figure 2.7) is way out of the domain of what you would expect from "just a transformer", it demonstrates real understanding outside of the domain of the training data. It even outperforms purposely trained image generation models like stable diffusion in some regards, namely the adherence to instructions, although the generated images are not that visually pleasing compared to the likes of Dall-E of stable diffusion, which is a very unfair complaint for a freaking Language Model.
harharveryfunny t1_jdhkn99 wrote
Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
> GPT-4 with image input can interpret any computer screen
Not necessarily - it depends how they've implemented it. If it's just dense object and text detection, then that's all you're going to get.
For the model to be able to actually "see" the image they would need to feed it into the model at the level of neural net representation, not post-detection object description.
For example, if you wanted the model to guage whether two photos of someone not in it's training set are the same person, then it'd need face embeddings to do that (to gauge distance). They could special case all sorts of cases like this in addition to object detection, but you could always find something they missed.
The back-of-a-napkin hand-drawn website sketch demo is promising, but could have been done via object detection.
In the announcement of GPT-4, OpenAI said they're working with another company on the image/vision tech, and gave a link to an assistive vision company... for that type of use maybe dense labelling is enough.
eliminating_coasts t1_jdhkkw3 wrote
Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
You could in principle send them four images, that align at a corner where the cursor is, if it can work out how images fit together.
zy415 OP t1_jdhk9t8 wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
Is it just ICML that reviewers tend to ghost? My experiences (as both reviewers and authors) with NeurIPS and ICLR are that reviewers tend to participate in discussion with authors
Balance- OP t1_jdhjm0f wrote
Reply to comment by Deep-Station-1746 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
It doesn’t have to use it yet for actions on its own, but it could be very useful context when prompting questions.
[deleted] t1_jdhje54 wrote
Reply to comment by banmeyoucoward in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
[removed]
drcopus t1_jdhjddx wrote
Reply to comment by endless_sea_of_stars in [N] ChatGPT plugins by Singularian2501
Imo doing everything in-context seems more hacky - I would rather see a Toolformer approach but I understand that it probably requires more engineering and compute.
I reckon the in-context approach probably makes the plugins less stable as the model has to nail the syntax. ChatGPT is good at coding but it makes basic errors often enough to notice.
[deleted] t1_jdhj5k5 wrote
Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
[deleted]
SkinnyJoshPeck t1_jdhis65 wrote
Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
i imagine you could interpolate, given access to more info about the image post-GPT analysis. i.e. i’d like to think it has some boundary defined for the objects it identifies in the image as part of metadata or something in the API.
race2tb t1_jdhilvd wrote
Reply to comment by frequenttimetraveler in [N] ChatGPT plugins by Singularian2501
They may no longer have a purpose. The Generative AI will just be fed directly by customers and producers. The Generative AI service will pay for portfolios of data content it cannot generate itself. People will get paid based on how much their feeds are woven into content.
ObiWanCanShowMe t1_jdhil2v wrote
Reply to comment by Deep-Station-1746 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
We are smarter locally, meaning to our experience and our capability, we are not "smarter" in the grand scheme.
regular-jackoff t1_jdhikdt wrote
Reply to comment by Deep-Station-1746 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
Damn that’s rough
WarmSignificance1 t1_jdhi4rj wrote
Reply to comment by race2tb in [N] ChatGPT plugins by Singularian2501
I get the concept, and I see this working for a small subset of websites. But have you seen an average person interact with a website before? Having a non-deterministic GUI will absolutely kill UX in my opinion. Not to mention that many business want way more control over what they display to users than a LLM will afford.
Deep-Station-1746 t1_jdhhbbg wrote
Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
Nope. Ability to input something doesn't mean being able to use it reliably. For example, take this post - your eyes have an ability to input all the info on the screen, but as a contribution, this post is pretty worthless. And, you are a lot smarter than GPT-4, I think.
Edit: spelling
WarmSignificance1 t1_jdhh6w9 wrote
Reply to comment by yokingato in [N] ChatGPT plugins by Singularian2501
I just don't see replacing GUIs with LLMs making sense in general.
Do people really want to access their bank via a LLM? I see that being an inferior user experience.
sEi_ t1_jdhh07m wrote
Reply to comment by utopiah in [N] ChatGPT plugins by Singularian2501
Anyone's guess goes.
race2tb t1_jdhgx48 wrote
Reply to comment by WarmSignificance1 in [N] ChatGPT plugins by Singularian2501
Sites may not even exist. They may become feeds for the AI. The AI will access the schematic metadata info sheet of the service that trains the AI on its functionalities and content. Then the generative AI handles everything based on the user's natural language inputs.
[deleted] t1_jdhgar2 wrote
banmeyoucoward t1_jdhg7kt wrote
Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
I'd bet that screen recordings + mouse clicks + keyboard inputs made their way into the training data too.
Riboflavius t1_jdhg0ed wrote
Reply to [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a
That sounds fantastic, kudos to you! Great effort.
reditum t1_jdhfxfa wrote
[deleted] t1_jdhfke2 wrote
Jean-Porte t1_jdhf3vn wrote
Reply to [R] Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity by blabboy
Should have asked GPT-4 for the proper way to present data
Using line plots for this is absurd
Izzhov t1_jdhlnrg wrote
Reply to comment by Intrepid_Meringue_93 in [N] ChatGPT plugins by Singularian2501
I wrote a Python application for the first time using GPT-4 yesterday, it took me just a few hours to make something that could go into any folder and put all the images in all the subfolders into a Fullscreen slideshow with a black background and no border and each image resized to fit the screen without changing the aspect ratio that I can navigate with the arrow keys (looping back around to the first image after I hit the last one) and randomize the order with the spacebar (pressing spacebar again returns the original ordering) and toggle a display of the full image file path in white text with a black border in the upper left corner of the screen by pressing the q key which updates to match the image as I navigate the slide show and which hides my mouse cursor while I am focused on the fullscreen window and which automatically focuses the window once the program starts and which closes the program when I hit Esc and which, when I hold an arrow key down, goes to the next image, pauses for one second, and then proceeds through the following images at a rate of 10 per second until I lift the arrow key
This from knowing absolutely nothing about python a few hours prior. Using GPT-4 to write code makes me feel like a god dang superhero
Oh yeah, and I'd also never written a program that had a GUI before. In any language.