Recent comments in /f/MachineLearning

Izzhov t1_jdhlnrg wrote

I wrote a Python application for the first time using GPT-4 yesterday, it took me just a few hours to make something that could go into any folder and put all the images in all the subfolders into a Fullscreen slideshow with a black background and no border and each image resized to fit the screen without changing the aspect ratio that I can navigate with the arrow keys (looping back around to the first image after I hit the last one) and randomize the order with the spacebar (pressing spacebar again returns the original ordering) and toggle a display of the full image file path in white text with a black border in the upper left corner of the screen by pressing the q key which updates to match the image as I navigate the slide show and which hides my mouse cursor while I am focused on the fullscreen window and which automatically focuses the window once the program starts and which closes the program when I hit Esc and which, when I hold an arrow key down, goes to the next image, pauses for one second, and then proceeds through the following images at a rate of 10 per second until I lift the arrow key

This from knowing absolutely nothing about python a few hours prior. Using GPT-4 to write code makes me feel like a god dang superhero

Oh yeah, and I'd also never written a program that had a GUI before. In any language.

3

CptTombstone t1_jdhkpsp wrote

>At best it can recreate latent patterns in the training data.

Have you actually read the paper? On fresh LeetCode tests, GPT-4 significantly outperforms humans on all difficulty questions, reaching nearly double the performance of LeetCode users on medium and hard questions. Those are tests that were recently added to LeetCode's database and were not in the training data. Also, It performs genuinely well with image generation through SVG-code. The 3D modelling in Javascript example (Figure 2.7) is way out of the domain of what you would expect from "just a transformer", it demonstrates real understanding outside of the domain of the training data. It even outperforms purposely trained image generation models like stable diffusion in some regards, namely the adherence to instructions, although the generated images are not that visually pleasing compared to the likes of Dall-E of stable diffusion, which is a very unfair complaint for a freaking Language Model.

49

harharveryfunny t1_jdhkn99 wrote

> GPT-4 with image input can interpret any computer screen

Not necessarily - it depends how they've implemented it. If it's just dense object and text detection, then that's all you're going to get.

For the model to be able to actually "see" the image they would need to feed it into the model at the level of neural net representation, not post-detection object description.

For example, if you wanted the model to guage whether two photos of someone not in it's training set are the same person, then it'd need face embeddings to do that (to gauge distance). They could special case all sorts of cases like this in addition to object detection, but you could always find something they missed.

The back-of-a-napkin hand-drawn website sketch demo is promising, but could have been done via object detection.

In the announcement of GPT-4, OpenAI said they're working with another company on the image/vision tech, and gave a link to an assistive vision company... for that type of use maybe dense labelling is enough.

30

zy415 OP t1_jdhk9t8 wrote

Is it just ICML that reviewers tend to ghost? My experiences (as both reviewers and authors) with NeurIPS and ICLR are that reviewers tend to participate in discussion with authors

2

drcopus t1_jdhjddx wrote

Imo doing everything in-context seems more hacky - I would rather see a Toolformer approach but I understand that it probably requires more engineering and compute.

I reckon the in-context approach probably makes the plugins less stable as the model has to nail the syntax. ChatGPT is good at coding but it makes basic errors often enough to notice.

2

race2tb t1_jdhilvd wrote

They may no longer have a purpose. The Generative AI will just be fed directly by customers and producers. The Generative AI service will pay for portfolios of data content it cannot generate itself. People will get paid based on how much their feeds are woven into content.

2

WarmSignificance1 t1_jdhi4rj wrote

Reply to comment by race2tb in [N] ChatGPT plugins by Singularian2501

I get the concept, and I see this working for a small subset of websites. But have you seen an average person interact with a website before? Having a non-deterministic GUI will absolutely kill UX in my opinion. Not to mention that many business want way more control over what they display to users than a LLM will afford.

2

Deep-Station-1746 t1_jdhhbbg wrote

Nope. Ability to input something doesn't mean being able to use it reliably. For example, take this post - your eyes have an ability to input all the info on the screen, but as a contribution, this post is pretty worthless. And, you are a lot smarter than GPT-4, I think.

Edit: spelling

−19

race2tb t1_jdhgx48 wrote

Sites may not even exist. They may become feeds for the AI. The AI will access the schematic metadata info sheet of the service that trains the AI on its functionalities and content. Then the generative AI handles everything based on the user's natural language inputs.

4