Recent comments in /f/MachineLearning

dudaspl t1_je3zq9d wrote

Exactly. LLMs mimic intelligence by just generating text, and since they are trained on civilization-level knowledge/data they do it very well and can seem as intelligent as humans.

The real test is to put them to novel scenarios and see how their intelligence can produce solutions to these, i.e. put it in an some sort of escape room and see if they can escape.

1

SnooMarzipans3021 t1_je3x9ah wrote

Im unable to load full res image into the model and train it even with batch size 1 and all sorts of optimizations. My idea is to add two small modules to my network. One at the front which downscales the image and one at the back which upscales the image.

The problem here is the upscaling, it will need to be some sort of super resolution model.

1

OkWrongdoer4091 t1_je3vscq wrote

My only hope that my metareviewers will be diligent enough to reach out to the reviewers that didn't respond and ask if they'd changed their mind after reading the rebuttals. Not responding to the rebuttals destroys the purpose of rebuttal (and the review itself). If you volunteer to serve the community as a reviewer, why not commit to doing a good job of it? I wish I could contact the ACs, but the buttons are no longer available on OpenReview.

1

Beautiful-Gur-9456 OP t1_je3uesm wrote

I haven't done it yet, but I'm working on it! Their suggested sampling procedure requires multiple FID calculation, so I'm thinking of how to incorparate it efficiently.

Their scale is indeed large, it would cost me a few hundread bucks to train CIFAR10. My checkpoint was trained with much smaller size πŸ˜†

1

Craksy t1_je3tzt3 wrote

Aah, got you. My bad. Well, I suppose most people mainly think of NLP in these kind of contexts. That's where my mind went, anyway.

Training from scratch on a DSL is indeed an entirely different scale of problem (assuming it's not some enormous, complex DSL that relies heavily on context and thousands of years of culture to make sense of).

Sounds very interesting though. If you're allowed to share more information, I'd love to hear about it

3

geekfolk t1_je3qyfr wrote

I don’t know about this model, but GANs are typically smaller than diffusion models in terms of num of params. The image structure thing probably has something to do with the network architecture since GANs rarely use attention blocks and the network architecture of diffusion models is more hybrid (typically CNN + attention)

1

Beautiful-Gur-9456 OP t1_je3qsdu wrote

Nope. I mean the LPIPS loss, which kinda acts like a discriminator in GANs. We can replace it to MSE without much degradation.

Distilling SOTA diffusion model is obviously cheating πŸ˜‚, so I didn't even think of it. In my view, they are just apples and oranges. We can augment diffusion models with GANs and vice versa to get the most out of them, but what's the point? That would make things way more complex. It's clear that diffusion models cannot beat SOTA GANs for one-step generation; they've been tailored for that particular task for years. But we're just exploring possibilities, right?

Aside from the complexity, I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator. Using pre-trained VGG is cheating anyway. That would be an interesting direction to see!

2

LeEpicCheeseman t1_je3p1no wrote

Really depends on how "general" you define AGI to be.

To me, AGI means developing agents that can operate autonomously in the real world and make sensible decisions across a wide range of situations and domains. I don't think we're currently very close to developing these sorts of agents, although it probably isn't more than a couple decades away.

2