Recent comments in /f/MachineLearning

was_der_Fall_ist t1_je3ng6m wrote

Maybe that’s part of the benefit of using looped internal monologue/action systems. By having them iteratively store thoughts and otherwise in their context window, they no longer have to use the weights of the neural network to “re-think” every thought each time they predict a token. They could think more effectively by using their computation to do other operations that take the internal thoughts and actions as their basis.

1

RandomScriptingQs t1_je3lv1g wrote

Is anyone able to contrast MIT's 6.034 "Artificial Intelligence, Fall 2010" versus 18.065 "Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018"?
I'm wanting to use the one that lies slightly closer to the more theoretical/foundational side as supplementary study and have really enjoyed listening to both Instructors in the past.

2

martianunlimited t1_je3lmsp wrote

Relevant publication: https://cdn.openai.com/papers/gpt-4.pdf

I can take comfort in knowing that while GPT-4 is 10-percentile better than me in GRE Verbal, I still score (slightly) better than GPT-4 in GRE Quantitative and very similarly in GRE-Writing. (English is not my first language)

Side note: I am surprised how poorly GPT-4 do in AP English Language and AP English Lit; I thought as a large language model, it would have an advantage in those sort of questions. (Sorry, not an American, i could be misunderstanding what exactly is being tested in those subjects)

2

geekfolk t1_je3io3b wrote

using pretrained models is kind of cheating, some GANs use this trick too (projected GANs). But as a standalone model, it does not seem to work as well as SOTA GANs (judged by the numbers in the paper)

​

>Still, it's a lot easier than trying to solve any kind of minimax problem.

This is true for GANs in the early days; however, modern GANs are proved to not have mode collapse and the training is proved to converge.

>It's actually reminiscent of GANs since it uses pre-trained networks

I assume you mean distilling a diffusion model in the paper. There have been some attempts to combine diffusion and GANs to get the best of both worlds but afaik none involved distillation, I'm curious if anyone has tried distilling diffusion models into GANs.

0

Beautiful-Gur-9456 OP t1_je3hxbn wrote

The training pipeline, honestly, is significantly simpler without adversarial training, so the design space is much smaller.

It's actually reminiscent of GANs since it uses pre-trained networks as a loss function to improve the quality, though it's completely optional. Still, it's a lot easier than trying to solve any kind of minimax problem.

2

uiucecethrowaway999 t1_je3hpyz wrote

>Making the bold an unscientific assumption that this sub is at least decently representative of people “in the know” on ML,.

The increasing number of posts like this indicate that it may no longer be the case.

I’m not trying to be snarky or mean when I say this, but these sorts of posts offer pretty much zero insight or discussion value. There are a lot of very knowledgeable minds on this subreddit, but you won’t be able to get much out of it by asking such vague and sweeping questions.

11