Recent comments in /f/MachineLearning

Available_Lion_652 t1_jcjrfnx wrote

I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates

−13

shiva_2176 t1_jcjovuc wrote

Could someone please recommend a machine learning algorithm to create a "Flood Risk Matrix"? Additionally, any article or video tutorial on this subject that elaborates on methodology is highly desired.

1

blueSGL t1_jcjgsl1 wrote

Exactly.

I'm just eager to see what fine tunes are going to be made on LLaMA now, and how model merging effects them. The combination of those two techniques has lead to some crazy advancements in the Stable Diffusion world. No idea if merging will work with LLMs as it does for diffusion models. (has anyone even tried yet?)

3

crazymonezyy t1_jcjfp9o wrote

> There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider.

Management has a different perspective on costs than you and me. The way cost-benefit is analyzed in a company is whether by increasing the input cost X% can the profit then be increased by a corresponding Y% due to an increase in scale (number of contracts). They are also shit scared of the new guy on the block and losing existing business to the 100 or so startups that will come up over the next week flashing the shiny new thing in front of customers. They also don't have the same perspective on open as us, where they see black boxes as a partnership opportunity.

I'm not saying you're wrong, in fact I agree with your sentiment and it's the same as mine, and I've tried to put forth some of these arguments to my boss for why we should still be building products in-house instead of GPT-everything. What I realised is when you talk to somebody on the business side you'd get a very different response to the ironclad defense that works perfectly in your head.

1

xEdwin23x t1_jcjfnlj wrote

First, this is not a "small" model so size DOES matter. It may not be hundreds billion parameters but it's definitely not small imo.

Second, it always has been (about data) astronaut pointing gun meme.

30

saintshing t1_jcjc3zs wrote

stolen from vitalik

>70 years is the time between the first computer and modern smart watches.

>70 years is more than the time between the first heavier-than-air flight and landing on the moon.

>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.

2

mysteriousbaba t1_jcj9u7q wrote

Especially now that OpenAI have stopped publishing details of what goes into their black box. GPT-4 is the first time they haven't revealed details of their training architecture or dataset generation in the technical report.

2