Recent comments in /f/MachineLearning

currentscurrents t1_jdrpl3u wrote

I'm not really surprised. Anybody who's extensively used one of these tools has probably already run into their reasoning limitations.

Today's entire crop of self-supervised models can learn complex ideas, but they have a hard time manipulating them in complex ways. They can do a few operations on ideas (style transfer, translation, etc) but high-level reasoning involves many more operations that nobody understands yet.

But hey, at least there will still be problems left to solve by the time I graduate!

39

light24bulbs t1_jdrm9kh wrote

That's the part I wasn't getting. I assumed the fine tuning involved a different process. I see now that it is fact just more training data, often templated into a document in such a way that it's framed clearly for the LLM.

The confusing thing is that most of the LLM-as-a-service companies, Open-AI included, will ONLY take data in the question answer format, as if that's the only data you'd want to use to fine tune.

What if i want to feed a book in so we can talk about the book? A set of legal documents? Documentation of my project? Transcriptions of TV shows?

There are so many use cases for training on top of an already pre-trained LLM that aren't just question answering.

I'm into training llama now. I simply took some training code i found, removed the JSON parsing question answer templating stuff, and done.

1

baffo32 t1_jdrhj77 wrote

I was still confused as to your response, and I’m thinking that if you wanted a model to behave like you had given different pretraining data, you would probably first finetune on the different bulk data, and then after this finetune on the target task such as instruction following.

Instruction following is indeed of course just predicting the next word: on data where the next word is obedient to instructions preceding it.

1

Chris_The_Pekka t1_jdrfr4l wrote

Hello everyone, I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that is must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else.

1

farox t1_jdrfllu wrote

This is pretty much it. Just yesterday I needed to write some python web ui. So I described roughly what I needed and it gave me a solution for that. It had a couple of errors but gave me a basis to then work off of. Saved me a lot of "who do I do X with flask", but little complexity. For that I am sure it would take me longer to describe it, than to implement the logic myself.

7

enryu42 OP t1_jdrezba wrote

I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.

To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).

16

liqui_date_me t1_jdrd9dx wrote

Reply to comment by enryu42 in [D] GPT4 and coding problems by enryu42

One could argue that even standardized tests are somewhat boilerplate - if you practice enough SAT tests you’ll eventually do quite well at them, the questions are quite similar to each other from exam to exam. Ditto for AP exams.

I think a serious test for GPT4’s intelligence will be on one of the competitive entrance exams for some countries, like the IIT-JEE or the Gaokao or the International Math Olympiad, where the questions are made by domain experts and are designed to be intentionally difficult and specialized to solve.

19

enryu42 OP t1_jdrbyh5 wrote

Arithmetic can be solved in a toolformer-like way, by just giving it an access to a calculator. But this wouldn't help with coding.

Regarding the point about boilerplate, this is exactly what is surprising: GPT4 performs very well on exams/tests, which supposedly require some amount of creative reasoning. So either the tests are poorly designed, or it can do some creative tasks while not others. If the latter is the case, it would be interesting to learn which are the areas where it performs well, and why.

21

russell616 t1_jdrbedj wrote

Dumb question that's probably asked multiple times. But where should I continue in learning ML? I went through the tensorflow cert from Coursera and am yearning for more. Just don't know where to go now without a structured curriculum.

1

austintackaberry t1_jdrau92 wrote

Yes, that's correct.

​

>[@matei_zaharia] The code is at https://github.com/databrickslabs/dolly. You can also contact us for weights, just want to make sure people understand the restrictions on the fine tuning data (or you can get that data from Stanford and train it yourself).

https://twitter.com/matei_zaharia/status/1639357850807054336?s=20

1

liqui_date_me t1_jdr8516 wrote

This comment about GPT-4’s limited abilities in solving arithmetic was particularly interesting: https://www.reddit.com/r/singularity/comments/122ilav/why_is_maths_so_hard_for_llms/jdqsh5c/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3

Controversial take: GPT-4 is probably good for anything that needs lots of boilerplate code or text, like ingesting a book and writing an essay, or drafting rental contracts. There’s a lot of value in making that area of the economy more efficient for sure.

But for some of the more creative stuff it’s probably not as powerful and might actually hinder productivity. It still makes mistakes and programmers are going to have to go and fix those mistake’s retroactively.

23

minhrongcon2000 t1_jdr6xtv wrote

Right now yes! Most of the papers published recently (like Chinchilla, GPT, etc.) show a scaling law on the number of data wrt the number of params in a model. If you want a no-brain training with little preprocessing, bigger models are mostly better. However, if you have sufficient data, then the number of params needed may be mitigated. However, I feel like the number of parameters decreases really slow when the data size grows. So yeah, we still somehow need larger model (of course, this also depends on the scenario where you apply LLM, for example, you don't really need that big of a model for an ecom app)

2

davidbun t1_jdr115n wrote

Full disclosure, I'm one of the creators of the project, but this is exactly why we've built Deep Lake, the Data Lake for Deep Learning. It addresses all your concerns. Specifically:

- Works with any framework (PyTorch, TensorFlow - you might also want to look into training models with MMDetection)- Stores (and visualizes!) all your data, together with your metadata.

- Outperforms Zarr (we built on top of it in v1, but sadly were constrained a lot by it, so had to build everything from scratch), as well as various dataloaders in a variety of use cases.

- Achieves near full or full GPU utilization regardless of scale (think LAION-400B images battle-tested). This is regardless of which cloud you store your images on and where do you train your model, e.g., streaming from EC2 to AWS Sagemaker and achieving full GPU utilization at half the cost (no GPU idle time due to streaming capability).

27

theogognf t1_jdqy28l wrote

I stay up-to-date mainly by browsing https://paperswithcode.com/ in the morning and once a week at work. There have definitely been a good number of times that I stumble across some new method or repo to play around with for my main area of interest that ends up having some immediate return. I occasionally browse by all topics there, but I usually only filter by my main interests. I can't imagine staying current without some other similar site

1