Recent comments in /f/MachineLearning
starfries t1_jdyx0xh wrote
Reply to comment by nxqv in [D] FOMO on the rapid pace of LLMs by 00001746
I feel like Eliezer Yudkowsky proves that everyone can be Eliezer Yudkowsky, going from a crazy guy with a Harry Potter fanfic and a blog to being mentioned in your post alongside those other two names.
abnormal_human t1_jdywyac wrote
Reply to comment by antonivs in [D] FOMO on the rapid pace of LLMs by 00001746
I'm in the midst of a similar project. It also doesn't require massively expensive compute because for domain specific tasks, you often don't need models with gajillions of parameters to achieve business-interesting results.
Craksy t1_jdywiwi wrote
Reply to comment by antonivs in [D] FOMO on the rapid pace of LLMs by 00001746
Well that doesn't really contradict previous comment. They did mention fine tuning as an exception. GPT even stands for Generalized Pretrained Transformer. I'm sure some people like to draw hard lines between transfer learning/specialisation/fine tuning (different task or just different data) but at any rate, what you're describing can hardly be considered "training from scratch".
Indeed very few will need to be able to train models on that scale. In fact that was the whole motivation behind GPT. Training LLMs from scratch consumes a tremendous amount of resources, and 99% of that work goes into building a foundation that happens to generalize very well across many different tasks.
rylo_ren_ t1_jdyw4pf wrote
Reply to comment by henkje112 in [D] Simple Questions Thread by AutoModerator
Thank you!! I’ll give it a try. And yes I’m using sklearn
CriticalTemperature1 t1_jdyubo2 wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
Unfortunately the nature of this field is "the bitter lesson", scale trumps everything in machine learning so unfortunately/fortunately we are getting interested in language models when the scale is so large that it is impossible to make in impact on them unless you own your own $xxM company.
However, there are several interesting research avenues you can take:
- Improve small models with RLHF + fast implementations for a specific task (e.g. llama.cpp)
- Chaining models together with APIs to solve a real human problem
- Adding multimodal inputs to smaller LLMs
- Building platforms to make it easy to train and serve LLMs for many use cases
- Analyzing prompts and understanding how to make the most of the biggest LLMs
ginsunuva t1_jdyu8d2 wrote
Reply to comment by [deleted] in [D] FOMO on the rapid pace of LLMs by 00001746
Some things don’t need impacting and yet people need to force an impact (which may worsen things) to satisfy their ego, which usually soon goes back to needing more satisfaction after they realize the issue is psychological and always relative to the current situation. Not always of course, duh, but some times. I usually attribute it to OCD fixated on fear of death without “legacy.”
OkWrongdoer4091 t1_jdytmme wrote
Reply to comment by StellaAthena in [D] ICML 2023 Reviewer-Author Discussion by zy415
Let's see what happens next. Given that the reviews were released later than the deadline (at least for me), maybe there will be late responses to the rebuttals
---AI--- t1_jdysc3x wrote
Reply to comment by djc1000 in [D] FOMO on the rapid pace of LLMs by 00001746
But this just isn't true. You can train gpt 3 level transformers for like 600 usd
djc1000 t1_jdys06n wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
Totally agree with you. I was able to do interesting work when you could do that on a 10k budget. Now almost everyone is boxed out. It’s incredibly frustrating.
[deleted] t1_jdyrtnq wrote
Reply to comment by rshah4 in [D] FOMO on the rapid pace of LLMs by 00001746
[removed]
[deleted] t1_jdyp2rj wrote
Reply to comment by rshah4 in [D] FOMO on the rapid pace of LLMs by 00001746
[deleted]
antonivs t1_jdyp1zw wrote
Reply to comment by rshah4 in [D] FOMO on the rapid pace of LLMs by 00001746
> I wouldn't get worried about training these models from scratch. Very few people are going to need those skills.
Not sure about that, unless you also mean that there are relatively few ML developers in general.
After the ChatGPT fuss began, one of our developers trained a GPT model on a couple of different subsets of our company's data, using one of the open source GPT packages, which is obviously behind GPT 3, 3.5, or 4. He got very good results though, to the point we're working on productizing it. Not every model needs to be trained on internet-sized corpuses.
cheddacheese148 t1_jdyo5w5 wrote
Reply to comment by belikeron in [D] FOMO on the rapid pace of LLMs by 00001746
Ignoring literally everything else about what you said, it’s insanely cool to think about the first colonists in another solar system being the like 10th group to make the journey. If this isn’t already a movie, it needs to be!
sineiraetstudio t1_jdymf8q wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Oh, RLHF absolutely has all sorts of benefits (playing with top-p only makes answers more consistent - but sometimes you want to optimize for something different than "most likely"), so it's definitely here to stay (for now?), it's just not purely positive. Ideally we'd have a RLHF version that's still well calibrated (or even better, some way to determine confidence without looking at logits that also works with chain of thought prompting).
ZestyData t1_jdyli0a wrote
Reply to [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉 by aadityaubhat
Mods can we crack down on students posting basic tutorial-tier side projects to this sub, its becoming common lately.
xcviij t1_jdyl914 wrote
It's simply designed to give you the best response, whether that be real or fake it's incredible at understanding things and responding.
BawkSoup t1_jdykxs6 wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
FOMO? This is peak 1st world problems.
​
It's work, man. Do your passions on your own time. Or start your own company.
lqstuart t1_jdykj1b wrote
Such a stupid technology
nxqv t1_jdyjvxe wrote
Reply to comment by ghostfaceschiller in [D] FOMO on the rapid pace of LLMs by 00001746
Yeah it's really somethin
memberjan6 t1_jdyjd65 wrote
Reply to [P] two copies of gpt-3.5 (one playing as the oracle, and another as the guesser) performs poorly on the game of 20 Questions (68/1823). by evanthebouncy
I did the game with it.it worked great.
AuspiciousApple t1_jdyjclk wrote
It could do something semi-fancy, or it might simply be prepending the translation prompt with the previous input, translation, and user-based edits so that it can adjust to your specific perferences. It's called in-context learning, the model doesn't change so it doesn't learn in the standard sense, but it still learns from the current context.
MootVerick t1_jdyj5x3 wrote
Reply to comment by nxqv in [D] FOMO on the rapid pace of LLMs by 00001746
If ai can do research better than us, we are basically at singularity.
passerby251 t1_jdyj2bq wrote
Reply to comment by OkWrongdoer4091 in [D] ICML 2023 Reviewer-Author Discussion by zy415
No, but 2 out of 3 responded.
deepneuralnetwork t1_jdyis5x wrote
Reply to comment by ObiWanCanShowMe in [D] FOMO on the rapid pace of LLMs by 00001746
Came to same conclusion after using GPT-4. It’s kind of like a mediocre magic wand that still ends up making my job easier. It’s not perfect by all means but I’ve gotten way more value out of it already than the $20 I’ve paid into it so far.
kaisear t1_jdyxdbq wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
I feel the anxiety, too. At a deeper level, AGI will replace most of the jobs. Elon Musk says CEOs will be replaced before machine learning engineers. Society and the economy will need a structural change. Don't worry. We (humans) are all in the same boat.