Recent comments in /f/MachineLearning
LcuBeatsWorking t1_jcjref0 wrote
Reply to [D] GPT-4 is really dumb by [deleted]
There are tons of mistakes in these models, many are more subtle. Which is why people should stop hyping it so much. It's not just math..
cipri_tom t1_jcjr8rn wrote
Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Man, ChatRNN
The stars would be pouring over the repo if you named it ChatRNN. People love an antagonist, and "going back to the old days" and proving that was better
cipri_tom t1_jcjr74y wrote
Reply to comment by gliptic in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
first vey is not vey ! :)
Single_Blueberry t1_jcjr43p wrote
Reply to [D] GPT-4 is really dumb by [deleted]
Yes, it's well-known that current language models are pretty bad at math
No-Belt7582 t1_jcjqk6s wrote
Pytorch 2.0 is really impressing everyday
gliptic t1_jcjpy0h wrote
Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
What's wrong with Arveycavey ;).
shiva_2176 t1_jcjovuc wrote
Reply to [D] Simple Questions Thread by AutoModerator
Could someone please recommend a machine learning algorithm to create a "Flood Risk Matrix"? Additionally, any article or video tutorial on this subject that elaborates on methodology is highly desired.
utopiah t1_jcjoj65 wrote
Reply to comment by iJeff in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
in case you didn't follow https://www.reuters.com/technology/chinese-search-giant-baidu-introduces-ernie-bot-2023-03-16/ but nothing open source AFAICT.
ControversialGirl t1_jcjn8ve wrote
Reply to comment by ReginaldIII in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]
Smh smh carbon footprint
davorrunje t1_jcjjzqi wrote
Wow!!! This is fantastic!
supreme_harmony t1_jcjjx9p wrote
Reply to comment by nat_friedman in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman
I definitely hope someone proves me wrong and I wish all the people attempting the challenge the best.
thoughtdrops t1_jcjjq48 wrote
Reply to comment by luaks1337 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
>Samsung Galaxy S22 Ultra.
can you link to the samsung galaxy post? that sounds great
Necessary_Ad_9800 t1_jcjj8b6 wrote
Reply to comment by blueSGL in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Interesting. However I find some merges in SD to be terrible. But I have no doubt the open source community will make something amazing
blueSGL t1_jcjgsl1 wrote
Reply to comment by Necessary_Ad_9800 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Exactly.
I'm just eager to see what fine tunes are going to be made on LLaMA now, and how model merging effects them. The combination of those two techniques has lead to some crazy advancements in the Stable Diffusion world. No idea if merging will work with LLMs as it does for diffusion models. (has anyone even tried yet?)
Necessary_Ad_9800 t1_jcjge23 wrote
Reply to comment by blueSGL in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Everyone with their own private oracle in their hands. Pretty cool tbh
blueSGL t1_jcjga2i wrote
Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Is it possible to split the model and do inference across multiple lower VRAM GPUs or does a single card have to have the minimum 16gig VRAM?
crazymonezyy t1_jcjfp9o wrote
Reply to comment by Oswald_Hydrabot in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
> There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider.
Management has a different perspective on costs than you and me. The way cost-benefit is analyzed in a company is whether by increasing the input cost X% can the profit then be increased by a corresponding Y% due to an increase in scale (number of contracts). They are also shit scared of the new guy on the block and losing existing business to the 100 or so startups that will come up over the next week flashing the shiny new thing in front of customers. They also don't have the same perspective on open as us, where they see black boxes as a partnership opportunity.
I'm not saying you're wrong, in fact I agree with your sentiment and it's the same as mine, and I've tried to put forth some of these arguments to my boss for why we should still be building products in-house instead of GPT-everything. What I realised is when you talk to somebody on the business side you'd get a very different response to the ironclad defense that works perfectly in your head.
xEdwin23x t1_jcjfnlj wrote
Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
First, this is not a "small" model so size DOES matter. It may not be hundreds billion parameters but it's definitely not small imo.
Second, it always has been (about data) astronaut pointing gun meme.
hosjiu t1_jcjey3z wrote
Reply to comment by learn-deeply in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
sure, but its main focus is to try to help many people in the academic community out there can do “pretraining phase” by themeself for fast, cheap and reproducible research experiments.
cipri_tom t1_jcjeehj wrote
Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
This is great! It just needs a name that's as great as the work
RWKV is a tongue twister. How about Ruckus?
FallUpJV t1_jcje89y wrote
Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
I don't get this anymore if it's not the model size nor the transformer architecture then what is it?
Models were just not trained enough / not on the right data?
saintshing t1_jcjc3zs wrote
Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
stolen from vitalik
>70 years is the time between the first computer and modern smart watches.
>70 years is more than the time between the first heavier-than-air flight and landing on the moon.
>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.
mysteriousbaba t1_jcj9u7q wrote
Reply to comment by Oswald_Hydrabot in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
Especially now that OpenAI have stopped publishing details of what goes into their black box. GPT-4 is the first time they haven't revealed details of their training architecture or dataset generation in the technical report.
Available_Lion_652 t1_jcjrfnx wrote
Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]
I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates