Available_Lion_652 t1_jcjrfnx wrote on March 17, 2023 at 10:20 AM

Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]

I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates

LcuBeatsWorking t1_jcjref0 wrote on March 17, 2023 at 10:20 AM

Reply to [D] GPT-4 is really dumb by [deleted]

There are tons of mistakes in these models, many are more subtle. Which is why people should stop hyping it so much. It's not just math..

cipri_tom t1_jcjr8rn wrote on March 17, 2023 at 10:17 AM

Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Man, ChatRNN
The stars would be pouring over the repo if you named it ChatRNN. People love an antagonist, and "going back to the old days" and proving that was better

cipri_tom t1_jcjr74y wrote on March 17, 2023 at 10:17 AM

Reply to comment by gliptic in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

first vey is not vey ! :)

Single_Blueberry t1_jcjr43p wrote on March 17, 2023 at 10:16 AM

Reply to [D] GPT-4 is really dumb by [deleted]

Yes, it's well-known that current language models are pretty bad at math

No-Belt7582 t1_jcjqk6s wrote on March 17, 2023 at 10:08 AM

Reply to [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Pytorch 2.0 is really impressing everyday

gliptic t1_jcjpy0h wrote on March 17, 2023 at 10:00 AM

Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

What's wrong with Arveycavey ;).

shiva_2176 t1_jcjovuc wrote on March 17, 2023 at 9:45 AM

Reply to [D] Simple Questions Thread by AutoModerator

Could someone please recommend a machine learning algorithm to create a "Flood Risk Matrix"? Additionally, any article or video tutorial on this subject that elaborates on methodology is highly desired.

utopiah t1_jcjoj65 wrote on March 17, 2023 at 9:40 AM

Reply to comment by iJeff in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

in case you didn't follow https://www.reuters.com/technology/chinese-search-giant-baidu-introduces-ernie-bot-2023-03-16/ but nothing open source AFAICT.

ControversialGirl t1_jcjn8ve wrote on March 17, 2023 at 9:22 AM

Reply to comment by ReginaldIII in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

Smh smh carbon footprint

davorrunje t1_jcjjzqi wrote on March 17, 2023 at 8:34 AM

Reply to [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

Wow!!! This is fantastic!

supreme_harmony t1_jcjjx9p wrote on March 17, 2023 at 8:33 AM

Reply to comment by nat_friedman in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

I definitely hope someone proves me wrong and I wish all the people attempting the challenge the best.

thoughtdrops t1_jcjjq48 wrote on March 17, 2023 at 8:30 AM

Reply to comment by luaks1337 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

>Samsung Galaxy S22 Ultra.

can you link to the samsung galaxy post? that sounds great

Necessary_Ad_9800 t1_jcjj8b6 wrote on March 17, 2023 at 8:23 AM

Reply to comment by blueSGL in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Interesting. However I find some merges in SD to be terrible. But I have no doubt the open source community will make something amazing

blueSGL t1_jcjgsl1 wrote on March 17, 2023 at 7:48 AM

Reply to comment by Necessary_Ad_9800 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Exactly.

I'm just eager to see what fine tunes are going to be made on LLaMA now, and how model merging effects them. The combination of those two techniques has lead to some crazy advancements in the Stable Diffusion world. No idea if merging will work with LLMs as it does for diffusion models. (has anyone even tried yet?)

Necessary_Ad_9800 t1_jcjge23 wrote on March 17, 2023 at 7:42 AM

Reply to comment by blueSGL in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Everyone with their own private oracle in their hands. Pretty cool tbh

blueSGL t1_jcjga2i wrote on March 17, 2023 at 7:40 AM

Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Is it possible to split the model and do inference across multiple lower VRAM GPUs or does a single card have to have the minimum 16gig VRAM?

crazymonezyy t1_jcjfp9o wrote on March 17, 2023 at 7:32 AM

Reply to comment by Oswald_Hydrabot in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

> There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider.

Management has a different perspective on costs than you and me. The way cost-benefit is analyzed in a company is whether by increasing the input cost X% can the profit then be increased by a corresponding Y% due to an increase in scale (number of contracts). They are also shit scared of the new guy on the block and losing existing business to the 100 or so startups that will come up over the next week flashing the shiny new thing in front of customers. They also don't have the same perspective on open as us, where they see black boxes as a partnership opportunity.

I'm not saying you're wrong, in fact I agree with your sentiment and it's the same as mine, and I've tried to put forth some of these arguments to my boss for why we should still be building products in-house instead of GPT-everything. What I realised is when you talk to somebody on the business side you'd get a very different response to the ironclad defense that works perfectly in your head.

xEdwin23x t1_jcjfnlj wrote on March 17, 2023 at 7:31 AM

Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

First, this is not a "small" model so size DOES matter. It may not be hundreds billion parameters but it's definitely not small imo.

Second, it always has been (about data) astronaut pointing gun meme.

hosjiu t1_jcjey3z wrote on March 17, 2023 at 7:21 AM

Reply to comment by learn-deeply in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

sure, but its main focus is to try to help many people in the academic community out there can do “pretraining phase” by themeself for fast, cheap and reproducible research experiments.

cipri_tom t1_jcjeehj wrote on March 17, 2023 at 7:13 AM

Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

This is great! It just needs a name that's as great as the work

RWKV is a tongue twister. How about Ruckus?

FallUpJV t1_jcje89y wrote on March 17, 2023 at 7:10 AM

Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

I don't get this anymore if it's not the model size nor the transformer architecture then what is it?

Models were just not trained enough / not on the right data?

saintshing t1_jcjc3zs wrote on March 17, 2023 at 6:41 AM

Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

stolen from vitalik

>70 years is the time between the first computer and modern smart watches.

>70 years is more than the time between the first heavier-than-air flight and landing on the moon.

>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.

ThePerson654321 t1_jcjbrg5 wrote on March 17, 2023 at 6:36 AM

Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Wen paper?

mysteriousbaba t1_jcj9u7q wrote on March 17, 2023 at 6:11 AM

Reply to comment by Oswald_Hydrabot in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

Especially now that OpenAI have stopped publishing details of what goes into their black box. GPT-4 is the first time they haven't revealed details of their training architecture or dataset generation in the technical report.

Recent comments in /f/MachineLearning