Recent comments in /f/MachineLearning
bo_peng OP t1_jcjupnc wrote
Reply to comment by ThePerson654321 in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Soon :) working on it. Meanwhile take a look at https://github.com/ridgerchu/SpikeGPT which is a SNN version of RWKV, so has some explanation in the paper.
Available_Lion_652 t1_jcjukim wrote
Reply to comment by PM_ME_ENFP_MEMES in [D] GPT-4 is really dumb by [deleted]
Yes, there is currently a fix for this problem. In Llamas paper they splited numbers into digits 12345 became 1 2 3 4 5 29 December became 2 9 December.
It helps with addition, subtracting but not with complex reasoning
bo_peng OP t1_jcjuinf wrote
Reply to comment by yehiaserag in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
More ctxlen and slightly better trained :) same speed & vram
bo_peng OP t1_jcjuhix wrote
Reply to comment by blueSGL in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Yes ChatRWKV v2 supports that :)
Take a look at the "strategy" guide: https://pypi.org/project/rwkv/
bo_peng OP t1_jcjuejz wrote
Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
ChatRNN is indeed a great name :)
R W K V are the four major parameters in RWKV (similar to QKV for attention).
I guess you can pronounce it like "Rwakuv" (A bit like racoon)
PM_ME_ENFP_MEMES t1_jcjubn0 wrote
Reply to [D] GPT-4 is really dumb by [deleted]
I read something about LLMs and why they’re so bad at math: during the tokenisation process, numbers don’t automatically get tokenised as the actual number. So, 67 may be tokenised as a token representing ‘67’ and all would be well.
However, it’s also likely that 67 may be tokenised as being two tokens, ‘6’ and ’7’, which may confuse the bot if it’s asked to do 67^2.
yumiko14 t1_jcju8tw wrote
Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]
link to that article please
ShredForMe t1_jcjtlj5 wrote
Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]
then I might just as well do that myself
Available_Lion_652 t1_jcjtc6h wrote
Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]
I don t understand why people down voted. I saw a claim that GPT 4 was trained on 25k Nvidia A100 for several months. It has used x100 more compute power than GPT3, based on that post. 20 B Llama model was trained on 1.4 trillions tokens. So yeah, I think that my post is based on these claims
pobtastic t1_jcjtadp wrote
Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]
I did try a few follow up prompts, but nothing changed the structure at all - I mean, it wasn’t for any purpose other than testing it, but I definitely would have felt it unsatisfactory if I’d really needed it for something work related
DamienLasseur t1_jcjt8gp wrote
Reply to comment by boostwtf in [D] GPT-4 is really dumb by [deleted]
Researchers have been using the term for a while now as well. It's mostly for when the model confidently outputs an incorrect answer such as fake website links etc.
[deleted] OP t1_jcjt6mc wrote
Reply to [D] GPT-4 is really dumb by [deleted]
[deleted]
Available_Lion_652 t1_jcjt3yi wrote
Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]
The tokenizer of Llama from Facebook splits numbers into digits such that the model is better at math calculations. The question that I asked the model is more than adding or subtracting numbers. The model must understand what a perfect cube is, which it does, but also it must not hallucinate when reasoning, which it fails at
ChuckSeven t1_jcjt0je wrote
Reply to comment by No-Belt7582 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
This is the way.
bacon_boat t1_jcjt0bi wrote
Reply to [D] GPT-4 is really dumb by [deleted]
I know, that joint probability distribution P(next_word|words) is really *dumb*.
boostwtf t1_jcjszyg wrote
Reply to [D] GPT-4 is really dumb by [deleted]
Are we using the term 'hallucinate' now? :D
2muchnet42day t1_jcjsy5s wrote
Reply to comment by farmingvillein in [D] What is the best way to fine tune a LLM with your own data and build a custom text classifier? by pgalgali
Why do you suggest Roberta and not something like LLAMA or Standford Alpaca?
NotARedditUser3 t1_jcjsxlo wrote
Reply to comment by pobtastic in [D] GPT-4 is really dumb by [deleted]
i think there you just have to be more creative in your prompt.... I want you to restructure this code to where entirely different methods are called, the comments are different, but the result / output is still effectively the same.....
Single_Blueberry t1_jcjsxa1 wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
>the fact that GPT 4 may be two magnitude orders bigger than GPT 3
I'm not aware of any reliable sources that claim that.
Intuitively I don't see why it would stop hallucinating. I imagine the corpus - as big as it may be - doesn't contain a lot of examples for the concept of "not knowing the answer".
That's something people use a lot in private conversation, but not in written language on the public internet or books. Which afaik is where most of the data comes from.
NotARedditUser3 t1_jcjsqta wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
All language models are currently trash at math. It's not an issue of training material, it's a core flaw in how they function.
People have found some success in getting reasonable outputs from language models using language input-output chains , breaking the task up into smaller increments. Still possible to hallucinate though and i saw one really good article that explained how even tool-assisted language chains (where a language model is able to print a token in one output, to call a function in a powershell or python script to appear in the next input, to generate the correct output later on) , when generating funny unexpected numbers from a 'trusted' tool in the input, the language model sometimes still disregards it, if it's drastically farther off than what the model's own training would lead it to expect the answer to look like.
Which also makes sense - the way the language model works , as we all know, it's just calculating which words look appropriate next to each other. Or tokens, to be more exact. The language model very likely doesn't distinguish much of a difference from 123,456,789 and 123,684,849 , both probably evaluate to roughly the same accuracy stat when it's looking for answers to a math question, in that both are higher than some wildly different answer such as.... 4.
seba07 t1_jcjsmxi wrote
Reply to [D] GPT-4 is really dumb by [deleted]
The model predicts (in a nutshell) the next word in the answer. It simply can't do math. That's just a known limitation.
Jean-Porte t1_jcjset8 wrote
Reply to [D] GPT-4 is really dumb by [deleted]
On this account, 90+% of humans are dumb
pobtastic t1_jcjs0yn wrote
Reply to [D] GPT-4 is really dumb by [deleted]
I asked it to rewrite a simple bash script “so it doesn’t look like I stole it” (just for kicks) and all it did was to rename functions… literally everything else, even the comments were exactly identical… Not very impressive.
bo_peng OP t1_jcjuvg9 wrote
Reply to comment by londons_explorer in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Yeah that will be cool. You are welcome to try it and I can help.
The rwkv pip package: https://pypi.org/project/rwkv/