Recent comments in /f/MachineLearning
bo_peng OP t1_jck4qkr wrote
Reply to comment by sanderbaduk in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
I manually disabled the <|endoftext|> token in the demo, so it can output irrelevant contents after a task is completed :)
olmec-akeru t1_jck3hvp wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
Precisely right; I hadn't applied my mind to that expansion. My comment is erroneous.
Available_Lion_652 t1_jck2th4 wrote
Reply to comment by olmec-akeru in [D] GPT-4 is really dumb by [deleted]
Not quite :). The second operation (a + b + c)^2014 = a^2014 + b^2014 + c^2014 is false. It does not understand complex math operations. To be sincere solving the above problem means it can do better math than most humans.
olmec-akeru t1_jck223w wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
Yeah, totally right—and I understand that the specifics really matter in some cases (for example calculating a starship trajectory).
What intrigues me, is that in ideas of concept, of logic, this specific error isn't meaningful. i.e. if the sum of three primes was initially correct the approach wouldn't be invalid. There is something in this.
Spiritual-Reply5896 t1_jcjz3fn wrote
Why is everyone talking about the context length, and not about some kind of memory retrieval? Is the assumption that by increasing context length we can eventually scale it to infite, thus replacing any kind of external memory?
ml_head t1_jcjyon2 wrote
Reply to comment by Empty-Revolution7570 in [P] Multimedia GPT: Can ChatGPT/GPT-4 be used for vision / audio tasks just by prompt engineering? by Empty-Revolution7570
I'm sure that it does. And would beca better demo of the technology. Maybe, keep the Cinderella story too, since some people wouldn't read your original story and wouldn't be able to tell if the summary is good. You might want to add an image with your original story in a format that wouldn't be easy to OCR, like using weird font on noisy background. In this way you are making the story available to humans but taking measures to hide it from any web crawler used by language models.
sanderbaduk t1_jcjy47g wrote
Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
​
Does it have trouble stopping? I see it ramble on, e.g. https://imgur.com/a/e6k7pSP
kittenkrazy t1_jcjxr0b wrote
I wonder if this can effectively be used in LLaMA. 32K context would be a game changer
SQLGene t1_jcjxkwp wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
There's a lot of hype and misinformation going around right now, thus the sassy remarks.
This is the best post I've seen about how it works behind the scenes. We shouldn't expect fancy autocomplete to be good at math.
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
Available_Lion_652 t1_jcjxja6 wrote
Reply to comment by JaCraig in [D] GPT-4 is really dumb by [deleted]
This is a 5 the grade math Olympiad problem. Sorry for not mentioning it. Good luck if you can resolve it with a basic app to calculate it
Available_Lion_652 t1_jcjxe16 wrote
Reply to comment by SQLGene in [D] GPT-4 is really dumb by [deleted]
Good remarks. This is my first post on this reddit. I didn't know what title to give. I was angry at ClosedAI for not revealing models details and dataset details.
JaCraig t1_jcjx7lx wrote
Reply to [D] GPT-4 is really dumb by [deleted]
Genuine question: Why are you trying to use a language model to do something that you could write a basic app to calculate?
Like I would have asked it to write an app in JavaScript, Java, C#, etc. Some popular language to calculate four perfect cubes to represent a number. That'd probably get me 90% of the way there then I'm just fixing a couple bugs. That seems like the more intuitive use case to me but I'm also a dev by trade.
SQLGene t1_jcjx3a6 wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
The title is "[D] GPT-4 is really dumb" and what would be accurate is "[D] GPT-4 is bad at math problems". This was a known issue with GPT 3.5 and I expect it to continue to be an issue. but I think it's a mischaracterization to say it's "dumb" when there are a number of non-mathematical applications where it's impressive so far.
So while my statement was a simplification, I stand by the intention. You are evaluating a tool based on an application I don't think it's meant for.
Available_Lion_652 t1_jcjwqg0 wrote
Reply to comment by SQLGene in [D] GPT-4 is really dumb by [deleted]
Probably I have to specify. The problem that I give to GPT 4 to solve was a 5tg grade math Olympiad problem. You re statement is unfounded
SQLGene t1_jcjwhst wrote
Reply to [D] GPT-4 is really dumb by [deleted]
"I purchased a Swiss Army Knife but it's a terrible calculator"
Available_Lion_652 t1_jcjwf9c wrote
Reply to comment by olmec-akeru in [D] GPT-4 is really dumb by [deleted]
Yes, that was interesting, :) but it failed at adding operatios
Available_Lion_652 t1_jcjw5cf wrote
Reply to comment by kaoD in [D] GPT-4 is really dumb by [deleted]
I understood the post really well. My comment was an augmentation. I think you did not understand what I said
olmec-akeru t1_jcjw333 wrote
Reply to [D] GPT-4 is really dumb by [deleted]
Right, so ignoring the specific error and thinking about the general approach: adding a^3 is a fourth term; and it happens that a = 0.
Sneaky, but not illogical.
Edit: the above is wrong, read the thread below for OPs insights.
kaoD t1_jcjvsmo wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
Looks like you don't understand the comment you're replying to.
Single_Blueberry t1_jcjvh6o wrote
Reply to comment by Available_Lion_652 in [D] GPT-4 is really dumb by [deleted]
Again, can't find a reliable source for that.
I personally doubt that GPT-4 is significantly larger than GPT 3.x, simply because that would also further inflate inference cost, which you generally want to avoid in a product (as opposed to a research feat).
Better architecture, better RLHF, more and better train data, more train compute? Seems all reasonable.
Orders of magnitudes larger again? Don't think so.
[deleted] t1_jcjvdyu wrote
[removed]
sweatierorc t1_jcjv2g4 wrote
Reply to [D] GPT-4 is really dumb by [deleted]
"hype is a hell of a drug", rick james
Logicalist t1_jcjv1pq wrote
Reply to comment by DamienLasseur in [D] GPT-4 is really dumb by [deleted]
Delusion. Would be entirely more accurate. Hallucinate is just wrong.
Available_Lion_652 t1_jcjuxp5 wrote
Reply to comment by yumiko14 in [D] GPT-4 is really dumb by [deleted]
Is not an article. Someone on Twitter estimated the total compute power based on a report that Microsoft had 25k A100 GPU racks. That was all
iJeff t1_jck5f7y wrote
Reply to comment by utopiah in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
Thanks!