Recent comments in /f/MachineLearning
tdgros t1_jczzeqz wrote
Reply to comment by phira in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
"what does a cow drink?" "Milk"
satireplusplus t1_jczz8e6 wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
VRAM is the limiting factor to run these things though, not tensor cores
Carrasco_Santo t1_jczz5k9 wrote
Reply to comment by phira in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
In theory, Open Assistant should at least match the best corporate models if enough people start accessing the language project and contribute at least a little bit each week to creating prompts, sorting prompts, etc.
In theory, if 10,000 people do this work every month, that's a much greater number of people than any AI team in large corporations. The issue is the quality of the work.
i_sanitize_my_hands OP t1_jczyrsl wrote
Reply to comment by Joel_Duncan in [D] Determining quality of training images with some metrics by i_sanitize_my_hands
Not expecting a magic bullet solution. Been in the field long enough to know that.
However, any written record of the intelligent ways you mentioned are valuable and worth going through.
One of the reasons it gets asked a lot is because image quality analysis doesn't seem to get enough air time. There are only few papers and sone as old as 2016. They font reflect the trend since 'all you need is attention '
Diligent-Wing-1486 t1_jczxi75 wrote
Reply to comment by boostwtf in [P] TherapistGPT by SmackMyPitchHup
Idk, sounds like a cute diminutive this way ,😂😂
I_will_delete_myself t1_jczvx4j wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
That or just use the cloud until Nvidia releases a 48gb gpu (which will happen sooner than one would think. Games are getting limited by VRAM)
currentscurrents t1_jczuqo8 wrote
Reply to comment by gybemeister in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Just price. They have the same amount of VRAM. The 4090 is faster of course.
gybemeister t1_jczucbf wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Any reason, beside price, to buy 3090s instead of 4090s?
phira t1_jczsx36 wrote
“How many months are there in the year?” “There are 365 days in the year”
Got a ways to go I guess, nice to see this stuff moving tho, I remember writing my first chat bot in 1999 and even the worst of the current models are brilliant in comparison
timedacorn369 t1_jczscaf wrote
It's mentioned as open source. So it means I can get the model weights and run it locally if I want to right?
Civil_Collection7267 t1_jczrmem wrote
Reply to comment by tungns91 in [D] Best ChatBot that can be run locally? by rustymonster2000
Tom's Hardware has an article on that: https://www.tomshardware.com/news/running-your-own-chatbot-on-a-single-gpu
ericflo t1_jczqkmj wrote
Reply to comment by kross00 in [D] Best ChatBot that can be run locally? by rustymonster2000
LoRA is how you train llama into alpaca on consumer hardware
2muchnet42day t1_jczooi6 wrote
Reply to comment by UnusualClimberBear in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Yeah, I wouldn't buy AMD either. It's a shame that NVIDIA is basically a monopoly in a AI, but it is what it is.
lucidraisin t1_jczoelv wrote
Reply to comment by Unlucky_Excitement_2 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email
currentscurrents t1_jczods2 wrote
Reply to comment by UnusualClimberBear in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
I'm hoping that non-Vonn-Neumann chips will scale up in the next few years. There's some you can buy today but they're small:
>NDP200 is designed natively run deep neural networks (DNN) on a variety of architectures, such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW.
>Up to 896k neural parameters in 8bit mode, 1.6M parameters in 4bit mode, and 7M+ In 1bit mode
An arduino idles at about 10mw, for comparison.
The idea is that if you're not shuffling the entire network weights across the memory bus every inference cycle, you save ludicrous amounts of time and energy. Someday, we'll use this kind of tech to run LLMs on our phones.
Unlucky_Excitement_2 t1_jczo8wf wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.
lucidraisin t1_jcznnvh wrote
Reply to comment by Unlucky_Excitement_2 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.
Board_Stock t1_jczly8z wrote
hello, I've recently run the alpaca.cpp on my laptop, but I want to give it a context window so that it can remember conversations, and make it voice activated using python. Can someone guide me on this?
UnusualClimberBear t1_jczl0bn wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Better light a candle rather than buy an AMD GC for anything close to cutting edge.
Xotchkass t1_jczkfku wrote
Reply to [D] Simple Questions Thread by AutoModerator
What are the input length of the Llama model? Can't find it anywhere.
currentscurrents t1_jczkbue wrote
Reply to comment by 2muchnet42day in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Honestly, they already cost more than I can afford to spend on a side project.
I'm just gonna have to wait and hope that AMD gets their act together on AI support.
Joel_Duncan t1_jczkbc0 wrote
I keep seeing this getting asked like people are expecting a magic bullet solution.
​
In general you can only get out something within the realm of what you put in.
There are intelligent ways to structure training and models, but you can't fill in expected gaps without training with a reference or a close approximation of what those gaps are.
My best suggestion is to limit your input data or muxed model to specific high resolution subsets.
ex. You can train a LoRa on a small focused subset of data.
Unlucky_Excitement_2 t1_jczk2lm wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?
currentscurrents t1_jczjxbb wrote
Reply to comment by londons_explorer in [P] TherapistGPT by SmackMyPitchHup
Data is really hard to get because of privacy regulations too.
There are millions of brain MRI scans sitting in hospital databases but nobody can use them without individually asking each patient. Most published datasets are only a couple dozen scans, and plenty are N=1.
currentscurrents t1_jd007aa wrote
Reply to comment by satireplusplus in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Right. And even once you have enough VRAM, memory bandwidth limits the speed more than tensor core bandwidth.
They could pack more tensor cores in there if they wanted to, they just wouldn't be able to fill them with data fast enough.