Recent comments in /f/MachineLearning
Eggy-Toast t1_jd18vf2 wrote
Reply to [P] TherapistGPT by SmackMyPitchHup
Probably good to display some sort of prominent * TherapistGPT and it’s creators do not practice medicine, TherapistGPT is not an alternative for actual Therapeutic care administered by professionals, etc
[deleted] t1_jd1758w wrote
Reply to comment by Radiant_Rhino in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
[removed]
The_frozen_one t1_jd125zf wrote
Reply to comment by mycall in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Not sure I understand. Is it better? Depends on what you're trying to do. I can say that alpaca-7B and alpaca-13B operate as better and more consistent chatbots than llama-7B and llama-13B. That's what standard alpaca has been fine-tuned to do.
Is it bigger? No, alpaca-7B and 13B are the same size as llama-7B and 13B.
VodkaHaze t1_jd11vhm wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
There's also the tenstorrent chips coming out to public which are vastly more efficient than nvidia stuff
currentscurrents t1_jd10ab5 wrote
Reply to comment by pier4r in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.
>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.
By comparison, the Nvidia 4090 is clocking in at ~1000GB/s
Apple is clearly positioning their devices for AI.
lurkinginboston t1_jd0zr7c wrote
Reply to comment by Straight-Comb-6956 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
I will assume you are much more knowledgeable than I am in this space.. have few basic questions that have been bothering me since all the craze started around GPT and LLM recently.
I managed to get Alpaca working on my end using the above link and get very good result. LLaMa biggest takeaway was it is able to reproduce quality comparable to GPT and much lower compute size. If this is the case, why is the ouput much shorter on LLaMa than what I get on OpenGPT? I would imagine the OpenGPT reponse is much longer because ... it is just bigger? What is the limiting factor to not for us to get longer generated response comparable to GPT?
ggml-alpaca-7b-q4.bin is only 4 gigabyt - I guess this what it means by 4bit and 7 billion parameter. Not sure if rumor or fact, GPT3 model is 128B, does it mean if we get trained model of GPT, and manage to run 128B locally, will it give us the same results? Will it be possible to retrofit GPT model within Alpaca.cpp with minor enhancement to get output JUST like OpenGPT? I have read to fit 128B, it requires muliple Nvidia A100.
Last question, inference means that it gets output from a trained model. Meta/OpenAI/Stability.ai have the resources to train a model. If my understanding is correct, Alpaca.cpp or https://github.com/ggerganov/llama.cpp are a sort of 'front-end' for these model. They allow us to provide an input to get an output by inference with the model. The question I am trying to ask is, what is so great about llama.cpp? Is it because it's in C? I know there is Rust version of it out, but it uses llama.cpp behind the scene. Is there any advantage of an inference to be written in Go or Python?
KingsmanVince t1_jd0z599 wrote
Reply to comment by tdgros in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
Technically the truth?
killerstorm t1_jd0yzdj wrote
Reply to [D] Simple Questions Thread by AutoModerator
Have people tried doing "textual inversion" for language models? (i.e not in a context of StableDiffusion)
mycall t1_jd0ytah wrote
Reply to comment by The_frozen_one in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
alpaca-30B > llama-30B ?
mycall t1_jd0yi8i wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
> if you're not shuffling the entire network weights across the memory bus every inference cycle
Isn't this common though?
ertgbnm t1_jd0xwfh wrote
Reply to comment by Civil_Collection7267 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Good to hear. Thanks!
blueSGL t1_jd0uijz wrote
Reply to comment by tdgros in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
depends on the age of the cow I suppose.
The_frozen_one t1_jd0sqd7 wrote
Reply to comment by RoyalCities in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
You can run llama-30B on a CPU using llama.cpp, it's just slow. The alpaca models I've seen are the same size as the llama model they are trained on, so I would expect running the alpaca-30B models will be possible on any system capable of running llama-30B.
Bloaf t1_jd0qy7z wrote
Reply to comment by RoyalCities in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
You can run it on your CPU. My old i7 6700k spits out 13B words a little slower than I could read them out loud. I'll test the 30B tonight on my 5600X
pier4r t1_jd0pf1x wrote
Reply to comment by wojtek15 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
> 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine.
But it doesn't have the same bandwidth as the VRAM on the GPU card iirc.
Otherwise every integrated GPGPU would be better due to available ram.
The neural engine on M1 and M2 is usable IIRC only with apple libraries, that may not be used by notable models yet.
Civil_Collection7267 t1_jd0pcqf wrote
Reply to comment by ertgbnm in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Untuned 30B LLaMA, you're saying? It's excellent and adept at storywriting, chatting, and so on, and it can output faster than ChatGPT at 4-bit precision. While I'm not into this myself, I understand that there is a very large RP community at subs like CharacterAI and Pygmalion, and the 30B model is genuinely great for feeling like talking to a real person. I'm using it with text-generation-webui and custom parameters and not the llama.cpp implementation.
For assistant tasks, I've been using either the ChatLLaMA 13B LoRA or the Alpaca 7B LoRA, both of which are very good as well. ChatLLaMA, for instance, was able to answer a reasoning question correctly that GPT-3.5 failed, but it has drawbacks in other areas.
The limitations so far are that none of these models can answer programming questions competently yet, and a finetune for that will be needed. They also have the tendency to hallucinate frequently unless parameters are made more restrictive.
xEdwin23x t1_jd0pc6v wrote
Active learning deals with using a small subset of representative images that should perform as well as a larger set of uncurated images. You can consider looking into that.
wojtek15 t1_jd0p206 wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Hey, recently I was thinking if Apple Silicon Macs may be best thing for AI in the future. Most powerful Mac Studio has 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine. If only memory size is considered, even A100, let alone any consumer oriented model, can't match. With this amount of memory you could run GPT3 Davinci size model in 4bit mode.
[deleted] OP t1_jd0nazd wrote
Reply to comment by RoyalCities in [P] The next generation of Stanford Alpaca by [deleted]
[removed]
RoyalCities t1_jd0m4vt wrote
Reply to [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Thanks. So bit confused here. Ot mentions needing an A100 to train. Am I able to run this off a 3090?
oathbreakerkeeper t1_jd0lu2p wrote
Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Am I looking in the wrong place? It seems like the torch 2.0 code still requires training==False in order to use FlashAttention:
ItsGrandPi t1_jd0l3mp wrote
Reply to [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Time to see if I can get this running on Dalai
Educational-Net303 t1_jd0k6p6 wrote
Reply to comment by 42gether in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Which takes years
42gether t1_jd0juau wrote
Reply to comment by Educational-Net303 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Niche supercar gamers start up the industry which then will lead into realistic VR which will then lead into consumer high quality stuff?
hosjiu t1_jd1a6az wrote
Reply to comment by Civil_Collection7267 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
"They also have the tendency to hallucinate frequently unless parameters are made more restrictive."
I am not really understand this point in term of technical