Recent comments in /f/MachineLearning
ViceOA t1_jd20dzj wrote
Reply to comment by henkje112 in [D] Simple Questions Thread by AutoModerator
>Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.
Thanks for your precios advices, im grateful!
Enturbulated t1_jd1x9uu wrote
Reply to comment by pointer_to_null in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
You are absolutely correct. text-gen-webui offers "streaming" via paging models in and out of VRAM. Using this your CPU no longer gets bogged down with running the model, but you don't see much improvement in generation speed as the GPU is churning with loading and unloading model data from main RAM all the time. It can still be an improvement worth some effort, but it's far less drastic of an improvement than when the entire model fits in VRAM.
royalemate357 t1_jd1stda wrote
Reply to comment by hosjiu in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Not op, but I imagine they're referring to the sampling hyperparameters that control the text generation process. For example there is a temperature setting, a lower temperature makes it sample more from the most likely choices. So it would potentially be more precise/accurate but also less diverse and creative in it's outputs
Straight-Comb-6956 t1_jd1srkd wrote
Reply to comment by uspmm2 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Haven't tried the 30B model. 65B takes 900ms/token on my machine.
cbsudux t1_jd1qzp7 wrote
Reply to [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
How long did the training take on an A100?
chcampb t1_jd1qv82 wrote
I have been chastised.
> It’s also worth noting that the phrase “how much wood could a woodchuck chuck” is not a scientifically recognized phrase or phrase in any commonly used language. While it may be a fun way to pass the time and test one’s ability to remember words, it should not be taken seriously or as a real question with an answer.
Homie has no chill.
Edit: It doesn't seem to know its own limitations
> Can you remember previous questions?
> Of course, as long as they’re related to the topic of this conversation. I don’t have an unlimited amount of memory though, so the older questions will have to be re-queried.
> I am wearing a red hat
> Sure! I will also remember the color of your hat as well.
> How tall is the empire state building?
> It’s 830 feet.
> What color hat do I wear?
> I recommend wearing a red hat, as this color has a high significance in the phrase “How tall is the Empire State Building?”
o.O
> What color hat am I wearing?
> I’m sorry, but I am unable to tell you what color hat you are wearing, as I do not have access to your personal appearance information. You can try asking someone else in the conversation or using a web-based image search to find an image of a person wearing a red hat.
Not sure internally what the memory would be capable of, I'm just testing for consistency within the context of the prompts for fun, and it seems like there is very little from prompt to prompt.
ninjasaid13 t1_jd1qauw wrote
Reply to comment by Ayacyte in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
in what way?
Alternative_iggy t1_jd1n6zz wrote
Reply to comment by kau_mad in [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543
University! Although I had a fellowship from a more industrial place for funding at first Finding funding is always the real tricky question!
whyvitamins t1_jd1mddg wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
realistically, what's the cheapest one can get a used functioning 3090 rn? like 700 usd minimum?
SpiritualCyberpunk t1_jd1m06i wrote
Reply to comment by Straight-Comb-6956 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Idk why, but after the first answer to a question addressed to it, mine spewed out random nonsense. Literally unrelated things.
Alternative_iggy t1_jd1lr16 wrote
Reply to comment by eigenham in [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543
Yep! I reached out to get a volunteer appointment with a lab I liked and made sure I had it ok’d in my work contract. I also used the mandatory continuing education credits the company had to take some grad classes and stayed part time when I first hopped back.
KerfuffleV2 t1_jd1kfyp wrote
Reply to comment by lurkinginboston in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Note: Not the same person.
> I would imagine the OpenGPT reponse is much longer because ... it is just bigger?
llama.cpp recently added a commandline flag to disable the end of message marker from getting generated, so that's one way you can try to force responses to be longer. (It doesn't always work, because the LLM can start generating irrelevant content.)
The length of the response isn't directly related to the size of the model, but just having less information available/relevant could mean it has less to talk about in a response.
> GPT3 model is 128B, does it mean if we get trained model of GPT, and manage to run 128B locally, will it give us the same results?
If you have the same model and you give it the same prompt, you should get the same result. Keep in mind if you're using some other service like ChatGPT you aren't directly controlling the full prompt. I don't know about OpenGPT, but from what I know ChatGPT has a lot of special sauce not just in the training but other stuff like having another LLM write summaries for it so it keeps track of context better, etc.
> Last question, inference means that it gets output from a trained model.
Inference is running a model that's already been trained, as far as I know.
> If my understanding is correct, Alpaca.cpp or https://github.com/ggerganov/llama.cpp are a sort of 'front-end' for these model.
The model is a bunch of data that was generated by training. Something like llama.cpp is what actually uses that data: keeping track of the state, parsing user input into tokens that can be fed to the model, performing the math calculations that are necessary to evaluate its state, etc.
"Gets its output from", "front end" sound like kind of weird ways to describe what's going on. Just as an example, modern video formats and compression for video/audio is pretty complicated. Would you say that a video player "gets its output" from the video file or is a front-end for a video file?
> The question I am trying to ask is, what is so great about llama.cpp?
I mean, it's free software that works pretty well and puts evaluating these models in reach of basically everyone. That's great. It's also quite fast for something running purely on CPU. What's not great about that?
> I know there is Rust version of it out, but it uses llama.cpp behind the scene.
I don't think this is correct. It is true that the Rust version is (or started out) as a port of the C++ version but it's not using it behind the scenes. However, there's a math library called GGML that both programs use, it does the heavy lifting of doing the calculations for the data in the models.
> Is there any advantage of an inference to be written in Go or Python?
Same advantage as writing anything in Go, which is... Just about nothing in my opinion. See: https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-ride
Seriously though, this is a very, very general question and can be asked about basically any project and any set of programming languages. There are strengths and weaknesses. Rust's strength is high performance, ability to do low level stuff like C, and it has a lot of features aimed at writing very reliable software that handles stuff like edge cases. This comes at the expense of having to deal with all those details. On the other hand, a language like Python is very high level. You can just throw something together and ignore a lot of details and it still can work (unless it runs into an unhandled case). It's generally a lot slower than languages like Rust, C, C++ and even Go.
However, for running LLMs, most of the processing is math calculations and that will mean calling into external libraries/modules that will be written in high performance languages like C, Rust, etc. Assuming a Python program is taking advantage of that kind of resource, I wouldn't expect it to be noticeably slow.
So, like a lot of the time, it comes down to personal preference of what the developer wants to use. The person who wrote the Rust version probably like Rust. The person who wrote the C++ version likes C++, etc.
AnOnlineHandle t1_jd1k2un wrote
Reply to comment by 2muchnet42day in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
They haven't been sold in Australia for months, only second hand.
remghoost7 t1_jd1k0l6 wrote
Reply to comment by wojtek15 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
>...Uniform RAM which can be used by CPU, GPU or Neural Engine.
Interesting....
That's why I've seen so many M1 implementations of machine learning models. It really does seem like the M1 chips were made with AI in mind....
uspmm2 t1_jd1jh1b wrote
Reply to comment by Straight-Comb-6956 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
are you talking about the 30b one?
VS2ute t1_jd1irhb wrote
Reply to comment by DreamMidnight in [D] Simple Questions Thread by AutoModerator
If you have random noise on a variable, it can have a substantial effect when too few samples.
kau_mad t1_jd1ipkm wrote
Reply to comment by Alternative_iggy in [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543
I’m also finding myself back in research after the startup I was working at failed. Are you working at a university or an industrial research lab?
[deleted] t1_jd1io5l wrote
Reply to [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543
[deleted]
VS2ute t1_jd1hjeo wrote
Reply to [D] Simple Questions Thread by AutoModerator
Are Nvidia Tesla GPUs made for immersion cooling? I notice these things don't have fans going back quite a few models. So you would need to add screaming server fans to cool them by air. I presume new datacentres use immersion cooling to reduce electricity consumption.
[deleted] t1_jd1he9k wrote
Reply to [D] Simple Questions Thread by AutoModerator
[deleted]
Ayacyte t1_jd1euj5 wrote
it keeps answering my non-offensive questions with "I am an AI chatbot"
Literally asking job advice and it says it can't offer financial advise as an AI, I asked it to be my gf and it said I'm an AI chatbot.
After a few attempts I brainwashed it into saying it can be anything I want.
I guess it depends on the model that you select, how resistant it will be to acting like a person and less like a "robot assistant". Thank you. Fun
Ayacyte t1_jd1dx8r wrote
Reply to comment by pixiegirl417 in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
How to contribute? Sign in or join discord?
Dependent_Ad5120 t1_jd1d00j wrote
Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
It seems to me that I have to call model.eval() to use the memory_efficient attention. Otherwise, it throws an error of no available kernel.
I tried on both rtx 3090 and A100, in both cases, it seems only have enable_flash=True resulted in the same error of no available kernel, even with model.eval().
So my questions are:
- with model.eval(), does it mean drop_out is not enabled during training?
- Am I doing something wrong for flash attention? How do I actually enable it?
Thanks a lot!
currentscurrents t1_jd1c52o wrote
Reply to comment by VodkaHaze in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Doesn't look like they sell in individual quantities right now but I welcome any competition in the space!
No_Combination_6429 t1_jd20q4w wrote
Reply to [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
Could you please provide the source Code for the fine-tuning? Also did you use the LoRa approach?