Recent comments in /f/MachineLearning

ajt9000 t1_jd5w735 wrote

Speaking of this do you guys know of ways to inference and/or train models on graphics cards with insufficient vram? I have had some success with breaking up models into multiple models and then inferencing them as a boosted ensemble but thats obviously not possible with lots of architectures.

I'm just wondering if you can do that with an unfavorable architecture as long as its pretrained.

1

itsnotlupus t1_jd5td54 wrote

It's not a user-facing product, it's a building block that would be useful to train music-oriented neural network, be they diffusers or other types of models.

It's probably going to take a little while before we see new models that leverage this library.

If you're looking for "stable diffusion but for music" right now, you could look at Riffusion (https://huggingface.co/riffusion/riffusion-model-v1)

2

neriticzone t1_jd5se2v wrote

Feedback on stratified k fold validation

I am doing some applied work with CNNs in the academic world.

I have a relatively small dataset.

I am doing 10 fold stratified cross validation(?) where I do an initial test-train split, and then the data in the train split is further cross validated to a 10 fold train-validate split.

I then run the ensemble of 10 train models against the test split, and I select the results from the best performing model against the test data as the predicted values for the test data.

Is this a reasonable strategy? Thank you!

1

asterisk2a t1_jd59igg wrote

Question about ML research breakthroughs and narratives.

AlexNet was not the first and not the fastest and not the CNN that won the most prices - using Nvidia GPU CUDA cores for acceleration. Then why is it so often named as the 'it' paper in the popular MSM & AI YouTube Channels narrative around AI? Even Jensen Huang, CEO of Nvidia mentioned it in his keynote.

Is it because AlexNet can be traced back to 'Made in America' and sold to Google? And co-author is Chief Science Officer at OpenAI? And the others aren't.

2

KerfuffleV2 t1_jd52brx wrote

> there's a number of efforts like llama.cpp/alpaca.cpp or openassistant but the problem is that fundamentally these things require a lot of compute, which you really cant step around.

It's honestly less than you'd expect. I have a Ryzen 5 1600 which I bought about 5 years ago for $200 (it's $79 now). I can run llama 7B on the CPU and it generates about 3 tokens/sec. That's close to what ChatGPT can do when it's fairly busy. Of course, llama 7B is no ChatGPT but still. This system has 32GB RAM (also pretty cheap) and I can run llama 30B as well, although it takes a second or so per token.

So you can't really chat in real time, but you can set it to generate something and come back later.

The 3 or 2 bit quantized versions of 65B or higher models would actually fit in memory. Of course, it would be even slower to run but honestly, it's amazing it's possible to run it at all on 5 year old hardware which wasn't cutting edge even back then.

19

not_particulary t1_jd51f0h wrote

There's a lot coming up. I'm looking into it right now, here's a tutorial I found:

https://medium.com/@martin-thissen/llama-alpaca-chatgpt-on-your-local-computer-tutorial-17adda704c23

​

Here's something unique, where a smaller LLM outperforms GPT-3.5 on specific tasks. It's multimodal and based on T5, which is much more runnable on consumer hardware.

https://arxiv.org/abs/2302.00923

28

Gody_ t1_jd4ak8v wrote

Hello guys, would you consider this supervised or unsupervised learning?

I am using Keras LSTM to generate new text, by tokenizing it, making n-grams from it and training the LSTM to predict the next word (token) by putting n-1 n-grams as a train sample, and as "labels" I am putting the last word (token) of the n-gram. Would you consider this supervised or unsupervised ML?

Technically, I do have a label for every n-gram, its own last word, but the dataset itself was not labeled beforehand. As I am new to ML I am a little bit confused and even ChatGPT sometimes says that its supervised, and sometimes unsupervised ML.

Thanks for any answers.

0