NoRip7374 t1_jcpex1s wrote on March 18, 2023 at 3:22 PM

Reply to comment by jakderrida in [Discussion] Future of ML after chatGPT. by [deleted]

Nor with self hosted models...

gdpoc t1_jcperei wrote on March 18, 2023 at 3:20 PM

Reply to comment by nucLeaRStarcraft in [D] Unit and Integration Testing for ML Pipelines by Fender6969

Also depends on privacy constraints, sometimes you can't persist the data.

WASDx t1_jcpeej0 wrote on March 18, 2023 at 3:18 PM

Reply to comment by crowwork in [P] Web Stable Diffusion by crowwork

Explicit as in requiring user approval? My web development skills are really out of date, can you link to the API?

KerfuffleV2 t1_jcp7qcz wrote on March 18, 2023 at 2:31 PM

Reply to comment by Art10001 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

I'm not sure I fully understand it, but it seems like it's just basically adding context to the prompt it submits with requests. For obvious reasons, the prompt can only get so big. It also requires making requests to OpenAI's embedding API which isn't free: so it's both pushing in more tokens and making those extra requests.

I can definitely see how that approach could produce better results, but it's also not really unlimited memory. Note: I skimmed the source, but I'm not really a C++ person and I didn't actually set it up to use my OpenAI account via API.

hiptobecubic t1_jcp6kjn wrote on March 18, 2023 at 2:22 PM

Reply to comment by Individual-Sky-778 in [Discussion] Future of ML after chatGPT. by [deleted]

I mean results steadily improving, regardless of method.

satireplusplus t1_jcp6bu4 wrote on March 18, 2023 at 2:20 PM

Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

This model uses a "trick" to efficiently train RNNs at scale and I still I have to take a look to understand how it works. Hopefully the paper is out soon!

Otherwise size is what matters! To get there it's a combination of factors - the transformer architecture scales well and was the first architecture that allowed to train these LLMs cranked up to enormous sizes. Enterprise GPU hardware with lots of memory (40G, 80G) and frameworks like pytorch that make parallelizing training across multiple GPUs easy.

And OPs 14B model might be "small" by today's standard, but its still gigantic compared to a few years ago. It's ~27GB of FP16 weights.

Having access to 1TB of preprocessed text data that you can download right away without doing your own crawling is also neat (pile).

gwbyrd t1_jcp0czq wrote on March 18, 2023 at 1:33 PM

Reply to [D] Newbie question about Stanford Alpaca 7b fine-tuning by [deleted]

Does anyone know how and could guide me through fine-tuning the Llama data set or the alpaca data set on my own Facebook posts and comments so that I could create a virtual Avatar of myself?

elgringo0091 t1_jcovnrt wrote on March 18, 2023 at 12:52 PM

Reply to [Discussion] Future of ML after chatGPT. by [deleted]

If you are a researcher, there is a lot that can be done to advance AI algorithmically, though you'd be limited by proper access to data and compute.

If you are an ML engineer working in a company, your concern might become valid for a period of time. So yes, you might need to learn how to become a good prompt engineer.

But at some points, these LLM+CV will become accessible, from a data and a compute perspective, then it will be super fun. Imagine hundred of thousand of LLMs running as agents and interacting between each other and millions of people.

crowwork OP t1_jcourze wrote on March 18, 2023 at 12:43 PM

Reply to comment by WASDx in [P] Web Stable Diffusion by crowwork

models are stored in the browser cache via an explicit browser cache api. So the model is cached once.

turnip_burrito t1_jcoul9i wrote on March 18, 2023 at 12:41 PM

Reply to comment by MysteryInc152 in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Yes, exactly. Everyone keeps leaving the architecture's inductive structural priors out of the discussion.

It's not all about data! The model matters too!

Pale-Dentist330 t1_jcoukyt wrote on March 18, 2023 at 12:41 PM

Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

The model is doing a pretty good job for its size(4GB). Its terminal based, and I wanted to create a rest server around it.

jakderrida t1_jcotnis wrote on March 18, 2023 at 12:32 PM

Reply to comment by DreamMidnight in [D] Simple Questions Thread by AutoModerator

The basis of this rule of thumb is that having too few observations relative to the number of predictor variables can lead to unstable estimates of the model parameters, making it difficult to generalize to new data. In particular, if the number of observations is small relative to the number of predictor variables, the model may fit the noise in the data rather than the underlying signal, leading to overfitting.

EmmyNoetherRing t1_jcotecm wrote on March 18, 2023 at 12:30 PM

Reply to comment by boss_007 in [Discussion] Future of ML after chatGPT. by [deleted]

just like cars pass horses.

fferegrino t1_jcot78v wrote on March 18, 2023 at 12:28 PM

Reply to [Discussion] Future of ML after chatGPT. by [deleted]

There are still apps needed to be built on top of these APIs, niche tasks within your business domain that a generalist GPT will not be able to cover.

EmmyNoetherRing t1_jcot5t2 wrote on March 18, 2023 at 12:27 PM

Reply to comment by banatage in [Discussion] Future of ML after chatGPT. by [deleted]

Is that true? OpenAI seems to think they’ll be able to train task-specific AI on top of their existing models for specific roles.

boss_007 t1_jcosyeu wrote on March 18, 2023 at 12:25 PM

Reply to [Discussion] Future of ML after chatGPT. by [deleted]

This too shall pass

hund35 t1_jcorkdy wrote on March 18, 2023 at 12:11 PM

Reply to [Discussion] Future of ML after chatGPT. by [deleted]

- One of the biggest advantages of hosted like models like chatGP etc, is not to having to own/buy million dollars worth of hardware to host such big models. But it also seems to be one of the biggest disadvantages for openAI. Since it seems like they have like downtime multiple times a week /or month by looking at their discored.

- I think it will be harder to trust them on data in the future, keep in mind they started as research company that was pretty transparent and has slowly turned into a full on business that is become less transparent (i understand they need to make money, since imagine its quite expensive to run). Laying off their ethical team (https://techcrunch.com/2023/03/13/microsoft-lays-off-an-ethical-ai-team-as-it-doubles-down-on-openai/) also with microsoft trying to get more involved. I do think it could be a problem especially for people living in countries that is part of EU, since its already illegal to like store data and use certain services in countries that isnt deemed trusted by the EU (such as america)

- In some use cases it would still have useful to have models that arent operated server side and can be used offline.

ab3rratic t1_jcorj10 wrote on March 18, 2023 at 12:10 PM

Reply to [Discussion] Future of ML after chatGPT. by [deleted]

There is life outside of NLP and CV.

wind_dude t1_jcoqe5z wrote on March 18, 2023 at 11:58 AM

Reply to comment by banatage in [Discussion] Future of ML after chatGPT. by [deleted]

nor with statistical models. But accuracy has generally been higher, but LLMs are catching up and key NLP domains.

tripple13 t1_jcoq61v wrote on March 18, 2023 at 11:56 AM

Reply to [Discussion] Future of ML after chatGPT. by [deleted]

Did you create an account, just to ask this question?

I don't think neither CV nor NLP is going away. CV is yet to be solved to the same extent as NLP, but I agree it might just be a matter of time.

Research wise, there are still tons of problems around uncertainty, complexity, causality, 'real-world' problem solving (domain adaptation) and so forth.

Just don't compete on having the largest cluster of GPUs.