Recent comments in /f/MachineLearning

jan_antu t1_je9me91 wrote

I read a few of your posts. It seems like you're having a break from reality. I'm a scientist but not a psychologist; I think you should speak with one, or a psychiatrist. Things may be fine for now but you don't want to end up hurting yourself or someone else by accident as this progresses.

16

ZestyData t1_je9ly2p wrote

.. Uh. I'm going to assume you're relatively new to the world of ML. Translation is one of the most common uses for SOTA LLMs.

Its how Google Translate works, as just the most famous example.

What the SOTA translation tools don't yet use is instruct-tuning, to give them conversational interfaces (i.e the difference between GPT and ChatGPT). So they look different than using ChatGPT. But its very largely the same Generative (technically pretrained) Transformers under the hood.

30

RicketyCricket t1_je9lp7z wrote

Mainly that Spock is much lighter weight and really focuses on just configuration management and stateful ness. Hydra has all these crazy bells and whistles (Ray integration etc) that could be useful for certain things but kinda starts meandering from the original purpose of configuration management imo. Hydra is great and if it works for you then use it. We built Spock internally when I was at Fidelity because Hydra didn’t exist… just so happens that FB/Meta was doing the same thing at the same time so both libraries end up covering a very similar usage space

1

EquipmentStandard892 t1_je9kmvi wrote

This exactly what I was talking about, I'm studying the llama.cpp to understand how this whole ML LLM world works, and I've found its pretty "simple" in the meanings of the programming itself. I'm a software engineer outside the ML field, and it was pretty interesting to do this deep dive. I'll take a deeper look into this RWKV proposal and maybe make something upon to test. If I found something interesting I comment here 😊

3

saintshing t1_je9iw85 wrote

Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.

>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560

4

icedrift t1_je9i0wk wrote

What I mean is, why generate the function when only the data needs to be generated? Let's say I need a function that takes the text content of a post and returns an array of recommended flairs for the user to click. Why do this

/**
* This function takes a passage of text, and recommends up to 8
* unique flairs for a user to select. Flairs can be thought of as labels
* that categorize the type of post.
*
* \@param textContent - the text content of a user's post
*
* \@returns an array of flairs represented as strings
*
* \@imaginary
*/

declare function recommendedFlairs(textContent: string) : <string[]>

When you could write out the function and only generate the data?

async function recommendedFlairs(textContent: string) : <string[]> {
const OAIrequest = await someRequest(textContent);
const flairs = formatResponse(OAIrequest);
return flairs
}

In writing all this out I think I figured it out. You're abstracting away a lot of the headaches that come with trying to get the correct outputs out of GPT?

2

EquipmentStandard892 t1_je9gc9y wrote

I've already seen langchain and it's truly amazing, the issue I've encountered and was trying to overcome is more an architectural problem actually, the token context span limit. I was looking to add a layer upon the transformer architecture to bypass this limitations, I've seen MKRL is able to handle higher context lengths, even claiming unlimited context span, although need to study more. I was not thinking about prompt engineering at all.

7