Recent comments in /f/MachineLearning

JaCraig t1_jckmll4 wrote

My point is more it's the wrong tool for the job. Something designed for calculations like wolfram alpha and their API is probably better suited:

https://www.wolframalpha.com/input?i=%28x%5E3%29%2B%28y%5E3%29%2B%28z%5E3%29+%3D+1024

BUT I did ask ChatGPT (so 3.5) to write an app to do it in a couple languages and it gave me a working app first try on each. It's not a very good app as I could optimize it a lot more, but it works. GPT-4 gave a slightly better app in each instance.

0

farmingvillein t1_jckm5r2 wrote

Although note that OP does say that his data isn't labeled...and you of course need to label it for Roberta. So you're going to need to bootstrap that process via manual labeling or--ideally, if able--via an LLM labeling process.

If you go through the effort to set up an LLM labeling pipeline, you might just find that it is easier to use the LLM as a classifier, instead of fine-tuning yet another model (depending on cost, quality, etc. concerns).

1

harharveryfunny t1_jckltrp wrote

> I think it should be possible to replicate even GPT-4 with open source tools something like Bloom + FlashAttention & fine-tune on 32k tokens.

So you mean build a model with a 32K attention window, but somehow initialize it with weights from BLOOM (2K window) then finetune ? Are you aware of any attempts to do this sort of thing ?

10

MirrorBredda t1_jckho6w wrote

Subject: Template to create new library with Scikit Learn Fit Predict API style

Hi every1ne,

I have seen so many packages re-using the fit.predict API style Scikit Learn came up with which is the most popular nowadays.
I was reckoning whether there was a sort of Python Github template project to fork and start from? It would be to create a new library based on such fit.predict style but as alone researcher in the project, we are trying to find the optimal development sprints to avoid loosing time re-creating the wheel.

Best wishes,

1

mikljohansson t1_jckedf9 wrote

Very interesting work! I've been following this project for a while now

Can I ask a few questions?

  • What's the difference between RWKV-LM and ChatRWKV, e.g. is ChatRWKV mainly RWKV-LM but streamlined for inference and ease of use, or is there more differences?

  • Are you planning to fine tune on the Stanford Alpaca dataset (like was recently done for LLaMa and GPT-J to create instruct versions of them), or a similar GPT-generated instruction dataset? I'd love to see a instruct-tuned version of RWKV-LM 14B with a 8k+ context len!

3

super_deap OP t1_jck82rd wrote

Nuance is proportion to context.

Imagine we want to ask the language model to improve a certain module in Linux Kernel.

If I understood them correctly, memory-augmented transformers won't be able to fit together all the pieces to understand what needs to be improved and how because they need to make repeated calls to memory and search/summarize those calls to get a basic understanding and thus miss out on important details.

Compare that to huge context, they have everything they need for the memory in their context and there is no loss of details (in case of full attention).

17

CleanThroughMyJorts t1_jck7114 wrote

I don't think the two are mutually exclusive.

The problem with retrieval though (at least current implementations) is the model can't attend to memory globally the way it does with context memory; you're bottlenecked by the retrieval process having to bring things into context through a local search.

31