bubudumbdumb t1_jclromf wrote on March 17, 2023 at 7:21 PM

Reply to comment by Paarthri in ML models for User Recognition using Keystroke Dynamics [P] by bogdantudorache

They don't seem to claim "we have a dataset of typing logs from DIFFERENT people working on DIFFERENT tasks while typing on DIFFERENT keyboards".

If they had it I would be more concerned about the ethics of data collection than about the model accuracy

FallUpJV t1_jclpydo wrote on March 17, 2023 at 7:09 PM

Reply to comment by xEdwin23x in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Yes it's definitely not small, I meant comparated to the models people have been paying attention to the most on the last few years I guess.

The astronaut pointing gun meme is a good analogy, almost a scary one, I wonder how much we could improve existing models with simply better data.

Franck_Dernoncourt t1_jclpll3 wrote on March 17, 2023 at 7:07 PM

Reply to [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas

Thanks for sharing! How does it compare against other models (eg, alpaca or gpt 3.5/4)?

MysteryInc152 t1_jclpjzi wrote on March 17, 2023 at 7:07 PM

Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

It's predicting language. as long as the structure can allow properly to learn to predict language, you're good to go.

KerfuffleV2 t1_jclo0oh wrote on March 17, 2023 at 6:57 PM

Reply to comment by felheartx in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

I'm not an ML person, but it seems like that paper is just teaching the LLM to simulate a Turing machine. Actually making it respond normally while doing practical stuff like answering user queries would be a different thing.

Also, suppose the LLM has access to external memory. First, you have to teach it how to interact with that external memory (via special command sequences in its tokens, most likely). Then you have to teach it/take steps to make it appropriately note which things are important or not and store/retrieve them as necessary. All of this requires tokens for input/output so it will increase processing time even when used perfectly, these tokens will also consume the existing context window.

One really big thing with LLMs now is it seems like they don't (and maybe can't) know what they know/don't know. They just predict tokens, they can't really do introspection. Of course, they can be trained to respond that they don't know certain things, but getting the LLM to decide it needs to use the external memory doesn't seem like the simplest thing.

I mean, take humans as an example: Are you effective at taking notes, organizing them in a way that lets you easily recall them in the future, etc? It's not even an easy skill for humans to develop, and we're relatively good at knowing what we don't know.

Another thing is the paper you linked to says it set the temperature to 0, to make the responses very deterministic. Generally this makes them a lot less creative as well. If you turn up temperature, you potentially increase the chances that the LLM generates malformed queries for the external memory or stuff like that.

Anyway, I don't know much about the technical side of increasing the context window but when the context window is bigger the thing can just use it as far as I know. Taking advantage of some sort of external memory system seems like it's a very, very complicated thing to solve reliably.

Again, note this is coming from someone that doesn't really know much about ML, LLMs, etc. I'm just a normal developer, so take all this with a grain of salt.

lmericle t1_jcln487 wrote on March 17, 2023 at 6:51 PM

Reply to comment by felheartx in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

You will find that in hype circles such as NLP there's a lot of thought-terminating cliches passed around by people who are not so deep in the weeds. Someone says something with confidence, another person doesn't know how to vet it and so just blindly passes it on, and all of a sudden a hack becomes a rumor becomes dogma. It seems to me to be this way with context vs memory.

Put another way: it's the kind of attitude that says "No, Mr. Ford, what we wanted was faster horses".

LappenX t1_jcln1pc wrote on March 17, 2023 at 6:51 PM

Reply to [N] Jumpy 1.0 has now been released by the Farama Foundation by jkterry1

You don't want to use jax without jit.

Batteredcode t1_jcllc74 wrote on March 17, 2023 at 6:39 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Thank you this is really helpful, I think you're right that the cycle GAN is the way to go!

kreuzguy t1_jcliuwx wrote on March 17, 2023 at 6:23 PM

Reply to comment by kittenkrazy in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Someone should definitely look into this!

felheartx t1_jclij93 wrote on March 17, 2023 at 6:21 PM

Reply to comment by hfnuser0000 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

I have no idea about how many other ways there are, but this looks extremely promising: https://arxiv.org/abs/2301.04589#

So there's at least one :P

felheartx t1_jcli6si wrote on March 17, 2023 at 6:19 PM

Reply to comment by -Rizhiy- in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

You said working with external memory is not as straightforward. Can you explain that?

I've read this: https://arxiv.org/abs/2301.04589# and even though I'm not super familiar with the details, to my untrained eye it seems like attaching external memory is easier than extending the context size.

Just from reading posts on this subreddit I get the feeling that getting larger and larger context sizes is very difficult. Whereas simply attaching this sort of "dictionary" thing is pretty easy to do.

Paarthri t1_jcli5zq wrote on March 17, 2023 at 6:19 PM

Reply to comment by bogdantudorache in ML models for User Recognition using Keystroke Dynamics [P] by bogdantudorache

Did you measure any performance metrics? Its standard to do that when writing about an ml project.

bogdantudorache OP t1_jclfnmt wrote on March 17, 2023 at 6:03 PM

Reply to comment by Paarthri in ML models for User Recognition using Keystroke Dynamics [P] by bogdantudorache

You can always start one 😁

Sonicxc t1_jcleo4j wrote on March 17, 2023 at 5:56 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Hey man, thanks for the input. I will look into what you have mentioned

fastinguy11 t1_jcle8cn wrote on March 17, 2023 at 5:54 PM

Reply to comment by Nhabls in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Gpt4 32 k api when available ?

Paarthri t1_jcle34m wrote on March 17, 2023 at 5:53 PM

Reply to ML models for User Recognition using Keystroke Dynamics [P] by bogdantudorache

I don’t see any discussion on performance of the models…

limpbizkit4prez t1_jcld70o wrote on March 17, 2023 at 5:47 PM

Reply to comment by r_linux_mod_isahoe in [N] Jumpy 1.0 has now been released by the Farama Foundation by jkterry1

Yeah, but if I have a code base written in numpy and want to use jax, wouldn't I need to do the same amount of refactoring to integrate this as I would with regular jax? Are there a lot of functions in numpy that don't exist in jax.numpy?

r_linux_mod_isahoe t1_jclc6rp wrote on March 17, 2023 at 5:41 PM

Reply to comment by limpbizkit4prez in [N] Jumpy 1.0 has now been released by the Farama Foundation by jkterry1

porting an existing codebase to jax? Using any existing algorithm that's implemented in numpy but on jax backend? The opportunities are massive

[deleted] t1_jclc4rv wrote on March 17, 2023 at 5:40 PM

Reply to [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

[removed]

limpbizkit4prez t1_jclabza wrote on March 17, 2023 at 5:29 PM

Reply to [N] Jumpy 1.0 has now been released by the Farama Foundation by jkterry1

What's value in using this instead of "jax.numpy as np"?

Competitive-Rub-1958 t1_jcl97q0 wrote on March 17, 2023 at 5:22 PM

Reply to comment by No-Belt7582 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

> Either autograd is disabled (using torch.inference_mode or torch.no_grad) or no tensor argument requires_grad > training is disabled (using .eval())

What's the point of FlashAttention if you can't use it during training? 🤔

https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html