Prymu t1_jcqclnb wrote on March 18, 2023 at 7:10 PM

Reply to comment by votegoat in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

You know that (new) reddit has a save feature

remghoost7 t1_jcq8emm wrote on March 18, 2023 at 6:41 PM

Reply to comment by LetMeGuessYourAlts in [P] Web Stable Diffusion by crowwork

Ah, it was made with unreal.....? I didn't see that.

I always love adaptations of video game engines. One of the reasons I've been a huge fan of Unity for years. It's essentially just a wrapper for C# code with a pretty interface.

[deleted] t1_jcq6jko wrote on March 18, 2023 at 6:28 PM

Reply to comment by legendofbrando in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

[deleted]

fleanend t1_jcq5dz7 wrote on March 18, 2023 at 6:21 PM

Reply to comment by fullstackai in [D] Unit and Integration Testing for ML Pipelines by Fender6969

I'm glad I'm not the only one

boss_007 t1_jcq4h66 wrote on March 18, 2023 at 6:14 PM

Reply to comment by EmmyNoetherRing in [Discussion] Future of ML after chatGPT. by [deleted]

They are allowed to pass (gas) as well

votegoat t1_jcq47v6 wrote on March 18, 2023 at 6:13 PM

Reply to comment by simpleuserhere in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

commenting to save for later

LetMeGuessYourAlts t1_jcq33nb wrote on March 18, 2023 at 6:05 PM

Reply to comment by remghoost7 in [P] Web Stable Diffusion by crowwork

The Unreal Engine let's you compile to browser with basically a few clicks so that might be how they're doing it and abstracts away a lot of the hard parts.

MysteryInc152 OP t1_jcpzgd4 wrote on March 18, 2023 at 5:40 PM

Reply to comment by Temporary-Warning-34 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152

Bootstrapping is basically taking a model's best/better outputs on a certain task and finetuning on that.

EDIT: Seems I'm wrong on that

londons_explorer t1_jcpzan9 wrote on March 18, 2023 at 5:39 PM

Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

I would make 'fake' data which isn't hipaa protected and do most of your work on that.

Then do a final fine-tuning on the HIPAA data on some rented servers. Your HIPAA data probably isn't more than a few hundreds of billion words anyway, so a fine-tuning should be quite quick and cheap to do a few full passes of the dataset.

legendofbrando t1_jcpybhl wrote on March 18, 2023 at 5:33 PM

Reply to [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

Anyone gotten it to run on iOS?

MysteryInc152 OP t1_jcpxcn5 wrote on March 18, 2023 at 5:26 PM

Reply to comment by Temporary-Warning-34 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152

Oh for sure. Changed it to long context, i think that's better. I just meant there's no hard context limit.

Temporary-Warning-34 t1_jcpwx16 wrote on March 18, 2023 at 5:23 PM

Reply to comment by MysteryInc152 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152

RP isn't forever, though.

Temporary-Warning-34 t1_jcpwu9p wrote on March 18, 2023 at 5:23 PM

Reply to [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152

'Feedback bootstrap'. Lol.

Sorry. What does that mean?

Sad-Comedian-711 t1_jcpvahm wrote on March 18, 2023 at 5:12 PM

Reply to [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

So there is flash attention and then there is block sparse flash attention.

Flash attention by itself only got them to 16k on an A100 for their model, to go further they needed to use windowed attention... You could have already gone to 16k with windowed attention before this paper without much issue.

The special thing about this windowed attention is that it is in blocks that can fit into SRAM. From what I can tell Python's implementation of Flash Attention doesn't look like it supports block sparse flash attention.

https://github.com/pytorch/pytorch/blob/eb32bb2ca6811ea21002699f4be884d3012dc362/aten/src/ATen/native/transformers/cuda/flash_attn/fmha_fprop_kernel_1xN.h

While Triton's looks like it does:https://github.com/openai/triton/blob/c9740f0870f6ae2480acd2a76a5fb4c920bc5ce5/python/triton/ops/flash_attention.py

I think windowing must be done in blocks that align with the SRAM grid so it kinda has to be part of the Flash Attention implementation. You might be able to throw normal Big Bird block sparse attention on top...

You also may be able to call out to triton's implementation:
https://github.com/violethaze74/pumpkin-py/blob/d9250933bec045e6add61b3930ff3dbbe08f6501/aten/src/ATen/native/transformers/attention.cpp#L726

MysteryInc152 OP t1_jcputc0 wrote on March 18, 2023 at 5:09 PM

Reply to [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152

Uses relative positional encoding. Long context in theory but because it was trained on 2048 tokens of context, performance gradually declines after that. Finetuning for more context wouldn't be impossible though.

You can run with FP-16 (13GB RAM), 8-bit(10GB) and 4-bit(6 GB) quantization.

josejo9423 t1_jcpu2pe wrote on March 18, 2023 at 5:04 PM

Reply to comment by EcstaticStruggle in [D] Simple Questions Thread by AutoModerator

I would go with 1 but I would no tune early stopping just the number of estimators , xgbboost has the option of stopping iterations (early stopping) when there are no improvements in the metric, if you plot then what model believes and realizes that could have been stopped early , step up that number that you consider before overfitting

simpleuserhere OP t1_jcpttav wrote on March 18, 2023 at 5:02 PM

Reply to comment by Meddhouib10 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

This model is 4 bit quantized,so it will take less RAM (model size around 4GB)

Meddhouib10 t1_jcptalr wrote on March 18, 2023 at 4:59 PM

Reply to [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

What are the techniques to male such large models run on low ressources ?

EmmyNoetherRing t1_jcpr2q3 wrote on March 18, 2023 at 4:44 PM

Reply to comment by boss_007 in [Discussion] Future of ML after chatGPT. by [deleted]

Are they? They graze, I’d think you’d have the same methane problem you’ve got with cows.

Oswald_Hydrabot t1_jcpqshf wrote on March 18, 2023 at 4:42 PM

Reply to comment by crazymonezyy in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

Those are bad managers. I certainly have had these conversations and I left companies over their response until I found one that listened.

You have to try harder. You have to stop accepting short-sighted near term profit as "just how it is" or assuming that financial malpratice at scale is "good business", because if you do not and you don't keep trying, failure is inevitable. Corruption and corporate bailouts that take our tax revenue and cost us layoffs to pay for those mistakes are inevitable. Stop being complacent if you cannot accept putting in the effort to make what you know is right a reality.

I have been involved in those conversations at the highest levels in some of the largest companies in the world. More often than not I told them to either listen to the consulting that they PAID me for, or I will take my business somewhere else, and I did. If you don't suck at what you do then firing bad clients will not hurt you; in fact is it critical to your own growth in your career. You need to treat your employer as a client.

crowwork OP t1_jcpojd6 wrote on March 18, 2023 at 4:27 PM

Reply to comment by WASDx in [P] Web Stable Diffusion by crowwork

Explicitly means we need to write js code to request the cache as oppose to browser's automatic caching. https://developer.mozilla.org/en-US/docs/Web/API/Cache