Recent comments in /f/MachineLearning
remghoost7 t1_jcq8emm wrote
Reply to comment by LetMeGuessYourAlts in [P] Web Stable Diffusion by crowwork
Ah, it was made with unreal.....? I didn't see that.
I always love adaptations of video game engines. One of the reasons I've been a huge fan of Unity for years. It's essentially just a wrapper for C# code with a pretty interface.
[deleted] t1_jcq6jko wrote
Reply to comment by legendofbrando in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
[deleted]
fleanend t1_jcq5dz7 wrote
Reply to comment by fullstackai in [D] Unit and Integration Testing for ML Pipelines by Fender6969
I'm glad I'm not the only one
boss_007 t1_jcq4h66 wrote
Reply to comment by EmmyNoetherRing in [Discussion] Future of ML after chatGPT. by [deleted]
They are allowed to pass (gas) as well
votegoat t1_jcq47v6 wrote
Reply to comment by simpleuserhere in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
commenting to save for later
LetMeGuessYourAlts t1_jcq33nb wrote
Reply to comment by remghoost7 in [P] Web Stable Diffusion by crowwork
The Unreal Engine let's you compile to browser with basically a few clicks so that might be how they're doing it and abstracts away a lot of the hard parts.
MysteryInc152 OP t1_jcpzgd4 wrote
Reply to comment by Temporary-Warning-34 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
Bootstrapping is basically taking a model's best/better outputs on a certain task and finetuning on that.
EDIT: Seems I'm wrong on that
londons_explorer t1_jcpzan9 wrote
Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
I would make 'fake' data which isn't hipaa protected and do most of your work on that.
Then do a final fine-tuning on the HIPAA data on some rented servers. Your HIPAA data probably isn't more than a few hundreds of billion words anyway, so a fine-tuning should be quite quick and cheap to do a few full passes of the dataset.
legendofbrando t1_jcpybhl wrote
Anyone gotten it to run on iOS?
MysteryInc152 OP t1_jcpxcn5 wrote
Reply to comment by Temporary-Warning-34 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
Oh for sure. Changed it to long context, i think that's better. I just meant there's no hard context limit.
Temporary-Warning-34 t1_jcpwx16 wrote
Temporary-Warning-34 t1_jcpwu9p wrote
Reply to [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
'Feedback bootstrap'. Lol.
Sorry. What does that mean?
Sad-Comedian-711 t1_jcpvahm wrote
So there is flash attention and then there is block sparse flash attention.
Flash attention by itself only got them to 16k on an A100 for their model, to go further they needed to use windowed attention... You could have already gone to 16k with windowed attention before this paper without much issue.
The special thing about this windowed attention is that it is in blocks that can fit into SRAM. From what I can tell Python's implementation of Flash Attention doesn't look like it supports block sparse flash attention.
While Triton's looks like it does:https://github.com/openai/triton/blob/c9740f0870f6ae2480acd2a76a5fb4c920bc5ce5/python/triton/ops/flash_attention.py
I think windowing must be done in blocks that align with the SRAM grid so it kinda has to be part of the Flash Attention implementation. You might be able to throw normal Big Bird block sparse attention on top...
You also may be able to call out to triton's implementation:
https://github.com/violethaze74/pumpkin-py/blob/d9250933bec045e6add61b3930ff3dbbe08f6501/aten/src/ATen/native/transformers/attention.cpp#L726
MysteryInc152 OP t1_jcputc0 wrote
Reply to [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
Uses relative positional encoding. Long context in theory but because it was trained on 2048 tokens of context, performance gradually declines after that. Finetuning for more context wouldn't be impossible though.
You can run with FP-16 (13GB RAM), 8-bit(10GB) and 4-bit(6 GB) quantization.
josejo9423 t1_jcpu2pe wrote
Reply to comment by EcstaticStruggle in [D] Simple Questions Thread by AutoModerator
I would go with 1 but I would no tune early stopping just the number of estimators , xgbboost has the option of stopping iterations (early stopping) when there are no improvements in the metric, if you plot then what model believes and realizes that could have been stopped early , step up that number that you consider before overfitting
simpleuserhere OP t1_jcpttav wrote
Reply to comment by Meddhouib10 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
This model is 4 bit quantized,so it will take less RAM (model size around 4GB)
Meddhouib10 t1_jcptalr wrote
What are the techniques to male such large models run on low ressources ?
EmmyNoetherRing t1_jcpr2q3 wrote
Reply to comment by boss_007 in [Discussion] Future of ML after chatGPT. by [deleted]
Are they? They graze, I’d think you’d have the same methane problem you’ve got with cows.
Oswald_Hydrabot t1_jcpqshf wrote
Reply to comment by crazymonezyy in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
Those are bad managers. I certainly have had these conversations and I left companies over their response until I found one that listened.
You have to try harder. You have to stop accepting short-sighted near term profit as "just how it is" or assuming that financial malpratice at scale is "good business", because if you do not and you don't keep trying, failure is inevitable. Corruption and corporate bailouts that take our tax revenue and cost us layoffs to pay for those mistakes are inevitable. Stop being complacent if you cannot accept putting in the effort to make what you know is right a reality.
I have been involved in those conversations at the highest levels in some of the largest companies in the world. More often than not I told them to either listen to the consulting that they PAID me for, or I will take my business somewhere else, and I did. If you don't suck at what you do then firing bad clients will not hurt you; in fact is it critical to your own growth in your career. You need to treat your employer as a client.
crowwork OP t1_jcpojd6 wrote
Reply to comment by WASDx in [P] Web Stable Diffusion by crowwork
Explicitly means we need to write js code to request the cache as oppose to browser's automatic caching. https://developer.mozilla.org/en-US/docs/Web/API/Cache
boss_007 t1_jcpo2ke wrote
Reply to comment by EmmyNoetherRing in [Discussion] Future of ML after chatGPT. by [deleted]
Horses are greener
simpleuserhere OP t1_jcpo1px wrote
I have tested Alpaca 7B model on Android (Google Pixel 7).
imaginethezmell t1_jcpfo39 wrote
Reply to comment by spiritus_dei in [D] ChatGPT without text limits. by spiritus_dei
pretty much everyone already had to do this for any long text implementation
embed everything
search embeddings
use prompt + search result for final prompt
profit
danielbln t1_jcpfkwg wrote
Reply to comment by sqweeeeeeeeeeeeeeeps in [D] Newbie question about Stanford Alpaca 7b fine-tuning by [deleted]
$100 on compute. The bulk of the $600 cost came from generating the data via davinci3, which is 10x as expensive as gpt3.5-turbo.
Prymu t1_jcqclnb wrote
Reply to comment by votegoat in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
You know that (new) reddit has a save feature