Recent comments in /f/MachineLearning
msgs t1_jd46yf9 wrote
Reply to comment by Straight-Comb-6956 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
do you have a link to a torrent/download for the 30B or 65B weights that works with Alpaca.cpp? reddit DMs are fine if don't want to post it publicly.
xtof54 t1_jd467f3 wrote
There are several. either collaboratively (look at together.computer hivemind petals) or on single no gpu machine with pipeline parallelism, but it requires reimplementing for every model, see e.g slowLLM on github for bloom176b
oathbreakerkeeper t1_jd43931 wrote
Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
I'm using amp mixed precision which should be using fp16. It still requires training==false.
But the torch code also disables flash attention if autocast is enabled I'm not sure how to deal with that one.
Competitive-Rub-1958 t1_jd40cwb wrote
Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
would that mean for forcing MHA to use it, I should wrap the ctxmanager around the line where I forward through it?
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_mem_efficient=True):
x = x + self.attn_head(x, x, x, need_weights=False)[0]
because that doesn't really seem to work :(
is_it_fun t1_jd3syu7 wrote
Reply to comment by Leo_D517 in [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
Thank you for the very detailed response!
2muchnet42day t1_jd3pu0m wrote
Reply to comment by benfavre in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Can you train with 24 gigs of vram ?
SkullHero t1_jd3pkgh wrote
Reply to [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
Can't wait to try this out 😁
tekktokk OP t1_jd3orau wrote
Reply to comment by UnusualClimberBear in [R] What do we think about Meta-Interpretive Learning? by tekktokk
Got it. Appreciate the insight.
usc-ur OP t1_jd3o8te wrote
Reply to Smarty-GPT: wrapper of prompts/contexts [P] by usc-ur
I want to announce that we have released v1.1.0 which includes access for ChatGPT and GPT4 for Plus suscribers! :)
usc-ur OP t1_jd3nzab wrote
Reply to comment by farmingvillein in Smarty-GPT: wrapper of prompts/contexts [P] by usc-ur
The main purpose of this project is joining in a single environment all the resources (models, prompts, APIs, etc.) related to LLMs. Moreover, we also think from an end-user perspective. It is heavily unlikely that a user would introduce a complex context in a query to a model or searcher. In this project, we try to bias the different model responses to answer in different ways/behaviors, but hidding this to end-users.
usc-ur OP t1_jd3nx3w wrote
Reply to comment by Nezarah in Smarty-GPT: wrapper of prompts/contexts [P] by usc-ur
That's right :) Check the "purpose" in our readme
[deleted] t1_jd3nqda wrote
Reply to comment by Definitely_not_gpt3 in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
[deleted]
RedditLovingSun t1_jd3nidx wrote
Reply to comment by yahma in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
I don't think they can use llama cause of the limited open source rule fb put in llama. Wouldn't be as entirely open as pythia
UnusualClimberBear t1_jd3mklf wrote
Reply to comment by tekktokk in [R] What do we think about Meta-Interpretive Learning? by tekktokk
Even for protein folding it has been overridden by deep models. It might be useful for critical tasks where error is not allowed and everything is deterministic, but I'm not expert of the field.
Dependent_Ad5120 t1_jd3m0ce wrote
Reply to comment by oathbreakerkeeper in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
try fp16, that doesn't require training=False apparently.
tekktokk OP t1_jd3l4vl wrote
Reply to comment by UnusualClimberBear in [R] What do we think about Meta-Interpretive Learning? by tekktokk
Alright, thank you. Then I guess last question, if you happen to know; what is the current state of ILP in the ML/AI industry? Is it pretty much dead? Is it merely an interesting theory but hasn't found much application in the market? Does anyone see a bright future for it?
Dependent_Ad5120 t1_jd3knio wrote
Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
OK, I found out why. To use flash attention, I had to use fp16. It is a bit faster then using memory_efficient attention in my test.
[deleted] t1_jd3knaq wrote
Reply to [D] Simple Questions Thread by AutoModerator
[deleted]
paulgavrikov t1_jd3k9l9 wrote
Reply to [D]: Vanishing Gradients and Resnets by Blutorangensaft
Have you tried NFNets https://arxiv.org/abs/2102.06171
paulgavrikov t1_jd3jf74 wrote
Currently there are no good methods to do this. There’s discussion of existing methods and many insights into the problem in this paper https://arxiv.org/abs/2206.14486
TL;DR: which images you should remove depends on the ratio between samples / parameters, no current method works anywhere near ideal, but you may see improvements if you choose the most expensive methods
No_Combination_6429 t1_jd3ioav wrote
Reply to comment by juliensalinas in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
UnusualClimberBear t1_jd3gqap wrote
Reply to comment by tekktokk in [R] What do we think about Meta-Interpretive Learning? by tekktokk
Usually, the problem is the combinatorial nature of the possible number of rules that could apply. Here they seem to be able to find a subset of possible rules with a polynomial complexity, but as table 7 of the second paper contains tiny 'wrt ML/RL data) instances of problems, I would answer yes to your questions. ILP is something coming with strong guarantees, while ML comes with a statistical risk. Theses guarantees aren't free.
tekktokk OP t1_jd3f2xf wrote
Reply to comment by UnusualClimberBear in [R] What do we think about Meta-Interpretive Learning? by tekktokk
So the main problem with MIL or ILP is that it would not be able to handle the scale of the quantity of raw input data that the system would have to process?
farmingvillein t1_jd47vh9 wrote
Reply to comment by usc-ur in Smarty-GPT: wrapper of prompts/contexts [P] by usc-ur
OK, insofar as you care about adoption, I'd encourage you to clean up the README to make it much clearer as to what you're doing. Right now, you've got API call examples, but it isn't clear what is actually happening, why this wrapper is helpful/necessary, etc.
I can guess/infer all the above, but you want your README to make it really, really quick and easy for your readers to figure out what is going on.