Recent comments in /f/MachineLearning
lifesthateasy t1_jcxras0 wrote
Reply to [P] TherapistGPT by SmackMyPitchHup
Hooo leee, imagine if this has any of the issues ChatGPT had
Alternative_iggy t1_jcxrabk wrote
Reply to [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543
I guess I’m at year 10+ now. In the last few years I’ve switched back to academia/ research!
I started out in research where I was happy but made no money, switched to a startup where I made a little more money but often got bored of the problems, when a startup got bought by a bigger company found myself working on sometimes WAY cooler problems but had to deal with a lot of bureaucracy, moved up in management roles for a bit, and now hopped back to straight up research where I’m incredibly happy and have a lot more flexibility.
baffo32 t1_jcxqr2i wrote
Reply to comment by Art10001 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
i visited cvpr last year and people were saying that moe was what mostly was being used; i haven’t tried these things myself though
No_Combination_6429 t1_jcxqot2 wrote
Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Is it possibile to do the Same with other Models aswell? Like Bloomz etc…
darthstargazer t1_jcxqcgw wrote
Reply to [D] Simple Questions Thread by AutoModerator
Subject : Variational inference and genarative networks
I've been trying to grasp the ideas behind Variational auto encoders (Kingma et al) vs normalized flows (E.G RealNVP)
If someone can explain the link between the two I'd be thankful! Aren't they trying to do the same thing?
henkje112 t1_jcxlf74 wrote
Reply to comment by mmmfritz in [D] Simple Questions Thread by AutoModerator
Look into the Fact Extraction and VERification (FEVER) workshop :)
henkje112 t1_jcxlc7t wrote
Reply to comment by ViceOA in [D] Simple Questions Thread by AutoModerator
Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.
tungns91 t1_jcxkd5z wrote
Reply to comment by Civil_Collection7267 in [D] Best ChatBot that can be run locally? by rustymonster2000
Do you have specific chart between consumer hardware and performance of LLaMA 7B to 65B? Like I want to know if my poor gaming PC could have an response in under 1 minute?
mrfreeman93 t1_jcxkc6e wrote
Reply to comment by adt in [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr
well, that's a little out of date
SnooMarzipans3021 t1_jcxk1a3 wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hello, does anyone have experience with vision transformers?
I get wierd grid artifacts, especially on white / bright, textureless walls or sky.
Here is how it looks like: https://imgur.com/a/dwF69Z3
Im using maxim architecture: https://github.com/vztu/maxim-pytorch
My general task is image enchancement (make image prettier)
I have also tried simple GAN methods https://github.com/eezkni/UEGAN which doesnt have such issues
I have researched a bit but im unable to formualte this problem properly. I have found that guided filters might help here but havent tested them yet. Thanks
henkje112 t1_jcxjx44 wrote
Reply to comment by rylo_ren_ in [D] Simple Questions Thread by AutoModerator
I'm assuming you're using sklearn for LinearRegression. You're initializing an instance of the LinearRegression class with a normalize parameter, but this is not valid for this class (for a list of possible parameters, see the documentation).
I'm not sure what you're trying to do, but I think you want to normalize your input data? In that case you should ook at MinMaxScaler. This transforms your features by scaling each feature to a given range.
nenkoru t1_jcxjhy6 wrote
Reply to comment by Trolann in [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai
Yep, understandable. Made a Dockerfile for the project so that it could be run in an isolated environment. Checkout a pull request from me
fromnighttilldawn t1_jcxgr6b wrote
Reply to [R] What are the current must-read papers representing the state of the art in machine learning research? by alfredr
I don't read any of the papers because there is basically no way to re-implement them or independently verify them. People can feel shocked, surprised, amazed or enlightened all they want while reading the paper but in truth people still have no idea how any of this truly work.
Before at least you had a mathematical model to work with which shows that even at small-scale this idea can lead to something that work as promised on a larger-scale ML model.
Nowadays OpenAI can claim that Jesus came back and cleaned the data for their model and we would actually have no way to actually verify the veracity of this claim.
mrcet007 t1_jcxge76 wrote
Reply to comment by derek_ml in [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai
whats the benefit?
Civil_Collection7267 t1_jcx9jri wrote
LLaMA 13B/30B and LLaMA 7B with the Alpaca LoRA are the best that can be run locally on consumer hardware. LLaMA 65B exists but I wouldn't count that as something that can be run locally by most people.
From my own testing, the 7B model with the LoRA is comparable to 13B in coherency, and it's generally better than the recently released OpenAssistant model. If you'd like to see some examples, I answered many prompts in a r/singularity AMA for Alpaca. Go to this post and sort by new to see the responses. I continued where the OP left off.
SomewhereAtWork t1_jcx8aoc wrote
Reply to comment by UnusualClimberBear in [D] Is it possible to train LLaMa? by New_Yak1645
> if you have no fear of legal actions...
Legal actions? They can direct that to my Legal-LLaMA. ;-)
mike94025 t1_jcx5xvg wrote
Reply to comment by crude2refined in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Data type?
SDPA currently has 3 kernels implemented by a kernel picker.
- sdpa_math
- sdpa_flash
- sdpa_mem_eff
A kernel picker picks the best given your constraints
- Math is the trusted kernel from the equation in the paper.
- Flash only works for FP16 and BF16, and on SM80 (e.g., A100).
- mem_efficient kernel works on older architecture levels, and supports FP32, but the upside is limited due to lack of compute capacity for FP32. FP16 or BF16 should help. Also, there are requirements on alignment, dropout values etc to qualify for the high-perf SDPA implementations. Dropout required to be 0 @ PT2.0
Also, different kernels parallelize across different dimensions, so B=1 will not work with all of those kernels.
In a nutshell, performance comes at the price of generality, and GPUs are finnecky to get the performance, so our inputs must adhere to those, and parallelization strategies matter for different combinations of dimensions.
pkuba208 t1_jcx3d9i wrote
Reply to comment by Art10001 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Well... I run this model on a raspberry pi 4B, but you will need AT LEAST 8gb ram
gamerx88 t1_jcx0t9r wrote
Reply to comment by fullstackai in [D] Unit and Integration Testing for ML Pipelines by Fender6969
Ah, that makes sense.
kaikraul t1_jcx0hhm wrote
Yes, but you're working with older models. But the advantage is that you can adapt them to you. You can get results that you can't get with all the online models. But they lag behind from the current modulation. And I just wonder is it worth it? By the time I have downloaded the whole thing, trained and get results, the whole thing is already 2-3 versions further. I'm running behind on quality.
banuk_sickness_eater t1_jcwz87s wrote
Reply to comment by nat_friedman in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman
Thank you for doing this, doubling the corpus of literature from antiquity is absolutely a net positive for humanity.
disastorm t1_jcwyjyv wrote
Reply to [D] Simple Questions Thread by AutoModerator
I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?
ninjasaid13 t1_jcwwgt5 wrote
Reply to comment by Art10001 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
now what?
michaelthwan_ai OP t1_jcxrcfu wrote
Reply to comment by BalorNG in [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github) by michaelthwan_ai
I agree with you. 3 thoughts from me
- I think one direction of the so-called safety AI to give a genuine answer, is to give it factual/external info. I mean 1) a Retrieval-based model like searchGPT 2) API calling like toolformer (e.g. check weather API)
- LLM, is essentially a compression problem (I got the idea in lambdalabs). But it cannot remember everything. Therefore an efficient way to solve so are retrieval methods to search a very large space (like pagerank/google search), then obtain a smaller result set and let the LLM organize and filter related content from it.
- Humans are basically like that right? But if we got a query, we may need to read books (external retrieval) which is pretty slow. However, humans have a cool feature, long-term memory, to store things permanently. Imagine if an LLM can select appropriate things during your queries/chat and store them as a text or knowledge base inside it, then it is a knowledge profile to permanently remember the context bonded between you and the AI, instead of the current situation that ChatGPT will forget everything after a restart.