Recent comments in /f/MachineLearning
trajo123 t1_je7dgjz wrote
100 images??? Folks, neural nets are data hungry, if you don't have reams of data, don't fiddle with architectures, definitely not at first. The first thing to do when data is limited is to use pre-trained models. Then do data augmentation and only then look at other things like architectures and losses if you really have nothing better to do with your time.
SMP offers a wide variety of segmentation models with the option to use pre-trained weights.
jaxolingo OP t1_je7clzh wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
Nice!! i'll look into it
jaxolingo OP t1_je7c63z wrote
Reply to comment by Cantareus in [D] The best way to train an LLM on company data by jaxolingo
Cool, this is exactly the approach I had in mind that the model would take
currentscurrents t1_je7c29r wrote
The architecture probably isn't the problem. You only have 100 images, that's your problem.
If you can't get more labeled data, you should pretrain on unlabeled data that's as close as possible to your task - preferably other dental x-rays. Then you can finetune on your real dataset.
jaxolingo OP t1_je7b3lf wrote
Reply to comment by 13ass13ass in [D] The best way to train an LLM on company data by jaxolingo
Cool i'll check it out thanks!
I've been learning langchain but haven't gotten to sqlchain just yet
Business-Lead2679 OP t1_je7aefg wrote
Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679
Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs
Business-Lead2679 OP t1_je794o8 wrote
Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679
I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong
Business-Lead2679 OP t1_je792jz wrote
Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679
Finetuning
WarProfessional3278 t1_je790g7 wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done
lostmsu t1_je78jfg wrote
Reply to comment by WindForce02 in [D] Prediction time! Lets update those Bayesian priors! How long until human-level AGI? by LanchestersLaw
You are missing the idea entirely. I am sticking to the idea of the original Turing test to determine if AI is human-level already or not yet.
The original Turing test is dead simple and can be applied to ChatGPT easily.
The only other thing in my comment is that "human-level" is vague, as intelligence differs from human to human, which allows for goalpost moving like in your comment. IQ is the best measure of intelligence we have. So it is reasonable to turn the idea of Turing test into a plethora of different tests Turing(I) which is like any regular Turing test, but the IQ of the humans participating in the tests (both machine's opponent, and the person who needs to guess which one is the machine) is <= I.
My claim is that I believe ChatGPT or ChatGPT + some trivial form of memory enhancements (like feeding previous failures back into prompts) quite possibly can already pass Turing(70).
WokeAssBaller t1_je783go wrote
Reply to comment by lambertb in [D] GPT4 and coding problems by enryu42
Fair enough then give them problems to solve and measure their output. This feels like “90% of dentists claim crest improves your dental health”
I’ll take an independent study into consideration but today I find it more of a novelty
CatalyzeX_code_bot t1_je774q1 wrote
Reply to [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Found relevant code at https://github.com/CerebrasResearch/Sparse-IFT + all code implementations here
--
To opt out from receiving code links, DM me
LetGoAndBeReal t1_je71r0g wrote
Reply to comment by machineko in [D] The best way to train an LLM on company data by jaxolingo
Fine-tuning can be great for getting better output from the model based on the knowledge that model already contains. I only meant fine-tuning is not viable for getting new data/knowledge into a model. Fine-tuning does not accomplish knowledge absorption.
deep-yearning t1_je710wy wrote
What accuracy (Dice?) are you getting? 100 training images is pretty small. Have you tried cross-validation?
Business-Lead2679 OP t1_je70nka wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
Id like to train it on those settings:
EPOCHS = 3
LEARNING_RATE = 2e-5
CUTOFF_LEN = 1024
machineko t1_je70llx wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
Why would you say that fine-tuning is not viable? There are many production use cases of fine-tuning a model using in-house proprietary data.
If fact, if you have the resources you can do both fine-tuning of an existing model (whether is just supervised or unsupervised) and also use that for retrieval augmented generation.
bgighjigftuik t1_je6yfrg wrote
Associative memory on steroids. That's my bet on LLMs' "understanding"
modernzen t1_je6xujz wrote
Reply to comment by rshah4 in [D] FOMO on the rapid pace of LLMs by 00001746
Totally agree with this. Something like ChatGPT is overkill for most use cases and comes at a cost of both money (using the API) and latency. Clever prompting and fine-tuning can let you build free, fast models that are tailored towards your specific problem at hand.
EarthquakeBass t1_je6wa0g wrote
I think Azure might actually have support for private OpenAI stuff
Cantareus t1_je6vn79 wrote
Reply to comment by Im_Unlucky in [D] The best way to train an LLM on company data by jaxolingo
>shove data into vector db
>
>embed incoming query
>
>return topk matches from vector db
Can these steps be done through the openai API?
You only reply using JSON. Write 5 queries that will return useful data to aid you in answering the users questions "What was Apple’s return compared to it’s sector last month" return in JSON array with no explanations. Each query must run independently. Use the return format [{"Reason":"Why the query is useful","query":"The sql query"}] The schema is:
(Imagine a valid schema is here)
GPT-4 Response:
[
{
"Reason": "Retrieve Apple's stock return last month",
"query": "SELECT stock_return FROM stocks WHERE company = 'Apple' AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE();"
},
{
"Reason": "Retrieve sector for Apple",
"query": "SELECT sector FROM companies WHERE company_name = 'Apple';"
},
{
"Reason": "Retrieve companies in Apple's sector",
"query": "SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple');"
},
{
"Reason": "Calculate average return for Apple's sector last month",
"query": "SELECT AVG(stock_return) as sector_return FROM stocks WHERE company IN (SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple')) AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE();"
},
{
"Reason": "Compare Apple's return to the average return of its sector last month",
"query": "SELECT (SELECT stock_return FROM stocks WHERE company = 'Apple' AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE()) - (SELECT AVG(stock_return) as sector_return FROM stocks WHERE company IN (SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple')) AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE()) as comparison;"
}
]
Im_Unlucky t1_je6vjqd wrote
Reply to comment by visarga in [D] The best way to train an LLM on company data by jaxolingo
This is true. Good prompt engineering has definitely gone a long way for me in improving rag performance though. I don’t think shoving snippets into the prompt is the long term solution, but at this point in time it’s the best option in my experience.
MadDanWithABox t1_je6vh9x wrote
Just a caveat. Any training (fine-tuning really) that you do on an LLM is NOT guaranteed to be able to give you correct answers. The answers to your questions will seem plausible, and could be correct, but you'll need some system to verify if it's something you want to take action on.
kromem t1_je6uv46 wrote
Reply to comment by visarga in [D] The best way to train an LLM on company data by jaxolingo
"Moar layers" doesn't only need to apply to the NN.
CoT prompting works by breaking analysis down into smaller steps that each generate their own additional context.
Doing something similar with DB analysis is absolutely possible, such as preemptively summarizing schema and using that summary as part of the retrieval to contextualize the specific fragments.
Additionally, having static analysis examples on hand for related tables that's fed in to go from zero shot to few shot would go a long way at reducing some of the issues you highlight.
xander76 OP t1_je6ups2 wrote
Reply to comment by Intelligent_Rough_21 in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
Ha, as a programmer, I too am doomed to laziness. :)
gmork_13 t1_je7dho0 wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
For a more stable compute, check out google cloud gpu.
Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.
edit: here, I found one: https://github.com/tloen/alpaca-lora
I think you could get this done for far less than your budget if need be.