gmork_13 t1_je7dho0 wrote on March 29, 2023 at 11:03 PM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

trajo123 t1_je7dgjz wrote on March 29, 2023 at 11:03 PM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

100 images??? Folks, neural nets are data hungry, if you don't have reams of data, don't fiddle with architectures, definitely not at first. The first thing to do when data is limited is to use pre-trained models. Then do data augmentation and only then look at other things like architectures and losses if you really have nothing better to do with your time.

SMP offers a wide variety of segmentation models with the option to use pre-trained weights.

jaxolingo OP t1_je7clzh wrote on March 29, 2023 at 10:57 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Nice!! i'll look into it

jaxolingo OP t1_je7c63z wrote on March 29, 2023 at 10:54 PM

Reply to comment by Cantareus in [D] The best way to train an LLM on company data by jaxolingo

Cool, this is exactly the approach I had in mind that the model would take

currentscurrents t1_je7c29r wrote on March 29, 2023 at 10:53 PM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

The architecture probably isn't the problem. You only have 100 images, that's your problem.

If you can't get more labeled data, you should pretrain on unlabeled data that's as close as possible to your task - preferably other dental x-rays. Then you can finetune on your real dataset.

jaxolingo OP t1_je7b3lf wrote on March 29, 2023 at 10:46 PM

Reply to comment by 13ass13ass in [D] The best way to train an LLM on company data by jaxolingo

Cool i'll check it out thanks!

I've been learning langchain but haven't gotten to sqlchain just yet

Business-Lead2679 OP t1_je7aefg wrote on March 29, 2023 at 10:40 PM

Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679

Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs

Business-Lead2679 OP t1_je794o8 wrote on March 29, 2023 at 10:31 PM

Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679

I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong

Business-Lead2679 OP t1_je792jz wrote on March 29, 2023 at 10:31 PM

Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679

Finetuning

WarProfessional3278 t1_je790g7 wrote on March 29, 2023 at 10:30 PM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

lostmsu t1_je78jfg wrote on March 29, 2023 at 10:27 PM

Reply to comment by WindForce02 in [D] Prediction time! Lets update those Bayesian priors! How long until human-level AGI? by LanchestersLaw

You are missing the idea entirely. I am sticking to the idea of the original Turing test to determine if AI is human-level already or not yet.

The original Turing test is dead simple and can be applied to ChatGPT easily.

The only other thing in my comment is that "human-level" is vague, as intelligence differs from human to human, which allows for goalpost moving like in your comment. IQ is the best measure of intelligence we have. So it is reasonable to turn the idea of Turing test into a plethora of different tests Turing(I) which is like any regular Turing test, but the IQ of the humans participating in the tests (both machine's opponent, and the person who needs to guess which one is the machine) is <= I.

My claim is that I believe ChatGPT or ChatGPT + some trivial form of memory enhancements (like feeding previous failures back into prompts) quite possibly can already pass Turing(70).

WokeAssBaller t1_je783go wrote on March 29, 2023 at 10:23 PM

Reply to comment by lambertb in [D] GPT4 and coding problems by enryu42

Fair enough then give them problems to solve and measure their output. This feels like “90% of dentists claim crest improves your dental health”

I’ll take an independent study into consideration but today I find it more of a novelty

CatalyzeX_code_bot t1_je774q1 wrote on March 29, 2023 at 10:16 PM

Reply to [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101

Found relevant code at https://github.com/CerebrasResearch/Sparse-IFT + all code implementations here

--

To opt out from receiving code links, DM me

LetGoAndBeReal t1_je71r0g wrote on March 29, 2023 at 9:38 PM

Reply to comment by machineko in [D] The best way to train an LLM on company data by jaxolingo

Fine-tuning can be great for getting better output from the model based on the knowledge that model already contains. I only meant fine-tuning is not viable for getting new data/knowledge into a model. Fine-tuning does not accomplish knowledge absorption.

deep-yearning t1_je710wy wrote on March 29, 2023 at 9:33 PM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

What accuracy (Dice?) are you getting? 100 training images is pretty small. Have you tried cross-validation?

Business-Lead2679 OP t1_je70nka wrote on March 29, 2023 at 9:31 PM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

machineko t1_je70llx wrote on March 29, 2023 at 9:30 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Why would you say that fine-tuning is not viable? There are many production use cases of fine-tuning a model using in-house proprietary data.
If fact, if you have the resources you can do both fine-tuning of an existing model (whether is just supervised or unsupervised) and also use that for retrieval augmented generation.

bgighjigftuik t1_je6yfrg wrote on March 29, 2023 at 9:16 PM

Reply to [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents

Associative memory on steroids. That's my bet on LLMs' "understanding"

modernzen t1_je6xujz wrote on March 29, 2023 at 9:12 PM

Reply to comment by rshah4 in [D] FOMO on the rapid pace of LLMs by 00001746

Totally agree with this. Something like ChatGPT is overkill for most use cases and comes at a cost of both money (using the API) and latency. Clever prompting and fine-tuning can let you build free, fast models that are tailored towards your specific problem at hand.

EarthquakeBass t1_je6wa0g wrote on March 29, 2023 at 9:01 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

I think Azure might actually have support for private OpenAI stuff

Cantareus t1_je6vn79 wrote on March 29, 2023 at 8:57 PM

Reply to comment by Im_Unlucky in [D] The best way to train an LLM on company data by jaxolingo

>shove data into vector db
>
>embed incoming query
>
>return topk matches from vector db

Can these steps be done through the openai API?

You only reply using JSON. Write 5 queries that will return useful data to aid you in answering the users questions "What was Apple’s return compared to it’s sector last month" return in JSON array with no explanations. Each query must run independently. Use the return format [{"Reason":"Why the query is useful","query":"The sql query"}] The schema is:

(Imagine a valid schema is here)

GPT-4 Response:

[

{

"Reason": "Retrieve Apple's stock return last month",

"query": "SELECT stock_return FROM stocks WHERE company = 'Apple' AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE();"

},

{

"Reason": "Retrieve sector for Apple",

"query": "SELECT sector FROM companies WHERE company_name = 'Apple';"

},

{

"Reason": "Retrieve companies in Apple's sector",

"query": "SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple');"

},

{

"Reason": "Calculate average return for Apple's sector last month",

"query": "SELECT AVG(stock_return) as sector_return FROM stocks WHERE company IN (SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple')) AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE();"

},

{

"Reason": "Compare Apple's return to the average return of its sector last month",

"query": "SELECT (SELECT stock_return FROM stocks WHERE company = 'Apple' AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE()) - (SELECT AVG(stock_return) as sector_return FROM stocks WHERE company IN (SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple')) AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE()) as comparison;"

}

]

Im_Unlucky t1_je6vjqd wrote on March 29, 2023 at 8:56 PM

Reply to comment by visarga in [D] The best way to train an LLM on company data by jaxolingo

This is true. Good prompt engineering has definitely gone a long way for me in improving rag performance though. I don’t think shoving snippets into the prompt is the long term solution, but at this point in time it’s the best option in my experience.

MadDanWithABox t1_je6vh9x wrote on March 29, 2023 at 8:56 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Just a caveat. Any training (fine-tuning really) that you do on an LLM is NOT guaranteed to be able to give you correct answers. The answers to your questions will seem plausible, and could be correct, but you'll need some system to verify if it's something you want to take action on.

kromem t1_je6uv46 wrote on March 29, 2023 at 8:52 PM

Reply to comment by visarga in [D] The best way to train an LLM on company data by jaxolingo

"Moar layers" doesn't only need to apply to the NN.

CoT prompting works by breaking analysis down into smaller steps that each generate their own additional context.

Doing something similar with DB analysis is absolutely possible, such as preemptively summarizing schema and using that summary as part of the retrieval to contextualize the specific fragments.

Additionally, having static analysis examples on hand for related tables that's fed in to go from zero shot to few shot would go a long way at reducing some of the issues you highlight.

xander76 OP t1_je6ups2 wrote on March 29, 2023 at 8:51 PM

Reply to comment by Intelligent_Rough_21 in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76

Ha, as a programmer, I too am doomed to laziness. :)

Recent comments in /f/MachineLearning