Recent comments in /f/MachineLearning
jd_3d t1_je7xkwq wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/
dreaming_geometry t1_je7wweh wrote
Reply to [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I've thinking about trying something like this. Everything is moving so fast now in ml, I feel like nearly every new idea I have gets published before I even find the time to get started.
antonivs t1_je7ws1v wrote
Reply to comment by SlowThePath in [D] FOMO on the rapid pace of LLMs by 00001746
I was referring to what the OpenAI GPT models are trained on. For GPT-3, that involved about 45 TB of text data, part of which was Common Crawl, a multi-petabyte corpus obtained from 8 years of web crawling.
On top of that, 16% of its corpus was books, totaling about 67 billion tokens.
nuke-from-orbit t1_je7wkvr wrote
Reply to comment by light24bulbs in [P] Using ChatGPT plugins with LLaMA by balthierwings
That is exactly what is happening now.
wrossmorrow t1_je7vy2p wrote
Reply to comment by ustainbolt in [D] Training a 65b LLaMA model by Business-Lead2679
+1 for lambda labs
pengo t1_je7vr2t wrote
Reply to comment by Puzzleheaded_Acadia1 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Yes, it can think critically, it just doesn't tell you whether it is or isn't at any one time.
dreaming_geometry t1_je7vmov wrote
Reply to comment by Business-Lead2679 in [D] Training a 65b LLaMA model by Business-Lead2679
If you're having trouble with Vast.ai, you can ask for help on the discord. Sounds like your desired use case is a good fit.
BrotherAmazing t1_je7vj9v wrote
People saying get more than 100 images are right (all else being equal, yes, get more images!) but you likely can make good progress without as many images for your problem with clever augmentation and a smaller network.
Here’s why:
-
You only have to detect cavities. It’s not some 1,000-class semantic segmentation problem.
-
You should be working with single channel grayscale images, and not pixels that naturally come in 3-channel RGB color.
-
This is X-ray data just of teeth, so you don’t have nearly the amount of complex fine-detailed textures and patterns (with colors) that are exhibited in more general RGB optical datasets of all sorts of objects and environments.
Of course for a real operational system that you will use in commercial products you will need to get far more than 100 images. However, for a simple research problem or prototype demo, you can likely show good results and feasibility (without overfitting, yes) on your dataset with a smaller net and clever augmentation.
Justice43 t1_je7vbe7 wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.
light24bulbs t1_je7sn5e wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
The fact that the fine tuning can improve instruction following is EXACTLY that. There's no distinction between predicting the next word, following instructions, or deep knowledge. They are all the same thing as far as an LLM.
xander76 OP t1_je7rymw wrote
Reply to comment by bluenigma in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
Great question! We spent a lot of time experimenting with how to cue GPT into returning the right JSON type, and it’s pretty darned compliant. GPT-3 doesn’t always do a good job adhering to strict JSON syntax, but we wrote an extremely lenient parser that understands the weird things that GPT-3 gets wrong. (Sometimes it uses single quotes instead of double, sometimes it puts new lines in strings, sometimes it decides a semicolon is a better choice than a comma!). GPT-4 and GPT-3.5 are significantly better at JSON syntax.
On the question of returning the actual type you asked for, we do a run time type check to make sure it’s right. So if you get a value back, you can be sure it’s the type you wanted.
DonutListen2Me t1_je7rr2p wrote
You don't want to input tabular data to a LLM. You should use a model specifically for tabular data, like XGBoost.
What you want to do is embed your company's documents and use a vector search on the embeddings to feed as context to an LLM.
LetGoAndBeReal t1_je7re7y wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Of course the fine-tuning data itself can have knowledge not in the model - that doesn’t prove anything.
What you need to show is that knowledge presumably added during fine-tuning was then retrieved from the model after fine-tuning.
bluenigma t1_je7r1ci wrote
Reply to comment by reditum in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
Doesn't this also have pretty high risk of not actually adhering to the declared return type?
sdmat t1_je7pvwm wrote
Reply to comment by currentscurrents in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
We can test with things that are highly unlikely to be in the training data.
Model: GPT-4
How can Mars colonists use jeans to improve habitat survivability?
Mars colonists can use jeans to improve habitat survivability by repurposing them in several creative and resourceful ways. Here are some ideas on how jeans can be utilized to enhance the living conditions on Mars:
Insulation: Jeans can be cut into pieces and used as insulation material for habitats, providing an additional layer of thermal protection against the harsh Martian environment.
Air and Water Filtration: Denim fibers can be used to create simple air and water filters, helping to remove dust particles and impurities from both air and water resources.
Gardening and Agriculture: Jeans can be shredded and used as a component of a growth medium for plants, providing a source of organic material that can help retain moisture and support plant growth in Martian greenhouses.
Patching Material: The durable fabric of jeans can be used to patch holes or reinforce weak areas in habitat structures or protective suits.
Textile-Based Electronics: By weaving conductive threads into the denim fabric, jeans can be transformed into flexible, wearable electronic devices or sensors. These could be employed for monitoring vital signs, environmental conditions, or providing communication capabilities.
Emergency Tourniquets: In case of injuries, jeans can be cut into strips and used as improvised tourniquets to control bleeding.
Storage Solutions: Jeans can be fashioned into storage bags or pouches for organizing tools, equipment, and other essential items within the habitat.
Crafting and Construction: Denim fibers can be combined with other materials to create a variety of items, such as ropes, mats, or even structural components for the habitat.
While these suggestions highlight the potential for repurposing jeans in a Mars colony, it's essential to note that relying solely on jeans for habitat survivability would not be sufficient. However, creative reuse of available resources can undoubtedly contribute to the overall sustainability and resilience of a Martian habitat.
light24bulbs t1_je7pnxa wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
This IS training. That's what it is. This is how "knowledge" got into the model in the first place
ustainbolt t1_je7plqi wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.
[deleted] t1_je7pfgc wrote
Reply to [D] Simple Questions Thread by AutoModerator
[deleted]
LetGoAndBeReal t1_je7p0l8 wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
In what way does this show that new knowledge was added to a large language model?
a_marklar t1_je7ogfc wrote
Reply to comment by NotDoingResearch2 in [D] 3d model generation by konstantin_lozev
Depends what exactly useable topology means to you but my first stop is always CGAL
light24bulbs t1_je7ob17 wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
Okay, here's my friend turning the alpaca instructions into training data
https://github.com/lxe/llama-peft-tuner/blob/main/convert_alpaca_to_text.py
See how it's just turning it into a fat string?
itsyourboiirow t1_je7n7p8 wrote
Others have mentioned it, but do data augmentation, crop, resize, rotate, etc. and you'll be able to increase the size of your dataset and improve results.
LetGoAndBeReal t1_je7n1gc wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Instead of seeing who can talk more loudly about who’s right, why don’t you post a link of a script that does this.
gstatton t1_je7mwmv wrote
Reply to comment by jaxolingo in [D] The best way to train an LLM on company data by jaxolingo
I’ve done something similar to this, but I created a view in snowflake, dumped the ‘describe table <table-name>’ to CSV, used that CSV descriptor in the head of the prompt, and asked the LLM to generate a SQL query from the user’s question using the provided table schema…I’ll take the model’s return and pass it directly to snowflake, return it as a data frame, and then send that back to the LLM to get a text summary of the query results
You can also use LangChain to set up the chain to do this automatically..
SlowThePath t1_je7xmaz wrote
Reply to comment by antonivs in [D] FOMO on the rapid pace of LLMs by 00001746
Definitely not denying that it was trained on a massive amount of data because it was, but calling it internet sized is not accurate. I guess you were speaking in hyperbole and I juts didn't read it that way. I know what you mean.