light24bulbs t1_je7mr9p wrote on March 30, 2023 at 12:14 AM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

False, they all do. The process of fine-tuning is identical to the initial pre-training, though perhaps with different settings. They're mostly setup to take q&a data for getting llama to take instructions better, but actually that's just text wrapped in some context and passed in straight up.

I was very confused by this as well but no, you can train new stuff.

LetGoAndBeReal t1_je7m1tq wrote on March 30, 2023 at 12:08 AM

Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo

Take a closer look at every script/blog/video related to fine-tuning a model and you will see it doesn’t involve adding new knowledge to the model. If you find an exception I’d be delighted to see it.

NotDoingResearch2 t1_je7lqmi wrote on March 30, 2023 at 12:06 AM

Reply to comment by a_marklar in [D] 3d model generation by konstantin_lozev

How easy is it to convert point clouds or SDFs to triangle based meshes with useable topology?

[deleted] t1_je7l0pt wrote on March 30, 2023 at 12:00 AM

Reply to [D] llama 7b vs 65b ? by deck4242

[removed]

[deleted] t1_je7kipl wrote on March 29, 2023 at 11:57 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

[deleted]

netham91 t1_je7jpai wrote on March 29, 2023 at 11:50 PM

Reply to comment by athos45678 in [D] The best way to train an LLM on company data by jaxolingo

Can you share more steps on this and also share some relevant links? Thanks.

netham91 t1_je7jejn wrote on March 29, 2023 at 11:48 PM

Reply to comment by zeoNoeN in [D] The best way to train an LLM on company data by jaxolingo

Can you share more details or point in the right direction?

light24bulbs t1_je7ilvq wrote on March 29, 2023 at 11:42 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

I disagree that it's not viable to train a model. The problem is that the best public model (llama) is non-commercial.

That said, it's extremely possible to train things into it. There's a ton of new scripts floating around online. The Lora training is especially good.

The trouble with vectors is they are so limited. They're fine if you need to look up one distinct thing (and the vector gets the match right) but they're utterly useless if you'd like the model to learn about something in general.

bgighjigftuik t1_je7if31 wrote on March 29, 2023 at 11:41 PM

Reply to [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally? by RedditLovingSun

I told GPT-4 to write the code for a small 4-wheeled robot to act as a Roomba-like device. It wrote the MicroPython code for doing so (I did not know that the project existed). Bought the board (I was using Arduino), re-hooked everything together, and got it to work as expected on the second try. It even created what I believe it is a kind of memory module for long-term storage of my dorm's shape, so the robot has memorized and optimized the cleaning routes on itself.

Not bad for 3 mins of prompting

[deleted] t1_je7hpr5 wrote on March 29, 2023 at 11:35 PM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

[removed]

gmork_13 t1_je7hec6 wrote on March 29, 2023 at 11:33 PM

Reply to comment by russell616 in [D] Simple Questions Thread by AutoModerator

What are you interested in?
I'd recommend covering some classification and generation using images and text, with several different models and data sets.

gmork_13 t1_je7h3rc wrote on March 29, 2023 at 11:31 PM

Reply to comment by Various_Ad7388 in [D] Simple Questions Thread by AutoModerator

Having started with TF and moved to torch myself, torch was just easier to work with when doing something a bit out of the ordinary. Since then it has gained in popularity and with popularity comes lots of walkthroughs, documentation, videos guides and research papers with github repos.

Adventurous-Mouse849 t1_je7gyqe wrote on March 29, 2023 at 11:29 PM

Reply to comment by BreakingCiphers in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

And also data augmentation. Rotation, cropping, zooming. This is essential for data scarcity in medical imaging.

gmork_13 t1_je7gvde wrote on March 29, 2023 at 11:29 PM

Reply to comment by Various_Ad7388 in [D] Simple Questions Thread by AutoModerator

Definitely start with torch. It works all the way up, just start building more complex things.

t_minus_1 t1_je7gsc7 wrote on March 29, 2023 at 11:28 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Please look at sketch and langchain pandas/SQL plugins. I have seen excellent results with both of these approaches. Both of these approaches will require you to send metadata to openAI.

If you would like to do it yourself may be start with GPT-J / LORA and use the same instruction approach/fine tuning like databricks did .

gmork_13 t1_je7gluu wrote on March 29, 2023 at 11:27 PM

Reply to comment by ReasonablyBadass in [D] Simple Questions Thread by AutoModerator

And not using RNNs haha

[deleted] t1_je7g3a2 wrote on March 29, 2023 at 11:23 PM

Reply to comment by thedamian in [D] Simple Questions Thread by AutoModerator

[deleted]

gmork_13 t1_je7fmm8 wrote on March 29, 2023 at 11:20 PM

Reply to comment by james_mclellan in [D] Simple Questions Thread by AutoModerator

I'm assuming you don't mean missing values in your dataset.

You can create 'missing' data, but if you create the missing data out of the data you already give to the model you're sort of doing the work for it. For compute efficient reasons you might want to avoid giving it 'unnecessary' data. What is unnecessary can be hard to define. Think about what you want the model to grasp in the first place.
I'm not sure what you mean by performing a test. If you were to train a language model the context of the word would define its meaning. You can always take the output probs of a model and do something with that if you'd like (for instance, if it's lots of low probability alternatives - do something).

currentscurrents OP t1_je7faup wrote on March 29, 2023 at 11:17 PM

Reply to comment by throwaway957280 in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents

This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.

I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.

athos45678 t1_je7ercw wrote on March 29, 2023 at 11:13 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Train a Llama LoRa model. The 30 b model isn’t too expensive to tune (40 bucks ish), and is ridiculously capable.

You just need to format the data in a long text doc with each prompt separated by two line breaks. I found it worked best in the alpaca style, where you have a single line break after the prompt, like “write a function that sorts this table in python def sort():” followed by the written out code, and then the double line break to signal the start of the next input.

Then use the simple-llama trainer app to make it all easy.

gmork_13 t1_je7eo4s wrote on March 29, 2023 at 11:12 PM

Reply to comment by Nobodyet94 in [D] Simple Questions Thread by AutoModerator

Does it have to be a transformer?
Have a look at this model, but it's difficult to answer your question without knowing the compute you have access to: https://paperswithcode.com/method/deit

Browse that site for some alternatives.

planetofthemapes15 t1_je7efmz wrote on March 29, 2023 at 11:11 PM

Reply to comment by throwaway957280 in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents

This should basically disqualify it IMO, thanks for bringing up that point

Edit: There are other suggestions that GPT-4 has abstract understanding. This paper is based on data collected before the release GPT-4 or even GPT-3.5 (October 22). For those drive-by downvoting my comment, explain why this paper is valuable in the face of contrary evidence such as that in https://arxiv.org/abs/2303.12712 which is actually based on the bleeding-edge technology which has generated all the recent interest in LLM's.

throwaway957280 t1_je7du40 wrote on March 29, 2023 at 11:06 PM

Reply to [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents

Worth noting this paper predates ChatGPT (3.5) by a few months.

BreakingCiphers t1_je7dlg5 wrote on March 29, 2023 at 11:04 PM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

While testing models and playing with hyperparams can be fun, the real problem is that you are trying to apply deep learning to 100 images.

Get more images.

OkWrongdoer4091 t1_je7dj86 wrote on March 29, 2023 at 11:04 PM

Reply to comment by Firm-Act-3860 in [D] ICML 2023 Reviewer-Author Discussion by zy415

Great news!

Recent comments in /f/MachineLearning