Recent comments in /f/MachineLearning
LetGoAndBeReal t1_je7m1tq wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Take a closer look at every script/blog/video related to fine-tuning a model and you will see it doesn’t involve adding new knowledge to the model. If you find an exception I’d be delighted to see it.
NotDoingResearch2 t1_je7lqmi wrote
Reply to comment by a_marklar in [D] 3d model generation by konstantin_lozev
How easy is it to convert point clouds or SDFs to triangle based meshes with useable topology?
[deleted] t1_je7l0pt wrote
Reply to [D] llama 7b vs 65b ? by deck4242
[removed]
[deleted] t1_je7kipl wrote
[deleted]
netham91 t1_je7jpai wrote
Reply to comment by athos45678 in [D] The best way to train an LLM on company data by jaxolingo
Can you share more steps on this and also share some relevant links? Thanks.
netham91 t1_je7jejn wrote
Reply to comment by zeoNoeN in [D] The best way to train an LLM on company data by jaxolingo
Can you share more details or point in the right direction?
light24bulbs t1_je7ilvq wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
I disagree that it's not viable to train a model. The problem is that the best public model (llama) is non-commercial.
That said, it's extremely possible to train things into it. There's a ton of new scripts floating around online. The Lora training is especially good.
The trouble with vectors is they are so limited. They're fine if you need to look up one distinct thing (and the vector gets the match right) but they're utterly useless if you'd like the model to learn about something in general.
bgighjigftuik t1_je7if31 wrote
Reply to [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally? by RedditLovingSun
I told GPT-4 to write the code for a small 4-wheeled robot to act as a Roomba-like device. It wrote the MicroPython code for doing so (I did not know that the project existed). Bought the board (I was using Arduino), re-hooked everything together, and got it to work as expected on the second try. It even created what I believe it is a kind of memory module for long-term storage of my dorm's shape, so the robot has memorized and optimized the cleaning routes on itself.
Not bad for 3 mins of prompting
[deleted] t1_je7hpr5 wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
[removed]
gmork_13 t1_je7hec6 wrote
Reply to comment by russell616 in [D] Simple Questions Thread by AutoModerator
What are you interested in?
I'd recommend covering some classification and generation using images and text, with several different models and data sets.
gmork_13 t1_je7h3rc wrote
Reply to comment by Various_Ad7388 in [D] Simple Questions Thread by AutoModerator
Having started with TF and moved to torch myself, torch was just easier to work with when doing something a bit out of the ordinary. Since then it has gained in popularity and with popularity comes lots of walkthroughs, documentation, videos guides and research papers with github repos.
Adventurous-Mouse849 t1_je7gyqe wrote
Reply to comment by BreakingCiphers in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
And also data augmentation. Rotation, cropping, zooming. This is essential for data scarcity in medical imaging.
gmork_13 t1_je7gvde wrote
Reply to comment by Various_Ad7388 in [D] Simple Questions Thread by AutoModerator
Definitely start with torch. It works all the way up, just start building more complex things.
t_minus_1 t1_je7gsc7 wrote
Please look at sketch and langchain pandas/SQL plugins. I have seen excellent results with both of these approaches. Both of these approaches will require you to send metadata to openAI.
If you would like to do it yourself may be start with GPT-J / LORA and use the same instruction approach/fine tuning like databricks did .
gmork_13 t1_je7gluu wrote
Reply to comment by ReasonablyBadass in [D] Simple Questions Thread by AutoModerator
And not using RNNs haha
[deleted] t1_je7g3a2 wrote
Reply to comment by thedamian in [D] Simple Questions Thread by AutoModerator
[deleted]
gmork_13 t1_je7fmm8 wrote
Reply to comment by james_mclellan in [D] Simple Questions Thread by AutoModerator
I'm assuming you don't mean missing values in your dataset.
-
You can create 'missing' data, but if you create the missing data out of the data you already give to the model you're sort of doing the work for it. For compute efficient reasons you might want to avoid giving it 'unnecessary' data. What is unnecessary can be hard to define. Think about what you want the model to grasp in the first place.
-
I'm not sure what you mean by performing a test. If you were to train a language model the context of the word would define its meaning. You can always take the output probs of a model and do something with that if you'd like (for instance, if it's lots of low probability alternatives - do something).
currentscurrents OP t1_je7faup wrote
Reply to comment by throwaway957280 in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.
I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.
athos45678 t1_je7ercw wrote
Train a Llama LoRa model. The 30 b model isn’t too expensive to tune (40 bucks ish), and is ridiculously capable.
You just need to format the data in a long text doc with each prompt separated by two line breaks. I found it worked best in the alpaca style, where you have a single line break after the prompt, like “write a function that sorts this table in python def sort():” followed by the written out code, and then the double line break to signal the start of the next input.
Then use the simple-llama trainer app to make it all easy.
gmork_13 t1_je7eo4s wrote
Reply to comment by Nobodyet94 in [D] Simple Questions Thread by AutoModerator
Does it have to be a transformer?
Have a look at this model, but it's difficult to answer your question without knowing the compute you have access to: https://paperswithcode.com/method/deit
Browse that site for some alternatives.
planetofthemapes15 t1_je7efmz wrote
Reply to comment by throwaway957280 in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
This should basically disqualify it IMO, thanks for bringing up that point
Edit: There are other suggestions that GPT-4 has abstract understanding. This paper is based on data collected before the release GPT-4 or even GPT-3.5 (October 22). For those drive-by downvoting my comment, explain why this paper is valuable in the face of contrary evidence such as that in https://arxiv.org/abs/2303.12712 which is actually based on the bleeding-edge technology which has generated all the recent interest in LLM's.
throwaway957280 t1_je7du40 wrote
Worth noting this paper predates ChatGPT (3.5) by a few months.
BreakingCiphers t1_je7dlg5 wrote
While testing models and playing with hyperparams can be fun, the real problem is that you are trying to apply deep learning to 100 images.
Get more images.
OkWrongdoer4091 t1_je7dj86 wrote
Reply to comment by Firm-Act-3860 in [D] ICML 2023 Reviewer-Author Discussion by zy415
Great news!
light24bulbs t1_je7mr9p wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
False, they all do. The process of fine-tuning is identical to the initial pre-training, though perhaps with different settings. They're mostly setup to take q&a data for getting llama to take instructions better, but actually that's just text wrapped in some context and passed in straight up.
I was very confused by this as well but no, you can train new stuff.