arnowaczynski t1_je6flvk wrote on March 29, 2023 at 7:15 PM

Reply to [D] Alternatives to fb Hydra? by alyflex

dataclasses from python standard library + dacite.from_dict

Rawvik t1_je6f9w8 wrote on March 29, 2023 at 7:12 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

I am also currently looking to do something like this for my own company data. Please let me know if you find something useful.

zeoNoeN t1_je6f3b5 wrote on March 29, 2023 at 7:11 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

I had a lot of success with implementing huggingface models in the last week, so that could be a starting point

NoRip7374 t1_je6e6rd wrote on March 29, 2023 at 7:05 PM

Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

At least some good news!

_Arsenie_Boca_ t1_je6ayl7 wrote on March 29, 2023 at 6:45 PM

Reply to comment by RicketyCricket in [D] Alternatives to fb Hydra? by alyflex

Thanks, looks like your library isn't far behind hydra in terms of functionality. Will definitely look into it more closely the next time I set up a project.

What would you say are the pros and cons between hydra and spock?

2blazen t1_je6axjp wrote on March 29, 2023 at 6:45 PM

Reply to comment by gigglegenius in [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally? by RedditLovingSun

>- Creative brainstorming for professional work

I struggle with this, I was trying to get it it help me come up with interesting thesis research questions in a very specific audioML field, but it failed to come up with anything original, and I don't know if there's a certain way I should have phrased my questions or it's just creative limitations

visarga t1_je6a6w8 wrote on March 29, 2023 at 6:40 PM

Reply to comment by harharveryfunny in [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter

> its own output is its only working memory

All the fantastic feats LLMs can do are thanks to context conditioning.

2blazen t1_je69rl1 wrote on March 29, 2023 at 6:37 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

I read about this tool on this sub and looks like what you're looking for https://lm-code-binder.github.io/

only_short t1_je68peh wrote on March 29, 2023 at 6:30 PM

Reply to comment by jaxolingo in [D] The best way to train an LLM on company data by jaxolingo

You can just call the GPT-4 API?

phb07jm t1_je676x4 wrote on March 29, 2023 at 6:21 PM

Reply to comment by SkinnyJoshPeck in [D] The best way to train an LLM on company data by jaxolingo

Also you might want more than just one ML engineer! 🤣

memberjan6 t1_je65xrg wrote on March 29, 2023 at 6:13 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Go wstch YouTube for pinecone.ai and milvus. Also, go watch office365 copilot video.

LetGoAndBeReal t1_je65ffo wrote on March 29, 2023 at 6:09 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

The comments here so far have addressed three possible approaches to this. Two of those approaches - ie training your own model and fine-tuning an existing model - are not currently viable. Training your model would require a ridiculous amount of human and compute power and not result in something where data could be easily added. Fine-tuning a model does not result in the model absorbing new data - it only conditions the output patterns from the model using data/knowledge the model gained during initial training.

The only viable approach is to use retrieval augmented generation, where data relating to user questions are retrieved from outside the model and fed to model as part of the prompt. Tools like LangChain can help you build a RAG solution on your own. There are also many services coming out that provide this sort of capability, such as humata.ai.

[deleted] t1_je654io wrote on March 29, 2023 at 6:08 PM

Reply to [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314

[removed]

currentscurrents OP t1_je631oa wrote on March 29, 2023 at 5:54 PM

Reply to [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents

TL;DR:

This is a survey paper. The authors summarize a variety of arguments about whether or not LLMs truly "understand" what they're learning.
The major argument in favor of understanding is that LLMs are able to complete many real and useful tasks that seem to require understanding.
The major argument against understanding is that LLMs are brittle in non-human ways, especially to small changes in their inputs. They also don't have a real-world experience to ground their knowledge in (although multimodal LLMs may change this).
A key issue is that no one has a solid definition of "understanding" in the first place. It's not clear how you would test for it. Tests intended for humans don't necessarily test understanding in LLMs.

I tend to agree with their closing summary. LLMs likely have a type of understanding, and humans have a different type of understanding.

>It could thus be argued that in recent years the field of AI has created machines with new modes of understanding, most likely new species in a larger zoo of related concepts, that will continue to be enriched as we make progress in our pursuit of the elusive nature of intelligence.

evergreensphere t1_je613n6 wrote on March 29, 2023 at 5:42 PM

Reply to comment by patniemeyer in [D] The best way to train an LLM on company data by jaxolingo

Fine tuning is only available for the old GPT3. It is not available for GPT3.5 or GPT4.

Also, most people I've talked to found that fine tuning did not work as well as using things like vectorized search, or vectorized search combined with graph index.

EverythingGoodWas t1_je612lg wrote on March 29, 2023 at 5:42 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

You aren’t going to train an LLM on company data. You could fine tune an existing one with company data, but creating an LLM from scratch is an absolutely massive compute task. If you are trying to make a closed domain question answering system, that uses your company’s data, you basically need to create a full pipeline from parsing, searching, and finally pushing the context and question to a language model.

abnormal_human t1_je60s31 wrote on March 29, 2023 at 5:40 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Yes, it's totally possible to train an LLM to understand tabular data. It's a very general purpose architecture. With enough resources it is well suited to a wide range of problems, and yes, Azure/Snowflake can do everything you need (at some price, assuming you know what to do with them).

You need to make a decision about whether you want to bake the info into the LLM, or whether you want to teach the LLM to find the answers and then format them for humans.

This will depend on your use case, budget, team size, competencies, data set size, and time to market requirements. Baking the info into the LLM is a lot harder than doing the other thing, like potentially 100x-1000x harder and more expensive, and without people with experience doing it, you will waste a lot of time/energy getting there.

DigThatData t1_je600q2 wrote on March 29, 2023 at 5:35 PM

Reply to comment by alyflex in [D] Alternatives to fb Hydra? by alyflex

i misunderstood, i thought you were looking for an alternative config component. if you're looking for an atlernative for managing hyperparameter search jobs, consider https://docs.ray.io/en/latest/tune/index.html . I think hydra actually might even integrate with ray.

deck4242 OP t1_je5zprk wrote on March 29, 2023 at 5:34 PM

Reply to comment by ortegaalfredo in [D] llama 7b vs 65b ? by deck4242

Any chance i could access to your discord to try out the 65b ?

AdamEgrate t1_je5yxd9 wrote on March 29, 2023 at 5:29 PM

Reply to comment by master-leaf in [D] The best way to train an LLM on company data by jaxolingo

Tapas is from 2020 which In this field makes it ancient.

alyflex OP t1_je5yia4 wrote on March 29, 2023 at 5:26 PM

Reply to comment by DigThatData in [D] Alternatives to fb Hydra? by alyflex

That is certainly an option that I was considering, but then I would have to make my own job planner / multirunner, (which I actually already have done for my current project, but this whole refactoring was to try and move away from my own custom functions and try to use some more standardized methods)

alyflex OP t1_je5y5gh wrote on March 29, 2023 at 5:24 PM

Reply to comment by RicketyCricket in [D] Alternatives to fb Hydra? by alyflex

> https://github.com/fidelity/spock

This looks quite promising, and I like the Post hooks you linked below, but I do not see any way of running a series of experiment in a non-combinatoric way? There is Optuna api (though I can't tell whether early pruning is supported in this?), but I don't see any way of grouping parameters for a set of experiments.

OscarYouDotCom t1_je5xu0s wrote on March 29, 2023 at 5:22 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Id look into langchain

patniemeyer t1_je5wxiv wrote on March 29, 2023 at 5:16 PM

Reply to comment by 3Street in [D] Simple Questions Thread by AutoModerator

The pricing page lists GPT-4. I think it was just added in the past day or two. (I have not confirmed that you can actually access it though)

EDIT: When I query the list of models through their API I still do not see GPT4, so maybe it's not actually available yet... or maybe I'm querying the wrong thing.

3Street t1_je5wk7v wrote on March 29, 2023 at 5:14 PM

Reply to comment by patniemeyer in [D] Simple Questions Thread by AutoModerator

Interesting, thank you! The link only seems to mention gpt 3, though? I wonder if / when they'll offer for gpt4

Recent comments in /f/MachineLearning