patniemeyer t1_je5wc4u wrote on March 29, 2023 at 5:13 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

This may not be what you want, but I was not aware until recently that OpenAI offers an API to fine tune GPT-3/4 on your own data: https://platform.openai.com/docs/guides/fine-tuning

They charge your for training and for usage of your custom model, so it may or may not be economical for your use case.

Sure_Cicada_4459 t1_je5w1bh wrote on March 29, 2023 at 5:11 PM

Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter

Spin off project based on reflection, apparently GPT-4 gets 20% improvement in coding tasks: https://github.com/GammaTauAI/reflexion-human-eval

People finetuning Llama using this prompt structure with much better results: https://twitter.com/Orwelian84/status/1639859947948363777?s=20

Someone already build an autonomous agent using feedback loops (not necessary related to reflexion): https://twitter.com/yoheinakajima/status/1640934493489070080

Seems to yield performance improvement up to a certain point obviously, but it's also a very basic prompt stucture overall one can image all kinds of "cognitive structures"

Zealousideal-Ice9957 t1_je5vo1c wrote on March 29, 2023 at 5:08 PM

Reply to comment by EvilMegaDroid in [D] FOMO on the rapid pace of LLMs by 00001746

You better have a look at the OpenAssistant initiative made by Laion, their Human assisted data collection process is said to be of very good quality compared to the underpaid croworder-based one used by OpenAI

patniemeyer t1_je5v9m7 wrote on March 29, 2023 at 5:06 PM

Reply to comment by 3Street in [D] Simple Questions Thread by AutoModerator

Yes, in fact OpenAI offers an API for this right now: https://platform.openai.com/docs/guides/fine-tuning

It *appears* from the terminology that they are using that they are actually performing training on top of their model with your data (which you supply in json). They talk about learning rate and epochs, etc. as params, however I have not seen a real doumentation of what they are doing.

ortegaalfredo t1_je5urre wrote on March 29, 2023 at 5:03 PM

Reply to [D] llama 7b vs 65b ? by deck4242

I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.

Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.

SkinnyJoshPeck t1_je5ue3b wrote on March 29, 2023 at 5:00 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

I'm not 100% sure what your infrastructure or background is, but generally you can just transform data to whatever data format works best for the model.

So, you would build a pipeline that goes

 Snowflake -&gt; Some ETL process -&gt; Transformed Data Storage -&gt; Model Training -&gt; Model Saving -&gt; Model Loading for API to ask questions

where that Some ETL process is a process that transforms your data to whatever the model needs, and your model trains from that.

For example, on AWS you might have something like

Redshift/RDS/Whatever -&gt; SageMaker -&gt; Output Model to S3 -&gt; API for your model or something idk

or if it's all going to be on-prem and you won't have Cloud tech, you'd do something like

Snowflake/Azure/Any Data Source -&gt; Airflow for running training -&gt; Model Upload to Some Folder -&gt; API in a docker container in Kubernetes or something for users to hit

or they can just download the model locally and use some script to ask it questions, I'm not 100% sure it all depends on the model/language/etc that you use.

This is a fairly complicated task; if your company is getting serious about this, y'all should hire someone who is an ML engineer to do this task. :)

fmindme t1_je5ty5k wrote on March 29, 2023 at 4:58 PM

Reply to [D] Alternatives to fb Hydra? by alyflex

I'm also using Omega conf. It's a great lib: full of feature, not opiniated, perfect for Mlops !

tripple13 t1_je5seed wrote on March 29, 2023 at 4:48 PM

Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Is that right? I some how end up here when trying to assess what the F.multi_head_attention call does in the Class definition.

But I trust you're right, it would only make sense, I just couldn't identify the calls myself.

james_mclellan t1_je5ru4r wrote on March 29, 2023 at 4:44 PM

Reply to [D] Simple Questions Thread by AutoModerator

Two questions :

(1) Does anyone create missing data when constructing models? Examples - searchjng for stronger relationships between data set and first and second derivatives of time series data, compairsons to same day of week last N periods, same holiday last N periods; examining distance to an urban center for geodata

(2) Does anyone use a model that falls back on functions when a match is not 100%? For example, "apple" may mean fruit, music, machines, music companies or machine companies -- instead of a number 0 to 1 of the probable meaning, does anyone use models where the code "performs a test" to better disambiguate?

Alternative_Staff431 t1_je5r9j7 wrote on March 29, 2023 at 4:41 PM

Reply to comment by ghostfaceschiller in [D] FOMO on the rapid pace of LLMs by 00001746

I thought so too but I actually genuinely appreciate what he says. His POV is valuable and his recent posts aren't really bad in recen times.

ntaylor- t1_je5qtl2 wrote on March 29, 2023 at 4:38 PM

Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

Nope. It's the same as all neural networks using transformer architecture. Just a big old series of matrix multiplications with some non linear transformations at end of the day

Dependent_Ad5120 t1_je5qfmp wrote on March 29, 2023 at 4:35 PM

Reply to comment by oathbreakerkeeper in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

I don't have a github repo for this, but it is pretty simple:

```

model = nn.Transformer().cuda().half

input = torch.rand(..).cuda().half

with sdp_kernel(...enable only flash attn):

output = model(input)

```

These 4 lines should be enough.

Beautiful-Gur-9456 OP t1_je5p8bu wrote on March 29, 2023 at 4:28 PM

Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

was that a thing? lmao 🤣

Technical-Vast1314 OP t1_je5oqm5 wrote on March 29, 2023 at 4:25 PM

Reply to comment by Peantoo in [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314

OK, panoptic segmentation means doing two kinds of segmentation task together: semantic segmentation and instance segmentation. The semantic segmentation can only segment things like "sky", "car", "person", but it's hard to segment each instance. And instance segmentation is like object detection, which means it will predict a box with a mask on an instance~

mike94025 t1_je5ojaw wrote on March 29, 2023 at 4:23 PM

Reply to comment by tripple13 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

It is. Follow the call tree into F.multi_head_attention_forward

mike94025 t1_je5oe88 wrote on March 29, 2023 at 4:22 PM

Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

https://www.linkedin.com/posts/michael-gschwind-3704222_pytorch-activity-7046773418288955393-gOSh

Technical-Vast1314 OP t1_je5ob6n wrote on March 29, 2023 at 4:22 PM

Reply to comment by xnalonali in [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314

We use MIT License in YOSO project~

mike94025 t1_je5o126 wrote on March 29, 2023 at 4:20 PM

Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

https://www.linkedin.com/posts/michael-gschwind-3704222_pytorch-activity-7046773418288955393-gOSh

mike94025 t1_je5nrdi wrote on March 29, 2023 at 4:18 PM

Reply to comment by oathbreakerkeeper in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

You’re looking in the wrong place. What you’re looking at is the BT gen1 fastpath, not the BT gern 2 custom kernels.

You need to look at F.multi_head_attention_forward().

The fastpath still services inference until a full rewrite of activation.py for now that will hopefully be refactored in a future release. (There’s always a tension between refactoring and introducing new features under a tone and staffing constrained problem formulation.)

13ass13ass t1_je5nc8b wrote on March 29, 2023 at 4:16 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

You could look at the natural language -> sql query tools that are all the rage right now. I’d recommend checking out langchains sqlchainagent since it’s open source.

mike94025 t1_je5mfa8 wrote on March 29, 2023 at 4:10 PM

Reply to comment by Competitive-Rub-1958 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

This doesn't force it. It says that flash is enabled, and stone others. To force it, you have to disable all other kernels. Then it’s flash or bust.

You can find more in our blog which got published today and the SDPA tutorial. Both are linked here https://www.linkedin.com/posts/michael-gschwind-3704222_pytorch-activity-7046773418288955393-gOSh

PS: the context manager can be used anywhere outside the call as well, including around the call to model.forward.

RedditLovingSun t1_je5m80p wrote on March 29, 2023 at 4:09 PM

Reply to [D] llama 7b vs 65b ? by deck4242

Significantly better

I guess it would be interesting to see if the performance difference gets wider or narrower after self-instruct optimizations like alpaca

jaxolingo OP t1_je5m4b1 wrote on March 29, 2023 at 4:08 PM

Reply to comment by ianitic in [D] The best way to train an LLM on company data by jaxolingo

>3

Oh that looks amazing thanks!

bttoddx t1_je5m3g6 wrote on March 29, 2023 at 4:08 PM

Reply to comment by master-leaf in [D] The best way to train an LLM on company data by jaxolingo

Adding to this, it looks like a survey paper was released earlier this month that details a number of methods to do the task op is looking for. The bibliography is a great resource.

RedditPolluter OP t1_je5ljvu wrote on March 29, 2023 at 4:04 PM

Reply to comment by Extreme_Photo in [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter

https://www.reddit.com/r/OpenAI/comments/124waw6/this_ai_paper_demonstrates_how_you_can_improve/je30t5g/?context=3

Recent comments in /f/MachineLearning