Recent comments in /f/MachineLearning
Sure_Cicada_4459 t1_je5w1bh wrote
Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter
Spin off project based on reflection, apparently GPT-4 gets 20% improvement in coding tasks: https://github.com/GammaTauAI/reflexion-human-eval
People finetuning Llama using this prompt structure with much better results: https://twitter.com/Orwelian84/status/1639859947948363777?s=20
Someone already build an autonomous agent using feedback loops (not necessary related to reflexion): https://twitter.com/yoheinakajima/status/1640934493489070080
Seems to yield performance improvement up to a certain point obviously, but it's also a very basic prompt stucture overall one can image all kinds of "cognitive structures"
Zealousideal-Ice9957 t1_je5vo1c wrote
Reply to comment by EvilMegaDroid in [D] FOMO on the rapid pace of LLMs by 00001746
You better have a look at the OpenAssistant initiative made by Laion, their Human assisted data collection process is said to be of very good quality compared to the underpaid croworder-based one used by OpenAI
patniemeyer t1_je5v9m7 wrote
Reply to comment by 3Street in [D] Simple Questions Thread by AutoModerator
Yes, in fact OpenAI offers an API for this right now: https://platform.openai.com/docs/guides/fine-tuning
It *appears* from the terminology that they are using that they are actually performing training on top of their model with your data (which you supply in json). They talk about learning rate and epochs, etc. as params, however I have not seen a real doumentation of what they are doing.
ortegaalfredo t1_je5urre wrote
Reply to [D] llama 7b vs 65b ? by deck4242
I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.
Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.
SkinnyJoshPeck t1_je5ue3b wrote
I'm not 100% sure what your infrastructure or background is, but generally you can just transform data to whatever data format works best for the model.
So, you would build a pipeline that goes
Snowflake -> Some ETL process -> Transformed Data Storage -> Model Training -> Model Saving -> Model Loading for API to ask questions
where that Some ETL process is a process that transforms your data to whatever the model needs, and your model trains from that.
For example, on AWS you might have something like
Redshift/RDS/Whatever -> SageMaker -> Output Model to S3 -> API for your model or something idk
or if it's all going to be on-prem and you won't have Cloud tech, you'd do something like
Snowflake/Azure/Any Data Source -> Airflow for running training -> Model Upload to Some Folder -> API in a docker container in Kubernetes or something for users to hit
or they can just download the model locally and use some script to ask it questions, I'm not 100% sure it all depends on the model/language/etc that you use.
This is a fairly complicated task; if your company is getting serious about this, y'all should hire someone who is an ML engineer to do this task. :)
fmindme t1_je5ty5k wrote
Reply to [D] Alternatives to fb Hydra? by alyflex
I'm also using Omega conf. It's a great lib: full of feature, not opiniated, perfect for Mlops !
tripple13 t1_je5seed wrote
Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
james_mclellan t1_je5ru4r wrote
Reply to [D] Simple Questions Thread by AutoModerator
Two questions :
(1) Does anyone create missing data when constructing models? Examples - searchjng for stronger relationships between data set and first and second derivatives of time series data, compairsons to same day of week last N periods, same holiday last N periods; examining distance to an urban center for geodata
(2) Does anyone use a model that falls back on functions when a match is not 100%? For example, "apple" may mean fruit, music, machines, music companies or machine companies -- instead of a number 0 to 1 of the probable meaning, does anyone use models where the code "performs a test" to better disambiguate?
Alternative_Staff431 t1_je5r9j7 wrote
Reply to comment by ghostfaceschiller in [D] FOMO on the rapid pace of LLMs by 00001746
I thought so too but I actually genuinely appreciate what he says. His POV is valuable and his recent posts aren't really bad in recen times.
ntaylor- t1_je5qtl2 wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Nope. It's the same as all neural networks using transformer architecture. Just a big old series of matrix multiplications with some non linear transformations at end of the day
Dependent_Ad5120 t1_je5qfmp wrote
Reply to comment by oathbreakerkeeper in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
I don't have a github repo for this, but it is pretty simple:
```
model = nn.Transformer().cuda().half
input = torch.rand(..).cuda().half
with sdp_kernel(...enable only flash attn):
output = model(input)
```
These 4 lines should be enough.
Beautiful-Gur-9456 OP t1_je5p8bu wrote
Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
was that a thing? lmao 🤣
Technical-Vast1314 OP t1_je5oqm5 wrote
Reply to comment by Peantoo in [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314
OK, panoptic segmentation means doing two kinds of segmentation task together: semantic segmentation and instance segmentation. The semantic segmentation can only segment things like "sky", "car", "person", but it's hard to segment each instance. And instance segmentation is like object detection, which means it will predict a box with a mask on an instance~
mike94025 t1_je5ojaw wrote
Reply to comment by tripple13 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
It is. Follow the call tree into F.multi_head_attention_forward
mike94025 t1_je5oe88 wrote
Technical-Vast1314 OP t1_je5ob6n wrote
Reply to comment by xnalonali in [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314
We use MIT License in YOSO project~
mike94025 t1_je5nrdi wrote
Reply to comment by oathbreakerkeeper in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
You’re looking in the wrong place. What you’re looking at is the BT gen1 fastpath, not the BT gern 2 custom kernels.
You need to look at F.multi_head_attention_forward().
The fastpath still services inference until a full rewrite of activation.py for now that will hopefully be refactored in a future release. (There’s always a tension between refactoring and introducing new features under a tone and staffing constrained problem formulation.)
13ass13ass t1_je5nc8b wrote
You could look at the natural language -> sql query tools that are all the rage right now. I’d recommend checking out langchains sqlchainagent since it’s open source.
mike94025 t1_je5mfa8 wrote
Reply to comment by Competitive-Rub-1958 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
This doesn't force it. It says that flash is enabled, and stone others. To force it, you have to disable all other kernels. Then it’s flash or bust.
You can find more in our blog which got published today and the SDPA tutorial. Both are linked here https://www.linkedin.com/posts/michael-gschwind-3704222_pytorch-activity-7046773418288955393-gOSh
PS: the context manager can be used anywhere outside the call as well, including around the call to model.forward.
RedditLovingSun t1_je5m80p wrote
Reply to [D] llama 7b vs 65b ? by deck4242
I guess it would be interesting to see if the performance difference gets wider or narrower after self-instruct optimizations like alpaca
jaxolingo OP t1_je5m4b1 wrote
Reply to comment by ianitic in [D] The best way to train an LLM on company data by jaxolingo
>3
Oh that looks amazing thanks!
bttoddx t1_je5m3g6 wrote
Reply to comment by master-leaf in [D] The best way to train an LLM on company data by jaxolingo
Adding to this, it looks like a survey paper was released earlier this month that details a number of methods to do the task op is looking for. The bibliography is a great resource.
patniemeyer t1_je5wc4u wrote
Reply to [D] The best way to train an LLM on company data by jaxolingo
This may not be what you want, but I was not aware until recently that OpenAI offers an API to fine tune GPT-3/4 on your own data: https://platform.openai.com/docs/guides/fine-tuning
They charge your for training and for usage of your custom model, so it may or may not be economical for your use case.