Recent comments in /f/MachineLearning
RicketyCricket t1_je5ky7l wrote
Reply to comment by DigThatData in [D] Alternatives to fb Hydra? by alyflex
As the developer of Spock (posted in another comment) -- OmegaConf is also an awesome choice and super useful. I'd suggest checking it out too!
You can go even closer to metal and use the attrs library as well (https://www.attrs.org/en/stable/)
Extreme_Photo t1_je5kuh4 wrote
Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter
Can you give an example of a use of reflection showing the prompt and the response?
RicketyCricket t1_je5kgy4 wrote
Reply to comment by RicketyCricket in [D] Alternatives to fb Hydra? by alyflex
second favorite:
https://fidelity.github.io/spock/advanced_features/Post-Hooks
Basically lets you do any validation necessary on your configs. Spock provides some basics (greater than, within bounds, etc) but it's totally up to the user via any simple asserts or validation functions a user wants to write.
DigThatData t1_je5kfoc wrote
Reply to [D] Alternatives to fb Hydra? by alyflex
go closer to the metal and use omegaconf directly.
RicketyCricket t1_je5jchr wrote
Reply to comment by RicketyCricket in [D] Alternatives to fb Hydra? by alyflex
This being my favorite hidden one:
RicketyCricket t1_je5j2n9 wrote
Reply to comment by _Arsenie_Boca_ in [D] Alternatives to fb Hydra? by alyflex
Most of the cool stuff is buried in the docs under advanced features :-)
https://fidelity.github.io/spock/advanced_features/Composition
(full transparency I'm the author/maintainer/core-developer. I know the docs need a re-org to surface more of the useful features)
ianitic t1_je5j1jo wrote
You might like something like this as you use azure: https://azure.microsoft.com/en-us/products/bot-services/
mr_house7 t1_je5iuk0 wrote
Reply to comment by obolli in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Microsoft is the one in charge now.
Peantoo t1_je5iks3 wrote
Reply to [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314
I understand semantic segmentation but it's been a while since I've tinkered in the computer vision space. Can you explain the push for panoptic segmentation and why you think it's a valuable technology? What can it accomplish that really good semantic segmentation can't?
master-leaf t1_je5hhu6 wrote
Reply to comment by jaxolingo in [D] The best way to train an LLM on company data by jaxolingo
I would check the paper, but I think they fine tune a pre trained local LM. They also created their own encodings to account for the structure of tabular data, such as the column headers, entity rows etc.
I will note though, from what I remember the table sizes were pretty small.
xnalonali t1_je5goxk wrote
Reply to [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314
What license will you use for this implementation?
[deleted] t1_je5g7vb wrote
[deleted] t1_je5fdej wrote
[removed]
thedamian t1_je5eweg wrote
Reply to comment by thomasahle in [D] Simple Questions Thread by AutoModerator
Before answering the question, I would submit that you should be thinking of keeping your models behind an api. No need to have it sitting on the client side (which is why it feels you're asking the quesiton)
And behind an API it can be as big as you'd like or can afford on your server)
jaxolingo OP t1_je5eunu wrote
Reply to comment by master-leaf in [D] The best way to train an LLM on company data by jaxolingo
From Hugging Face?
jaxolingo OP t1_je5eovq wrote
Reply to comment by TheDeviousPanda in [D] The best way to train an LLM on company data by jaxolingo
The end goal would be to add it into the products current chat in the web app, so I can't be doing that :)
master-leaf t1_je5dtrm wrote
There was a paper I read a few months ago (I think it was called tapas). In this paper they show how to ingest tabular data to a transformer model.
TheDeviousPanda t1_je5ddm3 wrote
It’s going to be a lot easier to just take something like GPT-4 and feed in your data directly and ask questions.
_Arsenie_Boca_ t1_je5d04j wrote
Reply to comment by RicketyCricket in [D] Alternatives to fb Hydra? by alyflex
Looks interesting, a bit more lightweight than hydra. But also misses a lot of cool features like composing multiple yaml configs
OldManSaluki t1_je5cmjw wrote
Reply to comment by keepthepace in [D] Do model weights have the same license as the modem architecture? by murphwalker
I'm leaning in this direction myself.
IANAL, but I think about the Feist Publications ruling which dealt with raw listings of facts (white pages names, addresses, phone numbers organized in the most functional format - alphabetic.) SCOTUS ruled that the raw data was not copyrightable even though it took a lot of effort to collect and compile it. It seems to be that the raw data here are the weights which would make them not copyrightable. The structural design of the model might be, and more than likely the compiled model with weights would be copyrightable.
I suspect this will work its way through the courts just in time to be rendered moot.
geekfolk t1_je59x39 wrote
Reply to comment by Beautiful-Gur-9456 in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
>I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator
that would be very similar to this: https://openreview.net/forum?id=HZf7UbpWHuA
WarmSignificance1 t1_je58y3c wrote
Reply to comment by _sbmaruf in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Looks interesting. Have you tried any of the GPT models against this benchmark?
AmbitiousTour t1_je582qj wrote
Reply to comment by [deleted] in [D] I've got a Job offer but I'm scared by [deleted]
I don't know.
Firm-Act-3860 t1_je581li wrote
Reply to comment by OkWrongdoer4091 in [D] ICML 2023 Reviewer-Author Discussion by zy415
One of my reviewers increased their score by 1 just today and updated their review by editing it. So don't lose hope yet :)
impossiblefork t1_je5l8k9 wrote
Reply to [D] Do model weights have the same license as the modem architecture? by murphwalker
How would either an architecture or a model be copyrightable?
Architectures are algorithms. If they aren't patentable and are in addition to that patented, they have no protection.
Model weights are a result of a mechanical procedure that fits a model to data, minimising some kind of error. That is not a work of human authorship.
Things that could be copyrightable are an article describing a model architecture, or a specific software implementation of a model.
As an argument why model weights are unlikely to be copyrightable consider the following parallel: we know that model output, for example, a story generated by ChatGTP based on a prompt is certainly not copyrightable, since it's not a work of human authorship, but then, how is the model? We can view the selection of training examples as something similar to a prompt and the training process as similar to the inference. I think giving copyright protection to model weights might be reasonable though, but I think it's unlikely that they have copyright protection.