impossiblefork t1_jch7ker wrote on March 16, 2023 at 8:22 PM

Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

Still, probably useful for research-- validating alternatives to transformers, etc.

Philpax t1_jch72b8 wrote on March 16, 2023 at 8:19 PM

Reply to comment by ReginaldIII in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

I invite you to compare the GPT summary and dotpoints in the article and to tell me they are the same

sam__izdat t1_jch4kn0 wrote on March 16, 2023 at 8:02 PM

Reply to comment by currentscurrents in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

It is a "structured thing" because it has concrete definable grammatical rules, shared across essentially every language and dialect, and common features, like an infinite range of expression and recursion. If language didn't have syntactic structure we'd just be yelling signals at each other, instead of doing what we're doing now. There would be nothing for GPT to capture.

[deleted] t1_jch4hd3 wrote on March 16, 2023 at 8:02 PM

Reply to [D] Simple Questions Thread by AutoModerator

[deleted]

ilrazziatore t1_jch3vpu wrote on March 16, 2023 at 7:58 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Uhm..... the bnn are built assuming distribution both on th parameters( ie the value assumed by the neurons weights) and on the data (the last layer has 2 outputs : the predicted mean and the predicted variance. Those 2 values are then used to model the loss function which is the likelihood and is a product of gaussians. I think its both model and data uncertainty.

Let's say I compare the variances and the mean values predicted.

Do I have to set the same calibration and test dataset apart for both models or use the entire dataset? The mcmc model can use the entire dataset without the risk of overfitting but for the bnn it will be like cheating

currentscurrents t1_jch3nic wrote on March 16, 2023 at 7:56 PM

Reply to comment by sam__izdat in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

So why do you think it is a structured formal thing?

BrotherAmazing t1_jch3dkl wrote on March 16, 2023 at 7:55 PM

Reply to comment by justprotein in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

I agree with your sentiment and have no problem with that.

There just seem to be more than one or two people here with the idea that Corporate entities have generally been publishing a higher % of their R&D than they actually ever did though. Some people (not saying you personally) seem to go farther and believe it is their duty to publish important IP and research.

I like them publishing and think it’s great, but just believe they never have a “duty” to do so if they don’t want to and have seen companies that “publish” behind the scenes hold a lot back too.

BrotherAmazing t1_jch23xt wrote on March 16, 2023 at 7:47 PM

Reply to comment by Jadien in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

In the real-world cases I have been involved in, granted it was only four cases, things did not at all play out that way. Once it went to court but the defendant settled on terms favorable to the plaintiff, once the defendant complies with the cease and desist prior to the lawsuit being initiated, and the other two times actually went to trial and weren’t settled (which they told me was rare) with the plaintiffs winning once and the defendants winning once.

What you say really is not true because once you win or lose in court, it cannot be tried again and it’s a settled matter, and that process indeed does legally settle whether there is infringement or not. No one sits around after the verdict is read and scratches their head, wondering whether they are infringing or not.

BrotherAmazing t1_jch1gll wrote on March 16, 2023 at 7:42 PM

Reply to comment by professorlust in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

I never said they don’t publish, re-read.

I can tell you firsthand what they publish has to get approval, and a lot of things do jot get approval to publish and are held as trade secrets. It boggles my mind this sub clearly has so many people who have never worked on the Corporate side of this industry and have these strong ideas that the Corporate side is or has ever been fully transparent and allows employees to publish anything and everything. The is so far from the truth it’s not funny.

For every model and paper published, there exists another model and many other papers that are not approved to be published and many exist in a different format as internal publications only. Other internal publications get watered down and a lot of extra work is omitted in order to get approval to publish. or they publish “generation 3” to the world while they’re working on “generation 5” internally.

sam__izdat t1_jch1c32 wrote on March 16, 2023 at 7:42 PM

Reply to comment by currentscurrents in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

I'm familiar with the terms, but saying e.g. "imaginary numbers don't exist because they're called imaginary" is not making a meaningful statement. All you've said is that German is not C++, and we have a funny name for that. And that's definitely one of the fuzzier interactions you can have about this, but I'm not sure how it proves that natural languages (apparently? if I'm reading this right...) lack structure.

Petersnajper t1_jch0sd5 wrote on March 16, 2023 at 7:38 PM

Reply to [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Can i get an invite?

LeN3rd t1_jcgzk3c wrote on March 16, 2023 at 7:30 PM

Reply to comment by ilrazziatore in [D] Simple Questions Thread by AutoModerator

If it is model uncertainty, the bnn should only assume distributions only for the model parameters, no? If you make the samples a distribution, you assume data uncertainty. Also I do not know exactly what you other model gives you, but as long as you get variances, I would just compare those at first. If the models give vastly different means, you should take that into account. There is probably some nice way to add this ensemble uncertainty with the uncertainty of the models. Also this strongly means that one model is biased and does jot give you a correct estimate of the model uncertainty.

ilrazziatore t1_jcgy9ya wrote on March 16, 2023 at 7:22 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Model uncertainty. One model is a calibrated bnn ( i splitted the dataset in a training, a calibration and a test set), the other model is a mathematical model developed considering some physical relation. For computational reasons the bnn assume iid samples normally distributed around their true values and maximize the likelihood (modeled as a product of normal distribution), the mathematical model instead rely on 4 coefficients and is fitted using Monte Carlo with a multivariate likelihood with the full covariance matrix. I wanted to compare the quality of the model uncertainty estimates but I don't know if I should do it on the test dataset for both. Afterall, models calibrated with mcmc methods do not overfit so why split the dataset?

bartturner t1_jcgy1re wrote on March 16, 2023 at 7:21 PM

Reply to comment by minhrongcon2000 in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

That is NOT how patent law works. Maybe you are confusing with trademark?

wirefire07 t1_jcgx51q wrote on March 16, 2023 at 7:15 PM

Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Already heared about this project? https://github.com/ggerganov/llama.cpp -> It's very fast!!

bartturner t1_jcgvl8h wrote on March 16, 2023 at 7:05 PM

Reply to comment by existential_one in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

> I agree, but what I'm saying is that Deepmind is gonna stop publishing their good stuff. And it's not because of OpenAI.

I do not believe that will happen. But the behavior of OpenAI does not help.

But Google has been more of a leader than a follower so hopefully the crappy behavior by OpenAI does not change anything.

I think the sharing of the research papers was done for a variety of reasons.

First, I fully agree to keep and retain talent. Which Google understood before others that was going to be critical. Why they were able to get DeepMind for $500 million and that would by easily 20x that today.

But the other reason is data. Nobody has more data than Google and also access to more data.

Google has the most popular web site in history and then the second most popular in addition. Then they also have the most popular operating system in history.

So if everyone had access to the same models it still keeps Google in a better position.

But the other reason is Google touches more people than any other company by a wide margin. Google now has 10 different services with over a billion daily active users.

Then the last reason is their hope that someone would not get something they can not get. I believe Google's goal from day 1 has always been AGI. That is what search has been about since pretty much day 1.

They worry that someone will figure it out in some basement somewhere. Very unlikely. But possible. If they can help drive a culture of sharing then it is far less likely to happen.

existential_one t1_jcgur9j wrote on March 16, 2023 at 7:00 PM

Reply to comment by bartturner in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

I agree, but what I'm saying is that Deepmind is gonna stop publishing their good stuff. And it's not because of OpenAI.

IMO ml research papers weren't profitable before, and companies benefited for the collective effort, plus to retain talent. But now we're seeing ML models having huge impact on companies and single incremental papers can actually improve the bottom line, so all companies are gonna start closing their doors

LeN3rd t1_jcguoiy wrote on March 16, 2023 at 6:59 PM

Reply to comment by No_Complaint_1304 in [D] Simple Questions Thread by AutoModerator

I didn't mean to discourage you. Its a fascinating field, but it is its own field of research for a reason. Start with BERT and see where that gets you.

These ones are also a nice small watch:

https://www.youtube.com/watch?v=gQddtTdmG_8

https://www.youtube.com/watch?v=rURRYI66E54

bartturner t1_jcgu47z wrote on March 16, 2023 at 6:56 PM

Reply to comment by existential_one in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

Love how much DeepMind shares with the papers. Same with Google Brain.

To me the issue is OpenAI. What makes it worse is they use breakthroughs from DeepMind, Google Brain and others and then do not share.

We call them filtches

LeN3rd t1_jcgu1z5 wrote on March 16, 2023 at 6:56 PM

Reply to comment by Batteredcode in [D] Simple Questions Thread by AutoModerator

This is possible in multiple ways. Old methods for this would be to view this as an inverse problem and apply some optimization method to it, like ADMM or FISTA.

If lots of data is missing (in your case the complete R&G channels) you should use a neural network for this. You are on the right track, though it could get hairy. If you have a prior (You have a dataset and you want it to work on similar images), a (cycle) GAN, or a retrained Stable diffusion model could work.

I am unsure about VAEs for your problem, since you usually train them by having the same input and output. You shouldn't enforce the latent to be only the blue channel, since the the encoder is useless. Training only the decoder site is essentially what GANs and diffusion networks do so i would start there.

existential_one t1_jcgtu4h wrote on March 16, 2023 at 6:54 PM

Reply to comment by bartturner in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

I think the culture of publishing has been dying and people will think OpenAI was the one to trigger it, but in reality other companies already started restricting publications. Deepmind being the biggest one.

currentscurrents t1_jcgtqwz wrote on March 16, 2023 at 6:54 PM

Reply to comment by impossiblefork in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

...for a toy-sized 250M parameter language model, yes.

fromnighttilldawn t1_jcgsst8 wrote on March 16, 2023 at 6:48 PM

Reply to In your experience, are AI Ethics teams valuable/effective? [D] by namey-name-name

Absolutely not. These ethicists can find "bias" all day and everyday, but become practically mute when it comes to condemning how their companies are in bed with capitalism and military-industrial complex that are far more dire to the fate of humanity.

LeN3rd t1_jcgsjxq wrote on March 16, 2023 at 6:46 PM

Reply to comment by ilrazziatore in [D] Simple Questions Thread by AutoModerator

define probabilistic. Is it model uncertainty, or data uncertainty? Either way you should get a standard deviation from your model (either as an output parameter, or implicitly by ensembles), that you can compare.

No_Complaint_1304 t1_jcgshk7 wrote on March 16, 2023 at 6:46 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Well I did expect this but still month’s! I’ll look into everything you mentioned. And I’ll drop the project for now, if I can’t finish it by studying heavily, I might as well learn slowly but surely, absorb all the information and then go back to make a project that involve predictions and analyzing data. ty4ur help

Recent comments in /f/MachineLearning