Oswald_Hydrabot t1_jci6kf3 wrote on March 17, 2023 at 12:20 AM

Reply to [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

This is more interesting than GPT-4 to me, by a great deal. Thank you for sharing!

Optimization and ownership of your full product is important. This is how we combat being locked out of the gated community, providing tangible value through running code.

I am going to check it out this evening!

Oswald_Hydrabot t1_jci6a41 wrote on March 17, 2023 at 12:18 AM

Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

You don't need a nuclear bomb to hunt elk.

This is a solution you can fully own on top of that.

It has value.

programmerChilli t1_jci4fyx wrote on March 17, 2023 at 12:05 AM

Reply to comment by logophobia in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

I've actually had pretty good success on using torch.compile for some of the stuff that KeOps works well for!

[deleted] t1_jci47j9 wrote on March 17, 2023 at 12:03 AM

Reply to comment by DamienLasseur in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

[removed]

Batteredcode t1_jci3t9m wrote on March 17, 2023 at 12:00 AM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Great, thank you so much for a detailed answer. Do you have anything you could point me to (or explain further) about how I could modify a diffusion method to do this?
Also, in terms of the VAE, I was thinking I'd be able to feed 2 channels in and train it to output 3 channels, I believe the encoder wouldn't be useless in this case and hence my latent would be more than merely the missing channel? Feel free to correct me if I'm wrong! My assumption is that even with this a NN may well perform better, or at least a simpler baseline. That said, my images will be similar in certain ways, so being able to model a distribution of the latents could prove useful presumably?

nat_friedman OP t1_jci2wcd wrote on March 16, 2023 at 11:53 PM

Reply to comment by geminy123 in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

definitely not.

DreamMidnight t1_jchxtfy wrote on March 16, 2023 at 11:17 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Yes, although I am specifically looking into the reasoning of "at least 10 datapoints per variable."

What is the mathematical reasoning of this minimum?

nat_friedman OP t1_jchujo7 wrote on March 16, 2023 at 10:54 PM

Reply to comment by londons_explorer in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

That's what I think too, but obviously people are free to solve this any way they want!

ReginaldIII t1_jchu5xq wrote on March 16, 2023 at 10:52 PM

Reply to comment by Philpax in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

You're right. They're worse.

geminy123 t1_jchso71 wrote on March 16, 2023 at 10:42 PM

Reply to [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

You spent more money in the website than the project itself…

edjez t1_jchqnxm wrote on March 16, 2023 at 10:28 PM

Reply to comment by Hydreigon92 in In your experience, are AI Ethics teams valuable/effective? [D] by namey-name-name

Awesome!

edjez t1_jchqj0v wrote on March 16, 2023 at 10:27 PM

Reply to comment by Hydreigon92 in In your experience, are AI Ethics teams valuable/effective? [D] by namey-name-name

Agree 100% that it is important to have people embedded in product teams who have accountability for it.

Ai ethics teams are also useful because they understand and keep track of the metrics and the benchmarks and methods used to evaluate biases, risks and harm. This is a super specialized area of knowledge that the whole company and community can capitalize on. It is also hard to keep it up to date- needs close ties to civic society and academic institutions, etc. . Think of it as if you have to set up a “pipeline”, a supply chain of practices, that start with real world insight and academic research and ends with actionable and implementable methods and code and tools.

In very large orgs, having specialized teams helps scale up company wide processes for incident response, policy work, etc.

You can see some of the the output of this work at Microsoft if you search for Sarah Bird’s presentations.

(cheers from another ML person who also worked w reco)

josejo9423 t1_jchq421 wrote on March 16, 2023 at 10:24 PM

Reply to comment by BM-is-OP in [D] Simple Questions Thread by AutoModerator

I am not quite familiar with deep learning but don’t you have loss function where you can maximize recall precision or AUC? I believe accuracy would not apply in this case since you have imbalanced dataset, also over sampling as it dealed in random forest you are making up new images i don’t know how good is that, why don’t you try under sampling better or weight adjustments?

currentscurrents t1_jchn22q wrote on March 16, 2023 at 10:03 PM

Reply to comment by Dankmemexplorer in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

Computers have gotten literally 100 million times faster within my lifetime. I'm not even that old!

Dankmemexplorer t1_jchlw3t wrote on March 16, 2023 at 9:56 PM

Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

man its funny that 250M is a toy now

how far we've come...

LeN3rd t1_jchkhuv wrote on March 16, 2023 at 9:47 PM

Reply to comment by No_Complaint_1304 in [D] Simple Questions Thread by AutoModerator

What you can try is to start with linear or log Regression and try to learn on Wikipedia. That might be fun and give you decent results.

learn-deeply t1_jchhzqo wrote on March 16, 2023 at 9:30 PM

Reply to [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

The value that nanoGPT offers is that it is a self-contained (minimal dependencies), easy to understand code. This repo is essentially a wrapper for huggingface's models, dataset and accelerator, which is not very useful for didactic purposes.

LeN3rd t1_jchht71 wrote on March 16, 2023 at 9:29 PM

Reply to comment by ilrazziatore in [D] Simple Questions Thread by AutoModerator

Than I would just use a completely different test dataset. In a paper I would also expect this.

No_Complaint_1304 t1_jchh2u9 wrote on March 16, 2023 at 9:24 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Damn I hope no one got me wrong I wanted to learn the basics in a week (and finish my side project asap) I didn’t claim I could study such a large and complex field in a week.

sam__izdat t1_jchg8nd wrote on March 16, 2023 at 9:18 PM

Reply to comment by currentscurrents in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

I'll leave it to the linguists to debate UG and the specifics of what it does and doesn't mean, but commonalities like some sort of hierarchy, recursion, structure-dependence of rules, etc clearly exist, whatever you want to call them. By shared I just mean there's specific things that human cognitive faculties are set up to do and then other (often computationally simpler) things they clearly don't do. But again, if you're just saying natural languages are not formal languages, I guess that's true by definition. It just sounded to me like you were implying something different.

codename_failure t1_jchft1r wrote on March 16, 2023 at 9:16 PM

Reply to comment by nat_friedman in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

Thanks for funding this, it looks like a cool project.

LeDebardeur t1_jchfnr0 wrote on March 16, 2023 at 9:15 PM

Reply to [D] To those of you who quit machine learning, what do you do now? by nopainnogain5

1 - Data engineering and DevOps
2 - It's way less stressful than ML because you have really clear requirements ( I need to get data from a source in a certain target with those constraints ). This sometimes can be challenging due to business requirements (Time, consistency, and monitoring those pipelines) but I find it better than go into a project where I don't even know if it will be feasible or no.
3 - I was a good programmer before I got to ML, so for me it was like I switched back to what I used to do, so it was not a big deal. ( My curriculum was a lot of software engineering / managing networks and pure dev)

currentscurrents t1_jch9ulc wrote on March 16, 2023 at 8:37 PM

Reply to comment by sam__izdat in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

Oh, it is clearly structured. Words and phrases and sentences are all forms of structure and we're using them right now.

What it doesn't have is formal structure; it cannot be fully defined by any set of rules. This is why you can't build a rules-based parser that understands english and have to use an 800GB language model instead.

>shared across essentially every language and dialect

Noam Chomsky thinks this, but the idea of a universal grammar is controversial in modern linguistics.

logophobia t1_jch9ow5 wrote on March 16, 2023 at 8:36 PM

Reply to [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

Neat concept, compile, but still has some limitations for the models I used them on (complex-valued tensors, pykeops, CUDA kernels). Some pretty great advancements otherwise. Will probably help when training transformers.

Kronod1le t1_jch97js wrote on March 16, 2023 at 8:32 PM

Reply to comment by noxiousmomentum in [N] A $250k contest to read ancient Roman papyrus scrolls with ML by nat_friedman

https://en.m.wikipedia.org/wiki/Nat_Friedman

Recent comments in /f/MachineLearning