CuriousCesarr OP t1_j54luzp wrote on January 20, 2023 at 10:34 AM

Reply to comment by andsmi97 in Looking for someone with good NN/ deep learning experience for a paid project by CuriousCesarr

I updated my post: a small team was excited to take on the task. Thank you for your implication though and have a great day! :)

CuriousCesarr OP t1_j54llrc wrote on January 20, 2023 at 10:31 AM

Reply to comment by Emergency-Fall232 in Looking for someone with good NN/ deep learning experience for a paid project by CuriousCesarr

In the end I found a small team that is interested in the task. :)

adubowski t1_j549298 wrote on January 20, 2023 at 7:41 AM

Reply to comment by LesleyFair in GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

Is your assumption that GPT-4 will stay the same size as GPT-3?

BrotherAmazing t1_j52hucj wrote on January 19, 2023 at 10:55 PM

Reply to A correct method beats pre-training any day by XecutionStyle

If OP asked this question in a court of law, the attorney would immediately yell “OBJECTION!” and the Judge would sustain, scold OP, but give them a chance to ask a question that doesn’t automatically pre-suppose and imply that pre-training cannot be “correct” or that there is always a “better” way than pre-training.

FWIW, I often avoid transfer learning or pre-training when it’s not needed, but I’m sure I could construct a problem that is not pathological and of practical importance where pre-training is “optimal” in some sense of that word.

Blacky372 t1_j521lej wrote on January 19, 2023 at 9:10 PM

Reply to GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

I like your article, thank you for sharing.

But writing "no spam, no nonsense" is a little weird to me if I get this when trying to subscribe.

Don't get me wrong, it's fine to monetize your content and to use your followers data to present them personalized ads. Acting like you're just enthusiastic about sharing info at the same time doesn't really fit.

LesleyFair OP t1_j51jnnr wrote on January 19, 2023 at 7:23 PM

Reply to comment by tehbuss_ in GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

Thanks! This means a lot!

tehbuss_ t1_j516emc wrote on January 19, 2023 at 6:03 PM

Reply to GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

This was a really good discussion!

JohnFatherJohn t1_j5161hj wrote on January 19, 2023 at 6:01 PM

Reply to comment by sEi_ in GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

People will be disappointed because they don't understand the relationship between model complexity and performance. There's so many irresponsible and/or uneducated articles suggesting that the orders of magnitude increase in the number of parameters will translate to orders of magnitude performance gains, which is obviously wrong.

sEi_ t1_j50efga wrote on January 19, 2023 at 3:09 PM

Reply to GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

About GPT-4 From the horses mouth:

Interview with Sam Altman (CEO OpenAI) from 2 days ago (17 jan).

Article in Verge:

>"OpenAI CEO Sam Altman on GPT-4: ‘people are begging to be disappointed and they will be’"

https://www.theverge.com/23560328/openai-gpt-4-rumor-release-date-sam-altman-interview

Video with the interview in 2 parts:

>StrictlyVC in conversation with Sam Altman

https://www.youtube.com/watch?v=57OU18cogJI&ab_channel=ConnieLoizos

LesleyFair OP t1_j501xt6 wrote on January 19, 2023 at 1:38 PM

Reply to comment by --dany-- in GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

First, thanks a lot for reading and thank you for the good questions:

A1) Current GPT-3 is 175B parameters. If GPT-4 would be 100T parameters, it would be a scale-up of roughly 500x.

A2) I got the calculation from the paper for the Turing NLG model. The total training time in seconds is reached by multiplying the number of tokens by the number of model parameters. That number is then divided by the number of GPUs times each GPU's FLOPs per second.

--dany-- t1_j4zx6lf wrote on January 19, 2023 at 12:57 PM

Reply to GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

Very good write up! Thanks for sharing your thoughts and observations. Some questions many other folks may have as well

how do you arrive at the number it’s 500x smaller or 200 million parameters?
Your estimate of 53 years for training a 100T model, can you elaborate how you got 53?

shironorey OP t1_j4zstg7 wrote on January 19, 2023 at 12:15 PM

Reply to comment by cma_4204 in Instance Segmentation on Android by shironorey

What a nice coincidence, I'm also looking on object detection for an alternative. Could found two answer then, nice.

cma_4204 t1_j4zrl0f wrote on January 19, 2023 at 12:02 PM

Reply to comment by shironorey in Instance Segmentation on Android by shironorey

That same page has an object detection demo, between the object detection and semantic segmentation examples there is probably enough info to figure out how to swap to instance seg

shironorey OP t1_j4zqttn wrote on January 19, 2023 at 11:54 AM

Reply to comment by SoopaFly_ in Instance Segmentation on Android by shironorey

I've done some project on plant leaf classification on android and surprisingly it works fine without straining even an old phone (using mobilenetv3) and I've also heard about computation problem before regarding deep learning especially on android quite frequent. But nice idea though could actually be an alternative for my problem. Will definetily look further on it.

shironorey OP t1_j4zq8bo wrote on January 19, 2023 at 11:47 AM

Reply to comment by cma_4204 in Instance Segmentation on Android by shironorey

I've seen alot of post that says the only model that support instance segmentation for android is Mask R-CNN, never check on semantic but maybe there's a point there to take. btw thanks, will check em

onkus t1_j4zmvdh wrote on January 19, 2023 at 11:07 AM

Reply to comment by XecutionStyle in A correct method beats pre-training any day by XecutionStyle

What do you mean by a better method?

onkus t1_j4zmty9 wrote on January 19, 2023 at 11:07 AM

Reply to A correct method beats pre-training any day by XecutionStyle

This is not a valid comparison.

andsmi97 t1_j4zaiyp wrote on January 19, 2023 at 8:18 AM

Reply to A correct method beats pre-training any day by XecutionStyle

Since you haven't said anything about data and problem the answer is no.

CatalyzeX_code_bot t1_j4z9jc1 wrote on January 19, 2023 at 8:05 AM

Reply to GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

Found relevant code at https://github.com/nvidia/megatron-lm + all code implementations here

--

To opt out from receiving code links, DM me

XecutionStyle OP t1_j4yyvig wrote on January 19, 2023 at 5:58 AM

Reply to comment by FastestLearner in A correct method beats pre-training any day by XecutionStyle

Yes. A "better" method makes less sense in context it seems.

FastestLearner t1_j4yw7kj wrote on January 19, 2023 at 5:32 AM

Reply to A correct method beats pre-training any day by XecutionStyle

What do you mean by “correct method”?

I_will_delete_myself t1_j4ylmkp wrote on January 19, 2023 at 3:58 AM

Reply to comment by tsgiannis in Why a pretrained model returns better accuracy than the implementation from scratch by tsgiannis

He just said why. It's because there isn't a diverse and large amount data you are training on. Imaginet was trained on many different kind of objects (over a million images) and while your toy dataset may probably only have 50-100k.

junetwentyfirst2020 t1_j4wrzut wrote on January 18, 2023 at 8:18 PM

Reply to Why a pretrained model returns better accuracy than the implementation from scratch by tsgiannis

The way I like to think about this is that the algorithm has to model many things. If you’re trying to learn whether the image contains a dog or not, first you have to model natural imagery, correlations between features, and maybe even a little 2D-to-3D to simplify invariances. I’m speaking hypothetically here, because the underlying model is quite latent and hard to inspect.

If you train from scratch you need to do all of these tasks on a dataset that is likely much smaller than is required to do all of them without overfitting. If you use a pretrained model, instead of learning all of those tasks, you instead have a model that only has to learn just one additional thing on the same amount of data.

tsgiannis OP t1_j4wk889 wrote on January 18, 2023 at 7:30 PM

Reply to comment by nibbajenkem in Why a pretrained model returns better accuracy than the implementation from scratch by tsgiannis

>Less data means more underspecification and thus the model more readily gets stuck in local minima

Probably this is the answer to the my "why".

ruphan t1_j4winzi wrote on January 18, 2023 at 7:20 PM

Reply to Why a pretrained model returns better accuracy than the implementation from scratch by tsgiannis

It is definitely possible. Let me give an analogy first. In the context of education, let's assume our pretrained model is a person with multiple STEM degrees in fields like neuroscience, math etc.. And let your model that's trained from scratch be someone with no degree yet. We have a limited amount of resources like a couple of textbooks on deep learning. It's intuitive that the first person should not only be able to pick up deep learning faster but also be better than the latter, given that they have a better understanding of the fundamentals and experience.

To extend this analogy to your case, I believe that the pretrained model must be quite big for the limited amount of new data that you have. The pretrained model would have developed a better set of filters that just couldn't be learned with a relatively small dataset for a big model trained form scratch. This is just like the analogy where it doesn't matter if neuroscience and math are not exactly deep learning, having the fundamentals strong by pretraining on millions of images makes that model achieve better accuracy.

Maybe if you have a bigger fine-tuning dataset, this gap in accuracy should diminish eventually.

Recent comments in /f/deeplearning