Viewing a single comment thread. View all comments

UseNew5079 t1_je6lgyb wrote

Maybe 7b model can get GPT-4 level performance if trained for _very_ long. Facebook paper showed that performance increased until the end of training and it looks like there was no plateau. Maybe it's just very inefficient but possible? Or maybe there is another way.

2

Akimbo333 t1_je9proo wrote

Why does performance increase with training instead of parameters?

1