Submitted by bradenjh t3_z26fui in MachineLearning
ayse_ww t1_ixgbva3 wrote
Reply to comment by bradenjh in [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh
This is quite interesting. Is such self-training scheme similar to recurrent network?
Viewing a single comment thread. View all comments