Submitted by bo_peng t3_1135aew in MachineLearning
afireohno t1_j8qg9eq wrote
Reply to comment by maizeq in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
There is some work on Frustratingly Short Attention Spans in Neural Language Modeling
Viewing a single comment thread. View all comments