Predicting the Order of Upcoming Tokens Improves Language Modeling

5 points | by wavelander 14 hours ago

1 comments