Congratulations on the experiment. Super interesting product, and seems like a lot of traction already with more than 700K performance predictions. How will the system itself evaluate the skills and evals submitted by the community and decide whether or not to include them in the benchmark?
Congratulations on the experiment. Super interesting product, and seems like a lot of traction already with more than 700K performance predictions. How will the system itself evaluate the skills and evals submitted by the community and decide whether or not to include them in the benchmark?
[dead]