DeepSWE: A contamination-free benchmark for long-horizon coding agents

38 points | by ammar_x 9 hours ago

11 comments