OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers

77 points | by kristianpaul 16 days ago

39 comments