Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems

19 points | by PranoyP 8 hours ago

13 comments

jlukecarlson 2 hours ago
I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
mlop99 7 hours ago
Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
5 hours ago
[deleted]
5 hours ago
[deleted]
shailendra145 7 hours ago
A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
papz2k 7 hours ago
Very interesting work.
raj_maddipati 4 hours ago
Excellent work
harshv_03 5 hours ago
Interesting
ankush9812 7 hours ago
Nice Work
ashyash518 7 hours ago
Nice work
saurabh_xen 7 hours ago
Great work
quanta9 7 hours ago
interesting
cs_exps 5 hours ago
[dead]