Benchmarking 7 coding agents on a real refactor
Same 12k-line TypeScript codebase, same task: extract a domain layer. I ran every agent twice and graded the diffs.
APR 22, 2026 ·
Same 12k-line TypeScript codebase, same task: extract a domain layer. I ran every agent twice and graded the diffs.
"Worth reading the verified subset methodology before quoting any number from the headline board."
via MarkTechPost