This commit is contained in:
Paul Gauthier 2024-12-21 14:11:21 -08:00
parent 2ea5a98ee0
commit ec44850646
4 changed files with 177 additions and 37 deletions

View file

@ -6,7 +6,7 @@ description: Quantitative benchmark of LLM code refactoring skill.
---
## Aider refactoring leaderboard
## Refactoring leaderboard
[Aider's refactoring benchmark](https://github.com/Aider-AI/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure [GPT-4 Turbo's "lazy coding" habit](/2023/12/21/unified-diffs.html).