mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-01 02:05:00 +00:00
2.4 KiB
2.4 KiB
parent | highlight_image | nav_order | description |
---|---|---|---|
Aider LLM Leaderboards | /assets/leaderboard.jpg | 100 | Quantitative benchmark of LLM code refactoring skill. |
Refactoring leaderboard
Aider's refactoring benchmark asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure GPT-4 Turbo's "lazy coding" habit.
The refactoring benchmark requires a large context window to work with large source files. Therefore, results are available for fewer models.
{% assign refac_sorted = site.data.refactor_leaderboard | sort: 'pass_rate_1' | reverse %} {% for row in refac_sorted %} {% endfor %}Model | Percent completed correctly | Percent using correct edit format | Command | Edit format |
---|---|---|---|---|
{{ row.model }} | {{ row.pass_rate_1 }}% | {{ row.percent_cases_well_formed }}% | {{ row.command }} |
{{ row.edit_format }} |