aider/aider/website/docs/leaderboards/refactor.md at ec4485064618f7f3e6fb96507db6144d9f339e25

mirror of https://github.com/Aider-AI/aider.git synced 2025-06-01 02:05:00 +00:00

Paul Gauthier ec44850646 copy

2024-12-21 14:11:21 -08:00

2.4 KiB

Raw Blame History

parent	highlight_image	nav_order	description
Aider LLM Leaderboards	/assets/leaderboard.jpg	100	Quantitative benchmark of LLM code refactoring skill.

Refactoring leaderboard

Aider's refactoring benchmark asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure GPT-4 Turbo's "lazy coding" habit.

The refactoring benchmark requires a large context window to work with large source files. Therefore, results are available for fewer models.

{% assign refac_sorted = site.data.refactor_leaderboard | sort: 'pass_rate_1' | reverse %} {% for row in refac_sorted %} {% endfor %}

Model	Percent completed correctly	Percent using correct edit format	Command	Edit format
{{ row.model }}	{{ row.pass_rate_1 }}%	{{ row.percent_cases_well_formed }}%	`{{ row.command }}`	{{ row.edit_format }}

2.4 KiB Raw Blame History

Refactoring leaderboard

2.4 KiB

Raw Blame History