mirror of https://github.com/Aider-AI/aider.git synced 2025-06-01 02:05:00 +00:00

Paul Gauthier (aider) 3b10e3bcb5 style: Use min-height for leaderboard table data cells

2025-04-13 12:43:18 -07:00

12 KiB

Raw Blame History

highlight_image	nav_order	description	has_children
/assets/leaderboard.jpg	950	Quantitative benchmarks of LLM code editing skill.	true

Aider LLM Leaderboards

Aider excels with LLMs skilled at writing and editing code, and uses benchmarks to evaluate an LLM's ability to follow instructions and edit code successfully without human intervention. Aider's polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust.

Aider polyglot coding leaderboard

{% assign max_cost = 0 %} {% for row in site.data.polyglot_leaderboard %} {% if row.total_cost > max_cost %} {% assign max_cost = row.total_cost %} {% endif %} {% endfor %} {% if max_cost == 0 %}{% assign max_cost = 1 %}{% endif %} {% assign edit_sorted = site.data.polyglot_leaderboard | sort: 'pass_rate_2' | reverse %} {% for row in edit_sorted %} {% comment %} Add loop index for unique IDs {% endcomment %} {% assign row_index = forloop.index0 %} {% endfor %}

	Model	Percent correct	Cost (log scale)	Command	Correct edit format	Edit Format
	{{ row.model }}	{{ row.pass_rate_2 }}%	{% if row.total_cost > 0 %} {% endif %} {% assign rounded_cost = row.total_cost \| times: 1.0 \| round: 2 %} {% if row.total_cost == 0 or rounded_cost == 0.00 %}?{% else %}${{ rounded_cost }}{% endif %}	`{{ row.command }}`	{{ row.percent_cases_well_formed }}%	{{ row.edit_format }}
{% for pair in row %} {% if pair[1] != "" and pair[1] != nil %} {% if pair[0] == 'percent_cases_well_formed' %} Percent cases well formed {% else %} {{ pair[0] \| replace: '_', ' ' \| capitalize }} {% endif %} : {% if pair[0] == 'command' %}`{{ pair[1] }}`{% else %}{{ pair[1] }}{% endif %} {% endif %} {% endfor %}

By Paul Gauthier, last updated

12 KiB Raw Blame History Unescape Escape

Aider LLM Leaderboards

Aider polyglot coding leaderboard

12 KiB

Raw Blame History