aider/aider/website/docs/leaderboards/index.md
2025-04-13 08:35:37 -07:00

22 KiB

highlight_image nav_order description has_children
/assets/leaderboard.jpg 950 Quantitative benchmarks of LLM code editing skill. true

Aider LLM Leaderboards

Aider excels with LLMs skilled at editing code, not just writing it. These benchmarks evaluate an LLM's ability to follow instructions and edit code successfully without human intervention. Aider works best with high-scoring models, though it can connect to almost any LLM.

Polyglot leaderboard

Aider's polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust.

{% assign max_cost = 0 %} {% for row in site.data.polyglot_leaderboard %} {% if row.total_cost > max_cost %} {% assign max_cost = row.total_cost %} {% endif %} {% endfor %} {% if max_cost == 0 %}{% assign max_cost = 1 %}{% endif %} {% assign edit_sorted = site.data.polyglot_leaderboard | sort: 'pass_rate_2' | reverse %} {% for row in edit_sorted %} {% comment %} Add loop index for unique IDs {% endcomment %} {% assign row_index = forloop.index0 %} {% endfor %}
Model Percent correct Cost (log scale) Command
{{ row.model }}
{{ row.pass_rate_2 }}%
{% if row.total_cost > 0 %}
{% endif %} {% assign rounded_cost = row.total_cost | times: 1.0 | round: 2 %} {% if row.total_cost == 0 or rounded_cost == 0.00 %}?{% else %}${{ rounded_cost }}{% endif %}
{{ row.command }}
    {% for pair in row %} {% if pair[1] != "" and pair[1] != nil %}
  • {{ pair[0] | replace: '_', ' ' | capitalize }}: {% if pair[0] == 'command' %}{{ pair[1] }}{% else %}{{ pair[1] }}{% endif %}
  • {% endif %} {% endfor %}

By Paul Gauthier, last updated