aider/aider/website/docs/leaderboards/index.md

11 KiB
Raw Blame History

highlight_image nav_order description has_children
/assets/leaderboard.jpg 950 Quantitative benchmarks of LLM code editing skill. true

Aider LLM Leaderboards

Aider excels with LLMs skilled at writing and editing code, and uses benchmarks to evaluate an LLM's ability to follow instructions and edit code successfully without human intervention. Aider's polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust.

Aider polyglot coding leaderboard

{% assign max_cost = 0 %} {% for row in site.data.polyglot_leaderboard %} {% if row.total_cost > max_cost %} {% assign max_cost = row.total_cost %} {% endif %} {% endfor %} {% if max_cost == 0 %}{% assign max_cost = 1 %}{% endif %} {% assign edit_sorted = site.data.polyglot_leaderboard | sort: 'pass_rate_2' | reverse %} {% for row in edit_sorted %} {% comment %} Add loop index for unique IDs {% endcomment %} {% assign row_index = forloop.index0 %} {% endfor %}
Model Percent correct Cost (log scale) Command % Conform Edit Format
{{ row.model }}
{{ row.pass_rate_2 }}%
{% if row.total_cost > 0 %}
{% endif %} {% assign rounded_cost = row.total_cost | times: 1.0 | round: 2 %} {% if row.total_cost == 0 or rounded_cost == 0.00 %}?{% else %}${{ rounded_cost }}{% endif %}
{{ row.command }} {{ row.percent_cases_well_formed }}% {{ row.edit_format }}
    {% for pair in row %} {% if pair[1] != "" and pair[1] != nil %}
  • {% if pair[0] == 'percent_cases_well_formed' %} Percent cases well formed {% else %} {{ pair[0] | replace: '_', ' ' | capitalize }} {% endif %} : {% if pair[0] == 'command' %}{{ pair[1] }}{% else %}{{ pair[1] }}{% endif %}
  • {% endif %} {% endfor %}

By Paul Gauthier, last updated