mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-01 18:25:00 +00:00
copy
This commit is contained in:
parent
3b08327792
commit
c1dc473ed8
2 changed files with 43 additions and 28 deletions
|
@ -831,7 +831,7 @@
|
||||||
date: 2025-04-12
|
date: 2025-04-12
|
||||||
versions: 0.81.3.dev
|
versions: 0.81.3.dev
|
||||||
seconds_per_case: 45.3
|
seconds_per_case: 45.3
|
||||||
total_cost: 6.3174
|
total_cost: 0 # incorrect: 6.3174
|
||||||
|
|
||||||
- dirname: 2025-03-29-05-24-55--chatgpt4o-mar28-diff
|
- dirname: 2025-03-29-05-24-55--chatgpt4o-mar28-diff
|
||||||
test_cases: 225
|
test_cases: 225
|
||||||
|
|
|
@ -1,66 +1,81 @@
|
||||||
---
|
---
|
||||||
title: Gemini 2.5 Pro Preview 0325 benchmark pricing
|
title: Gemini 2.5 Pro Preview 03-25 benchmark cost
|
||||||
excerpt: The low price reported for Gemini 2.5 Pro Preview 0325 appears to be correct.
|
excerpt: The $6.32 benchmark cost reported for Gemini 2.5 Pro Preview 03-25 was incorrect.
|
||||||
draft: false
|
draft: true
|
||||||
nav_exclude: true
|
nav_exclude: true
|
||||||
---
|
---
|
||||||
{% if page.date %}
|
{% if page.date %}
|
||||||
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
|
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
# Gemini 2.5 Pro Preview 0325 benchmark pricing
|
# Gemini 2.5 Pro Preview 03-25 benchmark pricing
|
||||||
|
|
||||||
The $6.32 cost reported in the leaderboard to run the aider polyglot benchmark on
|
The $6.32 cost reported to run the aider polyglot benchmark on
|
||||||
Gemini 2.5 Pro Preview 0325 was incorrect.
|
Gemini 2.5 Pro Preview 03-25 was incorrect.
|
||||||
The true cost was higher, possibly significantly so.
|
The true cost was higher, possibly significantly so.
|
||||||
|
This note shares the results of an audit and root cause analysis
|
||||||
|
relating to this error.
|
||||||
|
|
||||||
This note reviews and audits the original 0325 benchmark results to investigate the reported cost.
|
|
||||||
Two possible causes were identified, both related to the litellm package that
|
Two possible causes were identified, both related to the litellm package that
|
||||||
aider uses to connect to LLM APIs.
|
aider uses to connect to LLM APIs:
|
||||||
|
|
||||||
- The litellm model database had an incorrect price-per-token for output tokens in their database at the time of the benchmark. This does not appear to be a contributing factor to the incorrect benchmark cost.
|
- The litellm model database had an incorrect price-per-token for Gemini 2.5 Pro Preview 03-25 in their costs database.
|
||||||
- The litellm package was incorrectly excluding reasoning tokens from the token counts it reported back to aider. This appears to be the cause of the incorrect benchmark cost.
|
This does not appear to be a contributing factor to the incorrect benchmark cost.
|
||||||
|
- The litellm package was incorrectly excluding reasoning tokens from the token counts it reported to aider. This appears to be the cause of the incorrect benchmark cost.
|
||||||
|
|
||||||
The incorrect litellm database entry does not appear to have affected the aider benchmark costs.
|
The incorrect litellm database entry does not appear to have affected the aider benchmark costs.
|
||||||
Aider maintains and uses its own database of costs for some models, and it contained
|
Aider maintains and uses its own database of costs for some models, and it contained
|
||||||
the correct pricing at the time of the benchmark.
|
the correct pricing at the time of the benchmark.
|
||||||
Aider appears to have
|
Aider appears to have
|
||||||
loaded the correct cost data from its database and made use of it during the benchmark.
|
loaded the correct cost data from its database and made use of it during the benchmark.
|
||||||
Since litellm appears to have been excluding reasoning tokens from the token counts it reported,
|
|
||||||
aider underestimated the API costs.
|
|
||||||
|
|
||||||
Litellm fixed this issue on April 21, 2025 in
|
The version of litellm available at that time appears to have been
|
||||||
|
excluding reasoning tokens from the token counts it reported.
|
||||||
|
So even though aider had correct per-token pricing, it did not have the correct token counts
|
||||||
|
used during the benchmark.
|
||||||
|
This resulted in an underestimate of the benchmark costs.
|
||||||
|
|
||||||
|
Litellm fixed the token counting issue on April 21, 2025 in
|
||||||
commit [a7db0df](https://github.com/BerriAI/litellm/commit/a7db0df0434bfbac2b68ebe1c343b77955becb4b).
|
commit [a7db0df](https://github.com/BerriAI/litellm/commit/a7db0df0434bfbac2b68ebe1c343b77955becb4b).
|
||||||
This fix was released in litellm v1.67.1.
|
This fix was released in litellm v1.67.1.
|
||||||
Aider picked up this fix April 28, 2025 when it upgraded its litellm dependency
|
Aider picked up this fix April 28, 2025 when it upgraded its litellm dependency
|
||||||
from v1.65.7 to v1.67.4.post1
|
from v1.65.7 to v1.67.4.post1
|
||||||
in commit [9351f37](https://github.com/Aider-AI/aider/commit/9351f37)
|
in commit [9351f37](https://github.com/Aider-AI/aider/commit/9351f37).
|
||||||
That change shipped on May 5, 2025 in aider v0.82.3.
|
That dependency change shipped on May 5, 2025 in aider v0.82.3.
|
||||||
|
|
||||||
Unfortunately,
|
The incorrect cost has been removed from the leaderboard.
|
||||||
|
Unfortunately, the 03-25 version of Gemini 2.5 Pro Preview is no longer available,
|
||||||
|
so it is not possible to re-run the benchmark to obtain an accurate cost.
|
||||||
|
|
||||||
|
As a possibly relevant comparison, the newer 05-06 version of Gemini 2.5 Pro Preview
|
||||||
|
completed the benchmark at a cost of $41.17.
|
||||||
|
|
||||||
# Investigation
|
# Investigation
|
||||||
|
|
||||||
Every aider benchmark report contains the git commit hash of the aider repo state used to
|
Every aider benchmark report contains the git commit hash of the aider repository state used to
|
||||||
run the benchmark.
|
run the benchmark.
|
||||||
The benchmark run in question was built from
|
The
|
||||||
|
[benchmark run in question](https://github.com/Aider-AI/aider/blob/edbfec0ce4e1fe86735c915cb425b0d8636edc32/aider/website/_data/polyglot_leaderboard.yml#L814)
|
||||||
|
was built from
|
||||||
commit [0282574](https://github.com/Aider-AI/aider/commit/0282574).
|
commit [0282574](https://github.com/Aider-AI/aider/commit/0282574).
|
||||||
|
|
||||||
Additional runs of the benchmark from that build verified that the error in litellm's
|
Additional runs of the benchmark from that build verified that the error in litellm's
|
||||||
model cost database appears not to have been a factor:
|
model cost database appears not to have been a factor:
|
||||||
|
|
||||||
- The local model database correctly overrides the litellm database, which contained an incorrect token cost at the time.
|
- Aider's local model database correctly overrides the litellm database, which contained an incorrect token cost at the time.
|
||||||
- The correct pricing is loaded from aider's local model database and produces similar costs as the original run.
|
- The correct pricing is loaded from aider's local model database and produces similar (incorrect) costs as the original run.
|
||||||
- Updating aider's local model database with an absurdly high token cost resulted in an appropriately high benchmark cost report.
|
- Updating aider's local model database with an absurdly high token cost resulted in an appropriately high benchmark cost report, demonstrating that the local database costs were in effect.
|
||||||
|
|
||||||
|
This specific build of aider was then updated with various versions of litellm using `git biset`
|
||||||
|
to identify the first litellm commit where correct tokens counts were returned.
|
||||||
|
|
||||||
That build of aider was updated with various versions of litellm using `git biset`
|
|
||||||
to identify the litellm commit where the reasoning tokens were added to litellm's
|
|
||||||
token count reporting.
|
|
||||||
|
|
||||||
|
|
||||||
# Timeline
|
# Timeline
|
||||||
|
|
||||||
Below is the full timeline of git commits related to this issue in the aider and litellm repositories.
|
Below is the full timeline of git commits related to this issue in the aider and litellm repositories.
|
||||||
|
Each entry has a UTC timestamp, followed by the original literal timestamp obtained from the
|
||||||
|
relevant source.
|
||||||
|
|
||||||
- 2025-04-04 19:54:45 UTC (Sat Apr 5 08:54:45 2025 +1300)
|
- 2025-04-04 19:54:45 UTC (Sat Apr 5 08:54:45 2025 +1300)
|
||||||
- Correct value `"output_cost_per_token": 0.000010` for `gemini/gemini-2.5-pro-preview-03-25` added to `aider/resources/model-metadata.json`
|
- Correct value `"output_cost_per_token": 0.000010` for `gemini/gemini-2.5-pro-preview-03-25` added to `aider/resources/model-metadata.json`
|
||||||
|
@ -75,9 +90,9 @@ Below is the full timeline of git commits related to this issue in the aider and
|
||||||
- Commit [ac4f32f](https://github.com/BerriAI/litellm/commit/ac4f32f) in litellm.
|
- Commit [ac4f32f](https://github.com/BerriAI/litellm/commit/ac4f32f) in litellm.
|
||||||
|
|
||||||
- 2025-04-12 04:55:50 UTC (2025-04-12-04-55-50 UTC)
|
- 2025-04-12 04:55:50 UTC (2025-04-12-04-55-50 UTC)
|
||||||
- Benchmark performed
|
- Benchmark performed.
|
||||||
- Aider repo hash [0282574 recorded in benchmark results](https://github.com/Aider-AI/aider/blob/7fbeafa1cfd4ad83f7499417837cdfa6b16fe7a1/aider/website/_data/polyglot_leaderboard.yml#L814), without a "dirty" annotation, indicating that the benchmark was run on a clean checkout of the aider repo at commit [0282574](https://github.com/Aider-AI/aider/commit/0282574).
|
- Aider repo hash [0282574 recorded in benchmark results](https://github.com/Aider-AI/aider/blob/7fbeafa1cfd4ad83f7499417837cdfa6b16fe7a1/aider/website/_data/polyglot_leaderboard.yml#L814), without a "dirty" annotation, indicating that the benchmark was run on a clean checkout of the aider repo at commit [0282574](https://github.com/Aider-AI/aider/commit/0282574).
|
||||||
- Correct value `"output_cost_per_token": 0.000010` is in `aider/resources/model-metadata.json` at this commit [0282574](https://github.com/Aider-AI/aider/blob/0282574/aider/resources/model-metadata.json#L357)
|
- Correct value `"output_cost_per_token": 0.000010` is in `aider/resources/model-metadata.json` at this commit [0282574](https://github.com/Aider-AI/aider/blob/0282574/aider/resources/model-metadata.json#L357).
|
||||||
|
|
||||||
- 2025-04-12 15:06:39 UTC (Apr 12 08:06:39 2025 -0700)
|
- 2025-04-12 15:06:39 UTC (Apr 12 08:06:39 2025 -0700)
|
||||||
- Benchmark results added to aider repo.
|
- Benchmark results added to aider repo.
|
||||||
|
@ -95,4 +110,4 @@ Below is the full timeline of git commits related to this issue in the aider and
|
||||||
- 2025-04-28 14:53:20 UTC (Mon Apr 28 07:53:20 2025 -0700)
|
- 2025-04-28 14:53:20 UTC (Mon Apr 28 07:53:20 2025 -0700)
|
||||||
- Aider upgraded its litellm dependency from v1.65.7 to v1.67.4.post1, which included the reasoning token count fix.
|
- Aider upgraded its litellm dependency from v1.65.7 to v1.67.4.post1, which included the reasoning token count fix.
|
||||||
- Commit [9351f37](https://github.com/Aider-AI/aider/commit/9351f37) in aider.
|
- Commit [9351f37](https://github.com/Aider-AI/aider/commit/9351f37) in aider.
|
||||||
- This change shipped on May 5, 2025 in aider v0.82.3.
|
- This dependency change shipped on May 5, 2025 in aider v0.82.3.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue