From 46f8b6dfe4592c39a50216db943b5fd3cdb6361e Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Thu, 26 Sep 2024 08:07:36 -0700 Subject: [PATCH] feat: Update benchmark data table in senior-junior blog post --- .../website/_posts/2024-09-26-senior-junior.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/aider/website/_posts/2024-09-26-senior-junior.md b/aider/website/_posts/2024-09-26-senior-junior.md index 1458187f7..9894e7686 100644 --- a/aider/website/_posts/2024-09-26-senior-junior.md +++ b/aider/website/_posts/2024-09-26-senior-junior.md @@ -13,19 +13,16 @@ nav_exclude: true Here's a table containing the benchmark data for different model configurations: -| Senior | Junior | Junior Edit Format | Pass Rate 2 (%) | Seconds per Case | Total Cost ($) | +| Senior | Junior | Junior Edit Format | Pass Rate (%) | Average Time (sec) | Total Cost ($) | |-------|--------------|---------------------|-----------------|-------------------|----------------| -| claude-3.5-sonnet | claude-3.5-sonnet | junior-diff | 80.5 | 25.1 | 4.95 | -|-------|--------------|---------------------|-----------------|-------------------|----------------| -| gpt-4o | gpt-4o | junior-diff | 75.2 | 18.2 | 6.09 | -|-------|--------------|---------------------|-----------------|-------------------|----------------| -| o1-mini | deepseek | junior-diff | 69.2 | 52.2 | 5.79 | -| o1-mini | gpt-4o | junior-diff | 70.7 | 23.7 | 9.32 | -|-------|--------------|---------------------|-----------------|-------------------|----------------| -| o1-preview | claude-3.5-sonnet | junior-diff | 82.7 | 44.9 | 37.62 | -| o1-preview | deepseek | junior-diff | 80.5 | 73.2 | 35.79 | | o1-preview | deepseek | junior-whole | 85.0 | 67.4 | 35.32 | +| o1-preview | claude-3.5-sonnet | junior-diff | 82.7 | 44.9 | 37.62 | | o1-preview | gpt-4o | junior-diff | 80.5 | 42.3 | 39.38 | +| o1-preview | deepseek | junior-diff | 80.5 | 73.2 | 35.79 | +| claude-3.5-sonnet | claude-3.5-sonnet | junior-diff | 80.5 | 25.1 | 4.95 | +| gpt-4o | gpt-4o | junior-diff | 75.2 | 18.2 | 6.09 | +| o1-mini | gpt-4o | junior-diff | 70.7 | 23.7 | 9.32 | +| o1-mini | deepseek | junior-diff | 69.2 | 52.2 | 5.79 | |-------|--------------|---------------------|-----------------|-------------------|----------------| This table provides a comparison of different model configurations, showing their performance in terms of pass rate, processing time, and cost.