Commit graph

52 commits

Author SHA1 Message Date
Paul Gauthier (aider)
b3e29ab20e style: Apply linter formatting to benchmark code 2024-11-20 20:02:52 -08:00
Paul Gauthier (aider)
5504ac535b feat: Add simplified model names for legend labels 2024-11-20 20:02:48 -08:00
Paul Gauthier (aider)
4b3dd7f4ea style: Apply linter formatting to over_time.py 2024-11-20 19:59:43 -08:00
Paul Gauthier (aider)
8edf9540d5 feat: Add legend to plot and remove point labels 2024-11-20 19:59:38 -08:00
Paul Gauthier
1c62ecd1b5 style: Adjust x-axis label rotation angle for better readability 2024-11-20 19:59:36 -08:00
Paul Gauthier
7cf3d9f3ce style: Increase annotation font size in benchmark plot 2024-11-20 19:45:42 -08:00
Paul Gauthier
9b5a703307 updated models-over-time 2024-11-20 19:40:59 -08:00
Paul Gauthier (aider)
370993cbed style: Rotate point labels by 45 degrees in benchmark plot 2024-11-20 18:47:30 -08:00
Paul Gauthier
ddc538cdfa refactor: Adjust plot figure size and y-axis limits for better visualization 2024-11-20 18:47:28 -08:00
Paul Gauthier (aider)
062dc43c87 style: Make graph aspect ratio square 2024-11-20 18:43:18 -08:00
Paul Gauthier (aider)
7d9b986c04 feat: Add cyan color and line for Mistral models in visualization 2024-11-20 18:38:06 -08:00
Paul Gauthier
bd2b9a12ed style: Change Qwen model color from purple to darkblue 2024-11-20 18:38:04 -08:00
Paul Gauthier (aider)
2b55707738 feat: Add purple color and line for Qwen models in visualization 2024-11-20 18:35:25 -08:00
Paul Gauthier (aider)
093540507e feat: Add pink color and line for Haiku models in benchmark visualization 2024-11-20 18:33:54 -08:00
Paul Gauthier (aider)
8f1dcfda07 feat: Add brown color for DeepSeek models in benchmark visualization 2024-11-20 18:31:46 -08:00
Paul Gauthier
16b319174b refactor: Simplify model color detection logic for Sonnet models 2024-11-20 18:31:44 -08:00
Paul Gauthier (aider)
35115f5707 feat: Add orange color for Claude 3 Sonnet models in benchmark visualization 2024-11-20 18:30:09 -08:00
Paul Gauthier
822a8ab671 remove gpt-4o-mini from the gpt-4 trendline 2024-08-15 09:52:21 -07:00
Paul Gauthier (aider)
5ccdebf2c0 refactor: Extract color assignment logic into a separate function 2024-08-15 09:50:50 -07:00
Paul Gauthier (aider)
0a3c6bfbe7 feat: Change blue color to light blue in plot_over_time function 2024-08-14 06:29:48 -07:00
Paul Gauthier (aider)
d2b4846b95 feat: Replace orange color with purple for "-4o" models 2024-08-14 06:29:13 -07:00
Paul Gauthier (aider)
fb0b348bec fix: Remove unused blue_points variable 2024-08-14 06:28:28 -07:00
Paul Gauthier (aider)
a7290be843 style: Apply linter formatting changes 2024-08-14 06:27:51 -07:00
Paul Gauthier (aider)
1cdbc76974 feat: Connect model family lines in over_time plot 2024-08-14 06:27:48 -07:00
Paul Gauthier
714fd45f4d fix: Update color logic and font size in over_time.py 2024-08-14 06:27:47 -07:00
Paul Gauthier (aider)
1f6cadcc66 style: Refactor conditional logic in color assignment 2024-08-14 06:22:51 -07:00
Paul Gauthier (aider)
c4f70d81b7 feat: add new color for all "-4o-" models except "gpt-4o-mini" 2024-08-14 06:22:48 -07:00
Paul Gauthier (aider)
1f59687e9d style: Format code with linter 2024-08-14 06:21:48 -07:00
Paul Gauthier (aider)
d8c8c51156 The commit message for these changes would be:
feat: Improve graph visualization and add debugging

The changes made in this commit include:

1. Adjusting the y-axis limit to 100 to accommodate the higher pass rate values.
2. Rotating the x-axis labels for better readability.
3. Adding debug print statements to track the progress of figure generation and display.
4. Increasing the figure size for better visibility.
5. Adding additional debugging to ensure the data is being plotted correctly.

These improvements should help with the visualization and debugging of the graph generation process.
2024-08-14 06:21:45 -07:00
Paul Gauthier (aider)
d94d5aa3fa style: format code according to linter rules 2024-08-14 06:20:36 -07:00
Paul Gauthier (aider)
d2479f30f7 fix: Add debug prints and check for empty data in over_time.py 2024-08-14 06:20:32 -07:00
Paul Gauthier
56975d02a1 fix: Update path to edit_leaderboard.yml file 2024-08-14 06:20:31 -07:00
Paul Gauthier
fcfc05e2d3 aider: Make the x and y axis labels also #555.
# Aider chat conversation:

USER: make the x and y axis labels also #555

ASSISTANT: Here are the changes to make the x and y axis labels also #555 in `benchmark/swe_bench_lite.py` and `benchmark/over_time.py`:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    ax.set_ylabel("Pass rate (%)", fontsize=18)
    ax.set_title("SWE Bench Lite", fontsize=20)
    ax.set_ylim(0, 30)
    plt.xticks(rotation=45, ha="right", fontsize=16)
    plt.tight_layout(pad=3.0)
    plt.savefig("swe_bench_lite.jpg")
    plt.savefig("swe_bench_lite.svg")
    imgcat(fig)
=======
    ax.set_ylabel("Pass rate (%)", fontsize=18, color="#555")
    ax.set_title("SWE Bench Lite", fontsize=20)
    ax.set_ylim(0, 30)
    plt.xticks(rotation=45, ha="right", fontsize=16)
    plt.tight_layout(pad=3.0)
    plt.savefig("swe_bench_lite.jpg")
    plt.savefig("swe_bench_lite.svg")
    imgcat(fig)
    ax.xaxis.label.set_color("#555")
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    ax.set_xlabel("Model release date", fontsize=18)
    ax.set_ylabel("Aider code editing benchmark,\npercent completed correctly", fontsize=18)
    ax.set_title("LLM code editing skill by model release date", fontsize=20)
    ax.set_ylim(0, 30)
    plt.xticks(fontsize=14)
    plt.tight_layout(pad=3.0)
    plt.savefig("tmp_over_time.png")
    plt.savefig("tmp_over_time.svg")
    imgcat(fig)
=======
    ax.set_xlabel("Model release date", fontsize=18, color="#555")
    ax.set_ylabel("Aider code editing benchmark,\npercent completed correctly", fontsize=18, color="#555")
    ax.set_title("LLM code editing skill by model release date", fontsize=20)
    ax.set_ylim(0, 30)
    plt.xticks(fontsize=14)
    plt.tight_layout(pad=3.0)
    plt.savefig("tmp_over_time.png")
    plt.savefig("tmp_over_time.svg")
    imgcat(fig)
>>>>>>> REPLACE
```
2024-05-22 15:29:58 -07:00
Paul Gauthier
1f7ff54717 aider: Set font color using plt.rcParams to resolve KeyError for unrecognized key "font.color".
# Aider chat conversation:

USER: KeyError: 'Unrecognized key "font.color" for group "font" and name "color"'

ASSISTANT: To fix the `KeyError` related to the unrecognized key "font.color", we need to set the font color using `plt.rcParams` instead of `rc`.

Here are the changes:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10, "color": "#444444"})
=======
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})
    plt.rcParams["text.color"] = "#444444"
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10, "color": "#444444"})
=======
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})
    plt.rcParams["text.color"] = "#444444"
>>>>>>> REPLACE
```
2024-05-22 15:28:44 -07:00
Paul Gauthier
81ac2e18cc aider: Changed font color to dark grey in benchmark/swe_bench_lite.py and benchmark/over_time.py.
# Aider chat conversation:

USER: make the font color dark grey not black

ASSISTANT: Here are the changes to make the font color dark grey instead of black in `benchmark/swe_bench_lite.py` and `benchmark/over_time.py`:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})
=======
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10, "color": "#444444"})
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})
=======
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10, "color": "#444444"})
>>>>>>> REPLACE
```
2024-05-22 15:28:30 -07:00
Paul Gauthier
19e7823859 aider: Add more whitespace around the perimeter.
# Aider chat conversation:

USER: add more whitespace around the perimeter

ASSISTANT: Here are the changes to add more whitespace around the perimeter in `benchmark/swe_bench_lite.py` and `benchmark/over_time.py`:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    plt.tight_layout()
=======
    plt.tight_layout(pad=3.0)
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    plt.tight_layout()
=======
    plt.tight_layout(pad=3.0)
>>>>>>> REPLACE
```
2024-05-22 15:18:45 -07:00
Paul Gauthier
a6ae4cab06 aider: Make the model labels on the x axis larger.
# Aider chat conversation:

USER: make the model labels on the x axis larger

ASSISTANT: Here are the changes to make the model labels on the x-axis larger in `benchmark/swe_bench_lite.py` and `benchmark/over_time.py`:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    plt.xticks(rotation=45, ha='right')
=======
    plt.xticks(rotation=45, ha='right', fontsize=14)
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    plt.tight_layout()
=======
    plt.xticks(fontsize=14)
    plt.tight_layout()
>>>>>>> REPLACE
```
2024-05-22 15:18:20 -07:00
Paul Gauthier
5ef96a32e2 aider: Set the y-axis to go to 30.
# Aider chat conversation:

USER: make the y-axis go to 30

ASSISTANT: Here are the changes to make the y-axis go to 30 in `benchmark/swe_bench_lite.py`:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    plt.xticks(rotation=45, ha='right')
=======
    ax.set_ylim(0, 30)
    plt.xticks(rotation=45, ha='right')
>>>>>>> REPLACE
```

Here are the changes to make the y-axis go to 30 in `benchmark/over_time.py`:

benchmark/over_time.py
```python
<<<<<<< SEARCH
    plt.tight_layout()
=======
    ax.set_ylim(0, 30)
    plt.tight_layout()
>>>>>>> REPLACE
```
2024-05-22 15:14:38 -07:00
Paul Gauthier
74c0df8df8 svg 2024-05-15 11:54:00 -07:00
Paul Gauthier
c8d2649bb8 Update model names to remove parentheses and extra whitespace in benchmark plot. 2024-05-15 11:36:37 -07:00
Paul Gauthier
445c73267a Updated plot dimensions and axis labels for better visualization in benchmark over time. 2024-05-15 11:29:44 -07:00
Paul Gauthier
77ba518a1a aider: Made the border of the graph thinner and lighter by adjusting the spines properties of the axes.
# Aider chat conversation:

USER: make the border of the graph thinner and lighter

ASSISTANT: To make the border of the graph thinner and lighter, we can adjust the `spines` properties of the axes.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
    colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
=======
    for spine in ax.spines.values():
        spine.set_edgecolor('#DDDDDD')
        spine.set_linewidth(0.5)

    colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
>>>>>>> REPLACE
```
2024-05-15 11:21:01 -07:00
Paul Gauthier
a018fdfe2c aider: Make the dots bigger.
# Aider chat conversation:

USER: make the dots bigger

ASSISTANT: To make the dots bigger, we can adjust the `s` parameter in the `scatter` function, which controls the size of the dots.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
    ax.scatter(dates, pass_rates, c=colors, alpha=0.5)
=======
    ax.scatter(dates, pass_rates, c=colors, alpha=0.5, s=100)
>>>>>>> REPLACE
```
2024-05-15 11:20:05 -07:00
Paul Gauthier
3f1b86a1cc aider: Make the axes labels font larger.
# Aider chat conversation:

USER: I did `git reset --hard HEAD~1` to discard the last edits. Please wait for further instructions before attempting that change again. Feel free to ask relevant questions about why the changes were reverted.

ASSISTANT: Understood. If you have any specific instructions or further changes you'd like to make, please let me know!

USER: make the axes labels font larger

ASSISTANT: To make the axes labels font larger, we can adjust the `fontsize` parameter in the `set_xlabel` and `set_ylabel` methods.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
    ax.set_xlabel('Model release date')
    ax.set_ylabel('Aider code editing benchmark,\npercent completed correctly')
=======
    ax.set_xlabel('Model release date', fontsize=14)
    ax.set_ylabel('Aider code editing benchmark,\npercent completed correctly', fontsize=14)
>>>>>>> REPLACE
```
2024-05-15 11:14:48 -07:00
Paul Gauthier
05e34783c2 Update plot labels and annotations for better readability and clarity. 2024-05-15 11:13:17 -07:00
Paul Gauthier
75fa061016 aider: Adjust the position of labels to prevent overlap with dots.
# Aider chat conversation:

USER: make the labels not overlap the dots

ASSISTANT: To make the labels not overlap the dots, we can adjust the position of the labels slightly using the `xytext` parameter in the `annotate` function.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
        ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
=======
        ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75,
                    xytext=(5, 5), textcoords='offset points')
>>>>>>> REPLACE
```
2024-05-15 11:06:32 -07:00
Paul Gauthier
49eb72cc20 Updated image file format from SVG to PNG and added imgcat library for displaying the plot. 2024-05-15 11:06:31 -07:00
Paul Gauthier
8d670f187c aider: Used the colorscheme and fonts from the graphs in plots.py in over_time.py.
# Aider chat conversation:

USER: use the colorscheme and fonts from the graphs in plots.py

ASSISTANT: To use the colorscheme and fonts from the graphs in `plots.py`, we need to:

1. Import the necessary settings from `plots.py`.
2. Apply these settings to the graph in `over_time.py`.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
import yaml
from datetime import datetime
=======
import yaml
from datetime import datetime
from matplotlib import rc
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    plt.figure(figsize=(10, 6))
=======
    plt.rcParams["hatch.linewidth"] = 0.5
    plt.rcParams["hatch.color"] = "#444444"

    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})

    fig, ax = plt.subplots(figsize=(6, 4))
    ax.grid(axis="y", zorder=0, lw=0.2)
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
    plt.scatter(dates, pass_rates, c=colors, alpha=0.5)
=======
    ax.scatter(dates, pass_rates, c=colors, alpha=0.5)
>>>>>>> REPLACE
```

benchmark/over_time.py
```python
<<<<<<< SEARCH
        plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)

    plt.xlabel('Release Date')
    plt.ylabel('Pass Rate 2')
    plt.title('Model Performance Over Time')
    plt.grid(True)
    plt.tight_layout()
    plt.show()
=======
        ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)

    ax.set_xlabel('Release Date')
    ax.set_ylabel('Pass Rate 2')
    ax.set_title('Model Performance Over Time')
    plt.tight_layout()
    plt.savefig("tmp_over_time.svg")
    imgcat(fig)
>>>>>>> REPLACE
```
2024-05-15 11:04:56 -07:00
Paul Gauthier
85d9d8430f aider: Color the "gpt-3.5" models green in the plotting code.
# Aider chat conversation:

USER: color the gpt-3.5 models a new color

ASSISTANT: To color the "gpt-3.5" models a new color, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-3.5" in the model name and assign a different color.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
    colors = ['red' if 'gpt-4' in model else 'blue' for model in models]
=======
    colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
>>>>>>> REPLACE
```
2024-05-15 11:04:14 -07:00
Paul Gauthier
0d11357980 aider: Color the "gpt-4" models differently.
# Aider chat conversation:

USER: color the "gpt-4" models differently

ASSISTANT: To color the "gpt-4" models differently, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-4" in the model name and assign a different color.

Here are the changes:

benchmark/over_time.py
```python
<<<<<<< SEARCH
    plt.scatter(dates, pass_rates, c='blue', alpha=0.5)
=======
    colors = ['red' if 'gpt-4' in model else 'blue' for model in models]
    plt.scatter(dates, pass_rates, c=colors, alpha=0.5)
>>>>>>> REPLACE
```

This change will color the points red if the model name contains "gpt-4", otherwise, it will color them blue.
2024-05-15 11:03:47 -07:00