Commit graph

300 commits

Author SHA1 Message Date
Paul Gauthier (aider)
1f59687e9d style: Format code with linter 2024-08-14 06:21:48 -07:00
Paul Gauthier (aider)
d8c8c51156 The commit message for these changes would be:
feat: Improve graph visualization and add debugging

The changes made in this commit include:

1. Adjusting the y-axis limit to 100 to accommodate the higher pass rate values.
2. Rotating the x-axis labels for better readability.
3. Adding debug print statements to track the progress of figure generation and display.
4. Increasing the figure size for better visibility.
5. Adding additional debugging to ensure the data is being plotted correctly.

These improvements should help with the visualization and debugging of the graph generation process.
2024-08-14 06:21:45 -07:00
Paul Gauthier (aider)
d94d5aa3fa style: format code according to linter rules 2024-08-14 06:20:36 -07:00
Paul Gauthier (aider)
d2479f30f7 fix: Add debug prints and check for empty data in over_time.py 2024-08-14 06:20:32 -07:00
Paul Gauthier
56975d02a1 fix: Update path to edit_leaderboard.yml file 2024-08-14 06:20:31 -07:00
Paul Gauthier
060c8ff89a override dotenv 2024-08-13 18:06:00 -07:00
Paul Gauthier
139f7992cb do not pass pretty to coder 2024-08-13 17:43:41 -07:00
Your Name
e799e89ff4 install aider with -e in benchmark docker 2024-07-28 17:23:25 -03:00
Paul Gauthier
e9bde684ae updated benchmark docker to use [dev] 2024-07-14 17:24:51 +01:00
Paul Gauthier
a8d4be441e Update benchmark dockerfile 2024-07-11 08:54:47 +01:00
Paul Gauthier
9fa70b4b05 Install the cpu-only version of torch 2024-07-09 09:37:10 +01:00
Paul Gauthier
3ff0c7ce35 copy 2024-06-02 09:28:56 -07:00
Paul Gauthier
2cb9a8ddc8 copy 2024-06-01 16:10:55 -07:00
Paul Gauthier
47a3cb8adf copy 2024-06-01 15:05:29 -07:00
Paul Gauthier
2febc663f3 copy 2024-06-01 14:48:12 -07:00
Paul Gauthier
bc4d39ddf2 copy 2024-06-01 11:46:53 -07:00
Paul Gauthier
26edbcc8f1 copy 2024-06-01 11:26:16 -07:00
Paul Gauthier
fcc62ebffc copy 2024-06-01 07:34:30 -07:00
Paul Gauthier
07d36b22c0 copy 2024-06-01 07:23:25 -07:00
Paul Gauthier
1b539e4466 rename 2024-06-01 07:04:12 -07:00
Paul Gauthier
7623b8e2e6 copy 2024-06-01 07:03:58 -07:00
Paul Gauthier
6a2d7e08c2 copy 2024-05-31 13:32:20 -07:00
Paul Gauthier
2c6b472946 copy 2024-05-31 11:28:38 -07:00
Paul Gauthier
676b644df4 aider: Added more whitespace below the graph to give more room for the note.
# Aider chat conversation:

USER: add more white space below the graph, to give more room for the note

ASSISTANT: Here is the change to add more white space below the graph:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    plt.tight_layout(pad=3.0)
=======
    plt.tight_layout(pad=3.0, rect=[0, 0.05, 1, 1])
>>>>>>> REPLACE
```

This adjusts the `tight_layout` call to reserve 5% of the figure height at the bottom for the note, using the `rect` parameter. The `rect` tuple specifies the left, bottom, right and top of the area the subplots should occupy as a fraction of figure width and height.
2024-05-31 11:26:27 -07:00
Paul Gauthier
38ac9de678 use underscores in graph fnames 2024-05-31 09:44:33 -07:00
Paul Gauthier
0120d434ff added draft article 2024-05-31 09:33:31 -07:00
Paul Gauthier
639121bb45 full swe bench graph 2024-05-31 08:16:49 -07:00
Paul Gauthier
20ee2bbf36 aider: Increase white space at the bottom of the plot to make room for the note by adjusting the plt.tight_layout call to include more padding.
# Aider chat conversation:

USER: add some white space at the bottom to make more room for the note

ASSISTANT: To add more white space at the bottom of the plot to make room for the note, we can adjust the `plt.tight_layout` call to include more padding.

Here is the *SEARCH/REPLACE* block:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    plt.tight_layout(pad=3.0)
=======
    plt.tight_layout(pad=5.0)
>>>>>>> REPLACE
```
2024-05-31 07:44:28 -07:00
Paul Gauthier
8bcebfa475 Added a note at the bottom of the graph in swe_bench_lite.py. 2024-05-31 07:44:27 -07:00
Paul Gauthier
ffec28cc15 aider: Added note at the bottom of the graph explaining the number of instances processed by the agent.
# Aider chat conversation:

USER: no, add it to the bottom of the graph

ASSISTANT: To add a note at the bottom of the graph explaining that (570) and (2294) refer to the number of instances processed by the agent, we can use the `text` function from Matplotlib to place the note at the bottom of the plot.

Here is the *SEARCH/REPLACE* block to make this change:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
ax.xaxis.label.set_color(font_color)
=======
ax.xaxis.label.set_color(font_color)

# Add note at the bottom of the graph
note = "(570) and (2294) refer to the number of instances that were processed by the agent."
plt.figtext(0.5, 0.01, note, wrap=True, horizontalalignment='center', fontsize=10, color=font_color)
>>>>>>> REPLACE
```
2024-05-31 07:42:33 -07:00
Paul Gauthier
d017b9b23f aider: Add note explaining that (570) and (2294) refer to the number of instances processed by the agent.
# Aider chat conversation:

USER: add a note at the bottom that explains that (570) and (2294) refer to the number of instances that were processed by the agent

ASSISTANT: Here is the change to add a note at the bottom of the script explaining that (570) and (2294) refer to the number of instances processed by the agent.

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
ax.xaxis.label.set_color(font_color)
=======
ax.xaxis.label.set_color(font_color)

# Note: (570) and (2294) refer to the number of instances that were processed by the agent.
>>>>>>> REPLACE
```
2024-05-31 07:42:18 -07:00
Paul Gauthier
4d02a8aa58 Refactored plot_swe_bench_lite function to dynamically handle file names and titles. 2024-05-31 07:42:17 -07:00
Paul Gauthier
6966936316 Updated graph to use pass@1 unhinted results from other agents 2024-05-30 15:29:33 -07:00
Paul Gauthier
d5ca9b3696 updated autocoderover results 2024-05-30 09:45:02 -07:00
Paul Gauthier
689786a875 copy 2024-05-25 12:14:48 -07:00
Paul Gauthier
c86e258551 added proper data 2024-05-24 15:05:31 -07:00
Paul Gauthier
9d20b5f43a use swe-bench-lite.txt 2024-05-23 13:56:17 -07:00
Paul Gauthier
b8feaed5f9 aider: Reversed the order of the bars from short to tall.
# Aider chat conversation:

USER: reverse the order of the bars, so they go short->tall

ASSISTANT: To reverse the order of the bars from short to tall, we can reverse the order of the `models` and `pass_rates` lists before plotting:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    models = []
    pass_rates = []

    for line in lines:
        if line.strip():
            pass_rate, model = line.split("%")
            model = model.strip()
            model = model.replace("|", "\n")
            models.append(model.strip())
            pass_rates.append(float(pass_rate.strip()))
=======
    models = []
    pass_rates = []

    for line in lines:
        if line.strip():
            pass_rate, model = line.split("%")
            model = model.strip()
            model = model.replace("|", "\n")
            models.insert(0, model.strip())
            pass_rates.insert(0, float(pass_rate.strip()))
>>>>>>> REPLACE
```

This change uses `insert(0, ...)` instead of `append()` to add each model and pass rate to the beginning of their respective lists. This will effectively reverse the order, so the bars will be plotted from shortest to tallest.
2024-05-23 13:12:57 -07:00
Paul Gauthier
c0f5cbb9b5 copy 2024-05-23 10:03:43 -07:00
Paul Gauthier
5ff4deb12e aider: Increased font weight for "Aider" labels to make them more visually distinct.
# Aider chat conversation:

USER: there's no visible difference bwteen the bold and light fonts?

ASSISTANT: To make the bold font more visible, we can increase the font weight. Here are the changes:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
        fontweight = "bold" if "Aider" in model else "normal"
=======
        fontweight = 800 if "Aider" in model else "normal"
>>>>>>> REPLACE
```

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
            label.set_fontweight("bold")
=======
            label.set_fontweight(800)
>>>>>>> REPLACE
```

This sets the font weight to 800 for the "Aider" labels, which should make them bolder and more visibly distinct from the other labels.
2024-05-23 10:01:57 -07:00
Paul Gauthier
a03e36f7f7 Increased the font size of "Aider" labels in the SWE benchmark plot. 2024-05-23 10:01:55 -07:00
Paul Gauthier
ed43cd278a aider: Ensure bold and light fonts are visibly different by using "normal" instead of "light" for the non-Aider bars.
# Aider chat conversation:

USER: there's no visible difference bwteen the bold and light fonts?

ASSISTANT: To ensure that the bold and light fonts are visibly different, we can use "normal" instead of "light" for the non-Aider bars. This should provide a more noticeable contrast.

Here are the *SEARCH/REPLACE* blocks:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
        fontweight = "bold" if "Aider" in model else "light"
=======
        fontweight = "bold" if "Aider" in model else "normal"
>>>>>>> REPLACE
```

Let's make this change and see if it improves the visibility of the font differences.
2024-05-23 10:01:14 -07:00
Paul Gauthier
6d75a10c8b Changed font weight to "light" for model "Aider" in plot_swe_bench_lite. 2024-05-23 10:01:12 -07:00
Paul Gauthier
d37bace5d7 aider: Make the percentages on top of Aider's bars and the Aider x labels bold.
# Aider chat conversation:

USER: make the percentages on top of aiders bars bold. make the aider x labels bold too

ASSISTANT: To make the percentages on top of Aider's bars bold and the Aider x labels bold, we need to modify the text properties in the plotting section.

Here are the *SEARCH/REPLACE* blocks:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
        if "Aider" in model:
            y = yval + 0.75
            va = "bottom"
        else:
            y = yval - 1.25
            va = "top"

        ax.text(
            bar.get_x() + bar.get_width() / 2,
            y,
            f"{yval}%",
            ha="center",
            va=va,
            fontsize=14,
            # alpha=0.75,
        )
=======
        y = yval + 0.75 if "Aider" in model else yval - 1.25
        va = "bottom" if "Aider" in model else "top"
        fontweight = "bold" if "Aider" in model else "normal"

        ax.text(
            bar.get_x() + bar.get_width() / 2,
            y,
            f"{yval}%",
            ha="center",
            va=va,
            fontsize=14,
            fontweight=fontweight,
        )
>>>>>>> REPLACE
```

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    plt.xticks(
        # rotation=45,
        # ha="center",
        fontsize=16,
        color=font_color,
    )
=======
    xticks = plt.xticks(
        fontsize=16,
        color=font_color,
    )
    for label in xticks[1]:
        if "Aider" in label.get_text():
            label.set_fontweight("bold")
>>>>>>> REPLACE
```
2024-05-23 09:56:04 -07:00
Paul Gauthier
cf1140ba71 Updated colors and alpha values for Aider models in SWE benchmark plot. 2024-05-23 09:56:03 -07:00
Paul Gauthier
484f625b2c copy 2024-05-23 09:49:47 -07:00
Paul Gauthier
ee7dcd5071 aider: Adjust alpha values for bars based on model name containing "Aider".
# Aider chat conversation:

USER: can we use more alpha for the aider bars and less alpha for the other bars?

ASSISTANT: To adjust the alpha values for the bars, we need to modify the section where the bars are created and set their alpha values based on whether the model name contains "Aider".

Here is the *SEARCH/REPLACE* block:

benchmark/swe_bench_lite.py
```python
<<<<<<< SEARCH
    bars = ax.bar(models, pass_rates, color=colors, alpha=0.75, zorder=3)
=======
    bars = []
    for model, pass_rate, color in zip(models, pass_rates, colors):
        alpha = 0.9 if "Aider" in model else 0.5
        bar = ax.bar(model, pass_rate, color=color, alpha=alpha, zorder=3)
        bars.append(bar[0])
>>>>>>> REPLACE
```
2024-05-23 09:48:53 -07:00
Paul Gauthier
15669b7ae8 copy 2024-05-23 08:28:42 -07:00
Paul Gauthier
2f3baf7cdd copy 2024-05-23 08:04:37 -07:00
Paul Gauthier
15c228097b copy 2024-05-23 07:45:54 -07:00