This commit is contained in:
Paul Gauthier 2024-05-16 18:12:24 -07:00
parent 8c0bfeb392
commit 47eebdb716

View file

@ -10,20 +10,24 @@ highlight_image: /assets/models-over-time.png
I recently wanted to draw a graph showing how LLM code editing skill has been I recently wanted to draw a graph showing how LLM code editing skill has been
changing over time as new models have been released by OpenAI, Anthropic and others. changing over time as new models have been released by OpenAI, Anthropic and others.
I have all the I have all the
[data in the yaml file](https://github.com/paul-gauthier/aider/blob/main/_data/edit_leaderboard.yml) that is used to render [data in a yaml file](https://github.com/paul-gauthier/aider/blob/main/_data/edit_leaderboard.yml) that is used to render
[aider's LLM leaderboards](https://aider.chat/docs/leaderboards/). [aider's LLM leaderboards](https://aider.chat/docs/leaderboards/).
Below is the aider chat transcript, which shows: Below is the aider chat transcript, which shows:
- I launch aider with the yaml file, a file with other plots I've done recently (so GPT can crib the style) and an empty file called `over_time.py`. - I launch aider with the yaml file, a file with other plots I've done recently (so GPT can crib the style) and an empty file called `over_time.py`.
- Then I ask GPT to draw the scatterplot I want. - Then I ask GPT to draw the scatterplot I want.
- I run the resulting script and share the output with GPT so it can fix a small bug.
- I ask it to color the points for GPT-4 and GPT-3.5 family models differently, to better see trends within those model families. - I ask it to color the points for GPT-4 and GPT-3.5 family models differently, to better see trends within those model families.
- I add a PNG of the graph to the chat, so GPT can see the current graph and understand why I ask it to move the point labels so they don't overlap the points. - I add a PNG of the graph to the chat, so GPT can see the current graph and understand why I want it to move the point labels so they don't overlap the points.
- I work through a series of other small style changes, like changing fonts and the graph border. - I work through a series of other small style changes, like changing fonts and the graph border.
In the end I have the graph, but I also have the python code in my repo. In the end I have the graph, but I also have the python code in my repo.
So I can update this graph easily whenever I add new entries to the yaml data file. So I can update this graph easily whenever I add new entries to the yaml data file.
## Aider chat transcript
<div class="chat-transcript" markdown="1"> <div class="chat-transcript" markdown="1">
> Aider v0.35.1-dev > Aider v0.35.1-dev
@ -85,6 +89,9 @@ plot_over_time('_data/edit_leaderboard.yml')
> Applied edit to benchmark/over_time.py > Applied edit to benchmark/over_time.py
> Commit 49ac9ca aider: Added scatter plot functionality for YAML entries with a `released` date and `pass_rate_2`. > Commit 49ac9ca aider: Added scatter plot functionality for YAML entries with a `released` date and `pass_rate_2`.
</div>
<div class="chat-transcript" markdown="1">
#### /run python benchmark/over_time.py #### /run python benchmark/over_time.py
> Traceback (most recent call last): > Traceback (most recent call last):
File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in <module> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in <module>
@ -118,6 +125,9 @@ benchmark/over_time.py
> Applied edit to benchmark/over_time.py > Applied edit to benchmark/over_time.py
> Commit af2f4b7 aider: Use the `released` field directly in the plot instead of parsing it with `datetime.strptime`. > Commit af2f4b7 aider: Use the `released` field directly in the plot instead of parsing it with `datetime.strptime`.
</div>
<div class="chat-transcript" markdown="1">
#### color the "gpt-4" models differently #### color the "gpt-4" models differently
To color the "gpt-4" models differently, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-4" in the model name and assign a different color. To color the "gpt-4" models differently, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-4" in the model name and assign a different color.
@ -139,6 +149,9 @@ This change will color the points red if the model name contains "gpt-4", otherw
> Applied edit to benchmark/over_time.py > Applied edit to benchmark/over_time.py
> Commit 0d11357 aider: Color the "gpt-4" models differently. > Commit 0d11357 aider: Color the "gpt-4" models differently.
</div>
<div class="chat-transcript" markdown="1">
#### color the gpt-3.5 models a new color #### color the gpt-3.5 models a new color
To color the "gpt-3.5" models a new color, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-3.5" in the model name and assign a different color. To color the "gpt-3.5" models a new color, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-3.5" in the model name and assign a different color.
@ -157,6 +170,9 @@ benchmark/over_time.py
> Applied edit to benchmark/over_time.py > Applied edit to benchmark/over_time.py
> Commit 85d9d84 aider: Color the "gpt-3.5" models green in the plotting code. > Commit 85d9d84 aider: Color the "gpt-3.5" models green in the plotting code.
</div>
<div class="chat-transcript" markdown="1">
#### use the colorscheme and fonts from the other graphs in plots.py #### use the colorscheme and fonts from the other graphs in plots.py
To use the colorscheme and fonts from the graphs in `plots.py`, we need to: To use the colorscheme and fonts from the graphs in `plots.py`, we need to:
@ -229,6 +245,9 @@ benchmark/over_time.py
> Commit 8d670f1 aider: Used the colorscheme and fonts from the graphs in `plots.py` in `over_time.py`. > Commit 8d670f1 aider: Used the colorscheme and fonts from the graphs in `plots.py` in `over_time.py`.
</div>
<div class="chat-transcript" markdown="1">
#### /add tmp_over_time.png #### /add tmp_over_time.png
> Added /Users/gauthier/Projects/aider/tmp_over_time.png to the chat > Added /Users/gauthier/Projects/aider/tmp_over_time.png to the chat
@ -254,6 +273,9 @@ benchmark/over_time.py
> Commit 75fa061 aider: Adjust the position of labels to prevent overlap with dots. > Commit 75fa061 aider: Adjust the position of labels to prevent overlap with dots.
</div>
<div class="chat-transcript" markdown="1">
#### make the dots bigger #### make the dots bigger
To make the dots bigger, we can adjust the `s` parameter in the `scatter` function, which controls the size of the dots. To make the dots bigger, we can adjust the `s` parameter in the `scatter` function, which controls the size of the dots.
@ -272,6 +294,9 @@ benchmark/over_time.py
> Applied edit to benchmark/over_time.py > Applied edit to benchmark/over_time.py
> Commit a018fdf aider: Make the dots bigger. > Commit a018fdf aider: Make the dots bigger.
</div>
<div class="chat-transcript" markdown="1">
#### make the border of the graph thinner and lighter #### make the border of the graph thinner and lighter
To make the border of the graph thinner and lighter, we can adjust the `spines` properties of the axes. To make the border of the graph thinner and lighter, we can adjust the `spines` properties of the axes.