diff --git a/_posts/2024-05-13-models-over-time.md b/_posts/2024-05-13-models-over-time.md new file mode 100644 index 000000000..dbac39361 --- /dev/null +++ b/_posts/2024-05-13-models-over-time.md @@ -0,0 +1,297 @@ +--- +title: Drawing graphs with aider, GPT-4o and matplotlib +excerpt: Use GPT-4o to draw graphs with matplotlib, including adjusting styles and making visual changes. You get the graph, but you also get the code in your repo. +highlight_image: assets/models-over-time.png +--- +# Drawing graphs with aider, GPT-4o and matplotlib + +![LLM coding skill over time](assets/models-over-time.svg) + +I recently wanted to draw a graph showing how LLM code editing skill has been +changing over time as new models have been released by OpenAI, Anthropic and others. +I have all the +[data in the yaml file](https://github.com/paul-gauthier/aider/blob/main/_data/edit_leaderboard.yml) that is used to render +[aider's LLM leaderboards](https://aider.chat/docs/leaderboards/). + +Below is the aider chat transcript, which shows: + +- I launch aider with the yaml file, a file with other plots I've done recently (so GPT can crib the style) and an empty file called `over_time.py`. +- Then I ask GPT to draw the scatterplot I want. +- I ask it to color the points for GPT-4 and GPT-3.5 family models differently, to better see trends within those model families. +- I add a PNG of the graph to the chat, so GPT can see the current graph and understand why I ask it to move the point labels so they don't overlap the points. +- I work through a series of other small style changes, like changing fonts and the graph border. + +In the end I have the graph, but I also have the python code in my repo. +So I can update this graph easily whenever I add new entries to the yaml data file. + +
+ +> Aider v0.35.1-dev +> Models: gpt-4o with diff edit format, weak model gpt-3.5-turbo +> Added _data/edit_leaderboard.yml to the chat. +> Added benchmark/over_time.py to the chat. +> Added benchmark/plots.py to the chat. + +#### in over_time.py, draw a graph for each of the yaml entries with a `released` date. plot date on x, `pass_rate_2` on y and label with the `model` name. do a scatter plot. use colors/styles similar to the other graphs. + +To draw a scatter plot for each of the YAML entries with a `released` date, we need to: + +1. Read the YAML file. +2. Extract the `released` date and `pass_rate_2` for each entry. +3. Plot the data using matplotlib. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH +======= +import matplotlib.pyplot as plt +import yaml +from datetime import datetime + +def plot_over_time(yaml_file): + with open(yaml_file, 'r') as file: + data = yaml.safe_load(file) + + dates = [] + pass_rates = [] + models = [] + + for entry in data: + if 'released' in entry and 'pass_rate_2' in entry: + dates.append(datetime.strptime(entry['released'], '%Y-%m-%d')) + pass_rates.append(entry['pass_rate_2']) + models.append(entry['model']) + + plt.figure(figsize=(10, 6)) + plt.scatter(dates, pass_rates, c='blue', alpha=0.5) + + for i, model in enumerate(models): + plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75) + + plt.xlabel('Release Date') + plt.ylabel('Pass Rate 2') + plt.title('Model Performance Over Time') + plt.grid(True) + plt.tight_layout() + plt.show() + +# Example usage +plot_over_time('_data/edit_leaderboard.yml') +>>>>>>> REPLACE +``` + +> Applied edit to benchmark/over_time.py +> Commit 49ac9ca aider: Added scatter plot functionality for YAML entries with a `released` date and `pass_rate_2`. + +#### /run python benchmark/over_time.py +> Traceback (most recent call last): + File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in + plot_over_time('_data/edit_leaderboard.yml') + File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 15, in plot_over_time + dates.append(datetime.strptime(entry['released'], '%Y-%m-%d')) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +TypeError: strptime() argument 1 must be str, not datetime.date +> Add the output to the chat? y +> Traceback (most recent call last): +> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in +> plot_over_time('_data/edit_leaderboard.yml') +> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 15, in plot_over_time +> dates.append(datetime.strptime(entry['released'], '%Y-%m-%d')) +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> TypeError: strptime() argument 1 must be str, not datetime.date + +The error indicates that the `released` field in the YAML file is already a `datetime.date` object, so we don't need to parse it with `datetime.strptime`. We can directly use it in the plot. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH + dates.append(datetime.strptime(entry['released'], '%Y-%m-%d')) +======= + dates.append(entry['released']) +>>>>>>> REPLACE +``` + +> Applied edit to benchmark/over_time.py +> Commit af2f4b7 aider: Use the `released` field directly in the plot instead of parsing it with `datetime.strptime`. + +#### color the "gpt-4" models differently + +To color the "gpt-4" models differently, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-4" in the model name and assign a different color. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH + plt.scatter(dates, pass_rates, c='blue', alpha=0.5) +======= + colors = ['red' if 'gpt-4' in model else 'blue' for model in models] + plt.scatter(dates, pass_rates, c=colors, alpha=0.5) +>>>>>>> REPLACE +``` + +This change will color the points red if the model name contains "gpt-4", otherwise, it will color them blue. + +> Applied edit to benchmark/over_time.py +> Commit 0d11357 aider: Color the "gpt-4" models differently. + +#### color the gpt-3.5 models a new color + +To color the "gpt-3.5" models a new color, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-3.5" in the model name and assign a different color. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH + colors = ['red' if 'gpt-4' in model else 'blue' for model in models] +======= + colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models] +>>>>>>> REPLACE +``` + +> Applied edit to benchmark/over_time.py +> Commit 85d9d84 aider: Color the "gpt-3.5" models green in the plotting code. + +#### use the colorscheme and fonts from the other graphs in plots.py + +To use the colorscheme and fonts from the graphs in `plots.py`, we need to: + +1. Import the necessary settings from `plots.py`. +2. Apply these settings to the graph in `over_time.py`. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH +import yaml +from datetime import datetime +======= +import yaml +from datetime import datetime +from matplotlib import rc +>>>>>>> REPLACE +``` + +benchmark/over_time.py +```python +<<<<<<< SEARCH + plt.figure(figsize=(10, 6)) +======= + plt.rcParams["hatch.linewidth"] = 0.5 + plt.rcParams["hatch.color"] = "#444444" + + rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10}) + + fig, ax = plt.subplots(figsize=(6, 4)) + ax.grid(axis="y", zorder=0, lw=0.2) +>>>>>>> REPLACE +``` + +benchmark/over_time.py +```python +<<<<<<< SEARCH + plt.scatter(dates, pass_rates, c=colors, alpha=0.5) +======= + ax.scatter(dates, pass_rates, c=colors, alpha=0.5) +>>>>>>> REPLACE +``` + +benchmark/over_time.py +```python +<<<<<<< SEARCH + plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75) + + plt.xlabel('Release Date') + plt.ylabel('Pass Rate 2') + plt.title('Model Performance Over Time') + plt.grid(True) + plt.tight_layout() + plt.show() +======= + ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75) + + ax.set_xlabel('Release Date') + ax.set_ylabel('Pass Rate 2') + ax.set_title('Model Performance Over Time') + plt.tight_layout() + plt.savefig("tmp_over_time.svg") + imgcat(fig) +>>>>>>> REPLACE +``` + +> Applied edit to benchmark/over_time.py +> Commit 8d670f1 aider: Used the colorscheme and fonts from the graphs in `plots.py` in `over_time.py`. + + +#### /add tmp_over_time.png +> Added /Users/gauthier/Projects/aider/tmp_over_time.png to the chat + +#### make the labels not overlap the dots + +To make the labels not overlap the dots, we can adjust the position of the labels slightly using the `xytext` parameter in the `annotate` function. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH + ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75) +======= + ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75, + xytext=(5, 5), textcoords='offset points') +>>>>>>> REPLACE +``` + +> Committing benchmark/over_time.py before applying edits. +> Commit 49eb72c Updated image file format from SVG to PNG and added imgcat library for displaying the plot. +> Applied edit to benchmark/over_time.py +> Commit 75fa061 aider: Adjust the position of labels to prevent overlap with dots. + + +#### make the dots bigger + +To make the dots bigger, we can adjust the `s` parameter in the `scatter` function, which controls the size of the dots. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH + ax.scatter(dates, pass_rates, c=colors, alpha=0.5) +======= + ax.scatter(dates, pass_rates, c=colors, alpha=0.5, s=100) +>>>>>>> REPLACE +``` + +> Applied edit to benchmark/over_time.py +> Commit a018fdf aider: Make the dots bigger. + +#### make the border of the graph thinner and lighter + +To make the border of the graph thinner and lighter, we can adjust the `spines` properties of the axes. + +Here are the changes: + +benchmark/over_time.py +```python +<<<<<<< SEARCH + colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models] +======= + for spine in ax.spines.values(): + spine.set_edgecolor('#DDDDDD') + spine.set_linewidth(0.5) + + colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models] +>>>>>>> REPLACE +``` + +> Applied edit to benchmark/over_time.py +> Commit 77ba518 aider: Made the border of the graph thinner and lighter by adjusting the `spines` properties of the axes. + +
diff --git a/assets/models-over-time.png b/assets/models-over-time.png new file mode 100644 index 000000000..eaed94a53 Binary files /dev/null and b/assets/models-over-time.png differ