Commit graph

3126 commits

Author SHA1 Message Date
Paul Gauthier
eed4864af1 Changed the label of the y-axis in the plot to "Average GPT response time per exercise (sec)". 2023-11-08 10:11:46 -08:00
Paul Gauthier
207bbbc4df aider: Implemented the plot_timing function to create a bar chart showing the average duration of each (model, edit_format) combination, with consistent styling as plot_outcomes.
# Aider chat conversation:

ASSISTANT: To implement the `plot_timing` function and style the graph similarly to `plot_outcomes`, we will follow these steps:

1. Define the `plot_timing` function to create a bar chart that shows the average duration for each combination of model and edit format.
2. Use similar styling options as used in `plot_outcomes` for consistency.

Let's start by implementing the `plot_timing` function in the `benchmark/benchmark.py` file.

<source>python
benchmark/benchmark.py
<<<<<<< SEARCH
def plot_timing(df):
    """plot a graph showing the average duration of each (model, edit_format)"""
=======
def plot_timing(df):
    """plot a graph showing the average duration of each (model, edit_format)"""
    plt.rcParams["hatch.linewidth"] = 0.5
    plt.rcParams["hatch.color"] = "#444444"

    from matplotlib import rc
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})

    fig, ax = plt.subplots(figsize=(6, 4))
    ax.grid(axis="y", zorder=0, lw=0.2)

    zorder = 1
    grouped = df.groupby(["model", "edit_format"])["avg_duration"].mean().unstack()
    num_models, num_formats = grouped.shape

    pos = np.array(range(num_models))
    width = 0.8 / num_formats

    formats = grouped.columns
    models = grouped.index

    for i, fmt in enumerate(formats):
        edge = dict(edgecolor="#ffffff", linewidth=1.5)
        color = "#b3e6a8" if "diff" in fmt else "#b3d1e6"
        hatch = "////" if "func" in fmt else ""
        rects = ax.bar(
            pos + i * width,
            grouped[fmt],
            width * 0.95,
            color=color,
            hatch=hatch,
            zorder=zorder + 1,
            **edge,
        )
        ax.bar_label(rects, padding=4, labels=[f"{v:.1f}s" for v in grouped[fmt]], size=6)

    ax.set_xticks([p + 0.5 * width for p in pos])
    ax.set_xticklabels(models)

    ax.set_ylabel("Average duration (seconds)")
    ax.set_title("GPT Code Editing - Timing")
    ax.legend(
        title="Edit Format",
        loc="upper left",
    )
    ax.set_ylim(top=max(grouped.max()) * 1.1)  # Set y-axis limit to 10% more than the max value

    plt.tight_layout()
    plt.savefig("tmp_timing.svg")
    imgcat(fig)
>>>>>>> REPLACE
</source>

This code block adds the implementation of the `plot_timing` function, which creates a bar chart to visualize the average duration of tests for each model and edit format. The styling is kept consistent with the `plot_outcomes` function, using similar colors, hatches, and font settings. The resulting graph is saved as an SVG file and displayed in the terminal using `imgcat`.
2023-11-08 10:07:29 -08:00
Paul Gauthier
ab55b6ff0a Copy ORIGINAL_DNAME to dirname. 2023-11-08 10:07:27 -08:00
Paul Gauthier
59fed25cd1 copy 2023-11-08 08:36:48 -08:00
Paul Gauthier
2c83287c46 refac 2023-11-08 06:21:16 -08:00
Paul Gauthier
c86a957cf5 copy 2023-11-07 18:18:02 -08:00
Paul Gauthier
c55aff87e6 copy 2023-11-07 14:25:46 -08:00
Paul Gauthier
d81cef956e copy 2023-11-07 13:58:16 -08:00
Paul Gauthier
e5dc3b66c3 copy 2023-11-07 11:52:11 -08:00
paul-gauthier
a6f39f63a3
Update benchmarks-1106.md 2023-11-07 11:38:16 -08:00
paul-gauthier
cb2388dc9e
Update benchmarks-1106.md 2023-11-07 11:33:25 -08:00
Paul Gauthier
426819e703 copy 2023-11-07 10:53:27 -08:00
Paul Gauthier
ca3ef646ce copy 2023-11-07 10:21:36 -08:00
Paul Gauthier
5da64a6abc copy 2023-11-07 07:22:26 -08:00
Paul Gauthier
6d86fd9161 copy 2023-11-07 06:37:37 -08:00
Paul Gauthier
5b103329e6 copy 2023-11-07 05:25:03 -08:00
Paul Gauthier
ef700997cd Merge remote-tracking branch 'origin/main' 2023-11-07 05:24:45 -08:00
Paul Gauthier
433dca5b80 copy 2023-11-07 05:24:07 -08:00
paul-gauthier
a6dfd10f65
Update benchmarks-1106.md 2023-11-06 21:06:52 -08:00
paul-gauthier
d14aa23ece
Update benchmarks-1106.md 2023-11-06 21:06:15 -08:00
paul-gauthier
e9254070b1
Update benchmarks-1106.md 2023-11-06 21:05:12 -08:00
Paul Gauthier
beb026cc25 copy 2023-11-06 20:46:17 -08:00
Paul Gauthier
0253588616 set version to 0.17.1-dev 2023-11-06 20:27:37 -08:00
Paul Gauthier
393878120e version bump to 0.17.0 2023-11-06 20:27:12 -08:00
Paul Gauthier
dfb921c509 copy 2023-11-06 20:24:44 -08:00
Paul Gauthier
6b6935e56d Updated HISTORY 2023-11-06 20:21:10 -08:00
Paul Gauthier
5098be6172 Add context windows and pricing for 1106 models 2023-11-06 20:18:40 -08:00
paul-gauthier
c8b95b486f
Update benchmarks-1106.md 2023-11-06 19:50:23 -08:00
paul-gauthier
a2d52536a5
Update benchmarks-1106.md 2023-11-06 19:48:30 -08:00
Paul Gauthier
a6027721c1 copy 2023-11-06 19:15:01 -08:00
Paul Gauthier
2675ed5e87 copy 2023-11-06 19:12:22 -08:00
Paul Gauthier
15a1c754a4 copy 2023-11-06 18:58:24 -08:00
Paul Gauthier
11424739a0 copy 2023-11-06 18:56:21 -08:00
Paul Gauthier
e91b12c71c copy 2023-11-06 18:53:49 -08:00
Paul Gauthier
c55fbe582a copy 2023-11-06 18:51:45 -08:00
Paul Gauthier
87d3e6d763 copy 2023-11-06 18:50:33 -08:00
Paul Gauthier
df7f254b38 copy 2023-11-06 18:48:42 -08:00
Paul Gauthier
35754247b4 copy 2023-11-06 18:48:14 -08:00
Paul Gauthier
93aa497220 copy 2023-11-06 18:37:13 -08:00
Paul Gauthier
f658a6575a copy 2023-11-06 18:30:48 -08:00
Paul Gauthier
a238ab9bd5 copy 2023-11-06 18:30:26 -08:00
Paul Gauthier
08fd051a37 new benchmark results 2023-11-06 18:26:14 -08:00
Paul Gauthier
976fc7a836 update benchmarking script 2023-11-06 18:26:02 -08:00
Paul Gauthier
5ddda920b3 new benchmark results 2023-11-06 18:25:52 -08:00
Paul Gauthier
44388dbc6d Updated dev-requirements.in to include all benchmark dependencies 2023-11-06 13:08:48 -08:00
Paul Gauthier
af71638b06 less simple, but docker image builds 2023-11-03 14:53:22 -07:00
Paul Gauthier
208fc7ae78 simpler 2023-11-03 14:45:49 -07:00
Paul Gauthier
65f9d39d95 copy 2023-11-03 14:42:47 -07:00
Paul Gauthier
c16210cdd7 copy 2023-11-03 14:39:47 -07:00
Paul Gauthier
87ab5495a7 Simplify scripting aider 2023-11-03 14:29:45 -07:00