Commit graph

3138 commits

Author SHA1 Message Date
Paul Gauthier
7c92370a3b copy 2023-11-08 11:34:34 -08:00
Paul Gauthier
200414cee0 copy 2023-11-08 11:19:36 -08:00
Paul Gauthier
ed0dc95a10 copy 2023-11-08 11:16:56 -08:00
Paul Gauthier
fc747ef589 copy 2023-11-08 11:12:18 -08:00
Paul Gauthier
6acc3689e5 copy 2023-11-08 11:11:47 -08:00
Paul Gauthier
cb63b61411 copy 2023-11-08 10:56:46 -08:00
Paul Gauthier
be0296318f speed results 2023-11-08 10:53:03 -08:00
Paul Gauthier
3969aa98a3 copy 2023-11-08 10:28:54 -08:00
Paul Gauthier
f2a3e489e1 labels 2023-11-08 10:26:22 -08:00
Paul Gauthier
02b604adc3 Add labels to the legend in the speed graph 2023-11-08 10:20:28 -08:00
Paul Gauthier
72f1c390a1 copy 2023-11-08 10:20:12 -08:00
Paul Gauthier
867a0cc33f aider: Add labels to the bars in the plot_timing function to display them correctly in the legend.
# Aider chat conversation:

ASSISTANT: To address the issue with the empty legend for the edit format in the `plot_timing` function, we need to add labels to the bars so that the legend can display them correctly. We will modify the `benchmark/benchmark.py` file to include labels for each edit format.

Here's the change that needs to be made:

<source>python
benchmark/benchmark.py
<<<<<<< SEARCH
    for i, fmt in enumerate(formats):
        edge = dict(edgecolor="#ffffff", linewidth=1.5)
        color = "#b3e6a8" if "diff" in fmt else "#b3d1e6"
        hatch = "////" if "func" in fmt else ""
        rects = ax.bar(
            pos + i * width,
            grouped[fmt],
            width * 0.95,
            color=color,
            hatch=hatch,
            zorder=zorder + 1,
            **edge,
        )
=======
    # Create a mapping for the labels and colors to ensure consistency
    format_labels = {
        'diff': 'Diff',
        'func': 'Function',
        'whole': 'Whole',
        'diff-func': 'Diff-Function',
        'diff-whole': 'Diff-Whole',
        'func-whole': 'Function-Whole',
        # Add other edit formats here if necessary
    }
    format_colors = {
        'diff': '#b3e6a8',
        'func': '#b3d1e6',
        'whole': '#e6b3b3',
        'diff-func': '#e6d1b3',
        'diff-whole': '#b3e6d1',
        'func-whole': '#d1b3e6',
        # Add other edit format colors here if necessary
    }
    format_hatches = {
        'diff': '////',
        'func': '....',
        'whole': '',
        'diff-func': '\\\\\\\\',
        'diff-whole': 'xxxx',
        'func-whole': '----',
        # Add other edit format hatches here if necessary
    }

    for i, fmt in enumerate(formats):
        label = format_labels.get(fmt, fmt)
        color = format_colors.get(fmt, '#b3d1e6')
        hatch = format_hatches.get(fmt, '')
        edge = dict(edgecolor="#ffffff", linewidth=1.5)
        rects = ax.bar(
            pos + i * width,
            grouped[fmt],
            width * 0.95,
            label=label,
            color=color,
            hatch=hatch,
            zorder=zorder + 1,
            **edge,
        )
>>>>>>> REPLACE
</source>

This change adds a mapping for labels, colors, and hatches for each edit format. It also ensures that each bar in the bar chart has a label associated with it, which will be used to generate the legend. The colors and hatches are placeholders and should be adjusted to match the desired styling.
2023-11-08 10:11:47 -08:00
Paul Gauthier
eed4864af1 Changed the label of the y-axis in the plot to "Average GPT response time per exercise (sec)". 2023-11-08 10:11:46 -08:00
Paul Gauthier
207bbbc4df aider: Implemented the plot_timing function to create a bar chart showing the average duration of each (model, edit_format) combination, with consistent styling as plot_outcomes.
# Aider chat conversation:

ASSISTANT: To implement the `plot_timing` function and style the graph similarly to `plot_outcomes`, we will follow these steps:

1. Define the `plot_timing` function to create a bar chart that shows the average duration for each combination of model and edit format.
2. Use similar styling options as used in `plot_outcomes` for consistency.

Let's start by implementing the `plot_timing` function in the `benchmark/benchmark.py` file.

<source>python
benchmark/benchmark.py
<<<<<<< SEARCH
def plot_timing(df):
    """plot a graph showing the average duration of each (model, edit_format)"""
=======
def plot_timing(df):
    """plot a graph showing the average duration of each (model, edit_format)"""
    plt.rcParams["hatch.linewidth"] = 0.5
    plt.rcParams["hatch.color"] = "#444444"

    from matplotlib import rc
    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})

    fig, ax = plt.subplots(figsize=(6, 4))
    ax.grid(axis="y", zorder=0, lw=0.2)

    zorder = 1
    grouped = df.groupby(["model", "edit_format"])["avg_duration"].mean().unstack()
    num_models, num_formats = grouped.shape

    pos = np.array(range(num_models))
    width = 0.8 / num_formats

    formats = grouped.columns
    models = grouped.index

    for i, fmt in enumerate(formats):
        edge = dict(edgecolor="#ffffff", linewidth=1.5)
        color = "#b3e6a8" if "diff" in fmt else "#b3d1e6"
        hatch = "////" if "func" in fmt else ""
        rects = ax.bar(
            pos + i * width,
            grouped[fmt],
            width * 0.95,
            color=color,
            hatch=hatch,
            zorder=zorder + 1,
            **edge,
        )
        ax.bar_label(rects, padding=4, labels=[f"{v:.1f}s" for v in grouped[fmt]], size=6)

    ax.set_xticks([p + 0.5 * width for p in pos])
    ax.set_xticklabels(models)

    ax.set_ylabel("Average duration (seconds)")
    ax.set_title("GPT Code Editing - Timing")
    ax.legend(
        title="Edit Format",
        loc="upper left",
    )
    ax.set_ylim(top=max(grouped.max()) * 1.1)  # Set y-axis limit to 10% more than the max value

    plt.tight_layout()
    plt.savefig("tmp_timing.svg")
    imgcat(fig)
>>>>>>> REPLACE
</source>

This code block adds the implementation of the `plot_timing` function, which creates a bar chart to visualize the average duration of tests for each model and edit format. The styling is kept consistent with the `plot_outcomes` function, using similar colors, hatches, and font settings. The resulting graph is saved as an SVG file and displayed in the terminal using `imgcat`.
2023-11-08 10:07:29 -08:00
Paul Gauthier
ab55b6ff0a Copy ORIGINAL_DNAME to dirname. 2023-11-08 10:07:27 -08:00
Paul Gauthier
59fed25cd1 copy 2023-11-08 08:36:48 -08:00
Paul Gauthier
2c83287c46 refac 2023-11-08 06:21:16 -08:00
Paul Gauthier
c86a957cf5 copy 2023-11-07 18:18:02 -08:00
Paul Gauthier
c55aff87e6 copy 2023-11-07 14:25:46 -08:00
Paul Gauthier
d81cef956e copy 2023-11-07 13:58:16 -08:00
Paul Gauthier
e5dc3b66c3 copy 2023-11-07 11:52:11 -08:00
paul-gauthier
a6f39f63a3
Update benchmarks-1106.md 2023-11-07 11:38:16 -08:00
paul-gauthier
cb2388dc9e
Update benchmarks-1106.md 2023-11-07 11:33:25 -08:00
Paul Gauthier
426819e703 copy 2023-11-07 10:53:27 -08:00
Paul Gauthier
ca3ef646ce copy 2023-11-07 10:21:36 -08:00
Paul Gauthier
5da64a6abc copy 2023-11-07 07:22:26 -08:00
Paul Gauthier
6d86fd9161 copy 2023-11-07 06:37:37 -08:00
Paul Gauthier
5b103329e6 copy 2023-11-07 05:25:03 -08:00
Paul Gauthier
ef700997cd Merge remote-tracking branch 'origin/main' 2023-11-07 05:24:45 -08:00
Paul Gauthier
433dca5b80 copy 2023-11-07 05:24:07 -08:00
paul-gauthier
a6dfd10f65
Update benchmarks-1106.md 2023-11-06 21:06:52 -08:00
paul-gauthier
d14aa23ece
Update benchmarks-1106.md 2023-11-06 21:06:15 -08:00
paul-gauthier
e9254070b1
Update benchmarks-1106.md 2023-11-06 21:05:12 -08:00
Paul Gauthier
beb026cc25 copy 2023-11-06 20:46:17 -08:00
Paul Gauthier
0253588616 set version to 0.17.1-dev 2023-11-06 20:27:37 -08:00
Paul Gauthier
393878120e version bump to 0.17.0 2023-11-06 20:27:12 -08:00
Paul Gauthier
dfb921c509 copy 2023-11-06 20:24:44 -08:00
Paul Gauthier
6b6935e56d Updated HISTORY 2023-11-06 20:21:10 -08:00
Paul Gauthier
5098be6172 Add context windows and pricing for 1106 models 2023-11-06 20:18:40 -08:00
paul-gauthier
c8b95b486f
Update benchmarks-1106.md 2023-11-06 19:50:23 -08:00
paul-gauthier
a2d52536a5
Update benchmarks-1106.md 2023-11-06 19:48:30 -08:00
Paul Gauthier
a6027721c1 copy 2023-11-06 19:15:01 -08:00
Paul Gauthier
2675ed5e87 copy 2023-11-06 19:12:22 -08:00
Paul Gauthier
15a1c754a4 copy 2023-11-06 18:58:24 -08:00
Paul Gauthier
11424739a0 copy 2023-11-06 18:56:21 -08:00
Paul Gauthier
e91b12c71c copy 2023-11-06 18:53:49 -08:00
Paul Gauthier
c55fbe582a copy 2023-11-06 18:51:45 -08:00
Paul Gauthier
87d3e6d763 copy 2023-11-06 18:50:33 -08:00
Paul Gauthier
df7f254b38 copy 2023-11-06 18:48:42 -08:00
Paul Gauthier
35754247b4 copy 2023-11-06 18:48:14 -08:00