aider

mirror of https://github.com/Aider-AI/aider.git synced 2025-05-20 12:24:59 +00:00

Author	SHA1	Message	Date
Paul Gauthier	ca306b6a4a	remove the 1 try results	2023-06-30 14:01:29 -07:00
Paul Gauthier	7c56363d86	copy	2023-06-30 13:07:24 -07:00
Paul Gauthier	f84e3d6ab7	show both tries; repeats	2023-06-30 10:20:06 -07:00
Paul Gauthier	a9cc3f81b2	nice graph	2023-06-30 06:44:37 -07:00
Paul Gauthier	cba75a3bfe	prettier	2023-06-29 22:08:09 -07:00
Paul Gauthier	73f99996df	aider: Set bar color/fill style for each edit-format in `show_stats` function using a color map. # Context: USER: how can i set the bar color/fill style for each edit-format ASSISTANT: To set the bar color/fill style for each edit-format, you need to modify the `show_stats` function in the `benchmark/benchmark.py` file. Here are the steps: 1. Define a dictionary mapping each edit-format to a specific color or fill style. 2. Use the `color` or `fill` parameter in the `plot` function to set the color or fill style for each bar. Here is the code change: ```python benchmark/benchmark.py <<<<<<< ORIGINAL fig, ax = plt.subplots(figsize=(10, 6)) df_grouped2.unstack().plot(kind="barh", ax=ax) ======= fig, ax = plt.subplots(figsize=(10, 6)) color_map = {"edit_format1": "blue", "edit_format2": "green", "edit_format3": "red"} # Define your color map here df_grouped2.unstack().plot(kind="barh", ax=ax, color=[color_map[i] for i in df_grouped2.index]) >>>>>>> UPDATED ``` Please replace `"edit_format1"`, `"edit_format2"`, and `"edit_format3"` with your actual edit formats, and `"blue"`, `"green"`, and `"red"` with your desired colors.	2023-06-29 20:56:58 -07:00
Paul Gauthier	a02cadbc36	tight	2023-06-29 20:56:12 -07:00
Paul Gauthier	490a59cba4	copy	2023-06-29 20:47:49 -07:00
Paul Gauthier	cf171bdd1e	noop	2023-06-29 19:43:06 -07:00
Paul Gauthier	a577a76a44	check for dup model/edit rows	2023-06-29 19:11:28 -07:00
Paul Gauthier	0c9deaa118	aider: Sort the grouped columns by descending average. # Context: USER: sort the grouped columns by descending average ASSISTANT: To sort the grouped columns by descending average, we need to modify the `show_stats` function in the `benchmark/benchmark.py` file. Specifically, we need to sort the `df_grouped` DataFrame by its values in descending order. Here is the change: ```python benchmark/benchmark.py <<<<<<< ORIGINAL df_grouped = df.groupby(["model", "edit_format"])["pass_rate_1"].mean() ======= df_grouped = df.groupby(["model", "edit_format"])["pass_rate_1"].mean().sort_values(ascending=False) >>>>>>> UPDATED ``` This change will sort the grouped DataFrame `df_grouped` by its values in descending order. The `ascending=False` argument in the `sort_values` function ensures that the sorting is done in descending order.	2023-06-29 19:05:16 -07:00
Paul Gauthier	937ee478ad	show with imgcat	2023-06-29 19:04:48 -07:00
Paul Gauthier	0f6a90bfed	aider: Switched the plt stuff to use an explicit fig. # Context: USER: switch the plt stuff to use an explicit fig ASSISTANT: To switch the matplotlib plotting to use an explicit figure, we need to modify the `show_stats` function in the `benchmark/benchmark.py` file. Here are the changes: ```python benchmark/benchmark.py <<<<<<< ORIGINAL df_grouped.unstack().plot(kind="barh", figsize=(10, 6)) plt.xlabel("Pass Rate 1") plt.ylabel("Model") plt.title("Pass Rate 1 for each Model/Edit Format") plt.legend(title="Edit Format") plt.show() ======= fig, ax = plt.subplots(figsize=(10, 6)) df_grouped.unstack().plot(kind="barh", ax=ax) ax.set_xlabel("Pass Rate 1") ax.set_ylabel("Model") ax.set_title("Pass Rate 1 for each Model/Edit Format") ax.legend(title="Edit Format") plt.show() >>>>>>> UPDATED ``` This change creates a new figure and axes before plotting, and then uses the axes for setting labels, title, and legend. This is a more explicit way of handling figures and axes in matplotlib.	2023-06-29 18:58:50 -07:00
Paul Gauthier	e075fe9f0d	hello, graph	2023-06-29 18:58:21 -07:00
Paul Gauthier	c7ba0a5dba	aider: Modified show_stats function to plot a bar graph with the average of "pass_rate_1" for each "edit_format" within each model, grouping the bars for each model together. # Context: USER: don't combine the edit formats. plot a different bar for each model/edit_format combo. But group the bars for each model together. ASSISTANT: To implement this change, we need to modify the `show_stats` function in the `benchmark/benchmark.py` file. We will use the pandas and matplotlib libraries to create the bar plot. Here are the steps: 1. In the `show_stats` function, after creating the DataFrame `df`, sort it based on "model" and "edit_format". 2. Group the DataFrame by "model" and "edit_format" and plot a bar graph for "pass_rate_1". Here are the changes: ```python benchmark/benchmark.py <<<<<<< ORIGINAL def show_stats(dirnames): rows = [] for dirname in dirnames: row = summarize_results(dirname) rows.append(vars(row)) df = pd.DataFrame.from_records(rows) df.sort_values(by=["model", "edit_format"], inplace=True) df_grouped = df.groupby("model")["pass_rate_1"].mean() df_grouped.plot(kind='barh', figsize=(10, 6)) plt.xlabel('Pass Rate 1') plt.ylabel('Model') plt.title('Pass Rate 1 for each Model/Edit Format') plt.show() df.to_csv("tmp.benchmark.csv") ======= def show_stats(dirnames): rows = [] for dirname in dirnames: row = summarize_results(dirname) rows.append(vars(row)) df = pd.DataFrame.from_records(rows) df.sort_values(by=["model", "edit_format"], inplace=True) df_grouped = df.groupby(["model", "edit_format"])["pass_rate_1"].mean() df_grouped.unstack().plot(kind='barh', figsize=(10, 6)) plt.xlabel('Pass Rate 1') plt.ylabel('Model') plt.title('Pass Rate 1 for each Model/Edit Format') plt.legend(title='Edit Format') plt.show() df.to_csv("tmp.benchmark.csv") >>>>>>> UPDATED ``` This will create a bar graph with the average of "pass_rate_1" for each "edit_format" within each model. The bars for each model are grouped together.	2023-06-29 18:52:08 -07:00
Paul Gauthier	d5921871f7	aider: Sort DataFrame by "model" and "edit_format" and plot a horizontal bar graph of the average "pass_rate_1" for each model. # Context: USER: in `show_stats` plot a horizontal bar graph. sort on "model" and "edit_format". group the bars by model. plot the pass_rate_1 for each model/edit_format. ASSISTANT: To implement this change, we need to modify the `show_stats` function in the `benchmark/benchmark.py` file. We will use the pandas and matplotlib libraries to create the bar plot. Here are the steps: 1. Import the matplotlib library at the top of the file. 2. In the `show_stats` function, after creating the DataFrame `df`, sort it based on "model" and "edit_format". 3. Group the DataFrame by "model" and plot a bar graph for "pass_rate_1". Here are the changes: ```python benchmark/benchmark.py <<<<<<< ORIGINAL import pandas as pd ======= import pandas as pd import matplotlib.pyplot as plt >>>>>>> UPDATED ``` ```python benchmark/benchmark.py <<<<<<< ORIGINAL def show_stats(dirnames): rows = [] for dirname in dirnames: row = summarize_results(dirname) rows.append(vars(row)) df = pd.DataFrame.from_records(rows) print(df) df.to_csv("tmp.benchmark.csv") ======= def show_stats(dirnames): rows = [] for dirname in dirnames: row = summarize_results(dirname) rows.append(vars(row)) df = pd.DataFrame.from_records(rows) df.sort_values(by=["model", "edit_format"], inplace=True) df_grouped = df.groupby("model")["pass_rate_1"].mean() df_grouped.plot(kind='barh', figsize=(10, 6)) plt.xlabel('Pass Rate 1') plt.ylabel('Model') plt.title('Pass Rate 1 for each Model/Edit Format') plt.show() df.to_csv("tmp.benchmark.csv") >>>>>>> UPDATED ``` Please note that this will create a bar graph with the average of "pass_rate_1" for each model. If you want to plot "pass_rate_1" for each "edit_format" within each model, we would need to modify the grouping and plotting code.	2023-06-29 18:50:59 -07:00
Paul Gauthier	fe0aed2420	pull summary data into a df	2023-06-29 18:48:20 -07:00
Paul Gauthier	6706b01eb8	aider: Import the List class from the typing module in benchmark.py. # Context: USER: benchmark/benchmark.py:60:15: F821 undefined name 'List' ASSISTANT: To fix the undefined name 'List' error, we need to import the List class from the typing module. Here is the change: ```python benchmark/benchmark.py <<<<<<< ORIGINAL import datetime import json import os import random import re import shutil import subprocess import time from collections import defaultdict from json.decoder import JSONDecodeError from pathlib import Path ======= import datetime import json import os import random import re import shutil import subprocess import time from collections import defaultdict from json.decoder import JSONDecodeError from pathlib import Path from typing import List >>>>>>> UPDATED ```	2023-06-29 18:20:15 -07:00
Paul Gauthier	4bb174d040	refac to support multiple dirs	2023-06-29 18:19:57 -07:00
Paul Gauthier	3057b077bb	aider: The `dirname` command line argument has been modified to accept multiple directory names as a list of strings. # Context: USER: make the `dirname` cmd line arg accept many dirnames ASSISTANT: To make the `dirname` command line argument accept multiple directory names, we need to modify the `main` function in the `benchmark/benchmark.py` file. We will change the `dirname` argument from a single string to a list of strings. This will allow the user to input multiple directory names separated by spaces. Here is the change: ```python benchmark/benchmark.py <<<<<<< ORIGINAL def main( dirname: str = typer.Argument(..., help="Directory name"), ======= def main( dirname: List[str] = typer.Argument(..., help="Directory names"), >>>>>>> UPDATED ``` Please note that this change will require modifications in the rest of the code where `dirname` is used, as it will now be a list of strings instead of a single string. We will need to iterate over the list and apply the existing logic to each directory name. If you agree with this change, I will proceed to identify and modify the rest of the code accordingly.	2023-06-29 18:14:30 -07:00
Paul Gauthier	417305a569	AIDER_BENCHMARK_DIR	2023-06-29 18:13:54 -07:00
Paul Gauthier	3bbfebc390	fixed console.print	2023-06-29 15:34:10 -07:00
Paul Gauthier	b11a6fa7ee	track num_exhausted_context_windows	2023-06-29 15:32:45 -07:00
Paul Gauthier	92c2d47e20	added docker warning	2023-06-29 14:59:38 -07:00
Paul Gauthier	5fddb9c90a	removed remaining	2023-06-29 14:32:00 -07:00
Paul Gauthier	1e40dca4f1	added --cont	2023-06-29 14:22:24 -07:00
Paul Gauthier	5ee455a05b	docker environment for the whole benchmark	2023-06-29 14:19:00 -07:00
Paul Gauthier	5eb1b2bdec	added timeout reporting	2023-06-29 11:55:28 -07:00
Paul Gauthier	b852c35828	added back timeout	2023-06-29 11:15:51 -07:00
Paul Gauthier	ec4dc74de4	subprocess to unittest	2023-06-29 11:01:57 -07:00
Paul Gauthier	1f4e08d20c	removed timeouts	2023-06-29 10:42:09 -07:00
Paul Gauthier	13a87c7700	show test output	2023-06-29 10:33:53 -07:00
Paul Gauthier	24589b543e	drive unittest directly	2023-06-29 10:26:33 -07:00
Paul Gauthier	81ec0e1608	refactor timeouts	2023-06-29 09:22:11 -07:00
Paul Gauthier	4f81f88c91	Added --no-aider, better error if docker is not available	2023-06-29 07:22:53 -07:00
Paul Gauthier	3c63552628	Revert "aider: Refactor `run_unit_tests` to kill the subprocess when it times out." This reverts commit `6632367978`.	2023-06-29 07:08:53 -07:00
Paul Gauthier	6632367978	aider: Refactor `run_unit_tests` to kill the subprocess when it times out. # Context: USER: when `subprocess` `timeout` fires, do we need to kill the subprocess? just in `run_unit_tests` ASSISTANT: When a subprocess times out, it is not automatically terminated. So, yes, if you want to ensure that the subprocess is terminated when it times out, you should explicitly kill it. Here's how you can modify the `run_unit_tests` function to kill the subprocess when it times out: ```python benchmark/benchmark.py <<<<<<< ORIGINAL try: result = subprocess.run( command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, timeout=timeout, ) if result.returncode != 0: all_tests_passed = False print(f"Test {test_file} failed") res = cleanup_test_output(result.stdout) except subprocess.TimeoutExpired: all_tests_passed = False res = f"Test {test_file} timed out after {timeout} seconds." ======= with subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) as proc: try: stdout, _ = proc.communicate(timeout=timeout) if proc.returncode != 0: all_tests_passed = False print(f"Test {test_file} failed") res = cleanup_test_output(stdout) except subprocess.TimeoutExpired: proc.kill() proc.wait() all_tests_passed = False res = f"Test {test_file} timed out after {timeout} seconds." >>>>>>> UPDATED ``` This change uses `subprocess.Popen` to create the subprocess and `communicate` to interact with it. If `communicate` times out, it raises `subprocess.TimeoutExpired`, and then we kill the subprocess with `proc.kill()`.	2023-06-28 18:29:58 -07:00
Paul Gauthier	d1f0590911	move benchmark prompts into their own file	2023-06-28 18:25:39 -07:00
Paul Gauthier	0845de7a7d	ssh	2023-06-28 16:01:52 -07:00
Paul Gauthier	03a5598c60	Add simple sanity check for docker	2023-06-28 15:55:42 -07:00
Paul Gauthier	06656998e1	run tests with unittest, shrink output	2023-06-28 15:52:24 -07:00
Paul Gauthier	ea83df1ea3	remove long ==--- lines to save tokens	2023-06-28 15:44:23 -07:00
Paul Gauthier	a0844839ca	wip	2023-06-28 15:36:34 -07:00
Paul Gauthier	67114d1a80	Fail if docker build fails	2023-06-28 11:02:47 -07:00
Paul Gauthier	67c27fd9ef	Changed --stats-only -> --stats	2023-06-28 10:56:37 -07:00
Paul Gauthier	e24d03060d	rename --retries to --tries	2023-06-27 17:45:57 -07:00
Paul Gauthier	11dddad4c1	zip the hashes	2023-06-27 17:42:52 -07:00
Paul Gauthier	10049d8695	remove timing info, to avoid randomizing the response to GPT	2023-06-27 17:15:05 -07:00
Paul Gauthier	9cfebc0b60	record response hashes too	2023-06-27 16:11:05 -07:00
Paul Gauthier	9556313819	Added functionality to use pre-existing benchmark results if they exist.	2023-06-27 14:27:07 -07:00

... 2 3 4 5 6

258 commits