Commit graph

4056 commits

Author SHA1 Message Date
Paul Gauthier
201bb7fb54 copy 2024-05-06 12:14:36 -07:00
Paul Gauthier
b5f2dcaeae copy 2024-05-06 12:06:24 -07:00
Paul Gauthier
ac46a7ee7d copy 2024-05-06 12:02:46 -07:00
Paul Gauthier
7a5e88c2e6 finishing row.percent_cases_well_formed 2024-05-06 11:58:16 -07:00
Paul Gauthier
4b903a3bb8 cleanup 2024-05-06 11:50:49 -07:00
Paul Gauthier
89db1d8673 aider: Added a second bar for row.percent_cases_well_formed to each graph.
# Aider chat conversation:

USER: in each graph, give each model a second bar showing row.percent_cases_well_formed

ASSISTANT: Here are the changes to add a second bar for `row.percent_cases_well_formed` to each graph:

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

This adds a second dataset to each chart's `leaderboardData`, with a label of "Percent cases well formed". It sets different colors for the bars and borders.

Then in the loop that populates the data, it pushes the `row.percent_cases_well_formed` value into the second dataset.

With these changes, each model will have two bars - one for the pass rate and one for the percent of cases that were well formed.
2024-05-06 11:50:10 -07:00
Paul Gauthier
92ea428b82 cleaned up edit data 2024-05-06 11:48:10 -07:00
Paul Gauthier
17b5dbe804 moved edit results to yaml 2024-05-06 11:44:29 -07:00
Paul Gauthier
fc3a43ef41 completed moving refac to yml 2024-05-06 11:25:14 -07:00
Paul Gauthier
e58ce69154 move refac data to yml 2024-05-06 11:21:38 -07:00
Paul Gauthier
a7b08c7354 format output as yaml 2024-05-06 11:15:19 -07:00
Paul Gauthier
3162d42262 cleanup 2024-05-06 10:46:09 -07:00
Paul Gauthier
5fb7a323ec refactored plots 2024-05-06 10:44:34 -07:00
Paul Gauthier
e61857ef09 summarize as many messages as will fit into the summarizer context 2024-05-06 09:47:14 -07:00
Paul Gauthier
e51e0219ee aider: Add tqdm progress bar to summarize_chat_history_markdown loop.
# Aider chat conversation:

USER: add a tqdm to `summarize_chat_history_markdown` loop

ASSISTANT: ```diff
--- aider/history.py
+++ aider/history.py
@@ ... @@
 from aider.sendchat import simple_send_with_retries
+from tqdm import tqdm

 class ChatSummary:
     def __init__(self, model=None, max_tokens=1024):
@@ ... @@
     def summarize_chat_history_markdown(self, text):
         messages = []
         assistant = []
+        lines = text.splitlines(keepends=True)
+        for line in tqdm(lines, desc="Summarizing chat history"):
-        for line in text.splitlines(keepends=True):
             if line.startswith("# "):
                 continue
             if line.startswith(">"):
                 continue
             if line.startswith("#### /"):
                 continue

             if line.startswith("#### "):
                 if assistant:
                     assistant = "".join(assistant)
                     if assistant.strip():
                         messages.append(dict(role="assistant", content=assistant))
                     assistant = []

                 content = line[5:]
                 if content.strip() and content.strip() != "<blank>":
                     messages.append(dict(role="user", content=line[5:]))
                 continue

             assistant.append(line)

         summary = self.summarize(messages[-40:])
         return summary
```

This diff adds the `tqdm` progress bar to the `summarize_chat_history_markdown` method in the `aider/history.py` file. It wraps the loop that processes the chat history lines with `tqdm`, providing a description for the progress bar.
2024-05-06 09:31:32 -07:00
Paul Gauthier
64470955d4 pass max_chat_history_tokens into ChatSummary 2024-05-06 09:29:21 -07:00
Paul Gauthier
1747525418 aider: Added max_chat_history_tokens as an explicit argument to the Coder.__init__ method and passed args.max_chat_history_tokens when creating the Coder instance in main().
# Aider chat conversation:

USER: don't use kwargs, add it as an explicit arg

ASSISTANT: Here is the *SEARCH/REPLACE block* to add `max_chat_history_tokens` as an explicit argument instead of using `**kwargs`:

aider/coders/base_coder.py
```python
<<<<<<< SEARCH
    def __init__(
        self,
        main_model,
        io,
        fnames=None,
        git_dname=None,
        pretty=True,
        show_diffs=False,
        auto_commits=True,
        dirty_commits=True,
        dry_run=False,
        map_tokens=1024,
        verbose=False,
        assistant_output_color="blue",
        code_theme="default",
        stream=True,
        use_git=True,
        voice_language=None,
        aider_ignore_file=None,
        cur_messages=None,
        done_messages=None,
    ):
=======
    def __init__(
        self,
        main_model,
        io,
        fnames=None,
        git_dname=None,
        pretty=True,
        show_diffs=False,
        auto_commits=True,
        dirty_commits=True,
        dry_run=False,
        map_tokens=1024,
        verbose=False,
        assistant_output_color="blue",
        code_theme="default",
        stream=True,
        use_git=True,
        voice_language=None,
        aider_ignore_file=None,
        cur_messages=None,
        done_messages=None,
        max_chat_history_tokens=None,
    ):
>>>>>>> REPLACE
```

aider/main.py
```python
<<<<<<< SEARCH
        coder = Coder.create(
            main_model=main_model,
            edit_format=args.edit_format,
            io=io,
            ##
            fnames=fnames,
            git_dname=git_dname,
            pretty=args.pretty,
            show_diffs=args.show_diffs,
            auto_commits=args.auto_commits,
            dirty_commits=args.dirty_commits,
            dry_run=args.dry_run,
            map_tokens=args.map_tokens,
            verbose=args.verbose,
            assistant_output_color=args.assistant_output_color,
            code_theme=args.code_theme,
            stream=args.stream,
            use_git=args.git,
            voice_language=args.voice_language,
            aider_ignore_file=args.aiderignore,
        )
=======
        coder = Coder.create(
            main_model=main_model,
            edit_format=args.edit_format,
            io=io,
            ##
            fnames=fnames,
            git_dname=git_dname,
            pretty=args.pretty,
            show_diffs=args.show_diffs,
            auto_commits=args.auto_commits,
            dirty_commits=args.dirty_commits,
            dry_run=args.dry_run,
            map_tokens=args.map_tokens,
            verbose=args.verbose,
            assistant_output_color=args.assistant_output_color,
            code_theme=args.code_theme,
            stream=args.stream,
            use_git=args.git,
            voice_language=args.voice_language,
            aider_ignore_file=args.aiderignore,
            max_chat_history_tokens=args.max_chat_history_tokens,
        )
>>>>>>> REPLACE
```

This adds `max_chat_history_tokens` as an explicit argument to the `Coder.__init__` method, and passes `args.max_chat_history_tokens` when creating the `Coder` instance in `main()`.
2024-05-06 09:25:04 -07:00
Paul Gauthier
3a79b6230b aider: Added a --max-chat-history-tokens switch and passed it to ChatSummary in Coder.
# Aider chat conversation:

USER: add a --max-chat-history-tokens switch, pass it into Coder to `ChatSummary`

ASSISTANT: Here are the changes to add a --max-chat-history-tokens switch and pass it to ChatSummary in Coder:

aider/args.py
```python
<<<<<<< SEARCH
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
=======
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
    parser.add_argument(
        "--max-chat-history-tokens",
        type=int,
        default=None,
        help="Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.",
    )
>>>>>>> REPLACE
```

aider/coders/base_coder.py
```python
<<<<<<< SEARCH
        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            self.main_model.max_chat_history_tokens,
        )
=======
        max_chat_history_tokens = self.main_model.max_chat_history_tokens
        if kwargs.get("max_chat_history_tokens") is not None:
            max_chat_history_tokens = kwargs["max_chat_history_tokens"]

        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            max_chat_history_tokens,
        )
>>>>>>> REPLACE
```

aider/main.py
```python
<<<<<<< SEARCH
            map_tokens=args.map_tokens,
=======
            map_tokens=args.map_tokens,
            max_chat_history_tokens=args.max_chat_history_tokens,
>>>>>>> REPLACE
```

This adds the new command line argument `--max-chat-history-tokens` which defaults to None. If not specified, it will use the model's default `max_chat_history_tokens`.

It passes this argument from `main()` into the `Coder` constructor.

In the `Coder` constructor, it checks if `max_chat_history_tokens` was provided in the `kwargs`. If so, it uses that value, otherwise it falls back to the model's default.

Let me know if you have any other questions!

ASSISTANT: Here are the changes to add a --max-chat-history-tokens switch and pass it to ChatSummary in Coder:

aider/args.py
```python
<<<<<<< SEARCH
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
=======
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
    parser.add_argument(
        "--max-chat-history-tokens",
        type=int,
        default=None,
        help="Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.",
    )
>>>>>>> REPLACE
```

aider/coders/base_coder.py
```python
<<<<<<< SEARCH
        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            self.main_model.max_chat_history_tokens,
        )
=======
        max_chat_history_tokens = self.main_model.max_chat_history_tokens
        if kwargs.get("max_chat_history_tokens") is not None:
            max_chat_history_tokens = kwargs["max_chat_history_tokens"]

        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            max_chat_history_tokens,
        )
>>>>>>> REPLACE
```

aider/main.py
```python
<<<<<<< SEARCH
            map_tokens=args.map_tokens,
=======
            map_tokens=args.map_tokens,
            max_chat_history_tokens=args.max_chat_history_tokens,
>>>>>>> REPLACE
```

This adds the new command line argument `--max-chat-history-tokens` which defaults to None. If not specified, it will use the model's default `max_chat_history_tokens`.

It passes this argument from `main()` into the `Coder` constructor.

In the `Coder` constructor, it checks if `max_chat_history_tokens` was provided in the `kwargs`. If so, it uses that value, otherwise it falls back to the model's default.

Let me know if you have any other questions!

USER: InvalidEditBlock: edit failed!

aider/args.py does not contain the *exact chunk* of SEARCH lines you specified.
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The SEARCH block needs to be EXACTLY the same as the lines in aider/args.py with nothing missing!

aider/args.py does not contain these 6 exact lines in a row:
```
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
```

ASSISTANT: I apologize for the mistake. Here is the corrected change for aider/args.py:

aider/args.py
```python
<<<<<<< SEARCH
    group.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Max number of tokens to use for repo map, use 0 to disable (default: 1024)",
    )
=======
    group.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Max number of tokens to use for repo map, use 0 to disable (default: 1024)",
    )
    group.add_argument(
        "--max-chat-history-tokens",
        type=int,
        default=None,
        help="Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.",
    )
>>>>>>> REPLACE
```
2024-05-06 09:23:37 -07:00
Paul Gauthier
eac8e4cb8f announce reload of chat history 2024-05-05 19:08:00 -07:00
Paul Gauthier
00c12efd3f cleanup 2024-05-05 19:04:24 -07:00
Paul Gauthier
b3e19cf5be reload history 2024-05-05 19:03:17 -07:00
Paul Gauthier
cd84cc0d3e refactored summarize_chat_history_markdown 2024-05-05 18:58:16 -07:00
Paul Gauthier
3bb237bdc1 handle tasks with exceptions in the stats output 2024-05-05 08:24:45 -07:00
Paul Gauthier
a0649ba5fa fixed sendchat test 2024-05-05 08:03:26 -07:00
Paul Gauthier
6f8c1cf780 gemini refac results 2024-05-05 08:02:07 -07:00
Paul Gauthier
6b6548bd37 Merge remote-tracking branch 'origin/main' 2024-05-05 08:00:53 -07:00
paul-gauthier
fab8b8ae40
Update index.md 2024-05-04 20:07:32 -07:00
Paul Gauthier
9cdd9e12c3 catch all exceptions in the benchmark 2024-05-04 17:52:46 -07:00
Paul Gauthier
3e4fca2675 max_time not max_tries 2024-05-04 17:48:01 -07:00
Paul Gauthier
1b35ca25c4 copy 2024-05-04 17:45:08 -07:00
Paul Gauthier
7c9c4fe788 should_giveup? 2024-05-04 17:43:26 -07:00
Paul Gauthier
2d91ee8dbb copy 2024-05-04 17:33:25 -07:00
Paul Gauthier
425cb2941b Merge remote-tracking branch 'origin/main' 2024-05-04 16:26:36 -07:00
Paul Gauthier
366743493c renamed leaderboards 2024-05-04 16:25:46 -07:00
Paul Gauthier
812a620711 copy 2024-05-04 16:25:22 -07:00
paul-gauthier
df94f70a57
Update leaderboard.md 2024-05-04 11:32:43 -07:00
Paul Gauthier
b74edcf350 copy 2024-05-04 11:19:32 -07:00
Paul Gauthier
fbb3749270 copy 2024-05-04 11:14:15 -07:00
Paul Gauthier
ec07b6e556 updated refac 2024-05-04 11:11:34 -07:00
Paul Gauthier
e524dd9203 added refac leaderboard 2024-05-04 11:05:32 -07:00
Paul Gauthier
b02320b08f layout 2024-05-04 10:35:49 -07:00
Paul Gauthier
d96dc9ef63 Revert "aider: Change the graph to a horizontal bar graph."
This reverts commit 753ea441ea.
2024-05-04 10:35:26 -07:00
Paul Gauthier
26856c729e horizontal 2024-05-04 10:23:18 -07:00
Paul Gauthier
753ea441ea aider: Change the graph to a horizontal bar graph.
# Aider chat conversation:

USER: make the graph a horizontal bar graph

ASSISTANT: ```diff
--- docs/leaderboard.md
+++ docs/leaderboard.md
@@ ... @@
 <script>
   document.addEventListener('DOMContentLoaded', function () {
     var ctx = document.getElementById('leaderboardChart').getContext('2d');
     var leaderboardData = {
       labels: [],
       datasets: [{
         label: 'Percent correct',
         data: [],
         backgroundColor: 'rgba(54, 162, 235, 0.2)',
         borderColor: 'rgba(54, 162, 235, 1)',
         borderWidth: 1
       }]
     };

     {% for row in sorted %}
       leaderboardData.labels.push('{{ row.model }}');
       leaderboardData.datasets[0].data.push({{ row.second }});
     {% endfor %}

-    var leaderboardChart = new Chart(ctx, {
-      type: 'bar',
+    var leaderboardChart = new Chart(ctx, {
+      type: 'horizontalBar',
       data: leaderboardData,
       options: {
         scales: {
-          yAxes: [{
+          xAxes: [{
             scaleLabel: {
               display: true,
             },
             ticks: {
               beginAtZero: true
             }
           }]
         }
       }
     });
   });
 </script>
```
2024-05-04 10:22:54 -07:00
Paul Gauthier
e9228e0ed8 Update leaderboard chart with new label and style. 2024-05-04 10:22:53 -07:00
Paul Gauthier
9b4e643436 graph 2024-05-04 10:20:56 -07:00
Paul Gauthier
bc582723e3 aider: Add y-axis label "Percent correct" to the leaderboard chart.
# Aider chat conversation:

USER: add y-axis label "Percent correct"

ASSISTANT: ```diff
--- docs/leaderboard.md
+++ docs/leaderboard.md
@@ ... @@
 var leaderboardChart = new Chart(ctx, {
   type: 'bar',
   data: leaderboardData,
   options: {
     scales: {
       yAxes: [{
+        scaleLabel: {
+          display: true,
+          labelString: 'Percent correct'
+        },
         ticks: {
           beginAtZero: true
         }
       }]
     }
   }
 });
```
2024-05-04 10:20:42 -07:00
Paul Gauthier
e8ae216bbd graph 2024-05-04 10:19:46 -07:00
Paul Gauthier
98c8395d48 aider: Added Chart.js library to the docs/leaderboard.md file.
# Aider chat conversation:

USER: ReferenceError: Can't find variable: Chart

ASSISTANT: The error message "ReferenceError: Can't find variable: Chart" suggests that the Chart.js library is not being loaded before the script that is trying to use it. To fix this, we need to ensure that the Chart.js library is included in the HTML document before the script that initializes the chart.

Here is the diff to add the Chart.js library to the `docs/leaderboard.md` file:

```diff
--- docs/leaderboard.md
+++ docs/leaderboard.md
@@ ... @@
 <canvas id="leaderboardChart" width="800" height="450"></canvas>
+<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
 <script>
   document.addEventListener('DOMContentLoaded', function () {
     var ctx = document.getElementById('leaderboardChart').getContext('2d');
     var leaderboardData = {
       labels: [],
       datasets: [{
         label: 'Percent correct',
         data: [],
         backgroundColor: 'rgba(54, 162, 235, 0.2)',
         borderColor: 'rgba(54, 162, 235, 1)',
         borderWidth: 1
       }]
     };
@@ ... @@
```

This diff adds the Chart.js library from a CDN before the script tag that contains the chart initialization code. This should resolve the `ReferenceError` by ensuring that the `Chart` variable is defined when the script runs.
2024-05-04 10:16:08 -07:00
Paul Gauthier
9e8e47f1ce aider: Add script to draw a bar graph using Chart.js with model names as labels and percentages as bar heights in the docs/leaderboard.md file.
# Aider chat conversation:

USER: use chart.js to fill draw a bar graph in the canvas. is liquid templating to pull the "model" as bar label and "second" as bar height

ASSISTANT: ```diff
--- docs/leaderboard.md
+++ docs/leaderboard.md
@@ ... @@
-<canvas id="leaderboardChart" width="800" height="450"></canvas>
+<canvas id="leaderboardChart" width="800" height="450"></canvas>
+<script>
+  document.addEventListener('DOMContentLoaded', function () {
+    var ctx = document.getElementById('leaderboardChart').getContext('2d');
+    var leaderboardData = {
+      labels: [],
+      datasets: [{
+        label: 'Percent correct',
+        data: [],
+        backgroundColor: 'rgba(54, 162, 235, 0.2)',
+        borderColor: 'rgba(54, 162, 235, 1)',
+        borderWidth: 1
+      }]
+    };
+
+    {% for row in site.data.leaderboard | sort: 'second' | reverse %}
+      leaderboardData.labels.push('{{ row.model }}');
+      leaderboardData.datasets[0].data.push({{ row.second }});
+    {% endfor %}
+
+    var leaderboardChart = new Chart(ctx, {
+      type: 'bar',
+      data: leaderboardData,
+      options: {
+        scales: {
+          yAxes: [{
+            ticks: {
+              beginAtZero: true
+            }
+          }]
+        }
+      }
+    });
+  });
+</script>
```

This diff adds a script tag to the `docs/leaderboard.md` file that uses Chart.js to draw a bar graph on the canvas element with the id `leaderboardChart`. The script waits for the DOM to be fully loaded before executing. It then initializes the chart with the model names as labels and the corresponding percentages as the bar heights, using Liquid templating to pull the data from the site's data files.
2024-05-04 10:14:12 -07:00