Commit graph

4062 commits

Author SHA1 Message Date
Paul Gauthier
eab864c518 copy 2024-05-06 14:38:02 -07:00
Paul Gauthier
c67a94ee99 copy 2024-05-06 14:37:03 -07:00
Paul Gauthier
a58bc98f51 copy 2024-05-06 12:42:14 -07:00
Paul Gauthier
92f53d4d7b copy 2024-05-06 12:41:02 -07:00
Paul Gauthier
673be4647c copy 2024-05-06 12:23:06 -07:00
Paul Gauthier
c1e2829d21 copy 2024-05-06 12:21:14 -07:00
Paul Gauthier
201bb7fb54 copy 2024-05-06 12:14:36 -07:00
Paul Gauthier
b5f2dcaeae copy 2024-05-06 12:06:24 -07:00
Paul Gauthier
ac46a7ee7d copy 2024-05-06 12:02:46 -07:00
Paul Gauthier
7a5e88c2e6 finishing row.percent_cases_well_formed 2024-05-06 11:58:16 -07:00
Paul Gauthier
4b903a3bb8 cleanup 2024-05-06 11:50:49 -07:00
Paul Gauthier
89db1d8673 aider: Added a second bar for row.percent_cases_well_formed to each graph.
# Aider chat conversation:

USER: in each graph, give each model a second bar showing row.percent_cases_well_formed

ASSISTANT: Here are the changes to add a second bar for `row.percent_cases_well_formed` to each graph:

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

This adds a second dataset to each chart's `leaderboardData`, with a label of "Percent cases well formed". It sets different colors for the bars and borders.

Then in the loop that populates the data, it pushes the `row.percent_cases_well_formed` value into the second dataset.

With these changes, each model will have two bars - one for the pass rate and one for the percent of cases that were well formed.
2024-05-06 11:50:10 -07:00
Paul Gauthier
92ea428b82 cleaned up edit data 2024-05-06 11:48:10 -07:00
Paul Gauthier
17b5dbe804 moved edit results to yaml 2024-05-06 11:44:29 -07:00
Paul Gauthier
fc3a43ef41 completed moving refac to yml 2024-05-06 11:25:14 -07:00
Paul Gauthier
e58ce69154 move refac data to yml 2024-05-06 11:21:38 -07:00
Paul Gauthier
a7b08c7354 format output as yaml 2024-05-06 11:15:19 -07:00
Paul Gauthier
3162d42262 cleanup 2024-05-06 10:46:09 -07:00
Paul Gauthier
5fb7a323ec refactored plots 2024-05-06 10:44:34 -07:00
Paul Gauthier
e61857ef09 summarize as many messages as will fit into the summarizer context 2024-05-06 09:47:14 -07:00
Paul Gauthier
e51e0219ee aider: Add tqdm progress bar to summarize_chat_history_markdown loop.
# Aider chat conversation:

USER: add a tqdm to `summarize_chat_history_markdown` loop

ASSISTANT: ```diff
--- aider/history.py
+++ aider/history.py
@@ ... @@
 from aider.sendchat import simple_send_with_retries
+from tqdm import tqdm

 class ChatSummary:
     def __init__(self, model=None, max_tokens=1024):
@@ ... @@
     def summarize_chat_history_markdown(self, text):
         messages = []
         assistant = []
+        lines = text.splitlines(keepends=True)
+        for line in tqdm(lines, desc="Summarizing chat history"):
-        for line in text.splitlines(keepends=True):
             if line.startswith("# "):
                 continue
             if line.startswith(">"):
                 continue
             if line.startswith("#### /"):
                 continue

             if line.startswith("#### "):
                 if assistant:
                     assistant = "".join(assistant)
                     if assistant.strip():
                         messages.append(dict(role="assistant", content=assistant))
                     assistant = []

                 content = line[5:]
                 if content.strip() and content.strip() != "<blank>":
                     messages.append(dict(role="user", content=line[5:]))
                 continue

             assistant.append(line)

         summary = self.summarize(messages[-40:])
         return summary
```

This diff adds the `tqdm` progress bar to the `summarize_chat_history_markdown` method in the `aider/history.py` file. It wraps the loop that processes the chat history lines with `tqdm`, providing a description for the progress bar.
2024-05-06 09:31:32 -07:00
Paul Gauthier
64470955d4 pass max_chat_history_tokens into ChatSummary 2024-05-06 09:29:21 -07:00
Paul Gauthier
1747525418 aider: Added max_chat_history_tokens as an explicit argument to the Coder.__init__ method and passed args.max_chat_history_tokens when creating the Coder instance in main().
# Aider chat conversation:

USER: don't use kwargs, add it as an explicit arg

ASSISTANT: Here is the *SEARCH/REPLACE block* to add `max_chat_history_tokens` as an explicit argument instead of using `**kwargs`:

aider/coders/base_coder.py
```python
<<<<<<< SEARCH
    def __init__(
        self,
        main_model,
        io,
        fnames=None,
        git_dname=None,
        pretty=True,
        show_diffs=False,
        auto_commits=True,
        dirty_commits=True,
        dry_run=False,
        map_tokens=1024,
        verbose=False,
        assistant_output_color="blue",
        code_theme="default",
        stream=True,
        use_git=True,
        voice_language=None,
        aider_ignore_file=None,
        cur_messages=None,
        done_messages=None,
    ):
=======
    def __init__(
        self,
        main_model,
        io,
        fnames=None,
        git_dname=None,
        pretty=True,
        show_diffs=False,
        auto_commits=True,
        dirty_commits=True,
        dry_run=False,
        map_tokens=1024,
        verbose=False,
        assistant_output_color="blue",
        code_theme="default",
        stream=True,
        use_git=True,
        voice_language=None,
        aider_ignore_file=None,
        cur_messages=None,
        done_messages=None,
        max_chat_history_tokens=None,
    ):
>>>>>>> REPLACE
```

aider/main.py
```python
<<<<<<< SEARCH
        coder = Coder.create(
            main_model=main_model,
            edit_format=args.edit_format,
            io=io,
            ##
            fnames=fnames,
            git_dname=git_dname,
            pretty=args.pretty,
            show_diffs=args.show_diffs,
            auto_commits=args.auto_commits,
            dirty_commits=args.dirty_commits,
            dry_run=args.dry_run,
            map_tokens=args.map_tokens,
            verbose=args.verbose,
            assistant_output_color=args.assistant_output_color,
            code_theme=args.code_theme,
            stream=args.stream,
            use_git=args.git,
            voice_language=args.voice_language,
            aider_ignore_file=args.aiderignore,
        )
=======
        coder = Coder.create(
            main_model=main_model,
            edit_format=args.edit_format,
            io=io,
            ##
            fnames=fnames,
            git_dname=git_dname,
            pretty=args.pretty,
            show_diffs=args.show_diffs,
            auto_commits=args.auto_commits,
            dirty_commits=args.dirty_commits,
            dry_run=args.dry_run,
            map_tokens=args.map_tokens,
            verbose=args.verbose,
            assistant_output_color=args.assistant_output_color,
            code_theme=args.code_theme,
            stream=args.stream,
            use_git=args.git,
            voice_language=args.voice_language,
            aider_ignore_file=args.aiderignore,
            max_chat_history_tokens=args.max_chat_history_tokens,
        )
>>>>>>> REPLACE
```

This adds `max_chat_history_tokens` as an explicit argument to the `Coder.__init__` method, and passes `args.max_chat_history_tokens` when creating the `Coder` instance in `main()`.
2024-05-06 09:25:04 -07:00
Paul Gauthier
3a79b6230b aider: Added a --max-chat-history-tokens switch and passed it to ChatSummary in Coder.
# Aider chat conversation:

USER: add a --max-chat-history-tokens switch, pass it into Coder to `ChatSummary`

ASSISTANT: Here are the changes to add a --max-chat-history-tokens switch and pass it to ChatSummary in Coder:

aider/args.py
```python
<<<<<<< SEARCH
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
=======
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
    parser.add_argument(
        "--max-chat-history-tokens",
        type=int,
        default=None,
        help="Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.",
    )
>>>>>>> REPLACE
```

aider/coders/base_coder.py
```python
<<<<<<< SEARCH
        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            self.main_model.max_chat_history_tokens,
        )
=======
        max_chat_history_tokens = self.main_model.max_chat_history_tokens
        if kwargs.get("max_chat_history_tokens") is not None:
            max_chat_history_tokens = kwargs["max_chat_history_tokens"]

        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            max_chat_history_tokens,
        )
>>>>>>> REPLACE
```

aider/main.py
```python
<<<<<<< SEARCH
            map_tokens=args.map_tokens,
=======
            map_tokens=args.map_tokens,
            max_chat_history_tokens=args.max_chat_history_tokens,
>>>>>>> REPLACE
```

This adds the new command line argument `--max-chat-history-tokens` which defaults to None. If not specified, it will use the model's default `max_chat_history_tokens`.

It passes this argument from `main()` into the `Coder` constructor.

In the `Coder` constructor, it checks if `max_chat_history_tokens` was provided in the `kwargs`. If so, it uses that value, otherwise it falls back to the model's default.

Let me know if you have any other questions!

ASSISTANT: Here are the changes to add a --max-chat-history-tokens switch and pass it to ChatSummary in Coder:

aider/args.py
```python
<<<<<<< SEARCH
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
=======
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
    parser.add_argument(
        "--max-chat-history-tokens",
        type=int,
        default=None,
        help="Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.",
    )
>>>>>>> REPLACE
```

aider/coders/base_coder.py
```python
<<<<<<< SEARCH
        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            self.main_model.max_chat_history_tokens,
        )
=======
        max_chat_history_tokens = self.main_model.max_chat_history_tokens
        if kwargs.get("max_chat_history_tokens") is not None:
            max_chat_history_tokens = kwargs["max_chat_history_tokens"]

        self.summarizer = ChatSummary(
            self.main_model.weak_model,
            max_chat_history_tokens,
        )
>>>>>>> REPLACE
```

aider/main.py
```python
<<<<<<< SEARCH
            map_tokens=args.map_tokens,
=======
            map_tokens=args.map_tokens,
            max_chat_history_tokens=args.max_chat_history_tokens,
>>>>>>> REPLACE
```

This adds the new command line argument `--max-chat-history-tokens` which defaults to None. If not specified, it will use the model's default `max_chat_history_tokens`.

It passes this argument from `main()` into the `Coder` constructor.

In the `Coder` constructor, it checks if `max_chat_history_tokens` was provided in the `kwargs`. If so, it uses that value, otherwise it falls back to the model's default.

Let me know if you have any other questions!

USER: InvalidEditBlock: edit failed!

aider/args.py does not contain the *exact chunk* of SEARCH lines you specified.
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The SEARCH block needs to be EXACTLY the same as the lines in aider/args.py with nothing missing!

aider/args.py does not contain these 6 exact lines in a row:
```
    parser.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Number of tokens to use for the repo map. Set to 0 to disable.",
    )
```

ASSISTANT: I apologize for the mistake. Here is the corrected change for aider/args.py:

aider/args.py
```python
<<<<<<< SEARCH
    group.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Max number of tokens to use for repo map, use 0 to disable (default: 1024)",
    )
=======
    group.add_argument(
        "--map-tokens",
        type=int,
        default=1024,
        help="Max number of tokens to use for repo map, use 0 to disable (default: 1024)",
    )
    group.add_argument(
        "--max-chat-history-tokens",
        type=int,
        default=None,
        help="Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.",
    )
>>>>>>> REPLACE
```
2024-05-06 09:23:37 -07:00
Paul Gauthier
eac8e4cb8f announce reload of chat history 2024-05-05 19:08:00 -07:00
Paul Gauthier
00c12efd3f cleanup 2024-05-05 19:04:24 -07:00
Paul Gauthier
b3e19cf5be reload history 2024-05-05 19:03:17 -07:00
Paul Gauthier
cd84cc0d3e refactored summarize_chat_history_markdown 2024-05-05 18:58:16 -07:00
Paul Gauthier
3bb237bdc1 handle tasks with exceptions in the stats output 2024-05-05 08:24:45 -07:00
Paul Gauthier
a0649ba5fa fixed sendchat test 2024-05-05 08:03:26 -07:00
Paul Gauthier
6f8c1cf780 gemini refac results 2024-05-05 08:02:07 -07:00
Paul Gauthier
6b6548bd37 Merge remote-tracking branch 'origin/main' 2024-05-05 08:00:53 -07:00
paul-gauthier
fab8b8ae40
Update index.md 2024-05-04 20:07:32 -07:00
Paul Gauthier
9cdd9e12c3 catch all exceptions in the benchmark 2024-05-04 17:52:46 -07:00
Paul Gauthier
3e4fca2675 max_time not max_tries 2024-05-04 17:48:01 -07:00
Paul Gauthier
1b35ca25c4 copy 2024-05-04 17:45:08 -07:00
Paul Gauthier
7c9c4fe788 should_giveup? 2024-05-04 17:43:26 -07:00
Paul Gauthier
2d91ee8dbb copy 2024-05-04 17:33:25 -07:00
Paul Gauthier
425cb2941b Merge remote-tracking branch 'origin/main' 2024-05-04 16:26:36 -07:00
Paul Gauthier
366743493c renamed leaderboards 2024-05-04 16:25:46 -07:00
Paul Gauthier
812a620711 copy 2024-05-04 16:25:22 -07:00
paul-gauthier
df94f70a57
Update leaderboard.md 2024-05-04 11:32:43 -07:00
Paul Gauthier
b74edcf350 copy 2024-05-04 11:19:32 -07:00
Paul Gauthier
fbb3749270 copy 2024-05-04 11:14:15 -07:00
Paul Gauthier
ec07b6e556 updated refac 2024-05-04 11:11:34 -07:00
Paul Gauthier
e524dd9203 added refac leaderboard 2024-05-04 11:05:32 -07:00
Paul Gauthier
b02320b08f layout 2024-05-04 10:35:49 -07:00
Paul Gauthier
d96dc9ef63 Revert "aider: Change the graph to a horizontal bar graph."
This reverts commit 753ea441ea.
2024-05-04 10:35:26 -07:00
Paul Gauthier
26856c729e horizontal 2024-05-04 10:23:18 -07:00
Paul Gauthier
753ea441ea aider: Change the graph to a horizontal bar graph.
# Aider chat conversation:

USER: make the graph a horizontal bar graph

ASSISTANT: ```diff
--- docs/leaderboard.md
+++ docs/leaderboard.md
@@ ... @@
 <script>
   document.addEventListener('DOMContentLoaded', function () {
     var ctx = document.getElementById('leaderboardChart').getContext('2d');
     var leaderboardData = {
       labels: [],
       datasets: [{
         label: 'Percent correct',
         data: [],
         backgroundColor: 'rgba(54, 162, 235, 0.2)',
         borderColor: 'rgba(54, 162, 235, 1)',
         borderWidth: 1
       }]
     };

     {% for row in sorted %}
       leaderboardData.labels.push('{{ row.model }}');
       leaderboardData.datasets[0].data.push({{ row.second }});
     {% endfor %}

-    var leaderboardChart = new Chart(ctx, {
-      type: 'bar',
+    var leaderboardChart = new Chart(ctx, {
+      type: 'horizontalBar',
       data: leaderboardData,
       options: {
         scales: {
-          yAxes: [{
+          xAxes: [{
             scaleLabel: {
               display: true,
             },
             ticks: {
               beginAtZero: true
             }
           }]
         }
       }
     });
   });
 </script>
```
2024-05-04 10:22:54 -07:00