Commit graph

4018 commits

Author SHA1 Message Date
Paul Gauthier
c29d4860ab Added qwen1.5-110b-chat to leaderboard 2024-05-07 13:55:12 -07:00
Paul Gauthier
70b1c0c20c load .env in benchmark.py 2024-05-07 13:32:19 -07:00
Paul Gauthier
dc7e61f3c9 added deepseek-chat v2 (diff) to leaderboard 2024-05-07 09:38:46 -07:00
Paul Gauthier
e9da4016c8 deepseek-chat with sys examples and sys reminder 2024-05-07 08:48:49 -07:00
Paul Gauthier
086ccd64ee updated history 2024-05-07 08:48:28 -07:00
Paul Gauthier
04b224962e send the reminder if unknown context window size 2024-05-07 08:42:33 -07:00
Paul Gauthier
ca0faf7cc7 Merge remote-tracking branch 'origin/main' 2024-05-07 06:27:20 -07:00
Paul Gauthier
ecca737803 added deepseek-chat v2 2024-05-07 06:26:39 -07:00
Paul Gauthier
b1cae73a85 cleaned up csv output 2024-05-07 05:59:31 -07:00
paul-gauthier
6fb4983d88
Update edit_leaderboard.yml 2024-05-06 20:42:23 -07:00
paul-gauthier
18761b70be
Update index.md 2024-05-06 17:16:29 -07:00
paul-gauthier
6bc18b9591
Update index.md 2024-05-06 17:15:26 -07:00
Paul Gauthier
db994fb203 aider: Updated the reference to highlight_image in default.html to use page.highlight_image.
# Aider chat conversation:

USER: in default.html, how should i reference the `highlight_image` declared in the markdown front matter?

ASSISTANT: To reference the `highlight_image` front matter variable in `default.html`, you can use the Liquid syntax `{{ page.highlight_image }}`.

Here's the change in a *SEARCH/REPLACE block*:

_layouts/default.html
<source>html
<<<<<<< SEARCH
    {% if highlight_image %}
    <meta property="og:image" content="{{ site.url }}{{ highlight_image }}">
=======
    {% if page.highlight_image %}
    <meta property="og:image" content="{{ site.url }}{{ page.highlight_image }}">
>>>>>>> REPLACE
</source>

This checks if `page.highlight_image` is defined in the front matter of the current page. If so, it sets the `og:image` meta tag to the full URL of the highlight image by prepending `site.url` to `page.highlight_image`.
2024-05-06 16:26:56 -07:00
Paul Gauthier
0c839b835c layout 2024-05-06 16:24:39 -07:00
Paul Gauthier
64938eafee Use highlight as og:image if present 2024-05-06 16:21:22 -07:00
Paul Gauthier
eab864c518 copy 2024-05-06 14:38:02 -07:00
Paul Gauthier
c67a94ee99 copy 2024-05-06 14:37:03 -07:00
Paul Gauthier
a58bc98f51 copy 2024-05-06 12:42:14 -07:00
Paul Gauthier
92f53d4d7b copy 2024-05-06 12:41:02 -07:00
Paul Gauthier
673be4647c copy 2024-05-06 12:23:06 -07:00
Paul Gauthier
c1e2829d21 copy 2024-05-06 12:21:14 -07:00
Paul Gauthier
201bb7fb54 copy 2024-05-06 12:14:36 -07:00
Paul Gauthier
b5f2dcaeae copy 2024-05-06 12:06:24 -07:00
Paul Gauthier
ac46a7ee7d copy 2024-05-06 12:02:46 -07:00
Paul Gauthier
7a5e88c2e6 finishing row.percent_cases_well_formed 2024-05-06 11:58:16 -07:00
Paul Gauthier
4b903a3bb8 cleanup 2024-05-06 11:50:49 -07:00
Paul Gauthier
89db1d8673 aider: Added a second bar for row.percent_cases_well_formed to each graph.
# Aider chat conversation:

USER: in each graph, give each model a second bar showing row.percent_cases_well_formed

ASSISTANT: Here are the changes to add a second bar for `row.percent_cases_well_formed` to each graph:

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

This adds a second dataset to each chart's `leaderboardData`, with a label of "Percent cases well formed". It sets different colors for the bars and borders.

Then in the loop that populates the data, it pushes the `row.percent_cases_well_formed` value into the second dataset.

With these changes, each model will have two bars - one for the pass rate and one for the percent of cases that were well formed.
2024-05-06 11:50:10 -07:00
Paul Gauthier
92ea428b82 cleaned up edit data 2024-05-06 11:48:10 -07:00
Paul Gauthier
17b5dbe804 moved edit results to yaml 2024-05-06 11:44:29 -07:00
Paul Gauthier
fc3a43ef41 completed moving refac to yml 2024-05-06 11:25:14 -07:00
Paul Gauthier
e58ce69154 move refac data to yml 2024-05-06 11:21:38 -07:00
Paul Gauthier
a7b08c7354 format output as yaml 2024-05-06 11:15:19 -07:00
Paul Gauthier
3162d42262 cleanup 2024-05-06 10:46:09 -07:00
Paul Gauthier
5fb7a323ec refactored plots 2024-05-06 10:44:34 -07:00
Paul Gauthier
3bb237bdc1 handle tasks with exceptions in the stats output 2024-05-05 08:24:45 -07:00
Paul Gauthier
a0649ba5fa fixed sendchat test 2024-05-05 08:03:26 -07:00
Paul Gauthier
6f8c1cf780 gemini refac results 2024-05-05 08:02:07 -07:00
Paul Gauthier
6b6548bd37 Merge remote-tracking branch 'origin/main' 2024-05-05 08:00:53 -07:00
paul-gauthier
fab8b8ae40
Update index.md 2024-05-04 20:07:32 -07:00
Paul Gauthier
9cdd9e12c3 catch all exceptions in the benchmark 2024-05-04 17:52:46 -07:00
Paul Gauthier
3e4fca2675 max_time not max_tries 2024-05-04 17:48:01 -07:00
Paul Gauthier
1b35ca25c4 copy 2024-05-04 17:45:08 -07:00
Paul Gauthier
7c9c4fe788 should_giveup? 2024-05-04 17:43:26 -07:00
Paul Gauthier
2d91ee8dbb copy 2024-05-04 17:33:25 -07:00
Paul Gauthier
425cb2941b Merge remote-tracking branch 'origin/main' 2024-05-04 16:26:36 -07:00
Paul Gauthier
366743493c renamed leaderboards 2024-05-04 16:25:46 -07:00
Paul Gauthier
812a620711 copy 2024-05-04 16:25:22 -07:00
paul-gauthier
df94f70a57
Update leaderboard.md 2024-05-04 11:32:43 -07:00
Paul Gauthier
b74edcf350 copy 2024-05-04 11:19:32 -07:00
Paul Gauthier
fbb3749270 copy 2024-05-04 11:14:15 -07:00