Commit graph

3962 commits

Author SHA1 Message Date
Paul Gauthier
ca0faf7cc7 Merge remote-tracking branch 'origin/main' 2024-05-07 06:27:20 -07:00
Paul Gauthier
ecca737803 added deepseek-chat v2 2024-05-07 06:26:39 -07:00
Paul Gauthier
b1cae73a85 cleaned up csv output 2024-05-07 05:59:31 -07:00
paul-gauthier
6fb4983d88
Update edit_leaderboard.yml 2024-05-06 20:42:23 -07:00
paul-gauthier
18761b70be
Update index.md 2024-05-06 17:16:29 -07:00
paul-gauthier
6bc18b9591
Update index.md 2024-05-06 17:15:26 -07:00
Paul Gauthier
db994fb203 aider: Updated the reference to highlight_image in default.html to use page.highlight_image.
# Aider chat conversation:

USER: in default.html, how should i reference the `highlight_image` declared in the markdown front matter?

ASSISTANT: To reference the `highlight_image` front matter variable in `default.html`, you can use the Liquid syntax `{{ page.highlight_image }}`.

Here's the change in a *SEARCH/REPLACE block*:

_layouts/default.html
<source>html
<<<<<<< SEARCH
    {% if highlight_image %}
    <meta property="og:image" content="{{ site.url }}{{ highlight_image }}">
=======
    {% if page.highlight_image %}
    <meta property="og:image" content="{{ site.url }}{{ page.highlight_image }}">
>>>>>>> REPLACE
</source>

This checks if `page.highlight_image` is defined in the front matter of the current page. If so, it sets the `og:image` meta tag to the full URL of the highlight image by prepending `site.url` to `page.highlight_image`.
2024-05-06 16:26:56 -07:00
Paul Gauthier
0c839b835c layout 2024-05-06 16:24:39 -07:00
Paul Gauthier
64938eafee Use highlight as og:image if present 2024-05-06 16:21:22 -07:00
Paul Gauthier
eab864c518 copy 2024-05-06 14:38:02 -07:00
Paul Gauthier
c67a94ee99 copy 2024-05-06 14:37:03 -07:00
Paul Gauthier
a58bc98f51 copy 2024-05-06 12:42:14 -07:00
Paul Gauthier
92f53d4d7b copy 2024-05-06 12:41:02 -07:00
Paul Gauthier
673be4647c copy 2024-05-06 12:23:06 -07:00
Paul Gauthier
c1e2829d21 copy 2024-05-06 12:21:14 -07:00
Paul Gauthier
201bb7fb54 copy 2024-05-06 12:14:36 -07:00
Paul Gauthier
b5f2dcaeae copy 2024-05-06 12:06:24 -07:00
Paul Gauthier
ac46a7ee7d copy 2024-05-06 12:02:46 -07:00
Paul Gauthier
7a5e88c2e6 finishing row.percent_cases_well_formed 2024-05-06 11:58:16 -07:00
Paul Gauthier
4b903a3bb8 cleanup 2024-05-06 11:50:49 -07:00
Paul Gauthier
89db1d8673 aider: Added a second bar for row.percent_cases_well_formed to each graph.
# Aider chat conversation:

USER: in each graph, give each model a second bar showing row.percent_cases_well_formed

ASSISTANT: Here are the changes to add a second bar for `row.percent_cases_well_formed` to each graph:

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code editing tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in edit_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

docs/leaderboards/index.md
```markdown
<<<<<<< SEARCH
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
    {% endfor %}
=======
    var leaderboardData = {
      labels: [],
      datasets: [{
        label: 'Percent correct on code refactoring tasks',
        data: [],
        backgroundColor: 'rgba(54, 162, 235, 0.2)',
        borderColor: 'rgba(54, 162, 235, 1)',
        borderWidth: 1
      },
      {
        label: 'Percent cases well formed',
        data: [],
        backgroundColor: 'rgba(255, 99, 132, 0.2)',
        borderColor: 'rgba(255, 99, 132, 1)',
        borderWidth: 1
      }]
    };

    {% for row in refac_sorted %}
      leaderboardData.labels.push('{{ row.model }}');
      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
    {% endfor %}
>>>>>>> REPLACE
```

This adds a second dataset to each chart's `leaderboardData`, with a label of "Percent cases well formed". It sets different colors for the bars and borders.

Then in the loop that populates the data, it pushes the `row.percent_cases_well_formed` value into the second dataset.

With these changes, each model will have two bars - one for the pass rate and one for the percent of cases that were well formed.
2024-05-06 11:50:10 -07:00
Paul Gauthier
92ea428b82 cleaned up edit data 2024-05-06 11:48:10 -07:00
Paul Gauthier
17b5dbe804 moved edit results to yaml 2024-05-06 11:44:29 -07:00
Paul Gauthier
fc3a43ef41 completed moving refac to yml 2024-05-06 11:25:14 -07:00
Paul Gauthier
e58ce69154 move refac data to yml 2024-05-06 11:21:38 -07:00
Paul Gauthier
a7b08c7354 format output as yaml 2024-05-06 11:15:19 -07:00
Paul Gauthier
3162d42262 cleanup 2024-05-06 10:46:09 -07:00
Paul Gauthier
5fb7a323ec refactored plots 2024-05-06 10:44:34 -07:00
Paul Gauthier
3bb237bdc1 handle tasks with exceptions in the stats output 2024-05-05 08:24:45 -07:00
Paul Gauthier
a0649ba5fa fixed sendchat test 2024-05-05 08:03:26 -07:00
Paul Gauthier
6f8c1cf780 gemini refac results 2024-05-05 08:02:07 -07:00
Paul Gauthier
6b6548bd37 Merge remote-tracking branch 'origin/main' 2024-05-05 08:00:53 -07:00
paul-gauthier
fab8b8ae40
Update index.md 2024-05-04 20:07:32 -07:00
Paul Gauthier
9cdd9e12c3 catch all exceptions in the benchmark 2024-05-04 17:52:46 -07:00
Paul Gauthier
3e4fca2675 max_time not max_tries 2024-05-04 17:48:01 -07:00
Paul Gauthier
1b35ca25c4 copy 2024-05-04 17:45:08 -07:00
Paul Gauthier
7c9c4fe788 should_giveup? 2024-05-04 17:43:26 -07:00
Paul Gauthier
2d91ee8dbb copy 2024-05-04 17:33:25 -07:00
Paul Gauthier
425cb2941b Merge remote-tracking branch 'origin/main' 2024-05-04 16:26:36 -07:00
Paul Gauthier
366743493c renamed leaderboards 2024-05-04 16:25:46 -07:00
Paul Gauthier
812a620711 copy 2024-05-04 16:25:22 -07:00
paul-gauthier
df94f70a57
Update leaderboard.md 2024-05-04 11:32:43 -07:00
Paul Gauthier
b74edcf350 copy 2024-05-04 11:19:32 -07:00
Paul Gauthier
fbb3749270 copy 2024-05-04 11:14:15 -07:00
Paul Gauthier
ec07b6e556 updated refac 2024-05-04 11:11:34 -07:00
Paul Gauthier
e524dd9203 added refac leaderboard 2024-05-04 11:05:32 -07:00
Paul Gauthier
b02320b08f layout 2024-05-04 10:35:49 -07:00
Paul Gauthier
d96dc9ef63 Revert "aider: Change the graph to a horizontal bar graph."
This reverts commit 753ea441ea.
2024-05-04 10:35:26 -07:00
Paul Gauthier
26856c729e horizontal 2024-05-04 10:23:18 -07:00
Paul Gauthier
753ea441ea aider: Change the graph to a horizontal bar graph.
# Aider chat conversation:

USER: make the graph a horizontal bar graph

ASSISTANT: ```diff
--- docs/leaderboard.md
+++ docs/leaderboard.md
@@ ... @@
 <script>
   document.addEventListener('DOMContentLoaded', function () {
     var ctx = document.getElementById('leaderboardChart').getContext('2d');
     var leaderboardData = {
       labels: [],
       datasets: [{
         label: 'Percent correct',
         data: [],
         backgroundColor: 'rgba(54, 162, 235, 0.2)',
         borderColor: 'rgba(54, 162, 235, 1)',
         borderWidth: 1
       }]
     };

     {% for row in sorted %}
       leaderboardData.labels.push('{{ row.model }}');
       leaderboardData.datasets[0].data.push({{ row.second }});
     {% endfor %}

-    var leaderboardChart = new Chart(ctx, {
-      type: 'bar',
+    var leaderboardChart = new Chart(ctx, {
+      type: 'horizontalBar',
       data: leaderboardData,
       options: {
         scales: {
-          yAxes: [{
+          xAxes: [{
             scaleLabel: {
               display: true,
             },
             ticks: {
               beginAtZero: true
             }
           }]
         }
       }
     });
   });
 </script>
```
2024-05-04 10:22:54 -07:00