From 5d08c69ba02bbbb08de09861d5de3b3f2a0ac280 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Mon, 3 Jun 2024 11:23:22 -0700 Subject: [PATCH] copy --- _posts/2024-06-02-main-swe-bench.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2024-06-02-main-swe-bench.md b/_posts/2024-06-02-main-swe-bench.md index a4d963567..47af20f72 100644 --- a/_posts/2024-06-02-main-swe-bench.md +++ b/_posts/2024-06-02-main-swe-bench.md @@ -238,6 +238,7 @@ The "aider agent" internally makes multiple "attempts" at solving the problem, but it picks and returns one single candidate solution. Only that one candidate solution is evaluated with the acceptance tests and contributes to the benchmark score. +Thus it is a pass@1 result. This is contrast to a pass@N result for N>1, where N attempts are made and all N solutions are evaluated by the acceptance tests.