From 6382153597af092bfdac4ea30104d3243720502e Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Fri, 24 May 2024 16:00:17 -0700 Subject: [PATCH] copy --- _posts/2024-05-22-swe-bench-lite.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index 922fd82d4..9aee4e78c 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -115,7 +115,7 @@ These first two attempts obtained ~75% of all plausible and ~90% of all resolved - A long tail of solutions continued to be found using both models including one correctly resolved solution on the final, sixth attempt of that problem. -| Attempt | Agent |Number of
plausible
solutions|Percent of
plausible
solutions| Number of
correctly
resolved
solutions | Percent of
correctly
resolved
solutions | Score on
SWE Bench
Lite
(resolved/300) | +| Attempt | Agent |Number of
plausible
solutions|Percent of
plausible
solutions| Number of
correctly
resolved
solutions | Percent of
correctly
resolved
solutions | Score on
SWE Bench
Lite | |:--------:|------------|---------:|---------:|----:|---:|--:| | 1 | Aider with GPT-4o | 208 | 69.3% | 61 | 77.2% | 20.3% | | 2 | Aider with Opus | 49 | 16.3% | 10 | 12.7% | 3.3% |