diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index 8a553f296..05e5aee0e 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -446,8 +446,3 @@ which is not comparable to the unhinted aider results being reported here. [OpenDevin reported hinted results](https://x.com/gneubig/status/1791498953709752405) without noting that hints were used. - -The [official SWE Bench Lite leaderboard](https://www.swebench.com) -only accepts pass@1 results that do not use `hints_text`. - - diff --git a/_posts/2024-06-02-main-swe-bench.md b/_posts/2024-06-02-main-swe-bench.md index 47af20f72..ad0b751c8 100644 --- a/_posts/2024-06-02-main-swe-bench.md +++ b/_posts/2024-06-02-main-swe-bench.md @@ -261,8 +261,3 @@ Table 2 of their [paper](https://arxiv.org/pdf/2404.05427v2) reports an `ACR-avg` result of 10.59% which is an average pass@1 result. -The results presented here for aider are all pass@1, as -the [official SWE Bench leaderboard](https://www.swebench.com) -only accepts pass@1 results. - -