From 18e3f55c4ed3ef0d8f781938deb8719dca50542b Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Wed, 22 May 2024 17:18:59 -0700 Subject: [PATCH] copy --- _posts/2024-05-22-swe-bench-lite.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index e7af45bc4..d67af98c8 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -273,12 +273,12 @@ A repo's test suite can be run in three ways: 2. Run tests after aider has modified the repo. So the pre-existing test cases are still present, but may have been modified by aider. Aider may have also added new tests. -3. Run the final "acceptance tests" to judge if the coding agent has +3. Run the final "acceptance tests" to judge if aider has successfully resolved the problem. SWE Bench verifies both pre-existing tests and a set of held out acceptance tests (from the so called `test_patch`) to check that the issue is properly resolved. During this final acceptance testing, -any aider edits to tests are discard to ensure a faithful test of whether the +any aider edits to tests are discarded to ensure a faithful test of whether the issue was resolved. For the benchmark, aider is configured with a test command that will run the tests