This commit is contained in:
Paul Gauthier 2024-05-31 05:21:01 -07:00
parent ec26d5f783
commit ff49ff9122

View file

@ -414,19 +414,19 @@ displayed in the graph at the beginning of this article.
Note, the graph was updated on 5/30/24 as follows. Note, the graph was updated on 5/30/24 as follows.
The graph now contains AutoCodeRover's pass@1 results. The graph now contains AutoCodeRover's pass@1 results.
Previously it was reporting the pass@3 results, which are Previously it displayed the pass@3 results, which are
not comparable not comparable
to the pass@1 aider results being reported here. to the pass@1 results for aider being reported here.
The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover) The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover)
features the pass@3 results features the pass@3 results
without being clearly labeled. without being clearly labeled.
The graph now contains the best OpenDevin results obtained without using The graph now contains the best OpenDevin results obtained without using
the `hints_text` to provide hints to the agent. the SWE Bench `hints_text` to provide hints to the agent.
The previous graph contained their hinted result, The previous graph contained their hinted result,
which is not comparable which is not comparable
to the unhinted aider results being reported here. to the unhinted aider results being reported here.
OpenDevin's [hinted result was reported](https://x.com/gneubig/status/1791498953709752405) [OpenDevin reported hinted results](https://x.com/gneubig/status/1791498953709752405)
without noting that hints were used. without noting that hints were used.
The [official SWE Bench Lite leaderboard](https://www.swebench.com) The [official SWE Bench Lite leaderboard](https://www.swebench.com)