Succeeded in tricky task in the grep-ast codebase:
- checkout ed714ffe58734 / tricky-search-and-replace-state
- "read and parse .gitignore once, not each time we recurse `enumerate_files`"
- was having a lot of trouble creating a head/updated block that matched the actual source code
- new search/replace block does much better
Benchmark had *best* try 1 result and *lowest* num_error_outputs ever seen on gpt-4-0613.
Low num_error_outputs means it's less likely to elide/... code in the before block (original/search).
──────────── tmp.benchmarks/2023-10-25-22-03-19--search-and-replace-and-think ─────────────
test-cases: 133
model: gpt-4
edit_format: diff
commit_hash: c9c2ddb
num_error_outputs: 6
num_user_asks: 0
num_exhausted_context_windows 0
test_timeouts: 2
50.4% correct after try 0
66.2% correct after try 1