move into website/

2025-06-01 18:25:00 +00:00 · 2024-06-05 14:28:39 -07:00 · 2024-06-05 14:28:39 -07:00 · 56519361e2
commit 56519361e2
parent 5a4d38418d
103 changed files with 5 additions and 12 deletions
--- a/website/_data/edit_leaderboard.yml
+++ b/website/_data/edit_leaderboard.yml
@ -0,0 +1,477 @@
+- dirname: 2024-05-01-20-05-59--direct-opus-filenames-outside-fence
+  test_cases: 133
+  model: claude-3-opus-20240229
+  released: 2024-02-29
+  edit_format: diff
+  commit_hash: f4b1797-dirty, f4b1797
+  pass_rate_1: 53.4
+  pass_rate_2: 68.4
+  percent_cases_well_formed: 100.0
+  error_outputs: 2
+  num_malformed_responses: 0
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 0
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 1
+  command: aider --opus
+  date: 2024-05-01
+  versions: 0.30.2-dev
+  seconds_per_case: 32.4
+  total_cost: 13.8395
+  
+- dirname: 2024-03-06-16-42-00--claude3-sonnet-whole
+  test_cases: 133
+  model: claude-3-sonnet-20240229
+  released: 2024-02-29
+  edit_format: whole
+  commit_hash: a5f8076-dirty
+  pass_rate_1: 43.6
+  pass_rate_2: 54.9
+  percent_cases_well_formed: 100.0
+  error_outputs: 1
+  num_malformed_responses: 0
+  user_asks: 1
+  lazy_comments: 1
+  syntax_errors: 2
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 7
+  command: aider --sonnet
+  date: 2024-03-06
+  versions: 0.25.1-dev
+  seconds_per_case: 23.1
+  total_cost: 0.0000
+  
+- dirname: 2024-04-29-19-17-28--deepseek-coder-whole
+  test_cases: 132
+  model: deepseek-coder
+  released: 2024-01-25
+  edit_format: whole
+  commit_hash: c07f793-dirty
+  pass_rate_1: 47.0
+  pass_rate_2: 54.5
+  percent_cases_well_formed: 100.0
+  error_outputs: 0
+  num_malformed_responses: 0
+  user_asks: 0
+  lazy_comments: 2
+  syntax_errors: 13
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 2
+  command: aider --model deepseek/deepseek-coder
+  date: 2024-04-29
+  versions: 0.30.2-dev
+  seconds_per_case: 26.7
+  total_cost: 0.0000
+  
+- dirname: 2024-05-03-20-47-24--gemini-1.5-pro-diff-fenced
+  test_cases: 133
+  model: gemini-1.5-pro-latest
+  edit_format: diff-fenced
+  commit_hash: 3a48dfb, 5d32dd7
+  pass_rate_1: 45.9
+  pass_rate_2: 57.1
+  percent_cases_well_formed: 87.2
+  error_outputs: 60
+  num_malformed_responses: 17
+  user_asks: 3
+  lazy_comments: 0
+  syntax_errors: 8
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 3
+  command: aider --model gemini/gemini-1.5-pro-latest
+  date: 2024-05-03
+  versions: 0.31.2-dev
+  seconds_per_case: 21.3
+  total_cost: 0.0000
+
+- dirname: 2024-05-08-20-59-15--may-gpt-3.5-turbo-whole
+  test_cases: 133
+  model: gpt-3.5-turbo-0125
+  released: 2024-01-25
+  edit_format: whole
+  commit_hash: 1d55f74
+  pass_rate_1: 41.4
+  pass_rate_2: 50.4
+  percent_cases_well_formed: 100.0
+  error_outputs: 0
+  num_malformed_responses: 0
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 3
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 4
+  command: aider -3
+  date: 2024-05-08
+  versions: 0.33.1-dev
+  seconds_per_case: 6.5
+  total_cost: 0.5032
+  
+- dirname: 2023-11-06-21-23-59--gpt-3.5-turbo-0301
+  test_cases: 133
+  model: gpt-3.5-turbo-0301
+  released: 2023-03-01
+  edit_format: whole
+  commit_hash: 44388db-dirty
+  pass_rate_1: 50.4
+  pass_rate_2: 57.9
+  percent_cases_well_formed: 100.0
+  error_outputs: 1
+  num_malformed_responses: 0
+  user_asks: 1
+  lazy_comments: 0
+  syntax_errors: 0
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 8
+  command: aider --model gpt-3.5-turbo-0301
+  date: 2023-11-06
+  versions: 0.16.4-dev
+  seconds_per_case: 6.5
+  total_cost: 0.4822
+  
+- dirname: 2023-11-07-02-41-07--gpt-3.5-turbo-0613
+  test_cases: 133
+  model: gpt-3.5-turbo-0613
+  released: 2023-06-13
+  edit_format: whole
+  commit_hash: 93aa497-dirty
+  pass_rate_1: 38.3
+  pass_rate_2: 50.4
+  percent_cases_well_formed: 100.0
+  error_outputs: 1
+  num_malformed_responses: 0
+  user_asks: 1
+  lazy_comments: 0
+  syntax_errors: 0
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 5
+  command: aider --model gpt-3.5-turbo-0613
+  date: 2023-11-07
+  versions: 0.16.4-dev
+  seconds_per_case: 18.0
+  total_cost: 0.5366
+- dirname: 2024-04-30-21-40-51--litellm-gpt-3.5-turbo-1106-again
+  test_cases: 132
+  model: gpt-3.5-turbo-1106
+  edit_format: whole
+  commit_hash: 7b14d77
+  pass_rate_1: 45.5
+  pass_rate_2: 56.1
+  percent_cases_well_formed: 100.0
+  error_outputs: 1
+  num_malformed_responses: 0
+  user_asks: 1
+  lazy_comments: 0
+  syntax_errors: 19
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model gpt-3.5-turbo-1106
+  date: 2024-04-30
+  versions: 0.30.2-dev
+  seconds_per_case: 5.3
+  total_cost: 0.3261
+  
+- dirname: 2024-01-25-23-37-15--jan-exercism-gpt-4-0125-preview-udiff
+  test_cases: 133
+  model: gpt-4-0125-preview
+  released: 2024-01-25
+  edit_format: udiff
+  commit_hash: edcf9b1
+  pass_rate_1: 55.6
+  pass_rate_2: 66.2
+  percent_cases_well_formed: 97.7
+  error_outputs: 6
+  num_malformed_responses: 3
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 3
+  indentation_errors: 7
+  exhausted_context_windows: 0
+  test_timeouts: 4
+  command: aider --model gpt-4-0125-preview
+  date: 2024-01-25
+  versions: 0.22.1-dev
+  seconds_per_case: 44.8
+  total_cost: 14.6428
+  
+- dirname: 2024-05-04-15-07-30--redo-gpt-4-0314-diff-reminder-rules
+  test_cases: 133
+  model: gpt-4-0314
+  released: 2023-03-14
+  edit_format: diff
+  commit_hash: 0d43468
+  pass_rate_1: 50.4
+  pass_rate_2: 66.2
+  percent_cases_well_formed: 93.2
+  error_outputs: 28
+  num_malformed_responses: 9
+  user_asks: 1
+  lazy_comments: 3
+  syntax_errors: 9
+  indentation_errors: 7
+  exhausted_context_windows: 0
+  test_timeouts: 3
+  command: aider --model gpt-4-0314
+  date: 2024-05-04
+  versions: 0.31.2-dev
+  seconds_per_case: 19.8
+  total_cost: 16.2689
+  
+- dirname: 2023-12-16-21-24-28--editblock-gpt-4-0613-actual-main
+  test_cases: 133
+  model: gpt-4-0613
+  released: 2023-06-13
+  edit_format: diff
+  commit_hash: 3aa17c4
+  pass_rate_1: 46.6
+  pass_rate_2: 67.7
+  percent_cases_well_formed: 100.0
+  error_outputs: 14
+  num_malformed_responses: 0
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 0
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 2
+  command: aider -4
+  date: 2023-12-16
+  versions: 0.18.2-dev
+  seconds_per_case: 33.6
+  total_cost: 17.4657
+
+- dirname: 2024-05-08-21-16-03--may-gpt-4-1106-preview-udiff
+  test_cases: 133
+  model: gpt-4-1106-preview
+  released: 2023-11-06  
+  edit_format: udiff
+  commit_hash: 87664dc
+  pass_rate_1: 51.9
+  pass_rate_2: 65.4
+  percent_cases_well_formed: 92.5
+  error_outputs: 30
+  num_malformed_responses: 10
+  user_asks: 0
+  lazy_comments: 3
+  syntax_errors: 11
+  indentation_errors: 2
+  exhausted_context_windows: 0
+  test_timeouts: 1
+  command: aider --model gpt-4-1106-preview
+  date: 2024-05-08
+  versions: 0.33.1-dev
+  seconds_per_case: 20.4
+  total_cost: 6.6061
+  
+- dirname: 2024-05-01-02-09-20--gpt-4-turbo-examples
+  test_cases: 133
+  model: gpt-4-turbo-2024-04-09 (udiff)
+  released: 2024-04-09
+  edit_format: udiff
+  commit_hash: e610e5b-dirty
+  pass_rate_1: 48.1
+  pass_rate_2: 63.9
+  percent_cases_well_formed: 97.0
+  error_outputs: 12
+  num_malformed_responses: 4
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 4
+  indentation_errors: 2
+  exhausted_context_windows: 0
+  test_timeouts: 3
+  command: aider --gpt-4-turbo
+  date: 2024-05-01
+  versions: 0.30.2-dev
+  seconds_per_case: 22.8
+  total_cost: 6.3337
+  
+- dirname: 2024-05-03-22-24-48--openrouter--llama3-diff-examples-sys-msg
+  test_cases: 132
+  model: llama3-70b-8192
+  released: 2024-04-18
+  edit_format: diff
+  commit_hash: b5bb453
+  pass_rate_1: 38.6
+  pass_rate_2: 49.2
+  percent_cases_well_formed: 73.5
+  error_outputs: 105
+  num_malformed_responses: 35
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 1
+  indentation_errors: 2
+  exhausted_context_windows: 0
+  test_timeouts: 3
+  command: aider --model groq/llama3-70b-8192
+  date: 2024-05-03
+  versions: 0.31.2-dev
+  seconds_per_case: 14.5
+  total_cost: 0.4311
+  
+- dirname: 2024-05-06-18-31-08--command-r-plus-whole-final
+  test_cases: 133
+  model: command-r-plus
+  released: 2024-04-04
+  edit_format: whole
+  commit_hash: fc3a43e-dirty
+  pass_rate_1: 21.8
+  pass_rate_2: 31.6
+  percent_cases_well_formed: 100.0
+  error_outputs: 0
+  num_malformed_responses: 0
+  user_asks: 0
+  lazy_comments: 1
+  syntax_errors: 5
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 7
+  command: aider --model command-r-plus
+  date: 2024-05-06
+  versions: 0.31.2-dev
+  seconds_per_case: 22.9
+  total_cost: 2.7494
+  
+- dirname: 2024-05-07-12-55-06--deepseek-chat-v2-whole
+  test_cases: 133
+  model: deepseek-chat v2 (whole)
+  edit_format: whole
+  commit_hash: b1cae73, db994fb
+  pass_rate_1: 50.4
+  pass_rate_2: 60.2
+  percent_cases_well_formed: 100.0
+  error_outputs: 3
+  num_malformed_responses: 0
+  user_asks: 3
+  lazy_comments: 13
+  syntax_errors: 0
+  indentation_errors: 2
+  exhausted_context_windows: 0
+  test_timeouts: 1
+  command: aider --model deepseek/deepseek-chat --edit-format whole
+  date: 2024-05-07
+  versions: 0.31.2-dev
+  seconds_per_case: 42.4
+  total_cost: 0.0000
+
+- dirname: 2024-05-09-18-57-52--deepseek-chat-v2-diff-reverted-and-helpful-assistant2
+  test_cases: 133
+  model: deepseek-chat v2 (diff)
+  released: 2024-05-06
+  edit_format: diff
+  commit_hash: 80a3f6d
+  pass_rate_1: 44.4
+  pass_rate_2: 60.9
+  percent_cases_well_formed: 97.0
+  error_outputs: 14
+  num_malformed_responses: 4
+  user_asks: 2
+  lazy_comments: 0
+  syntax_errors: 13
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 3
+  command: aider --model deepseek/deepseek-chat
+  date: 2024-05-09
+  versions: 0.33.1-dev
+  seconds_per_case: 86.8
+  total_cost: 0.0941
+  
+- dirname: 2024-05-07-20-32-37--qwen1.5-110b-chat-whole
+  test_cases: 133
+  model: qwen1.5-110b-chat
+  released: 2024-02-04  
+  edit_format: whole
+  commit_hash: 70b1c0c
+  pass_rate_1: 30.8
+  pass_rate_2: 37.6
+  percent_cases_well_formed: 100.0
+  error_outputs: 3
+  num_malformed_responses: 0
+  user_asks: 3
+  lazy_comments: 20
+  syntax_errors: 0
+  indentation_errors: 6
+  exhausted_context_windows: 0
+  test_timeouts: 3
+  command: aider --model together_ai/qwen/qwen1.5-110b-chat
+  date: 2024-05-07
+  versions: 0.31.2-dev
+  seconds_per_case: 46.9
+  total_cost: 0.0000
+  
+- dirname: 2024-05-07-20-57-04--wizardlm-2-8x22b-whole
+  test_cases: 133
+  model: WizardLM-2 8x22B
+  edit_format: whole
+  commit_hash: 8e272bf, bbe8639
+  pass_rate_1: 27.8
+  pass_rate_2: 44.4
+  percent_cases_well_formed: 100.0
+  error_outputs: 0
+  num_malformed_responses: 0
+  user_asks: 0
+  lazy_comments: 1
+  syntax_errors: 2
+  indentation_errors: 2
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model openrouter/microsoft/wizardlm-2-8x22b
+  date: 2024-05-07
+  versions: 0.31.2-dev
+  seconds_per_case: 36.6
+  total_cost: 0.0000
+
+- dirname: 2024-05-13-17-39-05--gpt-4o-diff
+  test_cases: 133
+  model: gpt-4o
+  released: 2024-05-13
+  edit_format: diff
+  commit_hash: b6cd852
+  pass_rate_1: 60.2
+  pass_rate_2: 72.9
+  percent_cases_well_formed: 96.2
+  error_outputs: 103
+  num_malformed_responses: 5
+  user_asks: 0
+  lazy_comments: 0
+  syntax_errors: 0
+  indentation_errors: 2
+  exhausted_context_windows: 0
+  test_timeouts: 1
+  command: aider
+  date: 2024-05-13
+  versions: 0.34.1-dev
+  seconds_per_case: 6.0
+  total_cost: 0.0000
+
+- dirname: 2024-04-12-22-18-20--gpt-4-turbo-2024-04-09-plain-diff
+  test_cases: 33
+  model: gpt-4-turbo-2024-04-09 (diff)
+  edit_format: diff
+  commit_hash: 9b2e697-dirty
+  pass_rate_1: 48.5
+  pass_rate_2: 57.6
+  percent_cases_well_formed: 100.0
+  error_outputs: 15
+  num_malformed_responses: 0
+  user_asks: 15
+  lazy_comments: 0
+  syntax_errors: 0
+  indentation_errors: 0
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model gpt-4-turbo-2024-04-09
+  date: 2024-04-12
+  versions: 0.28.1-dev
+  seconds_per_case: 17.6
+  total_cost: 1.6205
+  
--- a/website/_data/refactor_leaderboard.yml
+++ b/website/_data/refactor_leaderboard.yml
@ -0,0 +1,146 @@
+- dirname: 2024-05-04-23-27-02--refac-gemini
+  test_cases: 89
+  model: gemini/gemini-1.5-pro-latest
+  edit_format: diff-fenced
+  commit_hash: a0649ba-dirty, 425cb29, 1b35ca2-dirty, 3e4fca2-dirty
+  pass_rate_1: 49.4
+  percent_cases_well_formed: 7.9
+  error_outputs: 247
+  num_malformed_responses: 82
+  user_asks: 0
+  lazy_comments: 4
+  syntax_errors: 0
+  indentation_errors: 8
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model gemini/gemini-1.5-pro-latest
+  date: 2024-05-04
+  versions: 0.31.2-dev
+  seconds_per_case: 55.7
+  total_cost: 0.0000
+- dirname: 2024-05-04-17-45-53--refac-opus
+  test_cases: 83
+  model: claude-3-opus-20240229
+  edit_format: diff
+  commit_hash: b02320b-dirty
+  pass_rate_1: 72.3
+  percent_cases_well_formed: 79.5
+  error_outputs: 51
+  num_malformed_responses: 17
+  user_asks: 0
+  lazy_comments: 2
+  syntax_errors: 1
+  indentation_errors: 3
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --opus
+  date: 2024-05-04
+  versions: 0.31.2-dev
+  seconds_per_case: 67.8
+  total_cost: 27.9176
+- dirname: 2024-04-09-21-49-54--refac-gpt-4-turbo-2024-04-09
+  test_cases: 88
+  model: gpt-4-turbo-2024-04-09 (udiff)
+  edit_format: udiff
+  commit_hash: b75fdb9
+  pass_rate_1: 34.1
+  percent_cases_well_formed: 30.7
+  error_outputs: 183
+  num_malformed_responses: 61
+  user_asks: 0
+  lazy_comments: 1
+  syntax_errors: 3
+  indentation_errors: 15
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --gpt-4-turbo
+  date: 2024-04-09
+  versions: 0.27.1-dev
+  seconds_per_case: 42.4
+  total_cost: 19.6556
+
+- dirname: 2024-05-08-22-25-41--may-refac-gpt-4-0125-preview-ex-sys
+  test_cases: 89
+  model: gpt-4-0125-preview
+  edit_format: udiff
+  commit_hash: bf09bd3-dirty
+  pass_rate_1: 33.7
+  percent_cases_well_formed: 47.2
+  error_outputs: 142
+  num_malformed_responses: 47
+  user_asks: 0
+  lazy_comments: 1
+  syntax_errors: 2
+  indentation_errors: 16
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model gpt-4-0125-preview
+  date: 2024-05-08
+  versions: 0.33.1-dev
+  seconds_per_case: 56.6
+  total_cost: 20.3270
+  
+- dirname: 2024-05-08-21-24-16--may-refac-gpt-4-1106-preview
+  test_cases: 89
+  model: gpt-4-1106-preview
+  edit_format: udiff
+  commit_hash: eaa2514-dirty
+  pass_rate_1: 50.6
+  percent_cases_well_formed: 39.3
+  error_outputs: 164
+  num_malformed_responses: 54
+  user_asks: 1
+  lazy_comments: 17
+  syntax_errors: 0
+  indentation_errors: 8
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model gpt-4-1106-preview
+  date: 2024-05-08
+  versions: 0.33.1-dev
+  seconds_per_case: 61.8
+  total_cost: 18.3844
+
+- dirname: 2024-05-13-17-42-22--refac-gpt-4o-diff
+  test_cases: 89
+  model: gpt-4o
+  edit_format: diff
+  commit_hash: b6cd852
+  pass_rate_1: 62.9
+  percent_cases_well_formed: 53.9
+  error_outputs: 9025
+  num_malformed_responses: 41
+  user_asks: 0
+  lazy_comments: 2
+  syntax_errors: 0
+  indentation_errors: 5
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider
+  date: 2024-05-13
+  versions: 0.34.1-dev
+  seconds_per_case: 27.8
+  total_cost: 0.0000
+
+- dirname: 2024-04-10-13-26-18--refac-gpt-4-turbo-2024-04-09-diff
+  test_cases: 88
+  model: gpt-4-turbo-2024-04-09 (diff)
+  edit_format: diff
+  commit_hash: 7875418
+  pass_rate_1: 21.4
+  percent_cases_well_formed: 6.8
+  error_outputs: 247
+  num_malformed_responses: 82
+  user_asks: 1
+  lazy_comments: 2
+  syntax_errors: 3
+  indentation_errors: 8
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model gpt-4-turbo-2024-04-09
+  date: 2024-04-10
+  versions: 0.28.1-dev
+  seconds_per_case: 67.8
+  total_cost: 20.4889
+
+  
--- a/website/_include/head-custom.html
+++ b/website/_include/head-custom.html
@ -0,0 +1,20 @@
+{% if page.highlight_image %}
+<meta property="og:image" content="{{ site.url }}{{ page.highlight_image }}">
+<meta property="twitter:image" content="{{ site.url }}{{ page.highlight_image }}">
+{% else %}
+<meta property="og:image" content="{{ site.url }}/assets/aider.jpg">
+<meta property="twitter:image" content="{{ site.url }}/assets/aider-square.jpg">
+{% endif %}
+<link rel="preconnect" href="https://fonts.gstatic.com">
+<link rel="preload" href="https://fonts.googleapis.com/css?family=Open+Sans:400,700&display=swap" as="style" type="text/css" crossorigin>
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="theme-color" content="#157878">
+<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
+<link rel="icon" type="image/png" sizes="32x32" href="{{ '/assets/icons/favicon-32x32.png' | relative_url }}">
+<link rel="icon" type="image/png" sizes="16x16" href="{{ '/assets/icons/favicon-16x16.png' | relative_url }}">
+<link rel="apple-touch-icon" sizes="180x180" href="{{ '/assets/icons/apple-touch-icon.png' | relative_url }}">
+<link rel="manifest" href="{{ '/assets/icons/site.webmanifest' | relative_url }}">
+<link rel="mask-icon" href="{{ '/assets/icons/safari-pinned-tab.svg' | relative_url }}" color="#5bbad5">
+<meta name="msapplication-TileColor" content="#da532c">
+<meta name="theme-color" content="#ffffff">
+<link rel="stylesheet" href="{{ '/assets/css/style.css?v=' | append: site.github.build_revision | relative_url }}">
--- a/website/_layouts/redirect.html
+++ b/website/_layouts/redirect.html
@ -0,0 +1,10 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <meta charset="utf-8">
+  <title>Redirecting&hellip;</title>
+  <link rel="canonical" href="{{ page.redirect.to }}">
+  <script>location="{{ page.redirect.to }}"</script>
+  <meta http-equiv="refresh" content="0; url={{ page.redirect.to }}">
+  <h1>Redirecting&hellip;</h1>
+  <a href="{{ page.redirect.to }}">Click here if you are not redirected.</a>
+</html>
--- a/website/_posts/2023-05-25-ctags.md
+++ b/website/_posts/2023-05-25-ctags.md
@ -0,0 +1 @@
+../docs/ctags.md
--- a/website/_posts/2023-07-02-benchmarks.md
+++ b/website/_posts/2023-07-02-benchmarks.md
@ -0,0 +1 @@
+../docs/benchmarks.md
--- a/website/_posts/2023-10-22-repomap.md
+++ b/website/_posts/2023-10-22-repomap.md
@ -0,0 +1 @@
+../docs/repomap.md
--- a/website/_posts/2023-11-06-benchmarks-1106.md
+++ b/website/_posts/2023-11-06-benchmarks-1106.md
@ -0,0 +1 @@
+../docs/benchmarks-1106.md
--- a/website/_posts/2023-11-06-benchmarks-speed-1106.md
+++ b/website/_posts/2023-11-06-benchmarks-speed-1106.md
@ -0,0 +1 @@
+../docs/benchmarks-speed-1106.md
--- a/website/_posts/2023-12-21-unified-diffs.md
+++ b/website/_posts/2023-12-21-unified-diffs.md
@ -0,0 +1 @@
+../docs/unified-diffs.md
--- a/website/_posts/2024-01-25-benchmarks-0125.md
+++ b/website/_posts/2024-01-25-benchmarks-0125.md
@ -0,0 +1 @@
+../docs/benchmarks-0125.md
--- a/website/_posts/2024-03-08-claude-3.md
+++ b/website/_posts/2024-03-08-claude-3.md
@ -0,0 +1,89 @@
+---
+title: Claude 3 beats GPT-4 on Aider's code editing benchmark
+excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
+highlight_image: /assets/2024-03-07-claude-3.jpg
+nav_exclude: true
+---
+# Claude 3 beats GPT-4 on Aider's code editing benchmark
+
+[![benchmark results](/assets/2024-03-07-claude-3.svg)](https://aider.chat/assets/2024-03-07-claude-3.svg)
+
+[Anthropic just released their new Claude 3 models](https://www.anthropic.com/news/claude-3-family)
+with evals showing better performance on coding tasks.
+With that in mind, I've been benchmarking the new models
+using Aider's code editing benchmark suite.
+
+Claude 3 Opus outperforms all of OpenAI's models,
+making it the best available model for pair programming with AI.
+
+To use Claude 3 Opus with aider:
+
+```
+pip install aider-chat
+export ANTHROPIC_API_KEY=sk-...
+aider --opus
+```
+
+## Aider's code editing benchmark
+
+[Aider](https://github.com/paul-gauthier/aider)
+is an open source command line chat tool that lets you
+pair program with AI on code in your local git repo.
+
+Aider relies on a
+[code editing benchmark](https://aider.chat/docs/benchmarks.html)
+to quantitatively evaluate how well
+an LLM can make changes to existing code.
+The benchmark uses aider to try and complete
+[133 Exercism Python coding exercises](https://github.com/exercism/python).
+For each exercise,
+Exercism provides a starting python file with stubs for the needed functions,
+a natural language description of the problem to solve
+and a test suite to evaluate whether the coder has correctly solved the problem.
+
+The LLM gets two tries to solve each problem:
+
+1. On the first try, it gets the initial stub code and the English description of the coding task. If the tests all pass, we are done.
+2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
+
+## Benchmark results
+
+### Claude 3 Opus
+
+- The new `claude-3-opus-20240229` model got the highest score ever on this benchmark, completing 68.4% of the tasks with two tries.
+- Its single-try performance was comparable to the latest GPT-4 Turbo model `gpt-4-0125-preview`, at 54.1%.
+- While Opus got the highest score, it was only a few points higher than the GPT-4 Turbo results. Given the extra costs of Opus and the slower response times, it remains to be seen which is the most practical model for daily coding use.
+
+### Claude 3 Sonnet
+
+- The new `claude-3-sonnet-20240229` model performed similarly to OpenAI's GPT-3.5 Turbo models with an overall score of 54.9% and a first-try score of 43.6%.
+
+## Code editing
+
+It's highly desirable to have the LLM send back code edits as
+some form of diffs, rather than having it send back an updated copy of the
+entire source code.
+
+Weaker models like GPT-3.5 are unable to use diffs, and are stuck sending back
+updated copies of entire source files.
+Aider uses more efficient
+[search/replace blocks](https://aider.chat/2023/07/02/benchmarks.html#diff)
+with the original GPT-4
+and
+[unified diffs](https://aider.chat/2023/12/21/unified-diffs.html#unified-diff-editing-format)
+with the newer GPT-4 Turbo models.
+
+Claude 3 Opus works best with the search/replace blocks, allowing it to send back
+code changes efficiently.
+Unfortunately, the Sonnet model was only able to work reliably with whole files,
+which limits it to editing smaller source files and uses more tokens, money and time.
+
+## Other observations
+
+There are a few other things worth noting:
+
+- Claude 3 Opus and Sonnet are both slower and more expensive than OpenAI's models. You can get almost the same coding skill faster and cheaper with OpenAI's models.
+- Claude 3 has a 2X larger context window than the latest GPT-4 Turbo, which may be an advantage when working with larger code bases.
+- The Claude models refused to perform a number of coding tasks and returned the error "Output blocked by content filtering policy". They refused to code up the [beer song](https://exercism.org/tracks/python/exercises/beer-song) program, which makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
+- The Claude APIs seem somewhat unstable, returning HTTP 5xx errors of various sorts. Aider automatically recovers from these errors with exponential backoff retries, but it's a sign that Anthropic made be struggling under surging demand.
+
--- a/website/_posts/2024-04-09-gpt-4-turbo.md
+++ b/website/_posts/2024-04-09-gpt-4-turbo.md
@ -0,0 +1,70 @@
+---
+title: GPT-4 Turbo with Vision is a step backwards for coding
+excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. In particular, it seems much more prone to "lazy coding" than the existing GPT-4 Turbo "preview" models.
+highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.jpg
+nav_exclude: true
+---
+# GPT-4 Turbo with Vision is a step backwards for coding
+
+[OpenAI just released GPT-4 Turbo with Vision](https://twitter.com/OpenAIDevs/status/1777769463258988634)
+and it performs worse on aider's coding benchmark suites than all the previous GPT-4 models.
+In particular, it seems much more prone to "lazy coding" than the
+existing GPT-4 Turbo "preview" models.
+
+## Code editing skill
+
+[![benchmark results](/assets/2024-04-09-gpt-4-turbo.svg)](https://aider.chat/assets/2024-04-09-gpt-4-turbo.svg)
+
+Aider relies on a
+[code editing benchmark](https://aider.chat/docs/benchmarks.html#the-benchmark)
+to quantitatively evaluate how well
+an LLM can make changes to existing code.
+The benchmark uses aider to try and complete
+[133 Exercism Python coding exercises](https://github.com/exercism/python).
+
+For each exercise, the LLM gets two tries to solve each problem:
+
+1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done.
+2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
+
+**GPT-4 Turbo with Vision
+scores only 62% on this benchmark,
+the lowest score of any of the existing GPT-4 models.**
+The other models scored 63-66%, so this represents only a small
+regression, and is likely statistically insignificant when compared
+against `gpt-4-0613`.
+
+## Lazy coding
+
+[![benchmark results](/assets/2024-04-09-gpt-4-turbo-laziness.svg)](https://aider.chat/assets/2024-04-09-gpt-4-turbo-laziness.svg)
+
+The GPT-4 Turbo "preview" models have been widely criticized for being "lazy"
+when coding.
+They often omit needed code
+and instead leave comments with homework assignments like "implement method here".
+
+```
+def some_complex_method(foo, bar):
+    # ... implement method here ...
+```
+
+Aider uses a ["laziness" benchmark suite](https://github.com/paul-gauthier/refactor-benchmark)
+which is designed to both provoke and quantify lazy coding.
+It consists of
+89 python refactoring tasks
+which tend to make GPT-4 Turbo code in that lazy manner.
+
+**The new GPT-4 Turbo with Vision model scores only 34% on aider's
+refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models
+by a significant margin.**
+
+# Conclusions
+
+Aider has full support for the new GPT-4 Turbo with Vision
+model, which you can access using the switch `--model gpt-4-turbo-2024-04-09`.
+But aider will continue to use `gpt-4-1106-preview` by default,
+as it is by far the strongest coder of the GPT-4 models.
+
+
+
+
--- a/website/_posts/2024-05-02-browser.md
+++ b/website/_posts/2024-05-02-browser.md
@ -0,0 +1,52 @@
+---
+title: Aider in your browser
+excerpt: Aider has an experimental browser UI, allowing you to collaborate with LLMs on code in your local git repo.
+highlight_image: /assets/browser.jpg
+nav_order: 800
+---
+# Aider in your browser
+
+<div class="video-container">
+  <video controls loop poster="/assets/browser.jpg">
+    <source src="/assets/aider-browser-social.mp4" type="video/mp4">
+    <a href="/assets/aider-browser-social.mp4">Aider browser UI demo video</a>
+  </video>
+</div>
+
+<style>
+.video-container {
+  position: relative;
+  padding-bottom: 101.89%; /* 1080 / 1060 = 1.0189 */
+  height: 0;
+  overflow: hidden;
+}
+
+.video-container video {
+  position: absolute;
+  top: 0;
+  left: 0;
+  width: 100%;
+  height: 100%;
+}
+</style>
+
+Use aider's new experimental browser UI to collaborate with LLMs
+to edit code in your local git repo.
+Aider will directly edit the code in your local source files,
+and [git commit the changes](https://aider.chat/docs/faq.html#how-does-aider-use-git)
+with sensible commit messages.
+You can start a new project or work with an existing git repo.
+Aider works well with GPT 3.5, GPT-4, GPT-4 Turbo with Vision,
+and Claude 3 Opus.
+It also supports [connecting to almost any LLM](https://aider.chat/docs/llms.html).
+
+Use the `--browser` switch to launch the browser version of aider:
+
+```
+pip install aider-chat
+
+export OPENAI_API_KEY=<key> # Mac/Linux
+setx   OPENAI_API_KEY <key> # Windows
+
+aider --browser
+```
--- a/website/_posts/2024-05-13-models-over-time.md
+++ b/website/_posts/2024-05-13-models-over-time.md
@ -0,0 +1,324 @@
+---
+title: Drawing graphs with aider, GPT-4o and matplotlib
+excerpt: Use GPT-4o to draw graphs with matplotlib, including adjusting styles and making visual changes. You get the graph, but you also get the code in your repo.
+highlight_image: /assets/models-over-time.png
+nav_exclude: true
+---
+
+![LLM coding skill over time](/assets/models-over-time.svg)
+
+# Drawing graphs with aider, GPT-4o and matplotlib
+
+I recently wanted to draw a graph showing how LLM code editing skill has been
+changing over time as new models have been released by OpenAI, Anthropic and others.
+I have all the
+[data in a yaml file](https://github.com/paul-gauthier/aider/blob/main/_data/edit_leaderboard.yml) that is used to render
+[aider's LLM leaderboards](https://aider.chat/docs/leaderboards/).
+
+Below is the aider chat transcript, which shows:
+
+- I launch aider with the yaml file, a file with other plots I've done recently (so GPT can crib the style) and an empty file called `over_time.py`.
+- Then I ask GPT to draw the scatterplot I want.
+- I run the resulting script and share the error output with GPT so it can fix a small bug.
+- I ask it to color the points for GPT-4 and GPT-3.5 family models differently, to better see trends within those model families.
+- I add a PNG of the graph to the chat, so GPT can see the current graph and understand why I want it to move the point labels so they don't overlap the points.
+- I work through a series of other small style changes, like changing fonts and the graph border.
+
+In the end I have the graph, but I also have the python code in my repo.
+So I can update this graph easily whenever I add new entries to the yaml data file.
+
+
+## Aider chat transcript
+
+<div class="chat-transcript" markdown="1">
+
+> Aider v0.35.1-dev  
+> Models: gpt-4o with diff edit format, weak model gpt-3.5-turbo  
+> Added _data/edit_leaderboard.yml to the chat.  
+> Added benchmark/over_time.py to the chat.  
+> Added benchmark/plots.py to the chat.  
+
+#### in over_time.py, draw a graph for each of the yaml entries with a `released` date. plot date on x, `pass_rate_2` on y and label with the `model` name. do a scatter plot. use colors/styles similar to the other graphs.  
+
+To draw a scatter plot for each of the YAML entries with a `released` date, we need to:
+
+1. Read the YAML file.
+2. Extract the `released` date and `pass_rate_2` for each entry.
+3. Plot the data using matplotlib.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+=======
+import matplotlib.pyplot as plt
+import yaml
+from datetime import datetime
+
+def plot_over_time(yaml_file):
+    with open(yaml_file, 'r') as file:
+        data = yaml.safe_load(file)
+
+    dates = []
+    pass_rates = []
+    models = []
+
+    for entry in data:
+        if 'released' in entry and 'pass_rate_2' in entry:
+            dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
+            pass_rates.append(entry['pass_rate_2'])
+            models.append(entry['model'])
+
+    plt.figure(figsize=(10, 6))
+    plt.scatter(dates, pass_rates, c='blue', alpha=0.5)
+
+    for i, model in enumerate(models):
+        plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
+
+    plt.xlabel('Release Date')
+    plt.ylabel('Pass Rate 2')
+    plt.title('Model Performance Over Time')
+    plt.grid(True)
+    plt.tight_layout()
+    plt.show()
+
+# Example usage
+plot_over_time('_data/edit_leaderboard.yml')
+>>>>>>> REPLACE
+```
+
+> Applied edit to benchmark/over_time.py  
+> Commit 49ac9ca aider: Added scatter plot functionality for YAML entries with a `released` date and `pass_rate_2`.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### /run python benchmark/over_time.py  
+> Traceback (most recent call last):
+  File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in <module>
+    plot_over_time('_data/edit_leaderboard.yml')
+  File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 15, in plot_over_time
+    dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
+                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+TypeError: strptime() argument 1 must be str, not datetime.date  
+> Add the output to the chat? y  
+> Traceback (most recent call last):  
+> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in <module>  
+> plot_over_time('_data/edit_leaderboard.yml')  
+> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 15, in plot_over_time  
+> dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))  
+> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
+> TypeError: strptime() argument 1 must be str, not datetime.date  
+
+The error indicates that the `released` field in the YAML file is already a `datetime.date` object, so we don't need to parse it with `datetime.strptime`. We can directly use it in the plot.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+            dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
+=======
+            dates.append(entry['released'])
+>>>>>>> REPLACE
+```
+
+> Applied edit to benchmark/over_time.py  
+> Commit af2f4b7 aider: Use the `released` field directly in the plot instead of parsing it with `datetime.strptime`.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### color the "gpt-4" models differently  
+
+To color the "gpt-4" models differently, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-4" in the model name and assign a different color.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+    plt.scatter(dates, pass_rates, c='blue', alpha=0.5)
+=======
+    colors = ['red' if 'gpt-4' in model else 'blue' for model in models]
+    plt.scatter(dates, pass_rates, c=colors, alpha=0.5)
+>>>>>>> REPLACE
+```
+
+This change will color the points red if the model name contains "gpt-4", otherwise, it will color them blue.
+
+> Applied edit to benchmark/over_time.py  
+> Commit 0d11357 aider: Color the "gpt-4" models differently.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### color the gpt-3.5 models a new color  
+
+To color the "gpt-3.5" models a new color, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-3.5" in the model name and assign a different color.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+    colors = ['red' if 'gpt-4' in model else 'blue' for model in models]
+=======
+    colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
+>>>>>>> REPLACE
+```
+
+> Applied edit to benchmark/over_time.py  
+> Commit 85d9d84 aider: Color the "gpt-3.5" models green in the plotting code.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### use the colorscheme and fonts from the other graphs in plots.py  
+
+To use the colorscheme and fonts from the graphs in `plots.py`, we need to:
+
+1. Import the necessary settings from `plots.py`.
+2. Apply these settings to the graph in `over_time.py`.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+import yaml
+from datetime import datetime
+=======
+import yaml
+from datetime import datetime
+from matplotlib import rc
+>>>>>>> REPLACE
+```
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+    plt.figure(figsize=(10, 6))
+=======
+    plt.rcParams["hatch.linewidth"] = 0.5
+    plt.rcParams["hatch.color"] = "#444444"
+
+    rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})
+
+    fig, ax = plt.subplots(figsize=(6, 4))
+    ax.grid(axis="y", zorder=0, lw=0.2)
+>>>>>>> REPLACE
+```
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+    plt.scatter(dates, pass_rates, c=colors, alpha=0.5)
+=======
+    ax.scatter(dates, pass_rates, c=colors, alpha=0.5)
+>>>>>>> REPLACE
+```
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+        plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
+
+    plt.xlabel('Release Date')
+    plt.ylabel('Pass Rate 2')
+    plt.title('Model Performance Over Time')
+    plt.grid(True)
+    plt.tight_layout()
+    plt.show()
+=======
+        ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
+
+    ax.set_xlabel('Release Date')
+    ax.set_ylabel('Pass Rate 2')
+    ax.set_title('Model Performance Over Time')
+    plt.tight_layout()
+    plt.savefig("tmp_over_time.svg")
+    imgcat(fig)
+>>>>>>> REPLACE
+```
+
+> Applied edit to benchmark/over_time.py  
+> Commit 8d670f1 aider: Used the colorscheme and fonts from the graphs in `plots.py` in `over_time.py`.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### /add tmp_over_time.png  
+> Added /Users/gauthier/Projects/aider/tmp_over_time.png to the chat  
+
+#### make the labels not overlap the dots  
+
+To make the labels not overlap the dots, we can adjust the position of the labels slightly using the `xytext` parameter in the `annotate` function.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+        ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
+=======
+        ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75,
+                    xytext=(5, 5), textcoords='offset points')
+>>>>>>> REPLACE
+```
+
+> Committing benchmark/over_time.py before applying edits.  
+> Commit 49eb72c Updated image file format from SVG to PNG and added imgcat library for displaying the plot.  
+> Applied edit to benchmark/over_time.py  
+> Commit 75fa061 aider: Adjust the position of labels to prevent overlap with dots.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### make the dots bigger  
+
+To make the dots bigger, we can adjust the `s` parameter in the `scatter` function, which controls the size of the dots.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+    ax.scatter(dates, pass_rates, c=colors, alpha=0.5)
+=======
+    ax.scatter(dates, pass_rates, c=colors, alpha=0.5, s=100)
+>>>>>>> REPLACE
+```
+
+> Applied edit to benchmark/over_time.py  
+> Commit a018fdf aider: Make the dots bigger.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+
+#### make the border of the graph thinner and lighter  
+
+To make the border of the graph thinner and lighter, we can adjust the `spines` properties of the axes.
+
+Here are the changes:
+
+benchmark/over_time.py
+```python
+<<<<<<< SEARCH
+    colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
+=======
+    for spine in ax.spines.values():
+        spine.set_edgecolor('#DDDDDD')
+        spine.set_linewidth(0.5)
+
+    colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
+>>>>>>> REPLACE
+```
+
+> Applied edit to benchmark/over_time.py  
+> Commit 77ba518 aider: Made the border of the graph thinner and lighter by adjusting the `spines` properties of the axes.  
+
+</div>
--- a/website/_posts/2024-05-22-draft.md
+++ b/website/_posts/2024-05-22-draft.md
@ -0,0 +1,11 @@
+---
+title: A draft post.
+excerpt: With a draft summary.
+highlight_image: /assets/linting.jpg
+draft: true
+nav_exclude: true
+---
+
+# A draft post
+
+Content TBD.
--- a/website/_posts/2024-05-22-linting.md
+++ b/website/_posts/2024-05-22-linting.md
@ -0,0 +1,146 @@
+---
+title: Linting code for LLMs with tree-sitter
+excerpt: Aider now lints code after every LLM edit and automatically fixes errors, using tree-sitter and AST-aware code context.
+highlight_image: /assets/linting.jpg
+nav_exclude: true
+---
+
+[![Linting code](/assets/linting.jpg)](https://aider.chat/assets/linting.jpg)
+
+# Linting code for LLMs with tree-sitter
+
+Aider now lints your code after every LLM edit, and offers to automatically fix
+any linting errors.
+You can also use aider's lint-and-fix functionality on your source files any time
+you like, to speedily resolve issues with code written by humans.
+
+Aider shows linting errors to the LLM in a novel format,
+using tree-sitter
+to help display relevant code context for each
+error.
+This increases the ability of the LLM to understand the problem and
+make the correct changes to resolve it.
+
+Aider ships with basic linters built with tree-sitter that support
+[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
+These built in linters will detect syntax errors and other fatal problems with the code.
+
+You can also configure aider to use your preferred linters.
+This allows aider to check for a larger class of problems, keep the code style
+aligned with the rest of your team, etc.
+
+## Linting and fixing your code
+
+Aider now lints each source file after it applies the edits
+suggested by an LLM.
+If problems are found, aider will ask if you'd like it to
+attempt to fix the errors.
+If so, aider will send the LLM a report of the lint errors
+and request changes to fix them. This process may iterate a few times
+as the LLM works to fully resolve all the issues.
+
+You can also lint and fix files any time, on demand from within the aider chat or via the
+command line:
+
+- The in-chat `/lint` command will lint and fix all the files which have
+been added to the chat by default. Or you can name any files
+in your git repo as arguments.
+- From the command line, you can run `aider --lint` to lint and fix
+all the dirty files in the repo.
+Or you can specify specific filenames on the command line.
+
+
+## An LLM-friendly lint report
+
+Most linting tools produce terse and cryptic output,
+which is one reason many engineers appreciate IDEs that highlight
+linting errors.
+LLM's don't have the luxury of using an IDE, so aider sends
+the linting errors in an LLM friendly format.
+
+Here's an example of raw output of the `flake8` python linter:
+
+```
+app.py:23:36: F821 undefined name 'num'
+app.py:41:16: F541 f-string is missing placeholders
+```
+
+This sort of output depends on the user to reference line numbers to find and fix
+each reported error.
+LLMs are quite bad at working with source code line numbers, often
+making off-by-one errors and other mistakes even when provided with
+a fully numbered code listing.
+
+Aider augments the raw linter by
+displaying and
+highlighting the lines that have errors within their
+containing functions, methods, classes.
+To do this, aider uses tree-sitter to obtain the code's AST and analyzes it
+in light of the linting errors.
+LLMs are more effective at editing code that's provided
+with context like this.
+
+```
+app.py:23:36: F821 undefined name 'num'
+app.py:41:16: F541 f-string is missing placeholders
+
+app.py:
+...⋮...
+  6│class LongNum:
+  7│    def __init__(self, num):
+  8│        """
+  9│        Initialize the number.
+ 10│        """
+...⋮...
+ 19│    def __str__(self):
+ 20│        """
+ 21│        Render the number as a string.
+ 22│        """
+ 23█        return str(num)
+ 24│
+ 25│
+ 26│@app.route('/subtract/<int:x>/<int:y>')
+...⋮...
+ 38│@app.route('/divide/<int:x>/<int:y>')
+ 39│def divide(x, y):
+ 40│    if y == 0:
+ 41█        return f"Error: Cannot divide by zero"
+ 42│    else:
+ 43│        result = x / y
+ 44│        return str(result)
+ 45│
+...⋮...
+```
+
+## Basic linters for most popular languages
+
+Aider comes batteries-included with built in linters for
+[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
+This provides wide support for linting without requiring
+users to manually install a linter and configure it to work with aider.
+
+Aider's built in language-agnostic linter uses tree-sitter to parse
+the AST of each file.
+When tree-sitter encounters a syntax error or other fatal issue
+parsing a source file, it inserts an AST node with type `ERROR`.
+Aider simply uses these `ERROR` nodes to identify all the lines
+with syntax or other types of fatal error, and displays
+them in the LLM friendly format described above.
+
+## Configuring your preferred linters
+
+You can optionally configure aider to use
+your preferred linters with the `--lint-cmd` switch.
+
+```
+# To lint javascript with jslint
+aider --lint-cmd javascript:jslint
+
+# To lint python with flake8 using some specific args:
+aider --lint-cmd "python:flake8 --select=E9,F821,F823..."
+```
+
+You can provide multiple `--lint-cmd` switches
+to set linters for various languages.
+You can also durably set linters in your `.aider.conf.yml` file.
+
--- a/website/_posts/2024-05-22-swe-bench-lite.md
+++ b/website/_posts/2024-05-22-swe-bench-lite.md
@ -0,0 +1,451 @@
+---
+title: How aider scored SOTA 26.3% on SWE Bench Lite
+excerpt: Aider achieved this result mainly through its existing features that focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.
+highlight_image: /assets/swe_bench_lite.jpg
+nav_exclude: true
+---
+
+# How aider scored SOTA 26.3% on SWE Bench Lite
+ 
+[Aider scored 26.3%](https://github.com/swe-bench/experiments/pull/7)
+on the
+[SWE Bench Lite benchmark](https://www.swebench.com),
+achieving a state-of-the-art result. 
+The previous top leaderboard entry was 20.3%
+from Amazon Q Developer Agent.
+
+See also [aider's SOTA result on the main SWE Bench](https://aider.chat/2024/06/02/main-swe-bench.html).
+
+[![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg)
+
+**All of aider's results reported here are pass@1 results,
+obtained without using the SWE Bench `hints_text`.**
+All results in the above chart are unhinted pass@1 results.
+Please see the [references](#references)
+for details on the data presented in this chart.
+It was corrected on 5/30/24 to reflect apples-to-apples comparisons,
+using pass@1 results from AutoCodeRover
+and results from OpenDevin that don't use hints.
+The [official SWE Bench Lite leaderboard](https://www.swebench.com)
+only accepts pass@1 results that do not use hints.
+
+## Interactive, not agentic
+
+Aider achieved this result mainly through its existing features that focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.
+Aider intentionally has quite limited and narrow "agentic behavior"
+to avoid long delays, high token costs
+and the need for users to repeatedly code review incorrect solutions.
+It's also worth noting that aider currently does not use
+RAG, vector search, tools or give the LLM access to search the web
+or unilaterally execute code.
+
+Aider is first and foremost an interactive tool for engineers to get real work done in
+real code bases using a chat interface.
+Aider provides a pair programming UX where users can ask for a change
+and see the edits performed in real-time.
+Aider can also offer additional help like fixing lint or test errors,
+but the user is always in full interactive control.
+This lets them quickly steer misunderstandings back on course and
+avoid wasting time and token costs.
+
+
+## Benchmark methodology
+
+For the benchmark, 
+aider was launched in each problem's git repository
+with the problem statement
+submitted as the opening chat message from "the user."
+After that aider runs as normal, with the following modifications:
+
+- Aider's suggestions were always accepted without user approval.
+- A simple harness was used to retry the SWE Bench problem if aider produced code that wasn't *plausibly correct*.
+Plausibly correct means that aider reported that it had successfully edited the repo
+without causing syntax errors or breaking any *pre-existing* tests.
+- If the solution isn't plausible, the harness launches aider to try again from scratch,
+alternating between using aider with GPT-4o and Opus.
+- If no plausible solution is found after six tries, the harness picks the solution
+with the fewest edit/lint/test problems.
+
+It's important to be clear that
+*aider and the benchmark harness
+only had access to the pre-existing tests in each problem's repo*.
+The held out "acceptance tests" were *only* used
+after benchmarking to compute statistics on which problems aider
+correctly resolved.
+
+The [full harness to run aider on SWE Bench Lite is available on GitHub](https://github.com/paul-gauthier/aider-swe-bench).
+
+The benchmarking process was similar to how a developer might use aider to
+resolve a GitHub issue:
+
+- They could launch aider in their repo with the command below, which
+tells aider they want to accept every suggestion
+and to use pytest to run tests.
+  - `aider --yes --test-cmd pytest`
+- They could start the chat by pasting in the URL or text of a GitHub issue.
+Aider will pull in the URL's content and then try and solve the issue.
+- If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe using aider with a different LLM this time.
+[Aider is tightly integrated with git](https://aider.chat/docs/faq.html#how-does-aider-use-git),
+so it's always easy to revert AI changes that don't pan out.
+
+Outside a benchmark setting, it's probably
+unwise or at least highly inefficient
+to let *any* AI agent run unsupervised on your code base.
+The reason aider is intended to be used interactively
+is so that the user can participate and direct aider's work and approve suggestions.
+This way the user can offer immediate feedback or corrections if their initial
+instructions turn out to be ambiguous,
+or if the AI starts going down a wrong path.
+
+## Aider with GPT-4o alone was SOTA
+
+Running the benchmark harness
+only using aider with GPT-4o to find plausible solutions
+achieved a score of 25.0%.
+This was itself matching the state-of-the-art, before being surpassed by the main
+result being reported here
+that used aider with both GPT-4o & Opus.
+
+As noted below, a single attempt using Aider with GPT-4o tied
+the current top entry on the leaderboard.
+
+## Aider with GPT-4o & Opus
+
+The benchmark harness alternated between running aider with GPT-4o and Opus.
+The harness proceeded in a fixed order, always starting with GPT-4o and
+then alternating with Opus until a plausible solution was found for each
+problem.
+
+The table below breaks down the plausible solutions that
+were found for the 300 problems.
+It also provides details on the 79 that were ultimately
+verified as correctly resolving their issue.
+Some noteworthy observations:
+
+- *Just the first attempt* of Aider with GPT-4o resolved 20.3% of the problems, which ties the Amazon Q Developer Agent currently atop the official leaderboard.
+- Including the second attempt, Aider with GPT-4o and Opus scored 23.6% on the benchmark.
+These first two attempts obtained ~75% of all plausible and ~90% of all resolved solutions.
+- A long tail of solutions continued to be found using both models including one correctly resolved solution on the final, sixth attempt of that problem.
+
+
+| Attempt | Agent |Number&nbsp;of<br>plausible<br>solutions|Percent&nbsp;of<br>plausible<br>solutions| Number&nbsp;of<br/>correctly<br>resolved<br>solutions | Percent&nbsp;of<br>correctly<br>resolved<br>solutions | Score&nbsp;on<br>SWE&nbsp;Bench<br>Lite |
+|:--------:|------------|---------:|---------:|----:|---:|--:|
+| 1 | Aider with GPT-4o    | 208 | 69.3% | 61 | 77.2% | 20.3% |
+| 2 | Aider with Opus      |  49 | 16.3% | 10 | 12.7% |  3.3% |
+| 3 | Aider with GPT-4o    |  20 |  6.7% |  3 |  3.8% |  1.0% |
+| 4 | Aider with Opus      |   9 |  3.0% |  2 |  2.5% |  0.7% |
+| 5 | Aider with GPT-4o    |  11 |  3.7% |  2 |  2.5% |  0.7% |
+| 6 | Aider with Opus      |   3 |  1.0% |  1 |  1.3% |  0.3% |
+| **Total** | | **300** | **100%** | **79** | **100%** | **26.3%** |
+
+
+If we break down the solutions solely by model,
+we can see that aider with GPT-4o outperforms Opus.
+This isn't a fair and direct comparison, because GPT-4o always took the first
+turn and therefore got first crack at all the "easiest" problems.
+Aider with Opus only ever saw problems that GPT-4o failed to
+find plausible solutions for on its first try.
+
+Aider with GPT-4o was producing higher quality plausible solutions,
+with a greater chance of going on to be accepted as resolving the issue.
+Again, this is biased by the turn ordering.
+But other anecdotal evidence from earlier runs of the benchmark
+also supports the observation that aider with GPT-4o is significantly stronger than Opus
+for this benchmark.
+
+
+| Agent      | Number&nbsp;of<br>plausible<br>solutions | Number&nbsp;of<br>correctly<br>resolved<br>solutions | Percent&nbsp;of<br>plausible<br>which<br>correctly<br>resolved<br>| 
+|------------|---------:|---------:|---:|
+| Aider with GPT-4o    | 239 | 66 |27.6% |
+| Aider with Opus      |  61 | 13 |21.3% |
+| **Total** | **300** | **79** |**26.3%** |
+
+## Repository map, not RAG
+
+The crucial first step in solving a SWE Bench problem is figuring out
+which parts of the repo are relevant and which files need to be edited.
+Most coding agents use some combination of RAG, vector search
+and providing the LLM with
+tools to interactively explore the code base.
+
+Aider instead uses a
+[repository map](https://aider.chat/2023/10/22/repomap.html)
+to help the LLM understand the 
+layout, code structure, and content of a git repo.
+The repo map is created through static analysis of the code's
+abstract syntax tree and call graph
+to provide a compact and powerful summary of the entire code base.
+The map is constantly
+tailored to show
+repo context that is relevant to the current state of the chat conversation.
+This is done by performing a graph optimization on the code's call graph.
+
+When the user asks for a change to their code, the LLM can use the repo map
+to decide which files to edit.
+The LLM simply returns a normal text response explaining which files
+it needs to edit and why.
+Aider notices when the LLM mentions filenames from the repo,
+and asks the user if they should be added to the chat.
+Adding a file to the chat allows the LLM to see the full contents
+of the file and edit it.
+
+<div class="chat-transcript" markdown="1">
+
+#### Please add a new /factorial/N endpoint.
+
+To add a new /factorial/N endpoint, the most likely file that needs to be edited is app.py.
+Please add app.py to the chat so I can proceed with the changes.
+
+> app.py  
+> Add these files to the chat? yes
+
+</div>
+
+This is a convenient and natural workflow for interactive chat,
+and it worked well for the SWE Bench problems.
+Aider successfully identified the correct file to edit
+in 70.3% of the benchmark tasks.
+
+We can determine which file needs to be edited using the "gold" patch
+which is associated with each SWE Bench task.
+This patch was created by a human developer
+to solve the issue, and therefore reveals a file which can
+be edited to solve the problem.
+Of course aider is not able to see or use the gold patch
+or the file names it contains in any way.
+This information was only used to compute
+statistics outside the benchmarking process.
+
+
+## Reliable code editing
+
+Once files have been selected for editing,
+the next step is of course to edit the source code to fix the problem.
+
+Aider goes to great lengths to ensure that LLMs can not just write code,
+but reliably *edit* code.
+Aider has a collection of prompting strategies and code editing backends which have
+been honed through
+[extensive benchmarking](https://aider.chat/docs/leaderboards/).
+These foundational capabilities help ensure that aider can
+properly integrate code from LLMs into an existing code base and source files.
+
+The repository map helps here too, making sure that the LLM
+can see relevant classes, functions and variables from the entire repo.
+This helps ensure that the project's existing APIs and conventions are
+respected and utilized when new code is added.
+
+Regardless, there are still cases where aider may be unable to cleanly
+complete the edits specified by the LLM.
+This is usually because the LLM has failed to conform to the editing
+instructions in its system prompt.
+When aider completes, it returns an editing outcome that indicates
+whether it was able to successfully apply all edits.
+The benchmark harness uses this editing status as
+one criteria to determine if aider has
+created a plausible solution.
+
+## Linting and fixing
+
+Another key criteria for a plausible solution is that it passes basic
+linting, which means that the code has no syntax
+or other fatal errors.
+[Aider lints code](https://aider.chat/2024/05/22/linting.html)
+after every LLM edit and offers to automatically fix
+any problems.
+
+Aider ships with built-in linters based on tree-sitter
+which work with most popular programming languages.
+Aider shows linting errors to the LLM in a novel format,
+using the abstract syntax tree to display relevant code context for each
+error.
+This context helps LLMs understand the problem and
+make the correct changes to resolve it.
+
+<div class="chat-transcript" markdown="1">
+
+```
+app.py:23:36: F821 undefined name 'num'  
+  
+app.py:  
+...⋮...  
+  6│class LongNum:  
+...⋮...  
+ 19│    def expound(self, threshold):  
+ 20│        number = self.basis  
+ 21│        while number < threshold:  
+ 22│            number *= self.factor  
+ 23█        return num  
+ 24│  
+ 25│  
+...⋮...  
+```  
+
+> Attempt to fix lint errors? yes
+
+</div>
+
+In the benchmark, these linting suggestions are always accepted.
+At completion,
+aider reports a linting outcome that
+indicates if it was able to produce
+code without any outstanding linting errors.
+The benchmark harness uses this status as
+one of the criteria to determine if aider has
+created a plausible solution.
+
+## Testing and fixing
+
+The final crtieria for a plausible solution is that 
+all tests must be passing.
+Aider can be configured with the command to run tests for a repo,
+and will automatically attempt to fix any test failures.
+
+A user working on a python project might configure testing
+by launching aider like this:
+
+```
+aider --test-cmd pytest
+``` 
+
+For the benchmark, aider is configured with a test command that will run the
+tests that already exist in each problem's repository.
+SWE Bench problems are based on repositories from large open
+source projects with extensive existing test suites.
+This means that
+testing will fail if aider has broken any of these
+pre-existing tests or if any new
+tests that it created aren't passing.
+
+As with editing and linting, aider reports a testing outcome
+that indicates if it completed with any outstanding failing tests.
+The benchmark harness uses this status when deciding if aider
+has produced a plausible solution.
+
+To be clear, *aider cannot run or even see the held out "acceptance tests"* that
+are used to judge if a proposed solution correctly
+resolves the problem.
+Those tests are only run outside of aider and the benchmark harness,
+to compute the final benchmark statistics.
+
+## Finding a plausible solution
+
+Each time aider executes, it reports
+the outcome of the editing, linting, and testing
+steps.
+Each of these steps may complete successfully or
+return a status that indicates that there were outstanding
+problems that remain unresolved.
+
+The benchmark harness uses these outcomes to determine if
+aider has produced a plausible
+solution to the current SWE Bench task.
+A plausible solution is one where aider
+returns saying that it 
+edited the repo with no outstanding
+edit, lint, or test errors.
+In this case, aider's changes are recorded
+as the SWE Bench `model_patch` to be evaluated later with the
+acceptance tests.
+
+If the solution is not plausible, another
+instance of aider is launched again from scratch on the same problem.
+The harness alternates launching aider with GPT-4o and Opus to solve the problem,
+and gives each model three attempts -- for a total of six attempts.
+As soon as a plausible solution is found, it is accepted and the
+harness moves on to the next SWE Bench instance.
+
+It's worth noting that repositories may have lint or test errors
+present before aider even starts to edit them.
+Whether unresolved errors were caused by aider or were pre-existing,
+there will be instances where
+no plausible solution is
+found after six tries.
+
+If all six attempts fail to produce a plausible solution,
+then the "best" solution available is selected as the
+`model_patch`.
+Which of the non-plausible solutions to use is determined
+by ignoring the testing outcome
+and prioritizing solutions in the following order:
+
+ - Pick a solution where editing and linting were completed successfully.
+ - Pick a solution where editing was at least partially successful and linting succeeded.
+ - Pick a solution where editing was successful.
+ - Pick a solution where editing was at least partially successful.
+
+## Computing the benchmark score
+
+The benchmark harness produced a plausible solution for each of the 300
+SWE Bench Lite instances and saved it as the `model_patch`.
+
+A separate evaluation script was used to
+test each of these solutions with the full test suite,
+including the held out acceptance tests.
+For this final acceptance testing, any edits that aider made to tests
+are discarded.
+This ensures that the correct,
+unmodified test suite is used for acceptance testing.
+The evaluation script compares the test results
+with results from testing
+the "gold" patch that was developed by a human to correctly solve the issue.
+If they match, the candidate solution has correctly resolved the issue.
+
+These acceptance tests are only ever run outside of aider
+and the benchmark harness, and only to compute the number of
+correctly resolved instances.
+They are never run, used, or even visible during aider's attempts to solve the problems.
+
+Aider correctly resolved 79 out of 300 SWE Bench Lite instances, or 26.3%.
+
+## Acknowledgments
+
+Much thanks to the team behind the
+[SWE Bench](https://www.swebench.com)
+family of AI coding benchmarks.
+Also thanks to Albert Örwall who has
+[dockerized the SWE Bench evaluation scripts](https://github.com/aorwall/SWE-bench-docker)
+making it faster, easier, and more reliable to run the acceptance tests.
+
+
+## References
+
+All of aider's results reported here are pass@1 results,
+obtained without using the SWE Bench `hints_text`.
+
+The "aider agent" internally makes multiple "attempts" at solving the problem,
+but it picks and returns one single candidate solution.
+Only that one candidate solution is evaluated with the acceptance tests
+and contributes to the benchmark score.
+Thus it is a pass@1 result.
+
+This is contrast to a pass@N result for N>1, where N attempts are made
+and all N solutions are evaluated by the acceptance tests.
+If *any* of the N solution pass, that counts as a pass@N success.
+
+Below are the references for the other pass@1 unhinted SWE-Bench results
+displayed in the graph at the beginning of this article.
+
+- [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com)
+- [19.0% AutoCodeRover](https://www.swebench.com/)
+- [18.0% SWE-Agent + GPT-4](https://www.swebench.com)
+- [16.7% OpenDevin](https://github.com/OpenDevin/OpenDevin/issues/2149)
+- [11.7% SWE-Agent + Opus](https://www.swebench.com)
+
+Note, the graph was corrected on 5/30/24 as follows.
+
+The graph now contains AutoCodeRover's average pass@1 results.
+Previously it displayed pass@3 results, which are
+not comparable
+to the pass@1 results for aider being reported here.
+The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover)
+features pass@3 results
+without being clearly labeled.
+
+The graph now contains the best OpenDevin results obtained without using
+the SWE Bench `hints_text` to provide hints to the agent.
+The previous graph contained their hinted result,
+which is not comparable
+to the unhinted aider results being reported here.
+[OpenDevin reported hinted results](https://x.com/gneubig/status/1791498953709752405)
+without noting that hints were used.
--- a/website/_posts/2024-05-24-self-assembly.md
+++ b/website/_posts/2024-05-24-self-assembly.md
@ -0,0 +1,67 @@
+---
+title: Aider has written 7% of its own code
+excerpt: Aider has written 7% of its own code, via 600+ commits that inserted 4.8K and deleted 1.5K lines of code.
+highlight_image: /assets/self-assembly.jpg
+nav_exclude: true
+---
+
+# Aider has written 7% of its own code
+
+[![self assembly](/assets/self-assembly.jpg)](https://aider.chat/assets/self-assembly.jpg)
+
+The
+[aider git repo](https://github.com/paul-gauthier/aider)
+currently contains about 4K commits and 14K lines of code.
+
+Aider made 15% of the commits, inserting 4.8K and deleting 1.5K lines of code.
+
+About 7% of the code now in the repo is attributable to an aider commit
+using `git blame`.
+This number is probably a significant undercount, because periodic reformatting
+by `black` is likely obscuring aider's authorship of many lines.
+
+Here's the breakdown of the code aider wrote in the current code base
+according to `git blame`.
+
+| File | Lines | Percent |
+|---|---:|---:|
+|aider/args.py| 6 of 449 | 1.3% |
+|aider/coders/base_coder.py| 37 of 1354 | 2.7% |
+|aider/coders/editblock_coder.py| 14 of 507 | 2.8% |
+|aider/coders/editblock_func_coder.py| 6 of 141 | 4.3% |
+|aider/coders/udiff_coder.py| 2 of 421 | 0.5% |
+|aider/coders/wholefile_coder.py| 5 of 146 | 3.4% |
+|aider/coders/wholefile_func_coder.py| 4 of 134 | 3.0% |
+|aider/commands.py| 67 of 703 | 9.5% |
+|aider/diffs.py| 15 of 129 | 11.6% |
+|aider/gui.py| 2 of 533 | 0.4% |
+|aider/history.py| 19 of 124 | 15.3% |
+|aider/io.py| 55 of 368 | 14.9% |
+|aider/linter.py| 30 of 240 | 12.5% |
+|aider/main.py| 30 of 466 | 6.4% |
+|aider/mdstream.py| 3 of 122 | 2.5% |
+|aider/models.py| 22 of 549 | 4.0% |
+|aider/repo.py| 19 of 266 | 7.1% |
+|aider/repomap.py| 17 of 518 | 3.3% |
+|aider/scrape.py| 12 of 199 | 6.0% |
+|aider/versioncheck.py| 10 of 37 | 27.0% |
+|aider/voice.py| 9 of 104 | 8.7% |
+|benchmark/benchmark.py| 33 of 730 | 4.5% |
+|benchmark/over_time.py| 32 of 60 | 53.3% |
+|benchmark/swe_bench_lite.py| 40 of 71 | 56.3% |
+|scripts/blame.py| 55 of 212 | 25.9% |
+|scripts/versionbump.py| 96 of 123 | 78.0% |
+|setup.py| 11 of 47 | 23.4% |
+|tests/test_coder.py| 48 of 612 | 7.8% |
+|tests/test_commands.py| 135 of 588 | 23.0% |
+|tests/test_editblock.py| 23 of 403 | 5.7% |
+|tests/test_io.py| 30 of 65 | 46.2% |
+|tests/test_main.py| 13 of 239 | 5.4% |
+|tests/test_models.py| 6 of 28 | 21.4% |
+|tests/test_repo.py| 2 of 296 | 0.7% |
+|tests/test_repomap.py| 70 of 217 | 32.3% |
+|tests/test_udiff.py| 7 of 119 | 5.9% |
+|tests/test_wholefile.py| 37 of 321 | 11.5% |
+| **Total** | **1022 of 14219** | 7.2% |
+
+
--- a/website/_posts/2024-06-02-main-swe-bench.md
+++ b/website/_posts/2024-06-02-main-swe-bench.md
@ -0,0 +1,264 @@
+---
+title: Aider is SOTA for both SWE Bench and SWE Bench Lite
+excerpt: Aider sets SOTA for the main SWE Bench, after recently setting SOTA for the Lite version.
+highlight_image: /assets/swe_bench.jpg
+nav_exclude: true
+---
+
+# Aider is SOTA for both SWE Bench and SWE Bench Lite
+ 
+Aider scored 18.9%
+on the main
+[SWE Bench benchmark](https://www.swebench.com),
+achieving a state-of-the-art result. 
+The current top leaderboard entry is 13.8%
+from Amazon Q Developer Agent.
+The best result reported elsewhere seems to be
+[13.9% from Devin](https://www.cognition.ai/post/swe-bench-technical-report).
+
+This result on the main SWE Bench builds on
+[aider's recent SOTA result on the easier SWE Bench Lite](https://aider.chat/2024/05/22/swe-bench-lite.html).
+
+[![SWE Bench results](/assets/swe_bench.svg)](https://aider.chat/assets/swe_bench.svg)
+
+**All of aider's results reported here are pass@1 results,
+obtained without using the SWE Bench `hints_text`.**
+Aider was benchmarked on the same
+[570 randomly selected SWE Bench problems](https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs)
+that were used in the
+[Devin evaluation](https://www.cognition.ai/post/swe-bench-technical-report).
+See the [references](#references)
+for more details on the data presented in this chart.
+
+## Interactive, not agentic
+
+Aider achieved this result mainly through its existing features that focus on static
+code analysis, reliable LLM code editing, and pragmatic UX for automatically
+fixing linting and testing errors.
+Aider intentionally has quite limited and narrow "agentic behavior"
+to avoid long delays, high token costs
+and the need for users to repeatedly code review incorrect solutions.
+It's also worth noting that aider currently does not use
+RAG, vector search, tools or give the LLM access to search the web
+or unilaterally execute code.
+
+Aider is first and foremost an interactive tool for engineers to get real work done in
+real code bases using a chat interface.
+Aider provides a pair programming UX where users can ask for a change 
+and see code edits performed in real-time.
+Aider can also offer additional help like fixing lint or test errors,
+but the user is always in full interactive control.
+This allows them to quickly steer misunderstandings back on course and
+avoid wasting time and token costs.
+
+
+## Benchmark methodology
+
+Benchmarking was conducted as follows:
+
+- Aider with GPT-4o was launched in each problem's git repository
+with the problem statement
+submitted as the opening chat message from "the user".
+- After that aider ran as normal, except all of aider's
+suggestions were always accepted without user approval.
+- A [simple harness](https://github.com/paul-gauthier/aider-swe-bench#the-aider-agent) was used to retry the SWE Bench problem if aider produced code that wasn't *plausibly correct*.
+Plausibly correct means that aider reported that it had successfully edited the repo
+without causing syntax errors or breaking any *pre-existing* tests.
+- If the solution from aider with GPT-4o wasn't plausible, the harness launched aider to try again from scratch using Claude 3 Opus.
+- If no plausible solution was found after those two tries, the harness picked the "most plausible" solution with the fewest edit/lint/test problems.
+
+It's important to be clear that
+*aider and the benchmark harness
+only had access to the pre-existing tests in each problem's repo*.
+The held out "acceptance tests" were *only* used
+after benchmarking to compute statistics on which problems aider
+correctly resolved.
+
+This is the same approach
+that was used for
+[aider's recent SOTA result on SWE Bench Lite](https://aider.chat/2024/05/22/swe-bench-lite.html).
+For the Lite benchmark,
+aider alternated between GPT-4o and Opus for up to six total attempts.
+To manage the cost of running the main SWE Bench benchmark,
+aider was limited to two total attempts:
+one with GPT-4o and one with Opus.
+
+For a detailed discussion of the benchmark
+methodology, see the
+[article about aider's SWE Bench Lite results](https://aider.chat/2024/05/22/swe-bench-lite.html).
+Also, the
+[aider SWE Bench repository on GitHub](https://github.com/paul-gauthier/aider-swe-bench)
+contains the harness and statistics code used for the benchmarks.
+
+The benchmarking process was similar to how a developer might use aider to
+resolve a GitHub issue:
+
+- They could launch aider in their repo with the command below, which
+tells aider they want to accept every suggestion
+and to use pytest to run tests.
+  - `aider --yes --test-cmd pytest`
+- They could start the chat by pasting in the URL or text of a GitHub issue.
+Aider will pull in the URL's content and then try and resolve the issue.
+- If aider doesn't produce code that lints and tests clean, the user might decide to
+[use git to revert the changes](https://aider.chat/docs/faq.html#how-does-aider-use-git),
+and try again with `aider --opus`.
+
+## Aider with GPT-4o alone was SOTA
+
+Using aider with GPT-4o to make a single attempt at resolving each problem
+achieved a score of 17.0%.
+This was itself a state-of-the-art result, before being surpassed by the main
+result being reported here
+that used aider with both GPT-4o & Opus.
+
+## Aider with GPT-4o & Opus
+
+The benchmark harness started by using aider with GPT-4o to try
+and resolve each problem.
+For problems where this didn't produce a plausible solution,
+the harness tried again using aider with Opus.
+So at most, two attempts were made for each problem.
+
+The table below breaks down the proposed solutions that
+were found from each attempt at the 570 problems.
+A proposed solution is either:
+
+- A plausible solution where
+aider reported no outstanding errors from editing, linting and testing.
+- Or, the "most plausible" solution generated by either attempt, with the
+[fewest outstanding editing, linting or testing errors](https://aider.chat/2024/05/22/swe-bench-lite.html#finding-a-plausible-solution).
+
+The table also provides details on the 108 solutions that were ultimately
+verified as correctly resolving their issue.
+
+| Attempt | Agent |Number&nbsp;of<br>proposed<br>solutions|Percent&nbsp;of<br>proposed<br>solutions| Number&nbsp;of<br/>correctly<br>resolved<br>solutions | Percent&nbsp;of<br>correctly<br>resolved<br>solutions | Score&nbsp;on<br>SWE&nbsp;Bench<br>Lite |
+|:--------:|------------|---------:|---------:|----:|---:|--:|
+| 1 | Aider with GPT-4o    | 419 | 73.5% | 87 | 80.6% | 15.3% |
+| 2 | Aider with Opus      | 151 | 26.5% | 21 | 19.4% |  3.7% |
+| **Total** | | **570** | **100%** | **108** | **100%** | **18.9%** |
+
+## Non-plausible but correct solutions?
+
+A solution doesn't actually have to be plausible in order to correctly resolve the issue.
+Recall that plausible is simply defined as aider
+reporting that it successfully completed all file edits,
+repaired and resolved any linting errors
+and resolved any test failures.
+But there are many reasons why aider might fail to do those things
+and yet still produce a solution that will pass
+acceptance testing:
+
+- There may have been pre-existing failing tests in the repo,
+before aider even started working on the SWE Bench problem.
+Aider may not have resolved such issues, and yet they may not be
+relevant to the acceptance testing.
+The SWE Bench acceptance testing just confirms that tests pass or fail
+in the same pattern as the "gold patch" developed by a human to resolve the
+problem.
+Some tests may fail during acceptance testing,
+and that's ok as long as they failed for the gold
+patch too.
+- There may have been pre-existing linting problems in the repo.
+If lingering linting issues affected code paths that are not well tested,
+they may not impact acceptance testing.
+- Aider may have reported file editing errors because it thought the LLM
+specified edits that it wasn't able to successfully apply.
+This can only happen when the LLM specified edits in
+a way that doesn't comply with the editing instructions in the system prompt.
+Given that the LLM isn't complying with the system prompt,
+it may have become confused and
+asked for redundant or otherwise irrelevant edits.
+Such outstanding edit errors might not be fatal for acceptance testing.
+- Etc.
+
+Keeping all this in mind, we can understand why
+GPT-4o accounts for 15.3% of the benchmark score in the table above,
+but benchmarking with just one attempt of aider with GPT-4o scored 17.0%.
+When an Opus attempt is allowed after GPT-4o,
+it may propose some *incorrect* solutions which
+are "more plausible" than some of GPT-4o's non-plausible solutions.
+These more plausible, incorrect solutions can
+eclipse some of
+the earlier non-plausible correct solutions that GPT-4o generated.
+This is why GPT-4o's score in the table 
+showing the combined GPT-4o & Opus results (15.3%)
+is lower than the result from just one try using aider with GPT-4o (17.0%).
+
+For these reasons, adding additional attempts is not guaranteed to monotonically
+increase the number of resolved problems.
+New solutions may resolve some new problems but they may also
+eclipse and discard some of the previous non-plausible correct solutions.
+
+Luckily, the net effect of additional attempts
+usually increases or at least maintains the
+number of resolved solutions.
+This was the case for all the attempts made in both this main SWE Bench result and the
+earlier Lite result.
+
+## Computing the benchmark score
+
+The benchmark harness produced one proposed solution for each of
+the 570 SWE Bench problems.
+
+A separate evaluation script was used to
+test each of these solutions with the full test suite,
+including the held out acceptance tests.
+For this final acceptance testing, any edits that aider made to tests
+were discarded.
+This ensured that the correct,
+unmodified test suite was used for acceptance testing.
+The evaluation script compared each proposed solution's test results
+with results from testing
+the "gold" patch that was developed by a human to correctly resolve the issue.
+If they matched, the proposed solution correctly resolved the issue.
+
+These acceptance tests were only ever run outside of aider
+and the benchmark harness, and only to compute statistics about the
+correctly resolved instances.
+They were never run, used, or even visible during aider's attempts to resolve the problems.
+
+Aider correctly resolved 108 out of 570 SWE Bench instances that were benchmarked,
+or 18.9%.
+
+## Acknowledgments
+
+Much thanks to the team behind the
+[SWE Bench](https://www.swebench.com)
+family of AI coding benchmarks.
+Also thanks to Albert Örwall who has
+[dockerized the SWE Bench evaluation scripts](https://github.com/aorwall/SWE-bench-docker)
+making it faster, easier, and more reliable to run the acceptance tests.
+
+
+## References
+
+All of aider's results reported here are pass@1 results,
+obtained without using the SWE Bench `hints_text`.
+
+The "aider agent" internally makes multiple "attempts" at solving the problem,
+but it picks and returns one single candidate solution.
+Only that one candidate solution is evaluated with the acceptance tests
+and contributes to the benchmark score.
+Thus it is a pass@1 result.
+
+This is contrast to a pass@N result for N>1, where N attempts are made
+and all N solutions are evaluated by the acceptance tests.
+If *any* of the N solution pass, that counts as a pass@N success.
+
+Below are the references for the other pass@1 unhinted SWE-Bench results
+displayed in the graph at the beginning of this article.
+
+- [13.9% Devin, benchmarked on 570 instances.](https://www.cognition.ai/post/swe-bench-technical-report)
+- [13.8% Amazon Q Developer Agent, benchmarked on 2,294 instances.](https://www.swebench.com)
+- [12.5% SWE- Agent + GPT-4, benchmarked on 2,294 instances.](https://www.swebench.com)
+- [10.6% AutoCode Rover, benchmarked on 2,294 instances.](https://arxiv.org/pdf/2404.05427v2)
+- [10.5% SWE- Agent + Opus, benchmarked on 2,294 instances.](https://www.swebench.com)
+
+The graph contains average pass@1 results for AutoCodeRover.
+The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover)
+features their pass@3 results
+without being clearly labeled.
+Table 2 of their
+[paper](https://arxiv.org/pdf/2404.05427v2)
+reports an `ACR-avg` result of 10.59% which is an average pass@1 result.
+
--- a/website/_sass/custom/custom.scss
+++ b/website/_sass/custom/custom.scss
@ -0,0 +1,95 @@
+
+.btn {
+  display: inline-block;
+  width: auto;
+}
+.btn + .btn {
+  margin-top: 0;
+  margin-left: 0.75rem;
+}
+.post {
+  background: #fff;
+  box-shadow: 0 2px 5px rgba(0,0,0,0.1);
+  margin-bottom: 2em;
+  padding: 1em;
+  border-radius: 4px;
+}
+.post-date {
+  color: #777;
+  font-size: 0.85em;
+  margin-bottom: 1em;
+  display: block;
+}
+
+.post-highlight {
+  max-width: 20em; /* Assuming the base font-size is 16px, 12.5em is equivalent to 200px */
+  margin-right: 1em;
+  margin-left: 1em;
+}
+
+@media (max-width: 768px) {
+  .post-highlight {
+    max-width: 30em; /* Larger size on mobile */
+  }
+}
+
+.post-content {
+  display: flex;
+  align-items: flex-start;
+  flex: 1;
+}
+
+.chat-transcript {
+    font-family: 'Courier New', Courier, monospace;
+    background-color: #000;
+    color: #0f0;
+    padding: 1em 1em 1em 1em;
+    border-radius: 5px;
+    margin-top: 50px;
+}
+
+.chat-transcript blockquote {
+    padding: 0;
+    margin-left: 0;
+    color: #819198;
+    border-left: none
+}
+
+.chat-transcript blockquote>:first-child {
+    margin-top: 0
+}
+
+.chat-transcript blockquote>:last-child {
+    margin-bottom: 0
+}
+
+
+.chat-transcript li,
+.chat-transcript p {
+    color: #00FFFF;
+}
+
+.chat-transcript h1 {
+    display: none;
+}
+
+.chat-transcript h4 {
+    color: #32FF32;
+    border-top: 1px solid #32FF32;
+    padding-top: 10px;
+}
+
+.chat-transcript h4::before {
+    content: '> ';
+}
+
+.chat-transcript blockquote p {
+    color: #ffffff;
+}
+.chat-transcript::before {
+    content: url('data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" width="90" height="25"><circle cx="7" cy="15" r="7" fill="%23f92672"/><circle cx="27" cy="15" r="7" fill="%23f4bf75"/><circle cx="47" cy="15" r="7" fill="%23a6e22e"/></svg>');
+    display: block;
+    margin-bottom: 0;
+    position: relative;
+    top: -0.5em;
+}
--- a/website/assets/2024-03-07-claude-3.jpg
+++ b/website/assets/2024-03-07-claude-3.jpg
--- a/website/assets/2024-03-07-claude-3.svg
+++ b/website/assets/2024-03-07-claude-3.svg
--- a/website/assets/2024-04-09-gpt-4-turbo-laziness.jpg
+++ b/website/assets/2024-04-09-gpt-4-turbo-laziness.jpg
--- a/website/assets/2024-04-09-gpt-4-turbo-laziness.svg
+++ b/website/assets/2024-04-09-gpt-4-turbo-laziness.svg
--- a/website/assets/2024-04-09-gpt-4-turbo.jpg
+++ b/website/assets/2024-04-09-gpt-4-turbo.jpg
--- a/website/assets/2024-04-09-gpt-4-turbo.svg
+++ b/website/assets/2024-04-09-gpt-4-turbo.svg
--- a/website/assets/aider-browser-social.mp4
+++ b/website/assets/aider-browser-social.mp4
--- a/website/assets/aider-square.jpg
+++ b/website/assets/aider-square.jpg
--- a/website/assets/aider.jpg
+++ b/website/assets/aider.jpg
--- a/website/assets/benchmarks-0125.jpg
+++ b/website/assets/benchmarks-0125.jpg
--- a/website/assets/benchmarks-0125.svg
+++ b/website/assets/benchmarks-0125.svg
--- a/website/assets/benchmarks-1106.jpg
+++ b/website/assets/benchmarks-1106.jpg
--- a/website/assets/benchmarks-1106.svg
+++ b/website/assets/benchmarks-1106.svg
--- a/website/assets/benchmarks-speed-1106.jpg
+++ b/website/assets/benchmarks-speed-1106.jpg
--- a/website/assets/benchmarks-speed-1106.svg
+++ b/website/assets/benchmarks-speed-1106.svg
--- a/website/assets/benchmarks-udiff.jpg
+++ b/website/assets/benchmarks-udiff.jpg
--- a/website/assets/benchmarks-udiff.svg
+++ b/website/assets/benchmarks-udiff.svg
--- a/website/assets/benchmarks.jpg
+++ b/website/assets/benchmarks.jpg
--- a/website/assets/benchmarks.svg
+++ b/website/assets/benchmarks.svg
--- a/website/assets/browser.jpg
+++ b/website/assets/browser.jpg
--- a/website/assets/figure.png
+++ b/website/assets/figure.png
--- a/website/assets/icons/android-chrome-192x192.png
+++ b/website/assets/icons/android-chrome-192x192.png
--- a/website/assets/icons/android-chrome-384x384.png
+++ b/website/assets/icons/android-chrome-384x384.png
--- a/website/assets/icons/apple-touch-icon.png
+++ b/website/assets/icons/apple-touch-icon.png
--- a/website/assets/icons/browserconfig.xml
+++ b/website/assets/icons/browserconfig.xml
@ -0,0 +1,9 @@
+<?xml version="1.0" encoding="utf-8"?>
+<browserconfig>
+    <msapplication>
+        <tile>
+            <square150x150logo src="/assets/icons/mstile-150x150.png"/>
+            <TileColor>#da532c</TileColor>
+        </tile>
+    </msapplication>
+</browserconfig>
--- a/website/assets/icons/favicon-16x16.png
+++ b/website/assets/icons/favicon-16x16.png
--- a/website/assets/icons/favicon-32x32.png
+++ b/website/assets/icons/favicon-32x32.png
--- a/website/assets/icons/favicon.ico
+++ b/website/assets/icons/favicon.ico
--- a/website/assets/icons/mstile-150x150.png
+++ b/website/assets/icons/mstile-150x150.png
--- a/website/assets/icons/safari-pinned-tab.svg
+++ b/website/assets/icons/safari-pinned-tab.svg
@ -0,0 +1,32 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
+ "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
+<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
+ width="436.000000pt" height="436.000000pt" viewBox="0 0 436.000000 436.000000"
+ preserveAspectRatio="xMidYMid meet">
+<metadata>
+Created by potrace 1.14, written by Peter Selinger 2001-2017
+</metadata>
+<g transform="translate(0.000000,436.000000) scale(0.100000,-0.100000)"
+fill="#000000" stroke="none">
+<path d="M0 2180 l0 -2180 2180 0 2180 0 0 2180 0 2180 -2180 0 -2180 0 0
+-2180z m2705 1818 c20 -20 28 -121 30 -398 l2 -305 216 -5 c118 -3 218 -8 222
+-12 3 -3 10 -46 15 -95 5 -48 16 -126 25 -172 17 -86 17 -81 -17 -233 -14 -67
+-13 -365 2 -438 21 -100 22 -159 5 -247 -24 -122 -24 -363 1 -458 23 -88 23
+-213 1 -330 -9 -49 -17 -109 -17 -132 l0 -43 203 0 c111 0 208 -4 216 -9 10
+-6 18 -51 27 -148 8 -76 16 -152 20 -168 7 -39 -23 -361 -37 -387 -10 -18 -21
+-19 -214 -16 -135 2 -208 7 -215 14 -22 22 -33 301 -21 501 6 102 8 189 5 194
+-8 13 -417 12 -431 -2 -12 -12 -8 -146 8 -261 8 -55 8 -95 1 -140 -6 -35 -14
+-99 -17 -143 -9 -123 -14 -141 -41 -154 -18 -8 -217 -11 -679 -11 l-653 0 -11
+33 c-31 97 -43 336 -27 533 5 56 6 113 2 128 l-6 26 -194 0 c-211 0 -252 4
+-261 28 -12 33 -17 392 -6 522 15 186 -2 174 260 180 115 3 213 8 217 12 4 4
+1 52 -5 105 -7 54 -17 130 -22 168 -7 56 -5 91 11 171 10 55 22 130 26 166 4
+36 10 72 15 79 7 12 128 15 665 19 l658 5 8 30 c5 18 4 72 -3 130 -12 115 -7
+346 11 454 10 61 10 75 -1 82 -8 5 -300 9 -650 9 l-636 0 -27 25 c-18 16 -26
+34 -26 57 0 18 -5 87 -10 153 -10 128 5 449 22 472 5 7 26 13 46 15 78 6 1281
+3 1287 -4z"/>
+<path d="M1360 1833 c0 -5 -1 -164 -3 -356 l-2 -347 625 -1 c704 -1 708 -1
+722 7 5 4 7 20 4 38 -29 141 -32 491 -6 595 9 38 8 45 -7 57 -15 11 -139 13
+-675 14 -362 0 -658 -3 -658 -7z"/>
+</g>
+</svg>
--- a/website/assets/icons/site.webmanifest
+++ b/website/assets/icons/site.webmanifest
@ -0,0 +1,19 @@
+{
+    "name": "",
+    "short_name": "",
+    "icons": [
+        {
+            "src": "/assets/icons/android-chrome-192x192.png",
+            "sizes": "192x192",
+            "type": "image/png"
+        },
+        {
+            "src": "/assets/icons/android-chrome-384x384.png",
+            "sizes": "384x384",
+            "type": "image/png"
+        }
+    ],
+    "theme_color": "#ffffff",
+    "background_color": "#ffffff",
+    "display": "standalone"
+}
--- a/website/assets/leaderboard.jpg
+++ b/website/assets/leaderboard.jpg
--- a/website/assets/linting.jpg
+++ b/website/assets/linting.jpg
--- a/website/assets/llms.jpg
+++ b/website/assets/llms.jpg
--- a/website/assets/models-over-time.png
+++ b/website/assets/models-over-time.png
--- a/website/assets/models-over-time.svg
+++ b/website/assets/models-over-time.svg
--- a/website/assets/robot-ast.png
+++ b/website/assets/robot-ast.png
--- a/website/assets/robot-flowchart.png
+++ b/website/assets/robot-flowchart.png
--- a/website/assets/screencast.svg
+++ b/website/assets/screencast.svg
--- a/website/assets/screenshot.png
+++ b/website/assets/screenshot.png
--- a/website/assets/self-assembly.jpg
+++ b/website/assets/self-assembly.jpg
--- a/website/assets/swe_bench.jpg
+++ b/website/assets/swe_bench.jpg
--- a/website/assets/swe_bench.svg
+++ b/website/assets/swe_bench.svg
--- a/website/assets/swe_bench_lite.jpg
+++ b/website/assets/swe_bench_lite.jpg
--- a/website/assets/swe_bench_lite.svg
+++ b/website/assets/swe_bench_lite.svg
--- a/website/assets/udiffs.jpg
+++ b/website/assets/udiffs.jpg
--- a/website/blog/index.html
+++ b/website/blog/index.html
@ -0,0 +1,35 @@
+---
+layout: default
+title: Blog
+nav_order: 1000
+---
+
+<div class="posts">
+    {% for post in site.posts %}
+    {% unless post.draft %}
+    <article class="post">
+      <h2><a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a></h2>
+      <div class="entry">
+        <div class="post-content">
+          {% if post.excerpt %}
+          {{ post.excerpt }}
+          {% else %}
+          {{ post.content | strip_html | truncatewords: 100 }}...
+          {% endif %}
+          {% if post.highlight_image %}
+          <a href="{{ site.baseurl }}{{ post.url }}">
+              <div class="post-highlight">
+                  <img src="{{ site.baseurl }}{{ post.highlight_image }}" alt="Highlight Image">
+              </div>
+          </a>
+          {% endif %}
+        </div>
+      </div>
+
+      <p class="post-date">
+          {{ post.date | date: "%b %-d, %Y" | upcase }}
+      </p>
+    </article>
+    {% endunless %}
+    {% endfor %}
+</div>
--- a/website/docs/benchmarks-0125.md
+++ b/website/docs/benchmarks-0125.md
@ -0,0 +1,41 @@
+---
+title: The January GPT-4 Turbo is lazier than the last version
+excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding than previous GPT-4 versions, according to a new "laziness" benchmark.
+highlight_image: /assets/benchmarks-0125.jpg
+nav_exclude: true
+---
+# The January GPT-4 Turbo is lazier than the last version
+
+[![benchmark results](/assets/benchmarks-0125.svg)](https://aider.chat/assets/benchmarks-0125.svg)
+
+[OpenAI just released a new version of GPT-4 Turbo](https://openai.com/blog/new-embedding-models-and-api-updates).
+This new model is intended to reduce the "laziness" that has been widely observed with the previous `gpt-4-1106-preview` model:
+
+> Today, we are releasing an updated GPT-4 Turbo preview model, gpt-4-0125-preview. This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of “laziness” where the model doesn’t complete a task.
+
+With that in mind, I've been benchmarking the new model using
+aider's existing
+[lazy coding benchmark](https://aider.chat/docs/unified-diffs.html).
+
+## Benchmark results
+
+Overall,
+the new `gpt-4-0125-preview` model seems lazier
+than the November `gpt-4-1106-preview` model:
+
+- It gets worse benchmark scores when using the [unified diffs](https://aider.chat/docs/unified-diffs.html) code editing format.
+- Using aider's older [SEARCH/REPLACE block](https://github.com/paul-gauthier/aider/blob/9033be74bf74ae70459013e54b2ae6a97c47c2e6/aider/coders/editblock_prompts.py#L75-L80) editing format, the new January model outperforms the older November model. But it still performs worse than both models using unified diffs.
+
+## Related reports
+
+This is one in a series of reports
+that use the aider benchmarking suite to assess and compare the code
+editing capabilities of OpenAI's GPT models.
+You can review the other reports
+for additional information:
+
+- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
+- [Code editing benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html).
+- [Aider's lazy coding benchmark](https://aider.chat/docs/unified-diffs.html).
+
+
--- a/website/docs/benchmarks-1106.md
+++ b/website/docs/benchmarks-1106.md
@ -0,0 +1,88 @@
+---
+title: Code editing benchmarks for OpenAI's "1106" models
+excerpt: A quantitative comparison of the code editing capabilities of the new GPT-3.5 and GPT-4 versions that were released in Nov 2023.
+highlight_image: /assets/benchmarks-1106.jpg
+nav_exclude: true
+---
+# Code editing benchmarks for OpenAI's "1106" models
+
+[![benchmark results](/assets/benchmarks-1106.svg)](https://aider.chat/assets/benchmarks-1106.svg)
+
+[![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
+
+[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
+and there's a lot
+of interest about their ability to code compared to the previous versions.
+With that in mind, I've been benchmarking the new models.
+
+[Aider](https://github.com/paul-gauthier/aider)
+is an open source command line chat tool that lets you work with GPT to edit
+code in your local git repo.
+To do this, aider needs to be able to reliably recognize when GPT wants to edit
+your source code,
+determine which files it wants to modify
+and accurately apply the changes it's trying to make.
+Doing a good job on this "code editing" task requires a good LLM, good prompting and
+a good tool driving the interactions with the LLM.
+
+Aider relies on a
+[code editing benchmark](https://aider.chat/docs/benchmarks.html)
+to quantitatively evaluate
+performance
+whenever one of these things changes.
+For example,
+whenever I change aider's prompting or the backend which drives LLM conversations,
+I run the benchmark to make sure these changes produce improvements (not regressions).
+
+The benchmark uses aider to try and complete
+[133 Exercism Python coding exercises](https://github.com/exercism/python).
+For each exercise, Exercism provides a starting python file with stubs for the needed functions,
+a natural language description of the problem to solve
+and a test suite to evaluate whether the coder has correctly solved the problem.
+
+The benchmark gives aider two tries to complete the task:
+
+1. On the first try, aider gives GPT the stub code file to edit and the natural language instructions that describe the problem. This reflects how you code with aider. You add your source code files to the chat and ask for changes, which are automatically applied.
+2. If the test suite fails after the first try, aider gives GPT the test error output and asks it to fix the code. Aider supports this sort of interaction using a command like `/run pytest` to run and share pytest results in the chat with GPT. You can `/run` whatever tests/linters/etc make sense for your language/framework/situation.
+
+## Benchmark results
+
+### gpt-4-1106-preview
+
+For now, I have only benchmarked the GPT-4 models using the `diff` edit method.
+This is the edit format that aider uses by default with gpt-4.
+
+- The new `gpt-4-1106-preview` model seems **2-2.5X faster** than the June GPT-4 model.
+- **It seems better at producing correct code on the first try**. It gets
+53% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
+- The new model seems to perform similar
+(~65%) to the old models (63-64%) after their second chance to correct bugs by reviewing test suite error output.
+
+### gpt-3.5-turbo-1106
+
+I benchmarked the GPT-3.5 models with both the `whole` and `diff` edit format.
+None of the gpt-3.5 models seem able to effectively use the `diff` edit format, including the newest November (1106) model.
+
+The comments below only focus on comparing the `whole` edit format results:
+
+- The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
+- The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
+- The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
+
+
+## Related reports
+
+This is one in a series of reports
+that use the aider benchmarking suite to assess and compare the code
+editing capabilities of OpenAI's GPT models.
+You can review the other reports
+for additional information:
+
+- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
+- [Code editing speed benchmarks for OpenAI's "1106" models](https://aider.chat/2023/11/06/benchmarks-speed-1106.html) compares the performance of the new GPT models.
+
+
+## Updates
+
+Last updated 11/14/23.
+OpenAI has relaxed rate limits so these results are no longer considered preliminary.
--- a/website/docs/benchmarks-speed-1106.md
+++ b/website/docs/benchmarks-speed-1106.md
@ -0,0 +1,55 @@
+---
+title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
+excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
+canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html
+highlight_image: /assets/benchmarks-speed-1106.jpg
+nav_exclude: true
+---
+# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
+
+<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
+
+[![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
+
+[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
+and there's a lot
+of interest about their capabilities and performance.
+With that in mind, I've been benchmarking the new models.
+
+[Aider](https://github.com/paul-gauthier/aider)
+is an open source command line chat tool that lets you work with GPT to edit
+code in your local git repo.
+Aider relies on a
+[code editing benchmark](https://aider.chat/docs/benchmarks.html)
+to quantitatively evaluate
+performance.
+
+This is the latest in a series of reports
+that use the aider benchmarking suite to assess and compare the code
+editing capabilities of OpenAI's GPT models. You can review previous
+reports to get more background on aider's benchmark suite:
+
+- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
+- [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
+
+## Speed
+
+This report compares the **speed** of the various GPT models.
+Aider's benchmark measures the response time of the OpenAI chat completion
+endpoint each time it asks GPT to solve a programming exercise in the benchmark
+suite. These results measure only the time spent waiting for OpenAI to
+respond to the prompt.
+So they are measuring
+how fast these models can
+generate responses which primarily consist of source code.
+
+Some observations:
+
+- **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
+- **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
+- The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
+
+## Updates
+
+Last updated 11/14/23.
+OpenAI has relaxed rate limits so these results are no longer considered preliminary.
--- a/website/docs/benchmarks.md
+++ b/website/docs/benchmarks.md
@ -0,0 +1,364 @@
+---
+title: GPT code editing benchmarks
+excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code editing benchmark suite based on the Exercism python exercises.
+highlight_image: /assets/benchmarks.jpg
+nav_exclude: true
+---
+# GPT code editing benchmarks
+
+[![benchmark results](/assets/benchmarks.svg)](https://aider.chat/assets/benchmarks.svg)
+
+Aider is an open source command line chat tool that lets you work with GPT to edit
+code in your local git repo.
+To do this, aider needs to be able to reliably recognize when GPT wants to edit local files,
+determine which files it wants to modify and what changes to save.
+Such automated
+code editing hinges on using the system prompt
+to tell GPT how to structure code edits in its responses.
+
+Aider currently asks GPT to use simple text based "edit formats", but
+[OpenAI's new function calling
+API](https://openai.com/blog/function-calling-and-other-api-updates)
+looks like a promising way to create more structured edit formats.
+After implementing a couple of function based edit formats,
+I wanted
+to measure the potential benefits
+of switching aider to use them by default.
+
+With this in mind, I developed a
+benchmark based on the [Exercism
+python](https://github.com/exercism/python) coding exercises.
+This
+benchmark evaluates how effectively aider and GPT can translate a
+natural language coding request into executable code saved into
+files that pass unit tests.
+It provides an end-to-end evaluation of not just
+GPT's coding ability, but also its capacity to *edit existing code*
+and *format those code edits* so that aider can save the
+edits to the local source files.
+
+I ran the benchmark
+on all the ChatGPT models (except `gpt-4-32k`), using a variety of edit formats.
+The results were interesting:
+
+  - **Plain text edit formats worked best.** Asking GPT to return an updated copy of the whole file in a standard markdown fenced code block proved to be the most reliable and effective edit format across all GPT-3.5 and GPT-4 models. The results for this `whole` edit format are shown in solid blue in the graph.
+  - **Function calls performed worse.** Using the new functions API for edits performed worse than the above whole file method, for all the models. GPT-3.5 especially produced inferior code and frequently mangled this output format. This was surprising, as the functions API was introduced to enhance the reliability of structured outputs. The results for these `...-func` edit methods are shown as patterned bars in the graph (both green and blue).
+  - **The new June GPT-3.5 models did a bit worse than the old June model.** The performance of the new June (`0613`) versions of GPT-3.5 appears to be a bit worse than the February (`0301`) version. This is visible if you look at the "first attempt" markers on the first three solid blue bars and also by comparing the first three solid green `diff` bars.
+  - **GPT-4 does better than GPT-3.5,** as expected.
+
+The quantitative benchmark results agree with my intuitions
+about prompting GPT for complex tasks like coding. It's beneficial to
+minimize the "cognitive overhead" of formatting the response, allowing
+GPT to concentrate on the coding task at hand.
+
+As a thought experiment, imagine a slack conversation with a junior developer where
+you ask them to write the code to add some new feature to your app.
+They're going to type the response back to you by hand in the chat.
+Should they type out the
+code and wrap it in a normal markdown code block?
+Or should they type up a properly escaped and
+syntactically correct json data structure
+that contains the text of the new code?
+
+Using more complex output formats with GPT seems to cause two issues:
+
+  - It makes GPT write worse code. Keeping the output format simple seems to allow GPT to devote more attention to the actual coding task.
+  - It reduces GPT's adherence to the output format, making it more challenging for tools like aider to accurately identify and apply the edits GPT is attempting to make.
+
+I was expecting to start using function call based edits in aider for both GPT-3.5 and GPT-4.
+But given these benchmark results, I won't be adopting the functions API
+at this time.
+I will certainly plan to benchmark functions again with future versions of the models.
+
+More details on the benchmark, edit formats and results are discussed below.
+
+
+## The benchmark
+
+The benchmark uses
+[133 practice exercises from the Exercism python repository](https://github.com/exercism/python/tree/main/exercises/practice).
+These
+exercises were designed to help individuals learn Python and hone
+their coding skills.
+
+Each exercise includes:
+
+  - [Instructions](https://github.com/exercism/python/blob/main/exercises/practice/anagram/.docs/instructions.md), provided in markdown files.
+  - [Stub python code](https://github.com/exercism/python/blob/main/exercises/practice/anagram/anagram.py) in an *implementation file*, specifying the functions or classes that need to be implemented.
+  - [Unit tests](https://github.com/exercism/python/blob/main/exercises/practice/anagram/anagram_test.py) in a separate python file.
+
+The goal is for GPT to read the instructions, implement the provided function/class skeletons
+and pass all the unit tests. The benchmark measures what percentage of
+the 133 exercises are completed successfully, causing all the associated unit tests to pass.
+
+To start each exercise, aider sends GPT
+the initial contents of the implementation file,
+the Exercism instructions
+and a final instruction:
+
+```
+Use the above instructions to modify the supplied files: <implementation file>
+Keep and implement the existing function or class stubs, they will be called from unit tests.
+Only use standard python libraries, don't suggest installing any packages.
+```
+
+Aider updates the implementation file based on GPT's reply and runs
+the unit tests. If all tests pass, the exercise is considered
+complete. If some tests fail, aider sends GPT a second message with
+the test error output. It only sends the first 50 lines of test errors
+to try and avoid exceeding the context window of the smaller models. Aider
+also includes this final instruction:
+
+```
+See the testing errors above.
+The tests are correct.
+Fix the code in <implementation file> to resolve the errors.
+```
+
+Requiring GPT to fix its first implementation in response to test failures
+is another way in which this benchmark stresses code editing skill.
+This second chance is also important because it
+gives GPT the opportunity to adjust if the
+instructions were imprecise with respect to the
+specific requirements of the unit tests.
+Many of the exercises have multiple paragraphs of instructions,
+and most human coders would likely fail some tests on their
+first try.
+
+The bars in the graph show the percent of exercises that were completed by
+each model and edit format combination. The full bar height represents
+the final outcome following both coding attempts.
+Each bar also has a horizontal mark that shows
+the intermediate performance after the first coding attempt,
+without the benefit of the second try that includes the test error output.
+
+It's worth noting that GPT never gets to see the source code of the
+unit tests during the benchmark. It only sees the error output from
+failed tests. Of course, all of this code was probably part of its
+original training data!
+
+In summary, passing an exercise means GPT was able to:
+
+  - Write the required code (possibly after reviewing test error output),
+  - Correctly package all of the code edits into the edit format so that aider can process and save it to the implementation file.
+
+Conversely, failing an exercise only requires a breakdown in one of
+those steps. In practice, GPT fails at different steps in different
+exercises. Sometimes it simply writes the wrong code. Other times, it
+fails to format the code edits in a way that conforms to the edit
+format, resulting in the code not being saved correctly.
+
+It's worth keeping in mind that changing the edit format often affects
+both aspects of GPT's performance.
+Complex edit formats often lead GPT to write worse code *and* make it less
+successful at formatting the edits correctly.
+
+
+## Edit formats
+
+I benchmarked 4 different edit formats, described below.
+Each description includes a sample response that GPT might provide to a user who
+requests:
+"Change the print from hello to goodbye."
+
+### whole
+
+The
+[whole](https://github.com/paul-gauthier/aider/blob/main/aider/coders/wholefile_prompts.py)
+format asks GPT to return an updated copy of the entire file, including any changes.
+The file should be
+formatted with normal markdown triple-backtick fences, inlined with the rest of its response text.
+
+This format is very similar to how ChatGPT returns code snippets during normal chats, except with the addition of a filename right before the opening triple-backticks.
+
+````
+Here is the updated copy of your file demo.py:
+
+demo.py
+```python
+def main():
+    print("goodbye")
+```
+````
+
+### diff
+
+The [diff](https://github.com/paul-gauthier/aider/blob/main/aider/coders/editblock_prompts.py)
+format also asks GPT to return edits as part of the normal response text,
+in a simple diff format.
+Each edit is a fenced code block that
+specifies the filename and a chunk of ORIGINAL and UPDATED code.
+GPT provides some original lines from the file and then a new updated set of lines.
+
+````
+Here are the changes you requested to demo.py:
+
+```python
+demo.py
+<<<<<<< ORIGINAL
+    print("hello")
+=======
+    print("goodbye")
+>>>>>>> UPDATED
+```
+````
+
+### whole-func
+
+The [whole-func](https://github.com/paul-gauthier/aider/blob/main/aider/coders/wholefile_func_coder.py)
+format requests updated copies of whole files to be returned using the function call API.
+
+
+```
+{
+    "explanation": "Changed hello to goodbye.",
+    "files": [
+        {
+            "path": "demo.py",
+            "content": "def main():\n    print(\"goodbye\")\n"
+        }
+}
+```
+
+### diff-func
+
+The
+[diff-func](https://github.com/paul-gauthier/aider/blob/main/aider/coders/editblock_func_coder.py)
+format requests a list of
+original/updated style edits to be returned using the function call API.
+
+```
+{
+    "explanation": "Changed hello to goodbye.",
+    "edits": [
+        {
+            "path": "demo.py",
+            "original_lines": [
+                "    print(\"hello\")"
+            ],
+            "updated_lines": [
+                "    print(\"goodbye\")"
+            ],
+        }
+    ]
+}
+```
+
+## GPT-3.5's performance
+
+### The `0613` models seem worse?
+
+The GPT-3.5 benchmark results have me fairly convinced that the new
+`gpt-3.5-turbo-0613` and `gpt-3.5-16k-0613` models
+are a bit worse at code editing than
+the older `gpt-3.5-turbo-0301` model.
+
+This is visible in the "first attempt"
+portion of each result, before GPT gets a second chance to edit the code.
+Look at the horizontal white line in the middle of the first three blue bars.
+Performance with the `whole` edit format was 46% for the
+February model and only 39% for the June models.
+
+But also note how much the solid green `diff` bars
+degrade between the February and June GPT-3.5 models.
+They drop from 30% down to about 19%.
+
+I saw other signs of this degraded performance
+in earlier versions of the
+benchmark as well.
+
+### Pathological use of `diff`
+
+When GPT-3.5 is able to correctly generate the `diff` edit format,
+it often uses it in a pathological manner. It places the *entire*
+original source file in the ORIGINAL block and the entire updated file
+in the UPDATED block. This is strictly worse than just using the
+`whole` edit format, as GPT is sending two full copies of the file.
+
+### Hallucinated function calls
+
+When GPT-3.5 uses the functions API
+it is prone to ignoring the JSON Schema that specifies valid functions.
+It often returns a completely novel and semantically
+invalid `function_call` fragment with `"name": "python"`.
+
+The `arguments` attribute is supposed to be a set of key/value pairs
+with the arguments to the function specified in the `name` field.
+Instead, GPT-3.5 frequently just stuffs an entire python
+file into that field.
+
+```
+        "function_call": {
+          "name": "python",
+          "arguments": "def main():\n    print(\"hello\")\n"
+        },
+```
+
+It seems like it might be getting confused by fine-tuning that was
+done for the ChatGPT code interpreter plugin?
+
+
+
+
+## Randomness
+
+The benchmark attempts to be deterministic, always sending identical
+requests for each exercise on repeated runs.
+As part of this effort,
+when sending test error output to GPT,
+it removes the wall-clock timing information that
+is normally included by the `unittest` module.
+
+The benchmark harness also logs SHA hashes of
+all the OpenAI API requests and replies.
+This makes it possible to
+detect randomness or nondeterminism
+in the benchmarking process.
+
+It turns out that the OpenAI chat APIs are not deterministic, even at
+`temperature=0`.  The same identical request will produce multiple
+distinct responses, usually less than 5-10 variations.  This suggests
+that OpenAI may be load balancing their API across a number of
+slightly different instances of the model?
+
+For certain exercises, some of these variable responses pass the unit tests while
+other variants do not. Results for exercises like this, which are
+"on the bubble",
+are therefore a bit random, depending on which variant OpenAI returns.
+
+Given that, it would be ideal to run all 133 exercises many times for each
+model/edit-format combination and report an average performance.
+This would average away the effect of the API variance.
+It would also significantly increase the cost of this sort of benchmarking.
+So I didn't do that.
+
+Benchmarking against 133 exercises already provides some robustness, since
+we are measuring the performance across many exercises.
+
+But to get a sense of how much the API variance impacts the benchmark outcomes,
+I ran all 133 exercises 10 times each
+against `gpt-3.5-turbo-0613` with the `whole` edit format.
+You'll see one set of error bars in the graph, which show
+the range of results from those 10 runs.
+
+The OpenAI API randomness doesn't seem to
+cause a large variance in the overall benchmark results.
+
+## Conclusions
+
+Based on these benchmark results, aider will continue to use
+the `whole` edit format for GPT-3.5, and `diff` for GPT-4.
+
+GPT-4 gets comparable results with the `whole` and `diff` edit formats,
+but using `whole` significantly increases costs and latency compared to `diff`.
+
+The latency of streaming back the entire updated copy of each edited file
+is a real challenge with the `whole` format.
+The GPT-3.5 models are quite responsive, and can
+stream back entire files at reasonable speed.
+Aider displays a progress bar and
+live diffs of the files as they stream in,
+which helps pass the time.
+
+The GPT-4 models are much slower, and waiting for even small files
+to be completely "retyped" on each request is probably unacceptable.
--- a/website/docs/browser.md
+++ b/website/docs/browser.md
@ -0,0 +1 @@
+../_posts/2024-05-02-browser.md
--- a/website/docs/commands.md
+++ b/website/docs/commands.md
@ -0,0 +1,53 @@
+---
+nav_order: 50
+---
+# In-chat commands
+
+
+## In-chat commands
+
+Aider supports commands from within the chat, which all start with `/`.
+Here are some of the most useful in-chat commands:
+
+- `/help`: Show help about all commands
+- `/exit`: Exit the application
+
+## Context management
+
+- `/add <file>`: Add matching files to the chat session using glob patterns
+- `/drop <file>`: Remove matching files from the chat session
+- `/web <url>`: Add the contents of a webpage to the chat
+- `/clear`: Clear the chat history
+- `/ls`: List all known files and those included in the chat session
+- `/tokens`: Report on the number of tokens used by the current chat context
+
+## Git
+
+- `/undo`: Undo the last git commit if it was done by aider
+- `/diff`: Display the diff of the last aider commit
+- `/commit <message>`: Commit edits to the repo made outside the chat (commit message optional)
+- `/git <command>`: Run a git command
+
+## Other
+
+- `/run <command>`: Run a shell command and optionally add the output to the chat, or use `!<command>`.
+- `/test <command>`: Like run, but automatically adds the output to the chat on non-zero exit codes.
+- `/voice`: Speak to aider to [request code changes with your voice](https://aider.chat/docs/voice.html).
+
+# Keybindings
+
+The interactive prompt is built with [prompt-toolkit](https://github.com/prompt-toolkit/python-prompt-toolkit) which provides a lot of Emacs and Vi-style keyboard. Some emacs bindings you may find useful are
+
+- `Ctrl-A` : Move cursor to the start of the line.
+- `Ctrl-B` : Move cursor back one character.
+- `Ctrl-D` : Delete the character under the cursor.
+- `Ctrl-E` : Move cursor to the end of the line.
+- `Ctrl-F` : Move cursor forward one character.
+- `Ctrl-K` : Delete from the cursor to the end of the line.
+- `Ctrl-L` : Clear the screen.
+- `Ctrl-N` : Move down to the next history entry.
+- `Ctrl-P` : Move up to the previous history entry.
+- `Ctrl-R` : Reverse search in command history.
+
+Note: aider currently exits vi normal mode after a single command, (maybe something to do with the esc keybinding?).
+Feel free to investigate and make a PR if you would like to see it fully supported.
--- a/website/docs/conventions.md
+++ b/website/docs/conventions.md
@ -0,0 +1,85 @@
+---
+nav_order: 100
+---
+# Specifying coding conventions
+
+Sometimes you want GPT to be aware of certain coding guidelines,
+like whether to provide type hints, which libraries or packages
+to prefer, etc.
+
+The easiest way to do that with aider is to simply create
+a small markdown file and include it in the chat.
+
+For example, say we want our python code to:
+
+```
+- Prefer httpx over requests for making http requests.
+- Use types everywhere possible.
+```
+
+We would simply create a file like `CONVENTIONS.md` with those lines
+and then we can add it to the aider chat, along with the file(s)
+that we want to edit:
+
+<div class="chat-transcript" markdown="1">
+> $ aider CONVENTIONS.md useragent.py  
+>  
+> Aider v0.24.2-dev  
+> Added CONVENTIONS.md to the chat.  
+> Added useragent.py to the chat.  
+
+#### write a function that uses httpbin to check the user agent on the request. return the ua.  
+
+To fulfill your request, I will:
+
+1. Add an import for `httpx`.
+2. Write a function named `check_user_agent` that makes a GET request to `httpbin.org/user-agent` and returns the user agent string.
+
+Here is the diff for the `useragent.py` file:
+
+```diff
+--- useragent.py
+++ useragent.py
+@@ ... @@
+import httpx
+
+def check_user_agent() -> str:
+    response = httpx.get('https://httpbin.org/user-agent')
+    return response.json()['user-agent']
+```
+
+This function uses `httpx` to make a GET request to `httpbin.org/user-agent`, which returns a JSON response containing the user agent of the HTTP client. The function then extracts the user agent string from the JSON response and returns it.
+
+> Applied edit to useragent.py  
+</div>
+
+GPT correctly used `httpx` and provided type hints.
+See below for the transcript of this same
+coding task conducted *without*
+using `CONVENTIONS.md`.
+In this case, GPT used `requests` and skipped types,
+which is perhaps more typical in small python scripts.
+
+<div class="chat-transcript" markdown="1">
+> $ aider useragent.py  
+>  
+> Aider v0.24.2-dev  
+> Added useragent.py to the chat.  
+
+
+#### write a function that uses httpbin to check the user agent on the request. return the ua.  
+
+```diff
+--- /dev/null
+++ useragent.py
+@@ ... @@
+import requests
+
+def check_user_agent():
+    response = requests.get('https://httpbin.org/user-agent')
+    user_agent = response.json().get('user-agent')
+    return user_agent
+```
+
+> Applied edit to useragent.py  
+</div>
--- a/website/docs/ctags.md
+++ b/website/docs/ctags.md
@ -0,0 +1,246 @@
+---
+title: Improving GPT-4's codebase understanding with ctags
+excerpt: Using ctags to build a "repository map" to increase GPT-4's ability to understand a large code base.
+highlight_image: /assets/robot-flowchart.png
+nav_exclude: true
+---
+# Improving GPT-4's codebase understanding with ctags
+
+![robot flowchat](/assets/robot-flowchart.png)
+
+## Updated
+
+Aider no longer uses ctags to build a repo map.
+Please see the newer article about
+[using tree-sitter to build a better repo map](https://aider.chat/docs/repomap.html).
+
+-------
+
+GPT-4 is extremely useful for "self-contained" coding tasks,
+like generating brand new code or modifying a pure function
+that has no dependencies.
+
+But it's difficult to use GPT-4 to modify or extend
+a large, complex pre-existing codebase.
+To modify such code, GPT needs to understand the dependencies and APIs
+which interconnect its subsystems.
+Somehow we need to provide this "code context" to GPT
+when we ask it to accomplish a coding task. Specifically, we need to:
+
+  - Help GPT understand the overall codebase, so that it
+can decifer the meaning of code with complex dependencies and generate
+new code that respects and utilizes existing abstractions.
+  - Convey all of this "code context" to GPT in an
+efficient manner that fits within the 8k-token context window.
+
+To address these issues, `aider` now
+sends GPT a **concise map of your whole git repository**
+that includes
+all declared variables and functions with call signatures.
+This *repo map* is built automatically using `ctags`, which
+extracts symbol definitions from source files. Historically,
+ctags were generated and indexed by IDEs and editors to
+help humans search and navigate large codebases.
+Instead, we're going to use ctags to help GPT better comprehend, navigate
+and edit code in larger repos.
+
+To get a sense of how effective this can be, this
+[chat transcript](https://aider.chat/examples/add-test.html)
+shows GPT-4 creating a black box test case, **without being given
+access to the source code of the function being tested or any of the
+other code in the repo.**
+Using only the meta-data in the repo map, GPT is able to figure out how to
+call the method to be tested, as well as how to instantiate multiple
+class objects that are required to prepare for the test.
+
+To code with GPT-4 using the techniques discussed here:
+
+
+  - Install [aider](https://aider.chat/docs/install.html).
+  - Install [universal ctags](https://aider.chat/docs/install.html#install-universal-ctags-optional).
+  - Run `aider` inside your repo, and it should say "Repo-map: universal-ctags using 1024 tokens".
+
+## The problem: code context
+
+GPT-4 is great at "self contained" coding tasks, like writing or
+modifying a pure function with no external dependencies.
+GPT can easily handle requests like "write a
+Fibonacci function" or "rewrite the loop using list
+comprehensions", because they require no context beyond the code
+being discussed.
+
+Most real code is not pure and self-contained, it is intertwined with
+and depends on code from many different files in a repo.
+If you ask GPT to "switch all the print statements in class Foo to
+use the BarLog logging system", it needs to see the code in the Foo class
+with the prints, and it also needs to understand the project's BarLog
+subsystem.
+
+A simple solution is to **send the entire codebase** to GPT along with
+each change request. Now GPT has all the context! But this won't work
+for even moderately
+sized repos, because they won't fit into the 8k-token context window.
+
+A better approach is to be selective,
+and **hand pick which files to send**.
+For the example above, you could send the file that
+contains the Foo class
+and the file that contains the BarLog logging subsystem.
+This works pretty well, and is supported by `aider` -- you
+can manually specify which files to "add to the chat" you are having with GPT.
+
+But it's not ideal to have to manually identify the right
+set of files to add to the chat.
+And sending whole files is a bulky way to send code context,
+wasting the precious 8k context window.
+GPT doesn't need to see the entire implementation of BarLog,
+it just needs to understand it well enough to use it.
+You may quickly run out of context window if you
+send many files worth of code just to convey context.
+
+## Using a repo map to provide context
+
+The latest version of `aider` sends a **repo map** to GPT along with
+each change request. The map contains a list of all the files in the
+repo, along with the symbols which are defined in each file. Callables
+like functions and methods also include their signatures.
+
+Here's a
+sample of the map of the aider repo, just showing the maps of
+[main.py](https://github.com/paul-gauthier/aider/blob/main/aider/main.py)
+and
+[io.py](https://github.com/paul-gauthier/aider/blob/main/aider/io.py)
+:
+
+```
+aider/
+   ...
+   main.py:
+      function
+         main (args=None, input=None, output=None)
+      variable
+         status
+   ...
+   io.py:
+      class
+         FileContentCompleter
+         InputOutput
+      FileContentCompleter
+         member
+            __init__ (self, fnames, commands)
+            get_completions (self, document, complete_event)
+      InputOutput
+         member
+            __init__ (self, pretty, yes, input_history_file=None, chat_history_file=None, input=None, output=None)
+            ai_output (self, content)
+            append_chat_history (self, text, linebreak=False, blockquote=False)
+            confirm_ask (self, question, default="y")
+            get_input (self, fnames, commands)
+            prompt_ask (self, question, default=None)
+            tool (self, *messages, log_only=False)
+            tool_error (self, message)
+   ...
+```
+
+Mapping out the repo like this provides some benefits:
+
+  - GPT can see variables, classes, methods and function signatures from everywhere in the repo. This alone may give it enough context to solve many tasks. For example, it can probably figure out how to use the API exported from a module just based on the details shown in the map.
+  - If it needs to see more code, GPT can use the map to figure out by itself which files it needs to look at. GPT will then ask to see these specific files, and `aider` will automatically add them to the chat context (with user approval).
+
+Of course, for large repositories even just the map might be too large
+for the context window.  However, this mapping approach opens up the
+ability to collaborate with GPT-4 on larger codebases than previous
+methods.  It also reduces the need to manually curate which files to
+add to the chat context, empowering GPT to autonomously identify
+relevant files for the task at hand.
+
+## Using ctags to make the map
+
+Under the hood, `aider` uses
+[universal ctags](https://github.com/universal-ctags/ctags)
+to build the
+map. Universal ctags can scan source code written in many
+languages, and extract data about all the symbols defined in each
+file.
+
+Historically, ctags were generated and indexed by IDEs or code editors
+to make it easier for a human to search and navigate a
+codebase, find the implementation of functions, etc.
+Instead, we're going to use ctags to help GPT navigate and understand the codebase.
+
+Here is the type of output you get when you run ctags on source code. Specifically,
+this is the
+`ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
+
+```json
+{
+  "_type": "tag",
+  "name": "main",
+  "path": "aider/main.py",
+  "pattern": "/^def main(args=None, input=None, output=None):$/",
+  "kind": "function",
+  "signature": "(args=None, input=None, output=None)"
+}
+{
+  "_type": "tag",
+  "name": "status",
+  "path": "aider/main.py",
+  "pattern": "/^    status = main()$/",
+  "kind": "variable"
+}
+```
+
+The repo map is built using this type of `ctags` data,
+but formatted into the space
+efficient hierarchical tree format shown earlier.
+This is a format that GPT can easily understand
+and which conveys the map data using a
+minimal number of tokens.
+
+## Example chat transcript
+
+This
+[chat transcript](https://aider.chat/examples/add-test.html)
+shows GPT-4 creating a black box test case, **without being given
+access to the source code of the function being tested or any of the
+other code in the repo.** Instead, GPT is operating solely off
+the repo map.
+
+Using only the meta-data in the map, GPT is able to figure out how to call the method to be tested, as well as how to instantiate multiple class objects that are required to prepare for the test.
+
+GPT makes one reasonable mistake writing the first version of the test, but is
+able to quickly fix the issue after being shown the `pytest` error output.
+
+## Future work
+
+Just as "send the whole codebase to GPT with every request"
+is not an efficient solution to this problem,
+there are probably better approaches than
+"send the whole repo map with every request".
+Sending an appropriate subset of the repo map would help `aider` work
+better with even larger repositories which have large maps.
+
+Some possible approaches to reducing the amount of map data are:
+
+  - Distill the global map, to prioritize important symbols and discard "internal" or otherwise less globally relevant identifiers. Possibly enlist `gpt-3.5-turbo` to perform this distillation in a flexible and language agnostic way.
+  - Provide a mechanism for GPT to start with a distilled subset of the global map, and let it ask to see more detail about subtrees or keywords that it feels are relevant to the current coding task.
+  - Attempt to analyize the natural language coding task given by the user and predict which subset of the repo map is relevant. Possibly by analysis of prior coding chats within the specific repo. Work on certain files or types of features may require certain somewhat predictable context from elsewhere in the repo. Vector and keyword search against the chat history, repo map or codebase may help here.
+
+One key goal is to prefer solutions which are language agnostic or
+which can be easily deployed against most popular code languages.
+The `ctags` solution has this benefit, since it comes pre-built
+with support for most popular languages.
+I suspect that Language Server Protocol might be an even
+better tool than `ctags` for this problem.
+But it is more cumbersome to deploy for a broad
+array of languages.
+Users would need to stand up an LSP server for their
+specific language(s) of interest.
+
+## Try it out
+
+To use this experimental repo map feature:
+
+  - Install [aider](https://aider.chat/docs/install.html).
+  - Install [universal ctags](https://aider.chat/docs/install.html#install-universal-ctags-optional).
+  - Run `aider` inside your repo, and it should say "Repo-map: universal-ctags using 1024 tokens".
--- a/website/docs/docker.md
+++ b/website/docs/docker.md
@ -0,0 +1,35 @@
+---
+nav_order: 100
+---
+
+# Run aider with docker
+
+You can run aider via docker without doing any local installation, like this:
+
+```
+docker pull paulgauthier/aider
+docker run -it --volume $(pwd):/app paulgauthier/aider --openai-api-key $OPENAI_API_KEY [...other aider args...]
+```
+
+You should run the above commands from the root of your git repo,
+since the `--volume` arg maps your current directory into the
+docker container.
+Given that, you need to be in the root of your git repo for aider to be able to
+see the repo and all its files.
+
+You should be sure your that
+git repo config contains your user name and email, since the
+docker container won't have your global git config.
+Run these commands while in your git repo, before
+you do the `docker run` command:
+
+```
+git config user.email "you@example.com"
+git config user.name "Your Name"
+```  
+
+
+## Limitations
+
+- When you use the in-chat `/run` command, it will be running shell commands *inside the docker container*. So those commands won't be running in your local environment, which may make it tricky to `/run` tests, etc for your project.
+- The `/voice` command won't work unless you can figure out how to give the docker container access to your host audio device. The container has libportaudio2 installed, so it should work if you can do that.
--- a/website/docs/faq.md
+++ b/website/docs/faq.md
@ -0,0 +1,312 @@
+---
+nav_order: 60
+---
+
+# Frequently asked questions
+{: .no_toc }
+
+- TOC
+{:toc}
+
+## How does aider use git?
+
+Aider works best with code that is part of a git repo.
+Aider is tightly integrated with git, which makes it easy to:
+
+  - Use git to undo any aider changes that you don't like
+  - Go back in the git history to review the changes that aider made to your code
+  - Manage a series of aider's changes on a git branch
+
+Aider specifically uses git in these ways:
+
+  - It asks to create a git repo if you launch it in a directory without one.
+  - Whenever aider edits a file, it commits those changes with a descriptive commit message. This makes it easy to undo or review aider's changes.
+  - Aider takes special care before editing files that already have uncommitted changes (dirty files). Aider will first commit any preexisting changes with a descriptive commit message. This keeps your edits separate from aider's edits, and makes sure you never lose your work if aider makes an inappropriate change.
+
+Aider also allows you to use in-chat commands to `/diff` or `/undo` the last change.
+To do more complex management of your git history, you cat use raw `git` commands,
+either by using `/git` within the chat, or with standard git tools outside of aider.
+
+While it is not recommended, you can disable aider's use of git in a few ways:
+
+  - `--no-auto-commits` will stop aider from git committing each of its changes.
+  - `--no-dirty-commits` will stop aider from committing dirty files before applying its edits.
+  - `--no-git` will completely stop aider from using git on your files. You should ensure you are keeping sensible backups of the files you are working with.
+
+
+## Can I run aider in Google Colab?
+
+User [imabutahersiddik](https://github.com/imabutahersiddik)
+has provided this
+[Colab notebook](https://colab.research.google.com/drive/1J9XynhrCqekPL5PR6olHP6eE--rnnjS9?usp=sharing).
+
+## How can I run aider locally from source code?
+
+To run the project locally, follow these steps:
+
+```
+# Clone the repository:
+git clone git@github.com:paul-gauthier/aider.git
+
+# Navigate to the project directory:
+cd aider
+
+# Install the dependencies listed in the `requirements.txt` file:
+pip install -r requirements.txt
+
+# Run the local version of Aider:
+python -m aider.main
+```
+
+## Can I script aider?
+
+You can script aider via the command line or python.
+
+## Command line
+
+Aider takes a `--message` argument, where you can give it a natural language instruction.
+It will do that one thing, apply the edits to the files and then exit.
+So you could do:
+
+```bash
+aider --message "make a script that prints hello" hello.js
+```
+
+Or you can write simple shell scripts to apply the same instruction to many files:
+
+```bash
+for FILE in *.py ; do
+    aider --message "add descriptive docstrings to all the functions" $FILE
+done
+```
+
+User `aider --help` to see all the command line options, but these are useful for scripting:
+
+```
+--stream, --no-stream
+                      Enable/disable streaming responses (default: True) [env var:
+                      AIDER_STREAM]
+--message COMMAND, --msg COMMAND, -m COMMAND
+                      Specify a single message to send GPT, process reply then exit
+                      (disables chat mode) [env var: AIDER_MESSAGE]
+--message-file MESSAGE_FILE, -f MESSAGE_FILE
+                      Specify a file containing the message to send GPT, process reply,
+                      then exit (disables chat mode) [env var: AIDER_MESSAGE_FILE]
+--yes                 Always say yes to every confirmation [env var: AIDER_YES]
+--auto-commits, --no-auto-commits
+                      Enable/disable auto commit of GPT changes (default: True) [env var:
+                      AIDER_AUTO_COMMITS]
+--dirty-commits, --no-dirty-commits
+                      Enable/disable commits when repo is found dirty (default: True) [env
+                      var: AIDER_DIRTY_COMMITS]
+--dry-run, --no-dry-run
+                      Perform a dry run without modifying files (default: False) [env var:
+                      AIDER_DRY_RUN]
+--commit              Commit all pending changes with a suitable commit message, then exit
+                      [env var: AIDER_COMMIT]
+```
+
+
+## Python
+
+You can also script aider from python:
+
+```python
+from aider.coders import Coder
+from aider.models import Model
+
+# This is a list of files to add to the chat
+fnames = ["greeting.py"]
+
+model = Model("gpt-4-turbo", weak_model="gpt-3.5-turbo")
+
+# Create a coder object
+coder = Coder.create(main_model=model, fnames=fnames)
+
+# This will execute one instruction on those files and then return
+coder.run("make a script that prints hello world")
+
+# Send another instruction
+coder.run("make it say goodbye")
+```
+
+See the
+[Coder.create() and Coder.__init__() methods](https://github.com/paul-gauthier/aider/blob/main/aider/coders/base_coder.py)
+for all the supported arguments.
+
+It can also be helpful to set the equivalend of `--yes` by doing this:
+
+```
+from aider.io import InputOutput
+io = InputOutput(yes=True)
+# ...
+coder = Coder.create(client=client, fnames=fnames, io=io)
+```
+
+## What code languages does aider support?
+
+Aider supports pretty much all the popular coding languages.
+This is partly because top LLMs are fluent in most mainstream languages,
+and familiar with popular libraries, packages and frameworks.
+
+In fact, coding with aider is sometimes the most magical
+when you're working in a language that you
+are less familiar with.
+the LLM often knows the language better than you,
+and can generate all the boilerplate to get to the heart of your
+problem.
+The LLM will often solve your problem in an elegant way
+using a library or package that you weren't even aware of.
+
+Aider uses tree-sitter to do code analysis and help
+the LLM navigate larger code bases by producing
+a [repository map](https://aider.chat/docs/repomap.html).
+
+Aider can currently produce repository maps for most mainstream languages, listed below.
+But aider should work quite well for other languages, even without repo map support.
+
+- C
+- C#
+- C++
+- Emacs Lisp
+- Elixir
+- Elm
+- Go
+- Java
+- Javascript
+- OCaml
+- PHP
+- Python
+- QL
+- Ruby
+- Rust
+- Typescript
+
+## How to use pipx to avoid python package conflicts?
+
+If you are using aider to work on a python project, sometimes your project will require
+specific versions of python packages which conflict with the versions that aider
+requires.
+If this happens, the `pip install` command may return errors like these:
+
+```
+aider-chat 0.23.0 requires somepackage==X.Y.Z, but you have somepackage U.W.V which is incompatible.
+```
+
+You can avoid this problem by installing aider using `pipx`,
+which will install it globally on your system
+within its own python environment.
+This way you can use aider to work on any python project,
+even if that project has conflicting dependencies.
+
+Install [pipx](https://pipx.pypa.io/stable/) then just do:
+
+```
+pipx install aider-chat
+```
+
+## Aider isn't editing my files?
+
+Sometimes the LLM will reply with some code changes that don't get applied to your local files.
+In these cases, aider might say something like "Failed to apply edit to *filename*".
+
+This usually happens because the LLM is not specifying the edits
+to make in the format that aider expects.
+GPT-3.5 is especially prone to disobeying the system prompt instructions in this manner, but it also happens with stronger models.
+
+Aider makes every effort to get the LLM
+to conform, and works hard to deal with
+replies that are "almost" correctly formatted.
+If Aider detects an improperly formatted reply, it gives
+the LLM feedback to try again.
+Also, before each release new versions of aider are
+[benchmarked](https://aider.chat/docs/benchmarks.html).
+This helps prevent regressions in the code editing
+performance of an LLM that could have been inadvertantly
+introduced.
+
+But sometimes the LLM just won't cooperate.
+In these cases, here are some things you might try:
+
+  - Use `/drop` to remove files from the chat session which aren't needed for the task at hand. This will reduce distractions and may help GPT produce properly formatted edits.
+  - Use `/clear` to remove the conversation history, again to help GPT focus.
+  - Try the a different LLM.
+
+## How can I add ALL the files to the chat?
+
+People regularly ask about how to add **many or all of their repo's files** to the chat.
+This is probably not a good idea and will likely do more harm than good.
+
+The best approach is think about which files need to be changed to accomplish
+the task you are working on. Just add those files to the chat.
+
+Usually when people want to add "all the files" it's because they think it
+will give the LLM helpful context about the overall code base.
+Aider will automatically give the LLM a bunch of additional context about
+the rest of your git repo.
+It does this by analyzing your entire codebase in light of the
+current chat to build a compact
+[repository map](https://aider.chat/2023/10/22/repomap.html).
+
+Adding a bunch of files that are mostly irrelevant to the
+task at hand will often distract or confuse the LLM.
+The LLM will give worse coding results, and sometimese even fail to correctly edit files.
+Addings extra files will also increase the token costs on your OpenAI invoice.
+
+Again, it's usually best to just add the files to the chat that will need to be modified.
+If you still wish to add lots of files to the chat, you can:
+
+- Use a wildcard when you launch aider: `aider src/*.py`
+- Use a wildcard with the in-chat `/add` command: `/add src/*.py`
+- Give the `/add` command a directory name and it will recurisvely add every file under that dir: `/add src`
+
+## Can I specify guidelines or conventions?
+
+Sometimes you want the LLM to be aware of certain coding guidelines,
+like whether to provide type hints, which libraries or packages
+to prefer, etc.
+
+Just put any extra instructions in a file
+like `CONVENTIONS.md` and then add it to the chat.
+
+For more details, see this documentation on
+[using a conventions file with aider](https://aider.chat/docs/conventions.html).
+
+## Can I change the system prompts that aider uses?
+
+Aider is set up to support different system prompts and edit formats
+in a modular way. If you look in the `aider/coders` subdirectory, you'll
+see there's a base coder with base prompts, and then there are
+a number of
+different specific coder implementations.
+
+If you're thinking about experimenting with system prompts
+this document about
+[benchmarking GPT-3.5 and GPT-4 on code editing](https://aider.chat/docs/benchmarks.html)
+might be useful background.
+
+While it's not well documented how to add new coder subsystems, you may be able
+to modify an existing implementation or use it as a template to add another.
+
+To get started, try looking at and modifying these files.
+
+The wholefile coder is currently used by GPT-3.5 by default. You can manually select it with `--edit-format whole`.
+
+- wholefile_coder.py
+- wholefile_prompts.py
+
+The editblock coder is currently used by GPT-4o by default. You can manually select it with `--edit-format diff`.
+
+- editblock_coder.py
+- editblock_prompts.py
+
+The universal diff coder is currently used by GPT-4 Turbo by default. You can manually select it with `--edit-format udiff`.
+
+- udiff_coder.py
+- udiff_prompts.py
+
+When experimenting with coder backends, it helps to run aider with `--verbose --no-pretty` so you can see
+all the raw information being sent to/from the LLM in the conversation.
+
+You can also refer to the
+[instructions for installing a development version of aider](https://aider.chat/docs/install.html#install-development-versions-of-aider-optional).
--- a/website/docs/install.md
+++ b/website/docs/install.md
@ -0,0 +1,186 @@
+---
+title: Installation
+nav_order: 20
+---
+
+# Installing aider
+{: .no_toc }
+
+- TOC
+{:toc}
+
+## Quick start
+
+```
+$ pip install aider-chat
+
+# To work with GPT-4o
+$ export OPENAI_API_KEY=your-key-goes-here
+$ aider 
+
+# To work with Claude 3 Opus:
+$ export ANTHROPIC_API_KEY=your-key-goes-here
+$ aider --opus
+```
+
+Or read on for more detailed installation instructions.
+
+## Install git
+
+Make sure you have git installed.
+Here are
+[instructions for installing git in various environments](https://github.com/git-guides/install-git).
+
+## Get your API key
+
+To work with OpenAI's GPT 3.5 or GPT-4 models you need a paid
+[OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key).
+Note that this is different than being a "ChatGPT Plus" subscriber.
+
+To work with Anthropic's models like Claude 3 Opus you need a paid
+[Anthropic API key](https://docs.anthropic.com/claude/reference/getting-started-with-the-api).
+
+## Windows install
+
+```
+# Install aider
+py -m pip install aider-chat
+
+# To work with GPT-4o:
+$ aider --openai-api-key sk-xxx...
+
+# To work with Claude 3 Opus:
+$ aider --anthropic-api-key sk-xxx... --opus
+```
+
+## Mac/Linux install
+
+```
+# Install aider
+python -m pip install aider-chat
+
+# To work with GPT-4o:
+$ aider --openai-api-key sk-xxx...
+
+# To work with Claude 3 Opus:
+$ aider --anthropic-api-key sk-xxx... --opus
+```
+
+## Working with other LLMs
+
+Aider works well with GPT-4o, Claude 3 Opus, GPT-3.5 and supports [connecting to almost any LLM](https://aider.chat/docs/llms.html).
+
+## You are done!
+
+See the [usage instructions](https://aider.chat/docs/usage.html) to start coding with aider.
+
+---
+
+# Optional steps
+
+The steps below are completely optional.
+
+- [Store your API key](#store-your-api-key-optional)
+- [Enable Playwright](#enable-playwright-optional) to enhance the `/web <url>` command.
+- [Enable voice coding](#enable-voice-coding-optional)
+- [Add aider to your editor](#add-aider-to-your-editor-optional)
+- [Install development versions of aider](#install-development-versions-of-aider-optional)
+
+## Store your api key (optional)
+{: .no_toc }
+
+You can place your api key in an environment variable:
+
+* `export OPENAI_API_KEY=sk-...` on Linux or Mac
+* `setx OPENAI_API_KEY sk-...` in Windows PowerShell
+
+Or you can create a `.aider.conf.yml` file in your home directory.
+Put a line in it like this to specify your api key:
+
+```
+openai-api-key: sk-...
+```
+
+## Enable Playwright (optional)
+{: .no_toc }
+
+Aider supports adding web pages to the chat with the `/web <url>` command.
+When you add a url to the chat, aider fetches the page and scrapes its
+content.
+
+By default, aider uses the `httpx` library to scrape web pages, but this only
+works on a subset of web pages.
+Some sites explicitly block requests from tools like httpx.
+Others rely heavily on javascript to render the page content,
+which isn't possible using only httpx.
+
+Aider works best with all web pages if you install
+Playwright's chromium browser and its dependencies:
+
+```
+playwright install --with-deps chromium
+```
+
+See the
+[Playwright for Python documentation](https://playwright.dev/python/docs/browsers#install-system-dependencies)
+for additional information.
+
+
+## Enable voice coding (optional)
+{: .no_toc }
+
+Aider supports [coding with your voice](https://aider.chat/docs/voice.html)
+using the in-chat `/voice` command.
+Aider uses the [PortAudio](http://www.portaudio.com) library to
+capture audio.
+Installing PortAudio is completely optional, but can usually be accomplished like this:
+
+- For Windows, there is no need to install PortAudio.
+- For Mac, do `brew install portaudio`
+- For Linux, do `sudo apt-get install libportaudio2`
+
+## Add aider to your editor (optional)
+{: .no_toc }
+
+Other projects have integrated aider into some IDE/editors.
+It's not clear if they are tracking the latest
+versions of aider,
+so it may be best to just run the latest
+aider in a terminal alongside your editor.
+
+### NeoVim
+{: .no_toc }
+
+[joshuavial](https://github.com/joshuavial) provided a NeoVim plugin for aider:
+
+[https://github.com/joshuavial/aider.nvim](https://github.com/joshuavial/aider.nvim)
+
+### VS Code
+{: .no_toc }
+
+joshuavial also confirmed that aider works inside a VS Code terminal window.
+Aider detects if it is running inside VSCode and turns off pretty/color output,
+since the VSCode terminal doesn't seem to support it well.
+
+[MattFlower](https://github.com/MattFlower) provided a VSCode plugin for aider:
+
+[https://marketplace.visualstudio.com/items?itemName=MattFlower.aider](https://marketplace.visualstudio.com/items?itemName=MattFlower.aider)
+
+### Other editors
+{: .no_toc }
+
+If you are interested in creating an aider plugin for your favorite editor,
+please let me know by opening a
+[GitHub issue](https://github.com/paul-gauthier/aider/issues).
+
+
+## Install development versions of aider (optional)
+{: .no_toc }
+
+If you want to install the very latest development version of aider:
+
+* GitHub hosts the latest version, which is under active development:
+  * `python -m pip install git+https://github.com/paul-gauthier/aider.git`
+* If you've git cloned the aider repository already, you can install "live" from your local copy. This is mostly useful if you are developing aider and want your current modifications to take effect immediately.
+  * `python -m pip install -e .`
+
--- a/website/docs/leaderboards/index.md
+++ b/website/docs/leaderboards/index.md
@ -0,0 +1,224 @@
+---
+highlight_image: /assets/leaderboard.jpg
+nav_order: 900
+---
+
+# Aider LLM Leaderboards
+
+Aider works best with LLMs which are good at *editing* code, not just good at writing
+code.
+To evaluate an LLM's editing skill, aider uses a pair of benchmarks that
+assess a model's ability to consistently follow the system prompt
+to successfully edit code.
+
+The leaderboards below report the results from a number of popular LLMs.
+While [aider can connect to almost any LLM](/docs/llms.html),
+it works best with models that score well on the benchmarks.
+
+## GPT-4o takes the #1 & #2 spots
+
+GPT-4o tops the aider LLM code editing leaderboard at 72.9%, versus 68.4% for Opus. GPT-4o takes second on aider's refactoring leaderboard with 62.9%, versus Opus at 72.3%.
+
+GPT-4o did much better than the 4-turbo models, and seems *much* less lazy.
+
+GPT-4o is also able to use aider's established "diff" edit format that uses
+`SEARCH/REPLACE` blocks.
+This diff format is used by all the other capable models, including Opus and
+the original GPT-4 models
+The GPT-4 Turbo models have all required the "udiff" edit format, due to their
+tendancy to lazy coding.
+
+
+## Code editing leaderboard
+
+[Aider's code editing benchmark](/docs/benchmarks.html#the-benchmark) asks the LLM to edit python source files to complete 133 small coding exercises. This benchmark measures the LLM's coding ability, but also whether it can consistently emit code edits in the format specified in the system prompt.
+
+<table style="width: 100%; max-width: 800px; margin: auto; border-collapse: collapse; box-shadow: 0 2px 4px rgba(0,0,0,0.1); font-size: 14px;">
+  <thead style="background-color: #f2f2f2;">
+    <tr>
+      <th style="padding: 8px; text-align: left;">Model</th>
+      <th style="padding: 8px; text-align: center;">Percent completed correctly</th>
+      <th style="padding: 8px; text-align: center;">Percent using correct edit format</th>
+      <th style="padding: 8px; text-align: left;">Command</th>
+      <th style="padding: 8px; text-align: center;">Edit format</th>
+    </tr>
+  </thead>
+  <tbody>
+    {% assign edit_sorted = site.data.edit_leaderboard | sort: 'pass_rate_2' | reverse %}
+    {% for row in edit_sorted %}
+      <tr style="border-bottom: 1px solid #ddd;">
+        <td style="padding: 8px;">{{ row.model }}</td>
+        <td style="padding: 8px; text-align: center;">{{ row.pass_rate_2 }}%</td>
+        <td style="padding: 8px; text-align: center;">{{ row.percent_cases_well_formed }}%</td>
+        <td style="padding: 8px;"><code>{{ row.command }}</code></td>
+        <td style="padding: 8px; text-align: center;">{{ row.edit_format }}</td>
+      </tr>
+    {% endfor %}
+  </tbody>
+</table>
+
+<canvas id="editChart" width="800" height="450" style="margin-top: 20px"></canvas>
+<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+<script>
+  document.addEventListener('DOMContentLoaded', function () {
+    var ctx = document.getElementById('editChart').getContext('2d');
+    var leaderboardData = {
+      labels: [],
+      datasets: [{
+        label: 'Percent completed correctly',
+        data: [],
+        backgroundColor: 'rgba(54, 162, 235, 0.2)',
+        borderColor: 'rgba(54, 162, 235, 1)',
+        borderWidth: 1
+      },
+      {
+        label: 'Percent using correct edit format',
+        data: [],
+        backgroundColor: 'rgba(255, 99, 132, 0.2)',
+        borderColor: 'rgba(255, 99, 132, 1)',
+        borderWidth: 1
+      }]
+    };
+
+    {% for row in edit_sorted %}
+      leaderboardData.labels.push('{{ row.model }}');
+      leaderboardData.datasets[0].data.push({{ row.pass_rate_2 }});
+      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
+    {% endfor %}
+
+    var leaderboardChart = new Chart(ctx, {
+      type: 'bar',
+      data: leaderboardData,
+      options: {
+        scales: {
+          yAxes: [{
+            scaleLabel: {
+              display: true,
+            },
+            ticks: {
+              beginAtZero: true
+            }
+          }]
+        }
+      }
+    });
+  });
+</script>
+
+## Code refactoring leaderboard
+
+[Aider's refactoring benchmark](https://github.com/paul-gauthier/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure [GPT-4 Turbo's "lazy coding" habit](/2023/12/21/unified-diffs.html).
+
+The refactoring benchmark requires a large context window to
+work with large source files.
+Therefore, results are available for fewer models.
+
+<table style="width: 100%; max-width: 800px; margin: auto; border-collapse: collapse; box-shadow: 0 2px 4px rgba(0,0,0,0.1); font-size: 14px;">
+  <thead style="background-color: #f2f2f2;">
+    <tr>
+      <th style="padding: 8px; text-align: left;">Model</th>
+      <th style="padding: 8px; text-align: center;">Percent completed correctly</th>
+      <th style="padding: 8px; text-align: center;">Percent using correct edit format</th>
+      <th style="padding: 8px; text-align: left;">Command</th>
+      <th style="padding: 8px; text-align: center;">Edit format</th>
+    </tr>
+  </thead>
+  <tbody>
+    {% assign refac_sorted = site.data.refactor_leaderboard | sort: 'pass_rate_1' | reverse %}
+    {% for row in refac_sorted %}
+      <tr style="border-bottom: 1px solid #ddd;">
+        <td style="padding: 8px;">{{ row.model }}</td>
+        <td style="padding: 8px; text-align: center;">{{ row.pass_rate_1 }}%</td>
+        <td style="padding: 8px; text-align: center;">{{ row.percent_cases_well_formed }}%</td>
+        <td style="padding: 8px;"><code>{{ row.command }}</code></td>
+        <td style="padding: 8px; text-align: center;">{{ row.edit_format }}</td>
+      </tr>
+    {% endfor %}
+  </tbody>
+</table>
+
+<canvas id="refacChart" width="800" height="450" style="margin-top: 20px"></canvas>
+<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+<script>
+  document.addEventListener('DOMContentLoaded', function () {
+    var ctx = document.getElementById('refacChart').getContext('2d');
+    var leaderboardData = {
+      labels: [],
+      datasets: [{
+        label: 'Percent completed correctly',
+        data: [],
+        backgroundColor: 'rgba(54, 162, 235, 0.2)',
+        borderColor: 'rgba(54, 162, 235, 1)',
+        borderWidth: 1
+      },
+      {
+        label: 'Percent using correct edit format',
+        data: [],
+        backgroundColor: 'rgba(255, 99, 132, 0.2)',
+        borderColor: 'rgba(255, 99, 132, 1)',
+        borderWidth: 1
+      }]
+    };
+
+    {% for row in refac_sorted %}
+      leaderboardData.labels.push('{{ row.model }}');
+      leaderboardData.datasets[0].data.push({{ row.pass_rate_1 }});
+      leaderboardData.datasets[1].data.push({{ row.percent_cases_well_formed }});
+    {% endfor %}
+
+    var leaderboardChart = new Chart(ctx, {
+      type: 'bar',
+      data: leaderboardData,
+      options: {
+        scales: {
+          yAxes: [{
+            scaleLabel: {
+              display: true,
+            },
+            ticks: {
+              beginAtZero: true
+            }
+          }]
+        }
+      }
+    });
+  });
+</script>
+
+
+## LLM code editing skill by model release date
+
+[![connecting to many LLMs](/assets/models-over-time.svg)](https://aider.chat/assets/models-over-time.svg)
+
+
+## Notes on benchmarking results
+
+The key benchmarking results are:
+
+- **Percent completed correctly** - Measures what percentage of the coding tasks that the LLM completed successfully. To complete a task, the LLM must solve the programming assignment *and* edit the code to implement that solution.
+- **Percent using correct edit format** - Measures the percent of coding tasks where the LLM complied with the edit format specified in the system prompt. If the LLM makes edit mistakes, aider will give it feedback and ask for a fixed copy of the edit. The best models can reliably conform to the edit format, without making errors.
+
+
+## Notes on the edit format
+
+Aider uses different "edit formats" to collect code edits from different LLMs.
+The "whole" format is the easiest for an LLM to use, but it uses a lot of tokens
+and may limit how large a file can be edited.
+Models which can use one of the diff formats are much more efficient,
+using far fewer tokens.
+Models that use a diff-like format are able to 
+edit larger files with less cost and without hitting token limits.
+
+Aider is configured to use the best edit format for the popular OpenAI and Anthropic models
+and the [other models recommended on the LLM page](/docs/llms.html).
+For lesser known models aider will default to using the "whole" editing format
+since it is the easiest format for an LLM to use.
+
+## Contributing benchmark results
+
+Contributions of benchmark results are welcome!
+See the
+[benchmark README](https://github.com/paul-gauthier/aider/blob/main/benchmark/README.md)
+for information on running aider's code editing benchmarks.
+Submit results by opening a PR with edits to the
+[benchmark results data files](https://github.com/paul-gauthier/aider/blob/main/_data/).
--- a/website/docs/llms.md
+++ b/website/docs/llms.md
@ -0,0 +1,460 @@
+---
+title: Connecting to LLMs
+nav_order: 70
+---
+
+# Aider can connect to most LLMs
+
+[![connecting to many LLMs](/assets/llms.jpg)](https://aider.chat/assets/llms.jpg)
+
+## Best models
+
+**Aider works best with [GPT-4o](#openai) and [Claude 3 Opus](#anthropic),**
+as they are the very best models for editing code.
+
+## Free models
+
+Aider works with a number of **free** API providers:
+
+- Google's [Gemini 1.5 Pro](#gemini) is the most capable free model to use with aider, with
+code editing capabilities similar to GPT-3.5.
+- You can use [Llama 3 70B on Groq](#groq) which is comparable to GPT-3.5 in code editing performance.
+- The [Deepseek Chat v2](#deepseek) model work well with aider, better than GPT-3.5. Deepseek currently offers 5M free tokens of API usage.
+- Cohere also offers free API access to their [Command-R+ model](#cohere), which works with aider as a *very basic* coding assistant.
+
+## Local models
+
+Aider can work also with local models, for example using [Ollama](#ollama).
+It can also access
+local models that provide an
+[Open AI compatible API](#openai-compatible-apis).
+
+## Use a capable model
+
+Check
+[Aider's LLM leaderboards](https://aider.chat/docs/leaderboards/)
+to see which models work best with aider.
+
+Be aware that aider may not work well with less capable models.
+If you see the model returning code, but aider isn't able to edit your files
+and commit the changes...
+this is usually because the model isn't capable of properly
+returning "code edits".
+Models weaker than GPT 3.5 may have problems working well with aider.
+
+## Configuring models
+
+- [OpenAI](#openai)
+- [Anthropic](#anthropic)
+- [Gemini](#gemini)
+- [Groq & Llama3](#groq)
+- [Cohere](#cohere)
+- [Azure](#azure)
+- [OpenRouter](#openrouter)
+- [Ollama](#ollama)
+- [Deepseek](#deepseek)
+- [OpenAI compatible APIs](#openai-compatible-apis)
+- [Other LLMs](#other-llms)
+- [Model warnings](#model-warnings)
+- [Editing format](#editing-format)
+- [Using a .env file](#using-a-env-file)
+
+Aider uses the LiteLLM package to connect to LLM providers.
+The [LiteLLM provider docs](https://docs.litellm.ai/docs/providers)
+contain more detail on all the supported providers,
+their models and any required environment variables.
+
+## OpenAI
+
+To work with OpenAI's models, you need to provide your
+[OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key)
+either in the `OPENAI_API_KEY` environment variable or
+via the `--openai-api-key` command line switch.
+
+Aider has some built in shortcuts for the most popular OpenAI models and
+has been tested and benchmarked to work well with them:
+
+```
+pip install aider-chat
+
+export OPENAI_API_KEY=<key> # Mac/Linux
+setx   OPENAI_API_KEY <key> # Windows
+
+# GPT-4o is the best model, used by default
+aider
+
+# GPT-4 Turbo (1106)
+aider --4-turbo
+
+# GPT-3.5 Turbo
+aider --35-turbo
+
+# List models available from OpenAI
+aider --models openai/
+```
+
+You can use `aider --model <model-name>` to use any other OpenAI model.
+For example, if you want to use a specific version of GPT-4 Turbo
+you could do `aider --model gpt-4-0125-preview`.
+
+## Anthropic
+
+To work with Anthropic's models, you need to provide your
+[Anthropic API key](https://docs.anthropic.com/claude/reference/getting-started-with-the-api)
+either in the `ANTHROPIC_API_KEY` environment variable or
+via the `--anthropic-api-key` command line switch.
+
+Aider has some built in shortcuts for the most popular Anthropic models and
+has been tested and benchmarked to work well with them:
+
+```
+pip install aider-chat
+
+export ANTHROPIC_API_KEY=<key> # Mac/Linux
+setx   ANTHROPIC_API_KEY <key> # Windows
+
+# Claude 3 Opus
+aider --opus
+
+# Claude 3 Sonnet
+aider --sonnet
+
+# List models available from Anthropic
+aider --models anthropic/
+```
+
+You can use `aider --model <model-name>` to use any other Anthropic model.
+For example, if you want to use a specific version of Opus
+you could do `aider --model claude-3-opus-20240229`.
+
+## Gemini
+
+Google currently offers
+[*free* API access to the Gemini 1.5 Pro model](https://ai.google.dev/pricing).
+This is the most capable free model to use with aider,
+with code editing capability that's comparable to GPT-3.5.
+You'll need a [Gemini API key](https://aistudio.google.com/app/u/2/apikey).
+
+```
+pip install aider-chat
+
+export GEMINI_API_KEY=<key> # Mac/Linux
+setx   GEMINI_API_KEY <key> # Windows
+
+aider --model gemini/gemini-1.5-pro-latest
+
+# List models available from Gemini
+aider --models gemini/
+```
+
+## GROQ
+
+Groq currently offers *free* API access to the models they host.
+The Llama 3 70B model works
+well with aider and is comparable to GPT-3.5 in code editing performance.
+You'll need a [Groq API key](https://console.groq.com/keys).
+
+To use **Llama3 70B**:
+
+```
+pip install aider-chat
+
+export GROQ_API_KEY=<key> # Mac/Linux
+setx   GROQ_API_KEY <key> # Windows
+
+aider --model groq/llama3-70b-8192
+
+# List models available from Groq
+aider --models groq/
+```
+
+
+## Cohere
+
+Cohere offers *free* API access to their models.
+Their Command-R+ model works well with aider
+as a *very basic* coding assistant.
+You'll need a [Cohere API key](https://dashboard.cohere.com/welcome/login).
+
+To use **Command-R+**:
+
+```
+pip install aider-chat
+
+export COHERE_API_KEY=<key> # Mac/Linux
+setx   COHERE_API_KEY <key> # Windows
+
+aider --model command-r-plus
+
+# List models available from Cohere
+aider --models cohere_chat/
+```
+
+## Azure
+
+Aider can connect to the OpenAI models on Azure.
+
+```
+pip install aider-chat
+
+# Mac/Linux:                                           
+export AZURE_API_KEY=<key>
+export AZURE_API_VERSION=2023-05-15
+export AZURE_API_BASE=https://myendpt.openai.azure.com
+
+# Windows:
+setx AZURE_API_KEY <key>
+setx AZURE_API_VERSION 2023-05-15
+setx AZURE_API_BASE https://myendpt.openai.azure.com
+
+aider --model azure/<your_deployment_name>
+
+# List models available from Azure
+aider --models azure/
+```
+
+## OpenRouter
+
+Aider can connect to [models provided by OpenRouter](https://openrouter.ai/models?o=top-weekly):
+You'll need an [OpenRouter API key](https://openrouter.ai/keys).
+
+```
+pip install aider-chat
+
+export OPENROUTER_API_KEY=<key> # Mac/Linux
+setx   OPENROUTER_API_KEY <key> # Windows
+
+# Or any other open router model
+aider --model openrouter/<provider>/<model>
+
+# List models available from OpenRouter
+aider --models openrouter/
+```
+
+In particular, Llama3 70B works well with aider, at low cost:
+
+```
+pip install aider-chat
+
+export OPENROUTER_API_KEY=<key> # Mac/Linux
+setx   OPENROUTER_API_KEY <key> # Windows
+
+aider --model openrouter/meta-llama/llama-3-70b-instruct
+```
+
+
+## Ollama
+
+Aider can connect to local Ollama models.
+
+```
+# Pull the model
+ollama pull <model>
+
+# Start your ollama server
+ollama serve
+
+# In another terminal window...
+pip install aider-chat
+
+export OLLAMA_API_BASE=http://127.0.0.1:11434 # Mac/Linux
+setx   OLLAMA_API_BASE http://127.0.0.1:11434 # Windows
+
+aider --model ollama/<model>
+```
+
+In particular, `llama3:70b` works very well with aider:
+
+
+```
+ollama pull llama3:70b
+ollama serve
+
+# In another terminal window...
+export OLLAMA_API_BASE=http://127.0.0.1:11434 # Mac/Linux
+setx   OLLAMA_API_BASE http://127.0.0.1:11434 # Windows
+
+aider --model ollama/llama3:70b 
+```
+
+Also see the [model warnings](#model-warnings)
+section for information on warnings which will occur
+when working with models that aider is not familiar with.
+
+
+## Deepseek
+
+Aider can connect to the Deepseek.com API.
+Deepseek appears to grant 5M tokens of free API usage to new accounts.
+
+```
+pip install aider-chat
+
+export DEEPSEEK_API_KEY=<key> # Mac/Linux
+setx   DEEPSEEK_API_KEY <key> # Windows
+
+# Use Deepseek Chat v2
+aider --model deepseek/deepseek-chat
+```
+
+See the [model warnings](#model-warnings)
+section for information on warnings which will occur
+when working with models that aider is not familiar with.
+
+
+## OpenAI compatible APIs
+
+Aider can connect to any LLM which is accessible via an OpenAI compatible API endpoint.
+
+```
+pip install aider-chat
+
+# Mac/Linux:
+export OPENAI_API_BASE=<endpoint>
+export OPENAI_API_KEY=<key>
+
+# Windows:
+setx OPENAI_API_BASE <endpoint>
+setx OPENAI_API_KEY <key>
+
+# Prefix the model name with openai/
+aider --model openai/<model-name>
+```
+
+See the [model warnings](#model-warnings)
+section for information on warnings which will occur
+when working with models that aider is not familiar with.
+
+## Other LLMs
+
+Aider uses the [litellm](https://docs.litellm.ai/docs/providers) package
+to connect to hundreds of other models.
+You can use `aider --model <model-name>` to use any supported model.
+
+To explore the list of supported models you can run `aider --models <model-name>`
+with a partial model name.
+If the supplied name is not an exact match for a known model, aider will
+return a list of possible matching models.
+For example:
+
+```
+$ aider --models turbo
+
+Aider v0.29.3-dev
+Models which match "turbo":
+- gpt-4-turbo-preview (openai/gpt-4-turbo-preview)
+- gpt-4-turbo (openai/gpt-4-turbo)
+- gpt-4-turbo-2024-04-09 (openai/gpt-4-turbo-2024-04-09)
+- gpt-3.5-turbo (openai/gpt-3.5-turbo)
+- ...
+```
+
+See the [list of providers supported by litellm](https://docs.litellm.ai/docs/providers)
+for more details.
+
+## Model warnings
+
+Aider supports connecting to almost any LLM,
+but it may not work well with less capable models.
+If you see the model returning code, but aider isn't able to edit your files
+and commit the changes...
+this is usually because the model isn't capable of properly
+returning "code edits".
+Models weaker than GPT 3.5 may have problems working well with aider.
+
+Aider tries to sanity check that it is configured correctly
+to work with the specified model:
+
+- It checks to see that all required environment variables are set for the model. These variables are required to configure things like API keys, API base URLs, etc.
+- It checks a metadata database to look up the context window size and token costs for the model.
+
+Sometimes one or both of these checks will fail, so aider will issue
+some of the following warnings.
+
+#### Missing environment variables
+
+```
+Model azure/gpt-4-turbo: Missing these environment variables:
+- AZURE_API_BASE
+- AZURE_API_VERSION
+- AZURE_API_KEY
+```
+
+You need to set the listed environment variables.
+Otherwise you will get error messages when you start chatting with the model.
+
+
+#### Unknown which environment variables are required
+
+```
+Model gpt-5: Unknown which environment variables are required.
+```
+
+Aider is unable verify the environment because it doesn't know
+which variables are required for the model.
+If required variables are missing,
+you may get errors when you attempt to chat with the model.
+You can look in the
+[litellm provider documentation](https://docs.litellm.ai/docs/providers)
+to see if the required variables are listed there.
+
+#### Unknown model, did you mean?
+
+```
+Model gpt-5: Unknown model, context window size and token costs unavailable.
+Did you mean one of these?
+- gpt-4
+```
+
+If you specify a model that aider has never heard of, you will get an
+"unknown model" warning.
+This means aider doesn't know the context window size and token costs
+for that model.
+Some minor functionality will be limited when using such models, but
+it's not really a significant problem.
+
+Aider will also try to suggest similarly named models,
+in case you made a typo or mistake when specifying the model name.
+
+
+## Editing format
+
+Aider uses different "edit formats" to collect code edits from different LLMs.
+The "whole" format is the easiest for an LLM to use, but it uses a lot of tokens
+and may limit how large a file can be edited.
+Models which can use one of the diff formats are much more efficient,
+using far fewer tokens.
+Models that use a diff-like format are able to 
+edit larger files with less cost and without hitting token limits.
+
+Aider is configured to use the best edit format for the popular OpenAI and Anthropic models
+and the [other models recommended on the LLM page](https://aider.chat/docs/llms.html).
+For lesser known models aider will default to using the "whole" editing format
+since it is the easiest format for an LLM to use.
+
+If you would like to experiment with the more advanced formats, you can
+use these switches: `--edit-format diff` or `--edit-format udiff`.
+
+# Using a .env file
+
+Aider will read environment variables from a `.env` file in
+root of your git repo or in current directory.
+You can give it an explicit file to load with the `--env-file <filename>` parameter.
+
+You can use a `.env` file to store various keys and other settings for the
+models you use with aider.
+
+Here is an example `.env` file:
+
+```
+OPENAI_API_KEY=<key>
+ANTHROPIC_API_KEY=<key>
+GROQ_API_KEY=<key>
+OPENROUTER_API_KEY=<key>
+
+AZURE_API_KEY=<key>
+AZURE_API_VERSION=2023-05-15
+AZURE_API_BASE=https://example-endpoint.openai.azure.com
+
+OLLAMA_API_BASE=http://127.0.0.1:11434
+```
--- a/website/docs/repomap.md
+++ b/website/docs/repomap.md
@ -0,0 +1,264 @@
+---
+title: Building a better repository map with tree sitter
+excerpt: Tree-sitter allows aider to build a repo map that better summarizes large code bases.
+highlight_image: /assets/robot-ast.png
+nav_exclude: true
+---
+# Building a better repository map with tree sitter
+
+![robot flowchat](/assets/robot-ast.png)
+
+GPT-4 is extremely useful for "self-contained" coding tasks,
+like generating or modifying a simple function
+that has no dependencies. Tools like GitHub CoPilot serve
+these simple coding tasks well.
+
+But making complex changes in a larger, pre-existing codebase
+is much more difficult, for both humans and AIs.
+To do this successfully, you need to:
+
+1. Find the code that needs to be changed.
+2. Understand how that code relates to the rest of the codebase.
+3. Make the correct code change to accomplish the task.
+
+GPT-4 is actually great at making the code changes (3),
+once you tell it which files need to be changed (1)
+and show it how they fit into the rest of the codebase (2).
+
+This article is going to focus on step (2), providing "code context":
+
+  - We need to help GPT understand the overall codebase.
+  - This will help it understand the code it needs to change, which may depend on other parts of the codebase.
+  - It will also help GPT write new code and modify the existing code in a way
+that respects and utilizes existing libraries, modules and abstractions
+found elsewhere in the codebase.
+  - We must convey all of this "code context" to GPT in an
+efficient manner that fits within the limited context window.
+
+To address these issues, aider
+sends GPT a **concise map of your whole git repository**
+that includes
+the most important classes and functions along with their types and call signatures.
+
+This **repository map** is now built automatically using
+[tree-sitter](https://tree-sitter.github.io/tree-sitter/)
+to extract symbol definitions from source files.
+Tree-sitter is used by many IDEs, editors and LSP servers to
+help humans search and navigate large codebases.
+Aider now uses it to help GPT better comprehend, navigate
+and edit code in larger repos.
+
+*To code with GPT-4 using the techniques discussed here, just install [aider](https://aider.chat/docs/install.html).*
+
+
+## The problem: code context
+
+GPT-4 is great at "self contained" coding tasks, like writing or
+modifying a pure function with no external dependencies.
+GPT can easily handle requests like "write a
+Fibonacci function" or "rewrite this loop using list
+comprehensions", because they require no context beyond the code
+being discussed.
+
+Most real code is not pure and self-contained, it is intertwined with
+and depends on code from many different files in a repo.
+If you ask GPT to "switch all the print statements in class Foo to
+use the BarLog logging system", it needs to see and
+modify the code in the Foo class, but it also needs to understand
+how to use
+the project's BarLog
+subsystem.
+
+A simple solution is to **send the entire codebase** to GPT along with
+each change request. Now GPT has all the context! But this won't work
+for even moderately
+sized repos, because they won't fit into the context window.
+
+A better approach is to be selective,
+and **hand pick which files to send**.
+For the example above, you could send the file that
+contains the Foo class
+and the file that contains the BarLog logging subsystem.
+This works pretty well, and is supported by aider -- you
+can manually specify which files to "add to the chat" you are having with GPT.
+
+But sending whole files is a bulky way to send code context,
+wasting the precious context window.
+GPT doesn't need to see the entire implementation of BarLog,
+it just needs to understand it well enough to use it.
+You may quickly run out of context window by sending
+full files of code
+just to convey context.
+
+Aider also strives to reduce the manual work involved in
+coding with AI.
+So in an ideal world, we'd like aider to automatically
+identify and provide the needed code context.
+
+## Using a repo map to provide context
+
+Aider sends a **repo map** to GPT along with
+each request from the user to make a code change.
+The map contains a list of the files in the
+repo, along with the key symbols which are defined in each file.
+It shows how each of these symbols are defined in the
+source code, by including the critical lines of code for each definition.
+
+Here's a
+sample of the map of the aider repo, just showing the maps of
+[base_coder.py](https://github.com/paul-gauthier/aider/blob/main/aider/coders/base_coder.py)
+and
+[commands.py](https://github.com/paul-gauthier/aider/blob/main/aider/commands.py)
+:
+
+```
+aider/coders/base_coder.py:
+⋮...
+│class Coder:
+│    abs_fnames = None
+⋮...
+│    @classmethod
+│    def create(
+│        self,
+│        main_model,
+│        edit_format,
+│        io,
+│        skip_model_availabily_check=False,
+│        **kwargs,
+⋮...
+│    def abs_root_path(self, path):
+⋮...
+│    def run(self, with_message=None):
+⋮...
+
+aider/commands.py:
+⋮...
+│class Commands:
+│    voice = None
+│
+⋮...
+│    def get_commands(self):
+⋮...
+│    def get_command_completions(self, cmd_name, partial):
+⋮...
+│    def run(self, inp):
+⋮...
+```
+
+Mapping out the repo like this provides some key benefits:
+
+  - GPT can see classes, methods and function signatures from everywhere in the repo. This alone may give it enough context to solve many tasks. For example, it can probably figure out how to use the API exported from a module just based on the details shown in the map.
+  - If it needs to see more code, GPT can use the map to figure out by itself which files it needs to look at in more detail. GPT will then ask to see these specific files, and aider will automatically add them to the chat context.
+
+## Optimizing the map
+
+Of course, for large repositories even just the repo map might be too large
+for GPT's context window.
+Aider solves this problem by sending just the **most relevant**
+portions of the repo map.
+It does this by analyzing the full repo map using
+a graph ranking algorithm, computed on a graph
+where each source file is a node and edges connect
+files which have dependencies.
+Aider optimizes the repo map by
+selecting the most important parts of the codebase
+which will
+fit into the token budget assigned by the user
+(via the `--map-tokens` switch, which defaults to 1k tokens).
+
+The sample map shown above doesn't contain *every* class, method and function from those
+files.
+It only includes the most important identifiers,
+the ones which are most often referenced by other portions of the code.
+These are the key pieces of context that GPT needs to know to understand
+the overall codebase.
+
+
+## Using tree-sitter to make the map
+
+Under the hood, aider uses
+[tree sitter](https://tree-sitter.github.io/tree-sitter/)
+to build the
+map.
+It specifically uses the
+[py-tree-sitter-languages](https://github.com/grantjenks/py-tree-sitter-languages)
+python module,
+which provides simple, pip-installable binary wheels for
+[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
+
+Tree-sitter parses source code into an Abstract Syntax Tree (AST) based
+on the syntax of the programming language.
+Using the AST, we can identify where functions, classes, variables, types and
+other definitions occur in the source code.
+We can also identify where else in the code these things are used or referenced.
+
+Aider uses all of these definitions and references to
+determine which are the most important identifiers in the repository,
+and to produce the repo map that shows just those key
+lines from the codebase.
+
+## What about ctags?
+
+The tree-sitter repository map replaces the
+[ctags based map](https://aider.chat/docs/ctags.html)
+that aider originally used.
+Switching from ctags to tree-sitter provides a bunch of benefits:
+
+- The map is richer, showing full function call signatures and other details straight from the source files.
+- Thanks to `py-tree-sitter-languages`, we get full support for many programming languages via a python package that's automatically installed as part of the normal `pip install aider-chat`.
+- We remove the requirement for users to manually install `universal-ctags` via some external tool or package manager (brew, apt, choco, etc).
+- Tree-sitter integration is a key enabler for future work and capabilities for aider.
+
+## Future work
+
+You'll recall that we identified the 3 key steps
+required to use GPT
+to complete a coding task within a large, pre-existing codebase:
+
+1. Find the code that needs to be changed.
+2. Understand how that code relates to the rest of the codebase.
+3. Make the correct code change to accomplish the task.
+
+We're now using tree-sitter to help solve the code context problem (2),
+but it's also an important foundation
+for future work on automatically finding all the code which
+will need to be changed (1).
+
+Right now, aider relies on the user to specify which source files
+will need to be modified to complete their request.
+Users manually "add files to the chat" using aider's `/add` command,
+which makes those files available for GPT to modify.
+
+This works well, but a key piece of future work is to harness the
+power of GPT and tree-sitter to automatically identify
+which parts of the code will need changes.
+
+## Try it out
+
+To code with GPT-4 using the techniques discussed here,
+just install [aider](https://aider.chat/docs/install.html).
+
+## Credits
+
+Aider uses
+[modified versions of the tags.scm files](https://github.com/paul-gauthier/aider/tree/main/aider/queries)
+from these
+open source tree-sitter language implementations:
+
+* [https://github.com/tree-sitter/tree-sitter-c](https://github.com/tree-sitter/tree-sitter-c) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-c-sharp](https://github.com/tree-sitter/tree-sitter-c-sharp) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-cpp](https://github.com/tree-sitter/tree-sitter-cpp) — licensed under the MIT License.
+* [https://github.com/Wilfred/tree-sitter-elisp](https://github.com/Wilfred/tree-sitter-elisp) — licensed under the MIT License.
+* [https://github.com/elixir-lang/tree-sitter-elixir](https://github.com/elixir-lang/tree-sitter-elixir) — licensed under the Apache License, Version 2.0.
+* [https://github.com/elm-tooling/tree-sitter-elm](https://github.com/elm-tooling/tree-sitter-elm) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-go](https://github.com/tree-sitter/tree-sitter-go) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-java](https://github.com/tree-sitter/tree-sitter-java) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-javascript](https://github.com/tree-sitter/tree-sitter-javascript) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-ocaml](https://github.com/tree-sitter/tree-sitter-ocaml) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-php](https://github.com/tree-sitter/tree-sitter-php) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-python](https://github.com/tree-sitter/tree-sitter-python) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-ql](https://github.com/tree-sitter/tree-sitter-ql) — licensed under the MIT License.
+* [https://github.com/r-lib/tree-sitter-r](https://github.com/r-lib/tree-sitter-r) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-ruby](https://github.com/tree-sitter/tree-sitter-ruby) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-rust](https://github.com/tree-sitter/tree-sitter-rust) — licensed under the MIT License.
+* [https://github.com/tree-sitter/tree-sitter-typescript](https://github.com/tree-sitter/tree-sitter-typescript) — licensed under the MIT License.
--- a/website/docs/tips.md
+++ b/website/docs/tips.md
@ -0,0 +1,18 @@
+---
+nav_order: 50
+---
+
+## Tips
+
+* Think about which files need to be edited to make your change and add them to the chat.
+Aider can help the LLM figure out which files to edit all by itself, but the most efficient approach is to add the needed files to the chat yourself.
+* Don't add *everything* to the chat, just the files you think need to be edited.
+Aider also sends the LLM a [map of your entire git repo](https://aider.chat/docs/repomap.html).
+So the LLM can see all the other relevant parts of your code base.
+* Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk the LLM through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.
+* Use Control-C to safely interrupt the LLM if it isn't providing a useful response. The partial response remains in the conversation, so you can refer to it when you reply to the LLM with more information or direction.
+* Use the `/run` and `/test` commands to run the code or tests and show the output to the LLM so it can fix any issues.
+* Use Meta-ENTER (Esc+ENTER in some environments) to enter multiline chat messages. Or enter `{` alone on the first line to start a multiline message and `}` alone on the last line to end it.
+* If your code is throwing an error, share the error output with the LLM using `/run` or by pasting it into the chat. Let the LLM figure out and fix the bug.
+* LLMs know about a lot of standard tools and libraries, but may get some of the fine details wrong about APIs and function arguments. You can paste doc snippets into the chat to resolve these issues.
+
--- a/website/docs/tutorials.md
+++ b/website/docs/tutorials.md
@ -0,0 +1,13 @@
+---
+nav_order: 25
+---
+
+## Tutorial videos
+
+Here are a few tutorial videos made by aider users:
+
+- [Aider : the production ready AI coding assistant you've been waiting for](https://www.youtube.com/watch?v=zddJofosJuM) -- Learn Code With JV
+- [Holy Grail: FREE Coding Assistant That Can Build From EXISTING CODE BASE](https://www.youtube.com/watch?v=df8afeb1FY8) -- Matthew Berman
+- [Aider: This AI Coder Can Create AND Update Git Codebases](https://www.youtube.com/watch?v=EqLyFT78Sig) -- Ian Wootten
+
+
--- a/website/docs/unified-diffs.md
+++ b/website/docs/unified-diffs.md
@ -0,0 +1,381 @@
+---
+title: Unified diffs make GPT-4 Turbo 3X less lazy
+excerpt: GPT-4 Turbo has a problem with lazy coding, which can be signiciantly improved by asking for code changes formatted as unified diffs.
+highlight_image: /assets/benchmarks-udiff.jpg
+nav_exclude: true
+---
+# Unified diffs make GPT-4 Turbo 3X less lazy
+
+![robot flowchart](/assets/benchmarks-udiff.svg)
+
+Aider now asks GPT-4 Turbo to use
+[unified diffs](#choose-a-familiar-editing-format)
+to edit your code.
+This dramatically improves GPT-4 Turbo's performance on a
+challenging
+new benchmark 
+and significantly reduces its bad habit of "lazy" coding,
+where it writes
+code with comments
+like "...add logic here...".
+
+Aider's new "laziness" benchmark suite 
+is designed to both provoke and quantify lazy coding.
+It consists of
+89 python refactoring tasks
+which tend to make GPT-4 Turbo write lazy comments like
+"...include original method body...".
+
+This new laziness benchmark produced the following results with `gpt-4-1106-preview`:
+
+- **GPT-4 Turbo only scored 20% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format. It outputs "lazy comments" on 12 of the tasks.
+- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
+- **It's worse to add a prompt that says the user is blind, has no hands, will tip $2000 and fears truncated code trauma.** Widely circulated "emotional appeal" folk remedies 
+produced worse benchmark scores
+for both the baseline SEARCH/REPLACE and new unified diff editing formats.
+
+The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs:
+
+- **The June GPT-4's baseline was 26%** using aider's existing "SEARCH/REPLACE block" edit format.
+- **Aider's new unified diff edit format raised June GPT-4's score to 59%**. 
+- The benchmark was designed to use large files, and
+28% of them are too large to fit in June GPT-4's 8k context window.
+This puts a hard ceiling of 72% on how well the June model could possibly score.
+
+With unified diffs, GPT acts more like it's writing textual data intended to be read by a program,
+not talking to a person.
+Diffs are
+usually
+consumed by the
+[patch](https://www.gnu.org/software/diffutils/manual/html_node/Merging-with-patch.html)
+program, which is fairly rigid.
+This seems to encourage rigor, making
+GPT less likely to
+leave informal editing instructions in comments
+or be lazy about writing all the needed code.
+
+Aider's new unified diff editing format
+outperforms other solutions I evaluated by a wide margin.
+I explored many other approaches including:
+prompts about being tireless and diligent,
+OpenAI's function/tool calling capabilities,
+numerous variations on aider's existing editing formats,
+line number based formats
+and other diff-like formats.
+The results shared here reflect
+an extensive investigation and benchmark evaluations of many approaches.
+
+The rest of this article will describe
+aider's new editing format and refactoring benchmark.
+It will highlight some key design decisions,
+and evaluate their significance using ablation experiments.
+
+
+## Unified diff editing format
+
+The design and implementation of aider's new unified diff editing format
+helped clarify some general principles
+for GPT-4 code editing:
+
+- FAMILIAR - Choose an edit format that GPT is already familiar with.
+- SIMPLE - Choose a simple format that avoids escaping, syntactic overhead and brittle specifiers like line numbers or line counts.
+- HIGH LEVEL - Encourage GPT to structure edits as new versions of substantive code blocks (functions, methods, etc), not as a series of surgical/minimal changes to individual lines of code.
+- FLEXIBLE - Strive to be maximally flexible when interpreting GPT's edit instructions.
+
+A helpful shortcut here is to have empathy for GPT, and imagine you
+are the one being asked to specify code edits.
+Would you want to hand type a properly escaped json data structure
+to invoke surgical insert, delete, replace operations on specific code line numbers?
+Do you want to use a brittle format, where any mistake
+causes an error that discards all your work?
+
+GPT is quantitatively better at code editing when you reduce the
+burden of formatting edits by using a familiar, simple, high level
+and flexible editing format.
+
+### Choose a familiar editing format
+
+Unified diffs are perhaps the most common way to show
+code edits, because it's the 
+default output format of `git diff`:
+
+```diff
+--- a/greeting.py
+++ b/greeting.py
+@@ -1,5 +1,5 @@
+ def main(args):
+     # show a greeting
+-    print("Hello!")
+    print("Goodbye!")
+     return
+```
+
+Choosing such a popular format means that GPT has
+seen *many* examples in its training data.
+It's been trained to generate
+text that conforms to the unified diff syntax.
+
+### Use a simple editing format
+
+Aider's [previous benchmark results](https://aider.chat/docs/benchmarks.html) made
+it clear that simple editing formats
+work best.
+Even though OpenAI provides extensive support for
+structured formats like json and function calls,
+GPT is worse at editing code if you use them.
+I repeated these and other similar benchmarks against GPT-4 Turbo,
+and again reached these same conclusions.
+
+Informally, this is probably because stuffing *source code* into JSON is complicated
+and error prone.
+Wrapping the python code
+`print("On Windows use \"C:\\\"")`
+as valid json is pretty painful and error prone.
+Due to escaping issues GPT's code is often syntactically incorrect when it's
+unpacked from JSON,
+or the JSON decode just fails entirely.
+
+On the other hand, the core of the unified diff format is very simple.
+You include a hunk of the file that needs to be changed,
+with every line prefixed by a character
+to indicate unchanged, new or deleted lines.
+A unified diff looks pretty much like the code it is modifying.
+
+The one complicated piece is the line numbers found at the start
+of each hunk. They look something like this: `@@ -2,4 +3,5 @@`.
+GPT is terrible at working with source code line numbers.
+This is a general observation about *any* use of line
+numbers in editing formats,
+backed up by many quantitative benchmark experiments.
+
+You've probably ignored the line numbers in every diff you've seen,
+because the diffs usually still make sense without them.
+Aider tells GPT not to include line numbers,
+and just interprets each hunk from the unified diffs
+as a search and replace operation:
+
+This diff:
+
+```diff
+@@ ... @@
+ def main(args):
+     # show a greeting
+-    print("Hello!")
+    print("Goodbye!")
+     return
+```
+
+Means we need to search the file for the
+*space* and *minus* `-` lines:
+
+```python
+def main(args):
+    # show a greeting
+    print("Hello!")
+    return
+```
+
+And replace them with the *space* and *plus* `+` lines:
+
+```python
+def main(args):
+    # show a greeting
+    print("Goodbye!")
+    return
+```
+
+Simple, right?
+
+### Encourage high level edits
+
+The example unified diffs we've seen so far have all been single line changes,
+which makes them pretty easy to read and understand.
+Consider this slightly more complex change, which renames the variable `n` to
+`number`:
+
+```diff
+@@ ... @@
+-def factorial(n):
+def factorial(number):
+-    if n == 0:
+    if number == 0:
+         return 1
+     else:
+-        return n * factorial(n-1)
+        return number * factorial(number-1)
+```
+
+The following "high level diff" of the same
+change is not as succinct as the minimal diff above,
+but it is much easier to see two different coherent versions of the
+`factorial()` function.
+
+```diff
+@@ ... @@
+-def factorial(n):
+-    if n == 0:
+-        return 1
+-    else:
+-        return n * factorial(n-1)
+def factorial(number):
+    if number == 0:
+        return 1
+    else:
+        return number * factorial(number-1)
+```
+
+Aider's system prompt encourages
+GPT to produce these high level diffs.
+This makes GPT better at producing correct diffs, which can be successfully
+applied to the original file.
+
+**Experiments without "high level diff" prompting
+produce a 30-50% increase in editing errors,**
+where diffs fail to apply or apply incorrectly and
+produce invalid code.
+When a patch fails, aider needs to ask GPT for a corrected version of the diff.
+This takes time, costs tokens and sometimes fails to produce a successful edit
+even after multiple retries.
+
+There are probably a couple of reasons why high level diffs
+help:
+
+- It's easier to produce diffs that both correctly match the original code and correctly produce the intended new code. There is less risk of GPT getting confused, compared to generating a series of surgical edits that interleave lines of old and new code.
+- High level hunks often contain more lines than a surgical hunk, so they are less likely to accidentally match unrelated parts of the code. This is helpful because GPT can't reliably give us line numbers to specify exactly where in the file to make changes.
+
+### Be flexible when applying edits
+
+GPT frequently makes imperfect diffs that won't apply cleanly.
+They exhibit a variety of problems:
+
+- GPT forgets things like comments, docstrings, blank lines, etc. Or it skips over some code that it doesn't intend to change.
+- GPT forgets the leading *plus* `+` character to mark novel lines that it wants to add to the file. It incorrectly includes them with a leading *space* as if they were already there.
+- GPT outdents all of the code, removing all the leading white space which is shared across the lines. So a chunk of deeply indented code is shown in a diff with only the leading white space that changes between the lines in the chunk.
+- GPT jumps ahead to show edits to a different part of the file without starting a new hunk with a `@@ ... @@` divider.
+
+As an example of the first issue, consider this source code:
+
+```python
+import sys
+
+def main(args):
+    # show a greeting
+    print("Hello!")
+    return
+
+main(sys.argv[1:])
+```
+
+**The diff below is missing the "show a greeting" comment line**,
+and represents a common type of mistake GPT might make.
+When we search for the *minus* `-` lines, we won't find them
+in the original file
+because of the missing comment.
+
+
+```diff
+@@ ... @@
+-def main(args):
+-    print("Hello!")
+-    return
+def main(args):
+    print("Goodbye!")
+    return
+```
+
+
+Aider tries to be very flexible when applying diffs,
+in order to handle defects.
+If a hunk doesn't apply cleanly, aider uses a number of strategies:
+
+- Normalize the hunk, by taking the *minus* `-` and *space* lines as one version of the hunk and the *space* and *plus* `+` lines as a second version and doing an actual unified diff on them.
+- Try and discover new lines that GPT is trying to add but which it forgot to mark with *plus* `+` markers. This is done by diffing the *minus* `-` and *space* lines back against the original file.
+- Try and apply the hunk using "relative leading white space", so we can match and patch correctly even if the hunk has been uniformly indented or outdented.
+- Break a large hunk apart into an overlapping sequence of smaller hunks, which each contain only one contiguous run of *plus* `+` and *minus* `-` lines. Try and apply each of these sub-hunks independently.
+- Vary the size and offset of the "context window" of *space*  lines from the hunk that are used to localize the edit to a specific part of the file.
+- Combine the above mechanisms to progressively become more permissive about how to apply the hunk.
+
+These flexible patching strategies are critical, and 
+removing them
+radically increases the number of hunks which fail to apply.
+**Experiments where flexible patching is disabled show a 9X increase in editing errors** on aider's original Exercism benchmark.
+
+## Refactoring benchmark
+
+Aider has long used a
+[benchmark suite based on 133 Exercism python exercises]().
+But these are mostly small coding problems,
+usually requiring only a few dozen lines of code.
+GPT-4 Turbo is typically only lazy on 2-3 of these exercises:
+the ones with the most code and which involve refactoring.
+
+Based on this observation, I set out to build a benchmark based on refactoring
+a non-trivial amount of code found in fairly large files.
+To do this, I used python's `ast` module to analyze
+[9 popular open source python repositories](https://github.com/paul-gauthier/refactor-benchmark)
+to identify challenging refactoring tasks.
+The goal was to find:
+
+- Source files that contain classes with non-trivial methods, having 100-250+ AST nodes in their implementation.
+- Focus on methods that are part of a larger class, which has at least twice as much code as the method itself.
+- Select methods that don't use their `self` parameter, so they can be trivially refactored out of the class.
+
+We can then turn each of these source files into a task for the benchmark,
+where we ask GPT to do something like:
+
+> Refactor the `_set_csrf_cookie` method in the `CsrfViewMiddleware` class to be a stand alone, top level function.
+> Name the new function `_set_csrf_cookie`, exactly the same name as the existing method.
+> Update any existing `self._set_csrf_cookie` calls to work with the new `_set_csrf_cookie` function.
+
+A [simple python AST scanning script](https://github.com/paul-gauthier/aider/blob/main/benchmark/refactor_tools.py)
+found 89 suitable files
+and packaged them up as benchmark tasks.
+Each task has a test
+that checks if the refactor
+was performed roughly correctly:
+
+- The updated source file must parse as valid python, to detect misapplied edits which produce invalid code.
+- The target method must now exist as a top-level function in the file.
+- This new top-level function must contain approximately the same number of AST nodes as the original class method. This ensures that GPT didn't elide code and replace it with comments.
+- The original class must still be present in the file, and it must be smaller by about the number of AST nodes in the method which was removed. This helps confirm that the method was removed from the class, without other significant modifications.
+
+To be clear, this is not a rigorous test that the refactor was performed correctly.
+But it does serve as a basic sanity check that the refactor was essentially done as a cut & paste, without eliding any code as comments.
+And it correlates well with other laziness metrics
+gathered during benchmarking like the
+introduction of new comments that contain "...".
+
+The result is a pragmatic
+[benchmark suite that provokes, detects and quantifies GPT coding laziness](https://github.com/paul-gauthier/refactor-benchmark).
+
+
+
+## Conclusions and future work
+
+Based on the refactor benchmark results,
+aider's new unified diff format seems
+to dramatically increase GPT-4 Turbo's skill at more complex coding tasks.
+It also seems very effective at reducing the lazy coding
+which has been widely noted as a problem with GPT-4 Turbo.
+
+Unified diffs was one of the very first edit formats I tried
+when originally building aider.
+I think a lot of other AI coding assistant projects have also
+tried going down this path.
+It seems like any naive or direct use of structured diff formats
+is pretty much doomed to failure.
+But the techniques described here and
+incorporated into aider provide
+a highly effective way to harness GPT's knowledge of unified diffs.
+
+There could be significant benefits to
+fine tuning models on
+aider's simple, high level style of unified diffs.
+Dropping line numbers from the hunk headers and focusing on diffs of
+semantically coherent chunks of code
+seems to be an important part of successful GPT code editing
+(besides the relentless focus on flexibly applying edits).
+Most LLMs will have already seen plenty of unified diffs
+in their normal training data, and so should be
+amenable to fining tuning towards this
+particular diff style.
--- a/website/docs/usage.md
+++ b/website/docs/usage.md
@ -0,0 +1,31 @@
+---
+nav_order: 30
+---
+
+## Usage
+
+Run `aider` with the source code files you want to edit.
+These files will be "added to the chat session", so that the LLM can see their
+contents and edit them according to your instructions.
+
+```
+aider <file1> <file2> ...
+```
+
+Be selective, and just add the files that the LLM will need to edit.
+If you add a bunch of unrelated files, the LLM can get overwhelmed
+and confused (and it costs more tokens).
+Aider will automatically
+share snippets from other, related files with the LLM so it can
+[understand the rest of your code base](https://aider.chat/docs/repomap.html).
+
+You can also just launch aider anywhere in a git repo without naming
+files on the command line.  It will discover all the files in the
+repo.  You can then add and remove individual files in the chat
+session with the `/add` and `/drop` chat commands described below.
+If you or the LLM mention any of the repo's filenames in the conversation,
+aider will ask if you'd like to add them to the chat.
+
+Aider also has many other options which can be set with
+command line switches, environment variables or via a configuration file.
+See `aider --help` for details.
--- a/website/docs/voice.md
+++ b/website/docs/voice.md
@ -0,0 +1,119 @@
+---
+nav_order: 100
+---
+
+# Voice-to-code with aider
+
+Speak with GPT about your code! Request new features, test cases or bug fixes using your voice and let GPT do the work of editing the files in your local git repo. As with all of aider's capabilities, you can use voice-to-code with an existing repo or to start a new project.
+
+Voice support fits quite naturally into aider's AI pair programming
+chat interface. Now you can fluidly switch between voice and text chat
+when you ask GPT to edit your code.
+
+## How to use voice-to-code
+
+Use the in-chat `/voice` command to start recording,
+and press `ENTER` when you're done speaking.
+Your voice coding instructions will be transcribed
+and sent to GPT, as if you had  typed them into
+the aider chat session.
+
+See the [installation instructions](https://aider.chat/docs/install.html#install-portaudio-optional) for
+information on how to enable the `/voice` command.
+
+<br/>
+<div class="chat-transcript" markdown="1">
+
+> Aider v0.11.2-dev  
+> Added app.py to the chat.  
+
+#### /voice
+
+<blockquote>
+<div class="recording-line">
+    <p style="display: inline;">Recording, press ENTER when done... <span id="timer">3.5</span>sec</p>
+    <div id="progress-bar">
+        <div id="progress"></div>
+    </div>
+</div>
+</blockquote>
+
+<style>
+    .recording-line > * {
+        display: inline-block;
+        vertical-align: middle;
+    }
+    #progress-bar {
+        width: 100px;
+        height: 20px;
+        background-color: #333;
+        margin-left: 10px;
+    }
+    #progress {
+        height: 100%;
+        background-color: #f0f0f0;
+        animation: progress 3s infinite;
+    }
+    @keyframes progress {
+        0% { width: 10%; }
+        20% { width: 90%; }
+        50% { width: 30%; }
+        60% { width: 90%; }
+        80% { width: 20%; }
+        100% { width: 10%; }
+
+    }
+</style>
+
+<script>
+window.onload = function() {
+    var counter = 0;
+    var timer = document.getElementById('timer');
+    setInterval(function() {
+        counter = (counter + 0.1) % 9;
+        timer.textContent = counter.toFixed(1);
+    }, 100);
+}
+</script>
+
+<div style="display: inline-block; position: relative; width: 200px; background-color: #f0f0f0; border-radius: 10px; padding: 10px; color: #333; font-family: 'Comic Sans MS', 'Comic Sans', 'Marker Felt', sans-serif;">
+<span style="position: absolute; top: 0; left: 50%; transform: translate(-50%, -100%); width: 0; height: 0; border-left: 10px solid transparent; border-right: 10px solid transparent; border-bottom: 10px solid #f0f0f0;"></span>
+“&nbsp;add a factorial endpoint that uses math factorial&nbsp;”
+</div>
+
+#### Add a factorial endpoint that uses math.factorial.
+
+To add a factorial endpoint to the Flask application, we need to modify the `app.py` file.
+
+Here are the steps:
+
+1. Add a new route `/fact/<int:x>` to the Flask application.
+2. In the function for this route, call `math.factorial(x)` and return the result as a string.
+
+Here is the *edit block* for these changes:
+
+```python
+app.py
+<<<<<<< HEAD
+if __name__ == '__main__':
+    print("Starting...")
+    app.run()
+=======
+@app.route('/fact/<int:x>')
+def factorial(x):
+    result = math.factorial(x)
+    return str(result)
+
+if __name__ == '__main__':
+    print("Starting...")
+    app.run()
+>>>>>>> updated
+```
+
+> Applied edit to app.py  
+> Commit ef9e3e7 aider: Add a factorial endpoint that uses math.factorial.  
+
+</div>
+<br/>
+<br/>
+<br/>
--- a/website/examples/2048-game.md
+++ b/website/examples/2048-game.md
@ -0,0 +1,96 @@
+---
+parent: Example chat transcripts
+nav_order: 2
+---
+
+# Modify an open source 2048 game with aider
+
+In this chat transcript, the user explores and modifies an open source [javascript 2048 game repo](https://github.com/gabrielecirulli/2048).
+
+Notice that the LLM asks to see a particular source file, and aider automatically includes it in the chat session after getting permission from the user.
+
+<div class="chat-transcript" markdown="1">
+
+> $ git clone https://github.com/gabrielecirulli/2048  
+> $ cd 2048  
+> $ aider  
+
+#### what is this repo?  
+
+Based on the file names you provided, it seems like this repository is for a web-based game, possibly a 2048 game clone. The files include JavaScript files for game logic, HTML and CSS files for the user interface, and various meta files for icons and touch startup images.
+
+If you need help with any specific part of the code or want to add new features, please let me know, and I'll be happy to assist you.
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### how does the scoring work?  
+
+To help you with the scoring mechanism, I need to see the relevant code. Please provide the contents of the `js/game_manager.js` file, as it is likely to contain the scoring logic.
+
+
+> js/game_manager.js  
+> Add these files to the chat? y
+
+
+Apologies for the oversight. I see the contents of the `js/game_manager.js` file now.
+
+The scoring in this 2048 game works as follows:
+
+1. When two tiles with the same value merge, they create a new tile with a value equal to the sum of the merged tiles' values.
+2. The score is updated by adding the value of the new merged tile.
+
+Here's the relevant code snippet from the `js/game_manager.js` file:
+
+```javascript
+if (next && next.value === tile.value && !next.mergedFrom) {
+  var merged = new Tile(positions.next, tile.value * 2);
+  merged.mergedFrom = [tile, next];
+
+  self.grid.insertTile(merged);
+  self.grid.removeTile(tile);
+
+  // Converge the two tiles' positions
+  tile.updatePosition(positions.next);
+
+  // Update the score
+  self.score += merged.value;
+
+  // The mighty 2048 tile
+  if (merged.value === 2048) self.won = true;
+}
+```
+
+In this code snippet, when two tiles with the same value (`tile` and `next`) are found, they are merged into a new tile (`merged`). The value of the new tile is added to the current score (`self.score += merged.value`). If the merged tile's value is 2048, the game is considered won (`self.won = true`).
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can we add some randomness? maybe 10% of the time give the user a 10x bonus?  
+
+Sure, we can add a 10% chance of giving the user a 10x bonus when two tiles merge. To implement this, we'll modify the scoring logic in the `js/game_manager.js` file. Here's the step-by-step plan:
+
+1. Generate a random number between 0 and 1.
+2. Check if the random number is less than or equal to 0.1 (10% chance).
+3. If the condition is met, multiply the merged tile's value by 10 before adding it to the score.
+
+Here's the code change for the `js/game_manager.js` file:
+
+```javascript
+js/game_manager.js
+<<<<<<< ORIGINAL
+  // Update the score
+  self.score += merged.value;
+=======
+  // Update the score with a 10% chance of 10x bonus
+  var bonus = Math.random() <= 0.1 ? 10 : 1;
+  self.score += merged.value * bonus;
+>>>>>>> UPDATED
+```
+
+This change will give the user a 10x bonus 10% of the time when two tiles merge.
+
+> Applied edit to js/game_manager.js  
+> Commit eb9d882 aider: Added a 10% chance of giving the user a 10x bonus when two tiles merge.  
+
+</div>
--- a/website/examples/README.md
+++ b/website/examples/README.md
@ -0,0 +1,63 @@
+---
+title: Example chat transcripts
+has_children: true
+nav_order: 80
+has_toc: false
+---
+
+# Example chat transcripts
+
+Below are some chat transcripts showing what it's like to code with aider.
+In the chats, you'll see a varity of coding tasks like generating new code, editing existing code, debugging, exploring unfamiliar code, etc.
+
+* [**Hello World Flask App**](https://aider.chat/examples/hello-world-flask.html): Start from scratch and have aider create a simple Flask app with various endpoints, such as adding two numbers and calculating the Fibonacci sequence.
+
+* [**Javascript Game Modification**](https://aider.chat/examples/2048-game.html): Dive into an existing open-source repo, and get aider's help to understand it and make modifications.
+
+* [**Complex Multi-file Change with Debugging**](https://aider.chat/examples/complex-change.html): Aider makes a complex code change that is coordinated across multiple source files, and resolves bugs by reviewing error output and doc snippets.
+
+* [**Create a Black Box Test Case**](https://aider.chat/examples/add-test.html): Aider creates a "black box" test case without access to the source of the method being tested, using only a [high level map of the repository based on ctags](https://aider.chat/docs/ctags.html).
+
+* [**Honor the NO_COLOR env var**](https://aider.chat/examples/no-color.html): The user pastes the NO_COLOR spec from no-color.org into the chat, and aider modifies the application to conform.
+
+* [**Download, analyze and plot US Census data**](https://aider.chat/examples/census.html): Aider downloads census data, suggests some hypotheses to test, tests one and then summarizes and plots a graph of the results.
+
+* [**Semantic Search & Replace**](semantic-search-replace.md): Updating a collection of function calls, which requires dealing with various formatting and semantic differences in the various function call sites.
+
+* [**Pong Game with Pygame**](pong.md): Creating a simple Pong game using the Pygame library, with customizations for paddle size and color, and ball speed adjustments.
+
+* [**CSS Exercise: Animation Dropdown Menu**](css-exercises.md): A small CSS exercise involving adding animation to a dropdown menu.
+
+* [**Automatically Update Docs**](update-docs.md): Automatically updating documentation based on the latest version of the main() function.
+
+* [**Editing an Asciinema Cast File**](asciinema.md): Editing escape sequences in an `asciinema` screencast file.
+
+## What's happening in these chats?
+
+To better understand the chat transcripts, it's worth knowing that:
+
+  - Each time the LLM suggests a code change, `aider` automatically applies it to the source files.
+  - After applying the edits, `aider` commits them to git with a descriptive commit message.
+  - The LLM can only see and edit files which have been "added to the chat session". The user adds files either via the command line or the in-chat `/add` command. If the LLM asks to see specific files, `aider` asks the user for permission to add them to the chat. The transcripts contain notifications from `aider` whenever a file is added or dropped from the session.
+
+## Transcript formatting
+
+<div class="chat-transcript" markdown="1">
+
+> This is output from the aider tool.
+
+#### These are chat messages written by the user.
+
+Chat responses from the LLM are in a blue font like this, and often include colorized "edit blocks" that specify edits to the code.
+Here's a sample edit block that switches from printing "hello" to "goodbye":
+
+```python
+hello.py
+<<<<<<< ORIGINAL
+print("hello")
+=======
+print("goodbye")
+>>>>>>> UPDATED
+```
+
+</div>
--- a/website/examples/add-test.md
+++ b/website/examples/add-test.md
@ -0,0 +1,181 @@
+---
+parent: Example chat transcripts
+nav_order: 4
+---
+
+# Create a "black box" test case
+
+This transcript shows aider creating a black box test case,
+**without being given
+access to the source code of the function being tested or any of the
+other code in the repo.**
+
+Instead, aider is operating entirely off a
+[high level map of the repository based on ctags](https://aider.chat/docs/ctags.html),
+which provides details on all the symbols and function signatures in the repo.
+Using only the meta-data in the map, the LLM is able to figure out how to call the method to be tested, as well as how to instantiate multiple class objects that are required to prepare for the test.
+
+In detail, aider is able to use the repo map to:
+
+  - Find the function signature of the `cmd_add()` function which the user wants a test case for.
+  - Determine that it is a method of the `Command` class, so the test case will need to instantiate an instance to conduct the test.
+  - Identify that creating a `Command` instance requires passing in `InputOutput` and `Coder` instances.
+  - Figure out the arguments required to instantiate the `InputOutput` instance.
+  - Decide that the `Coder` class looks complex enough to use a `MagickMock`.
+
+The `cmd_add()` method is defined without type annotations, so aider makes a
+reasonable but incorrect guess that it accepts a `list` of files. The
+user invokes the `/run` command to run the new unit test, which throws an
+error based on this misunderstanding.
+
+The LLM reviews the error message and fixes the call to `cmd_add()` to
+correctly pass a `str` with a space-separated list of filenames
+instead.  The test then passes on the next `/run`.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider tests/test_commands.py  
+> Creating empty file tests/test_commands.py  
+> Added tests/test_commands.py to the chat  
+> Files not tracked in .git:  
+> - tests/test_commands.py  
+>  
+> Add them? y  
+> Added tests/test_commands.py to the git repo  
+> Commit b9cea2c Added new files to the git repo: tests/test_commands.py  
+> Using git repo: .git  
+
+#### add a test for cmd_add that passes in 'foo.txt' and 'bar.txt' and ensures they have both been created after the test. run the test in a tmp dir.  
+
+I will add a new test method called `test_cmd_add` in the `tests/test_commands.py` file. This test will create a temporary directory, change the current working directory to the temporary directory, and then call the `cmd_add` method with 'foo.txt' and 'bar.txt' as arguments. After that, it will check if both files have been created in the temporary directory.
+
+Here's the code change:
+
+```python
+tests/test_commands.py
+<<<<<<< ORIGINAL
+=======
+import os
+import tempfile
+from unittest import TestCase
+from unittest.mock import MagicMock
+from aider.commands import Commands
+from aider.io import InputOutput
+
+class TestCommands(TestCase):
+    def test_cmd_add(self):
+        # Create a temporary directory and change the current working directory
+        with tempfile.TemporaryDirectory() as tmpdir:
+            os.chdir(tmpdir)
+
+            # Initialize the Commands and InputOutput objects
+            io = InputOutput(pretty=False, yes=True)
+            coder = MagicMock()
+            commands = Commands(io, coder)
+
+            # Call the cmd_add method with 'foo.txt' and 'bar.txt'
+            commands.cmd_add(["foo.txt", "bar.txt"])
+
+            # Check if both files have been created in the temporary directory
+            self.assertTrue(os.path.exists("foo.txt"))
+            self.assertTrue(os.path.exists("bar.txt"))
+>>>>>>> UPDATED
+```
+
+This test will ensure that the `cmd_add` method creates the specified files in the current working directory.
+
+> Applied edit to tests/test_commands.py  
+> Commit e2d5212 aider: Added a test method `test_cmd_add` that creates a temporary directory, calls the `cmd_add` method with 'foo.txt' and 'bar.txt', and checks if both files have been created in the temporary directory.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### /run pytest tests/test_commands.py
+
+> ```  
+> ============================= test session starts ==============================  
+> platform darwin -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0  
+> rootdir: /Users/gauthier/Projects/aider  
+> collected 1 item  
+>   
+> tests/test_commands.py F                                                 [100%]  
+>   
+> =================================== FAILURES ===================================  
+> __________________________ TestCommands.test_cmd_add ___________________________  
+>   
+> self = <tests.test_commands.TestCommands testMethod=test_cmd_add>  
+>   
+>     def test_cmd_add(self):  
+>         # Create a temporary directory and change the current working directory  
+>         with tempfile.TemporaryDirectory() as tmpdir:  
+>             os.chdir(tmpdir)  
+>   
+>             # Initialize the Commands and InputOutput objects  
+>             io = InputOutput(pretty=False, yes=True)  
+>             coder = MagicMock()  
+>             commands = Commands(io, coder)  
+>   
+>             # Call the cmd_add method with 'foo.txt' and 'bar.txt'  
+> >           commands.cmd_add(["foo.txt", "bar.txt"])  
+>   
+> /Users/gauthier/Projects/aider/tests/test_commands.py:20:  
+> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
+>   
+> self = <aider.commands.Commands object at 0x10b412b50>  
+> args = ['foo.txt', 'bar.txt']  
+>   
+>     def cmd_add(self, args):  
+>         "Add matching files to the chat session"  
+>   
+>         added_fnames = []  
+>         files = self.coder.get_all_relative_files()  
+> >       for word in args.split():  
+> E       AttributeError: 'list' object has no attribute 'split'  
+>   
+> /Users/gauthier/Projects/aider/aider/commands.py:161: AttributeError  
+> =========================== short test summary info ============================  
+> FAILED tests/test_commands.py::TestCommands::test_cmd_add - AttributeError: '...  
+> ============================== 1 failed in 0.34s ===============================  
+> ```  
+>
+> Add the output to the chat? [y/n] (y): y  
+
+It seems that the `cmd_add` method in `aider/commands.py` is expecting a string as input, but in the test, we are passing a list. I will update the test to pass a string instead of a list.
+
+Here's the code change:
+
+```python
+tests/test_commands.py
+<<<<<<< ORIGINAL
+            # Call the cmd_add method with 'foo.txt' and 'bar.txt'
+            commands.cmd_add(["foo.txt", "bar.txt"])
+=======
+            # Call the cmd_add method with 'foo.txt' and 'bar.txt' as a single string
+            commands.cmd_add("foo.txt bar.txt")
+>>>>>>> UPDATED
+```
+
+Now, run the test again with the updated test method:
+
+```
+pytest tests/test_commands.py
+```
+
+> Applied edit to tests/test_commands.py  
+> Commit d1551b4 aider: Updated test to pass a string instead of a list when calling `cmd_add` method.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### /run pytest tests/test_commands.py
+
+> ```  
+> ============================= test session starts ==============================  
+> platform darwin -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0  
+> rootdir: /Users/gauthier/Projects/aider  
+> collected 1 item  
+>   
+> tests/test_commands.py .                                                 [100%]  
+>   
+> ============================== 1 passed in 0.38s ===============================  
+> ```  
+
+</div>
--- a/website/examples/asciinema.md
+++ b/website/examples/asciinema.md
@ -0,0 +1,29 @@
+---
+parent: Example chat transcripts
+---
+
+# Editing an asciinema cast file with aider
+
+The user needs to make some tedious edits to the escape sequences in an `asciinema` screencast file.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider hello.cast  
+> Added hello.cast to the chat  
+
+#### when the "hello.py> " prompt is displayed, it's including some escape codes to add a few blank lines below the prompt and then bringing the cursor back up. remove all that. just show the prompt followed by a space and leave the cursor showing right after that.  
+
+```python
+hello.cast
+<<<<<<< ORIGINAL
+[19.534013, "o", "\u001b[?2004h\u001b[?1l\u001b[?25l\u001b[0m\u001b[?7l\u001b[0m\u001b[J\u001b[0;38;5;28mhello.py>  \u001b[0m\r\r\n\r\r\n\r\r\n\u001b[0;38;5;28m                                                                \r\u001b[64C \r\u001b[3A\u001b[10C\u001b[?7h\u001b[0m\u001b[?12l\u001b[?25h"]
+=======
+[19.534013, "o", "\u001b[?2004h\u001b[?1l\u001b[?25l\u001b[0m\u001b[?7l\u001b[0m\u001b[J\u001b[0;38;5;28mhello.py> \u001b[0m\u001b[?7h\u001b[0m\u001b[?12l\u001b[?25h"]
+>>>>>>> UPDATED
+```
+
+> Applied edit to hello.cast  
+> Commit a3fa4d2 aider: Removed escape codes to display prompt and cursor correctly.  
+
+
+</div>
--- a/website/examples/census.md
+++ b/website/examples/census.md
@ -0,0 +1,326 @@
+---
+parent: Example chat transcripts
+---
+
+# Download, analyze and plot US Census data
+
+Inspired by
+[Ethan Mollick's recent blog article about ChatGPT plugins](https://www.oneusefulthing.org/p/it-is-starting-to-get-strange),
+I used `aider` and GPT-4 to do data analysis of some US Census data.
+I asked aider how to download census data and to suggest some
+hypotheses to test. I asked it to write the code to get the data, test one of the
+hypotheses and provide a summary and plot of the results.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider  
+> Using git repo: .git  
+
+#### what is a good basic us census dataset to analyze for interesting patterns?  
+
+A good basic US Census dataset to analyze for interesting patterns is the American Community Survey (ACS) dataset. The ACS is an ongoing survey conducted by the US Census Bureau that provides vital information on a yearly basis about the US population and its characteristics. The dataset includes information on demographics, social, economic, and housing characteristics.
+
+You can find the ACS dataset on the US Census Bureau's website: https://www.census.gov/programs-surveys/acs/data.html
+
+Some interesting patterns to analyze in the ACS dataset could include:
+
+1. Population growth and decline in different regions.
+2. Changes in racial and ethnic composition.
+3. Income inequality and poverty levels.
+4. Educational attainment and its relationship with income.
+5. Housing affordability and homeownership rates.
+
+Remember to consider the dataset's documentation and guidelines to understand the data and its limitations.
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### do you know how to download American Community Survey data?  
+
+Yes, I can guide you on how to download American Community Survey (ACS) data. You can use the Census Bureau's API to access the data. To do this, you'll need an API key, which you can obtain from the Census Bureau's website.
+
+Here's a Python example using the `requests` library to download ACS data:
+
+1. Install the `requests` library if you haven't already:
+
+```
+pip install requests
+```
+
+2. Use the following code to download ACS data:
+
+```python
+import requests
+
+# Replace 'your_api_key' with your actual API key
+api_key = 'your_api_key'
+
+# Define the API endpoint and parameters
+url = 'https://api.census.gov/data/2019/acs/acs5'
+params = {
+    'get': 'NAME,B01001_001E',  # Variables to retrieve (NAME: Geographic area name, B01001_001E: Total population)
+    'for': 'state:*',  # Geographic area (all states)
+    'key': api_key
+}
+
+# Send the request and parse the response
+response = requests.get(url, params=params)
+data = response.json()
+
+# Print the data
+for row in data:
+    print(row)
+```
+
+Replace `'your_api_key'` with the API key you obtained from the Census Bureau's website. This example retrieves the total population (variable `B01001_001E`) for all states. You can modify the `get` and `for` parameters to download different variables and geographic areas.
+
+For more information on available variables and geographic areas, refer to the ACS documentation:
+
+- Variables: https://api.census.gov/data/2019/acs/acs5/variables.html
+- Geographic Areas: https://api.census.gov/data/2019/acs/acs5/geography.html
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### how do i get an api key?  
+
+To obtain an API key from the Census Bureau, follow these steps:
+
+1. Go to the Census Bureau's API key request page: https://api.census.gov/data/key_signup.html
+
+2. Fill out the form with your name, email address, organization (if applicable), and a brief description of your intended use of the API.
+
+3. Click the "Request Key" button.
+
+4. You will receive an email containing your API key. It may take a few minutes for the email to arrive. Make sure to check your spam folder if you don't see it in your inbox.
+
+Once you have your API key, you can use it in your Python code to access the American Community Survey data, as shown in the previous example. Replace `'your_api_key'` with the API key you received in the email.
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### please save your download script as `getdata.py`  
+
+Here's the `getdata.py` file with the download script:
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+=======
+import requests
+
+# Replace 'your_api_key' with your actual API key
+api_key = 'your_api_key'
+
+# Define the API endpoint and parameters
+url = 'https://api.census.gov/data/2019/acs/acs5'
+params = {
+    'get': 'NAME,B01001_001E',  # Variables to retrieve (NAME: Geographic area name, B01001_001E: Total population)
+    'for': 'state:*',  # Geographic area (all states)
+    'key': api_key
+}
+
+# Send the request and parse the response
+response = requests.get(url, params=params)
+data = response.json()
+
+# Print the data
+for row in data:
+    print(row)
+>>>>>>> UPDATED
+```
+
+Save this code in a new file named `getdata.py`. Remember to replace `'your_api_key'` with your actual API key obtained from the Census Bureau.
+
+> Allow creation of new file new_file/getdata.py? y  
+> Add new_file/getdata.py to git? y  
+> Applied edit to new_file/getdata.py  
+> Commit afcb9d1 aider: Added `getdata.py` file with download script.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### add my api key: 4ae31ed2a0f0dd62609e51cd95deadbeefe7774f  
+
+I will update the API key in the file.
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+api_key = 'your_api_key'
+=======
+api_key = '4ae31ed2a0f0dd62609e51cd95deadbeefe7774f'
+>>>>>>> UPDATED
+```
+
+> Applied edit to new_file/getdata.py  
+> Commit 1d8b4a6 aider: Updated the API key in new_file/getdata.py.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### what are some interesting hypothesis that you might be able to test using ACS data?  
+
+Here are some interesting hypotheses that you can test using ACS (American Community Survey) data:
+
+1. Relationship between income and education level: Higher education levels lead to higher income.
+2. Impact of population density on housing prices: Higher population density areas have higher housing prices.
+3. Relationship between employment rate and crime rate: Higher employment rates lead to lower crime rates.
+4. Effect of public transportation availability on car ownership: Areas with better public transportation have lower car ownership rates.
+5. Relationship between age distribution and healthcare facilities: Areas with a higher percentage of elderly population have more healthcare facilities.
+
+These are just a few examples. ACS data provides a wealth of information on various demographic, social, economic, and housing characteristics, allowing you to explore many different hypotheses.
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can you download the data to test #2 Impact of population density on housing prices: Higher population density areas have higher housing prices.  
+
+As an AI language model, I cannot directly download or access data. However, I can help you with the code to download the required data for testing hypothesis #2 using the ACS data.
+
+To test the hypothesis, you'll need to obtain population density and housing price data. You can use the following ACS variables:
+
+- B01003_001E: Total population
+- B25077_001E: Median value of owner-occupied housing units
+
+Here's the updated code to fetch the required data:
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+    'get': 'NAME,B01001_001E',  # Variables to retrieve (NAME: Geographic area name, B01001_001E: Total population)
+=======
+    'get': 'NAME,B01003_001E,B25077_001E',  # Variables to retrieve (NAME: Geographic area name, B01003_001E: Total population, B25077_001E: Median value of owner-occupied housing units)
+>>>>>>> UPDATED
+```
+
+After running the updated code, you'll have the population and housing price data for each geographic area. You can then analyze the data to test the hypothesis.
+
+> Applied edit to new_file/getdata.py  
+> Commit 81549ab aider: Updated code to fetch population and housing price data for testing hypothesis #2.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### instead of printing all the data, print a summary table that shows whether the data supports the hypothesis  
+
+To print a summary table, you can use the pandas library to analyze the data and calculate the correlation between population density and housing prices. Here's how to modify the code:
+
+1. Import pandas.
+2. Convert the JSON data to a pandas DataFrame.
+3. Calculate the correlation between population density and housing prices.
+4. Print the correlation as a summary table.
+
+Here are the changes to the code:
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+import requests
+=======
+import requests
+import pandas as pd
+>>>>>>> UPDATED
+```
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+data = response.json()
+=======
+data = response.json()
+# Convert the JSON data to a pandas DataFrame
+df = pd.DataFrame(data[1:], columns=data[0])
+df[['B01003_001E', 'B25077_001E']] = df[['B01003_001E', 'B25077_001E']].apply(pd.to_numeric)
+>>>>>>> UPDATED
+```
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+for row in data:
+    print(row)
+=======
+# Calculate the correlation between population density and housing prices
+correlation = df['B01003_001E'].corr(df['B25077_001E'])
+
+# Print the correlation as a summary table
+print("Summary Table:")
+print("Correlation between population density and housing prices:", correlation)
+>>>>>>> UPDATED
+```
+
+After applying these changes, the code will print a summary table showing the correlation between population density and housing prices. A positive correlation value indicates that higher population density areas have higher housing prices, supporting the hypothesis.
+
+> Applied edit to new_file/getdata.py  
+> Applied edit to new_file/getdata.py  
+> Applied edit to new_file/getdata.py  
+> Commit 2f4fcd4 aider: Added pandas library to analyze data and calculate correlation between population density and housing prices, and printed the correlation as a summary table.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### /run python3 new_file/getdata.py  
+>```  
+>Summary Table:  
+>Correlation between population density and housing prices: 0.16020298445808737  
+>```  
+>  
+> Add the output to the chat? y
+
+Great! The script seems to be working as expected. The correlation between population density and housing prices is 0.1602. If you need any further assistance or modifications to the code, please let me know.
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can you also plot a graph which would show any correlation?  
+
+To plot a graph showing the correlation between population density and housing prices, we'll need to modify the `new_file/getdata.py` file. We'll use the `matplotlib` library to create the plot. If you don't have it installed, you can install it using `pip install matplotlib`.
+
+Here are the changes to make:
+
+1. Import the `matplotlib.pyplot` module.
+2. Create a scatter plot of population density (x-axis) and housing prices (y-axis).
+3. Add labels to the axes and a title to the plot.
+4. Display the plot.
+
+Here's the updated code:
+
+```python
+new_file/getdata.py
+<<<<<<< ORIGINAL
+import requests
+import pandas as pd
+=======
+import requests
+import pandas as pd
+import matplotlib.pyplot as plt
+>>>>>>> UPDATED
+
+...
+
+new_file/getdata.py
+<<<<<<< ORIGINAL
+print("Summary Table:")
+print("Correlation between population density and housing prices:", correlation)
+=======
+print("Summary Table:")
+print("Correlation between population density and housing prices:", correlation)
+
+# Create a scatter plot of population density and housing prices
+plt.scatter(df['B01003_001E'], df['B25077_001E'])
+plt.xlabel('Population Density')
+plt.ylabel('Housing Prices')
+plt.title('Correlation between Population Density and Housing Prices')
+plt.show()
+>>>>>>> UPDATED
+```
+
+This will create a scatter plot of population density and housing prices, which will help visualize the correlation between the two variables.
+
+> Applied edit to new_file/getdata.py  
+> Applied edit to new_file/getdata.py  
+> Commit 9ff3666 aider: Added scatter plot of population density and housing prices to visualize correlation.  
+
+
+</div>
+
+![aider screenshot](../assets/figure.png)
--- a/website/examples/chat-transcript-css.md
+++ b/website/examples/chat-transcript-css.md
@ -0,0 +1,451 @@
+---
+parent: Example chat transcripts
+---
+
+# Improve css styling of chat transcripts
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider index.html assets/css/style.css  
+> Added index.html to the chat  
+> Added assets/css/style.css to the chat  
+
+
+#### add a rounded border to all sides of the boxquote  
+
+```css
+assets/css/style.css
+<<<<<<< ORIGINAL
+.main-content blockquote{padding:0 1rem;margin-left:0;color:#819198;border-left:0.3rem solid #dce6f0}
+=======
+.main-content blockquote{
+  padding: 0 1rem;
+  margin
+```
+
+> ^C KeyboardInterrupt  
+> ^C again to quit  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### make all the edits to the embedded style tag in the html  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border-left-color: #159957;
+      }
+      .btn {
+        display: inline-block;
+        width: auto;
+      }
+      .btn + .btn {
+        margin-top: 0;
+      }
+    </style>
+=======
+    <style>
+      .main-content blockquote {
+        border-left-color: #159957;
+        border: 0.3rem solid #dce6f0;
+        border-radius: 0.5rem;
+      }
+      .btn {
+        display: inline-block;
+        width: auto;
+      }
+      .btn + .btn {
+        margin-top: 0;
+      }
+    </style>
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit e7a5cb5 aider: Added rounded border to all sides of the boxquote.  
+
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### make the blockquote font color the same as the p color  
+> Git repo has uncommitted changes.  
+> Commit before the chat proceeds [y/n/commit message]? Added note about missing method and incorrect method calls in index.html.  
+> Commit 2d31f5b Added note about missing method and incorrect method calls in index.html.  
+
+```css
+assets/css/style.css
+<<<<<<< ORIGINAL
+.main-content blockquote{padding:0 1rem;margin-left:0;color:#819198;border-left
+```
+
+> ^C KeyboardInterrupt  
+
+
+#### only edit index.html  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+      }
+      .btn {
+        display: inline-block;
+        width: auto;
+      }
+      .btn + .btn {
+        margin-top: 0;
+      }
+    </style>
+=======
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        color: #606c71;
+      }
+      .btn {
+        display: inline-block;
+        width: auto;
+      }
+      .btn + .btn {
+        margin-top: 0;
+      }
+    </style>
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit 8645217 aider: Changed blockquote font color to match <p> color in index.html.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### make the blockquote background a lighter shade of the same green as the border  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+      }
+=======
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+      }
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit 57b7b10 aider: Changed blockquote background color to a lighter shade of green.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### add some space inside the border of the blockquote  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+      }
+=======
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+        padding: 1rem;
+      }
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit dd7dcb6 aider: Added padding inside the border of the blockquote.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can we add a label "aider" along the top of the blockquote box only using style
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+        padding: 1rem;
+      }
+    </style>
+=======
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+        padding: 1rem;
+        position: relative;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -0.5rem;
+        left: 1rem;
+        background-color: #e0f2ef;
+        padding: 0 0.5rem;
+        font-weight: bold;
+        color: #159957;
+      }
+    </style>
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit 0b0e332 aider: Added label "aider" to the top of the blockquote box using style.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### add more padding on the top of the box and bring the label into the box so it doesn't intersect the border  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+        padding: 1rem;
+        position: relative;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -0.5rem;
+        left: 1rem;
+        background-color: #e0f2ef;
+        padding: 0 0.5rem;
+        font-weight: bold;
+        color: #159957;
+      }
+    </style>
+=======
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+        padding: 1.5rem 1rem 1rem 1rem;
+        position: relative;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: 0.5rem;
+        left: 1rem;
+        background-color: #e0f2ef;
+        padding: 0 0.5rem;
+        font-weight: bold;
+        color: #159957;
+      }
+    </style>
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit eee902e aider: Added more padding on the top of the box and brought the label into the box so it doesn't intersect the border.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### change all the blockquote colors to be shades of blue instead of green. choose the blue based on the blue in the header gradient  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #159957;
+        border-radius: 0.5rem;
+        background-color: #e0f2ef;
+        padding: 0.75rem;
+        position: relative;
+        margin-top: 2rem;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -1.75rem;
+        left: -0.5rem;
+        padding: 0 0.5rem;
+        font-weight: bold;
+        color: #159957;
+      }
+    </style>
+=======
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #155799;
+        border-radius: 0.5rem;
+        background-color: #e0ecf4;
+        padding: 0.75rem;
+        position: relative;
+        margin-top: 2rem;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -1.75rem;
+        left: -0.5rem;
+        padding: 0 0.5rem;
+        font-weight: bold;
+        color: #155799;
+      }
+    </style>
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit a26c135 aider: Changed blockquote colors to shades of blue based on the blue in the header gradient.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can you style the "aider" label on the blockquote as a "tab" that docks onto the top of the box?
+
+```html
+index.html
+<<<<<<< ORIGINAL
+      .main-content blockquote {
+        border: 0.1rem solid #155799;
+        border-radius: 0.5rem;
+        background-color: #e0ecf4;
+        padding: 0.75rem;
+        position: relative;
+        margin-top: 2rem;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -1.75rem;
+        left: -0.5rem;
+        padding: 0 0.5rem;
+        font-weight: bold;
+        color: #155799;
+      }
+=======
+      .main-content blockquote {
+        border: 0.1rem solid #155799;
+        border-radius: 0.5rem;
+        background-color: #e0ecf4;
+        padding: 0.75rem;
+        position: relative;
+        margin-top: 2rem;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -0.5rem;
+        left: 0;
+        padding: 0.25rem 0.5rem;
+        font-weight: bold;
+        color: #155799;
+        background-color: #e0ecf4;
+        border-top-left-radius: 0.5rem;
+        border-top-right-radius: 0.5rem;
+        border: 0.1rem solid #155799;
+        border-bottom: none;
+      }
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit 292eda3 aider: Styled the "aider" label on the blockquote as a "tab" that docks onto the top of the box.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### apply the same box around the h4 element  
+
+```html
+index.html
+<<<<<<< ORIGINAL
+    <style>
+      .main-content blockquote {
+        border: 0.1rem solid #155799;
+        border-radius: 0.5rem;
+        background-color: #e0ecf4;
+        padding: 1.25rem 0.75rem 0.75rem 0.75rem;
+        position: relative;
+        margin-top: 2rem;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -1rem;
+        left: 0.5;
+        padding: 0rem 0.5rem;
+        font-weight: bold;
+        color: #155799;
+        background-color: #e0ecf4;
+        border: 0.05rem solid #155799;
+        border-radius: 0.5rem;
+      }
+    </style>
+=======
+    <style>
+      .main-content blockquote, .main-content h4 {
+        border: 0.1rem solid #155799;
+        border-radius: 0.5rem;
+        background-color: #e0ecf4;
+        padding: 1.25rem 0.75rem 0.75rem 0.75rem;
+        position: relative;
+        margin-top: 2rem;
+      }
+      .main-content blockquote::before {
+        content: "aider";
+        display: inline-block;
+        position: absolute;
+        top: -1rem;
+        left: 0.5;
+        padding: 0rem 0.5rem;
+        font-weight: bold;
+        color: #155799;
+        background-color: #e0ecf4;
+        border: 0.05rem solid #155799;
+        border-radius: 0.5rem;
+      }
+    </style>
+>>>>>>> UPDATED
+```
+
+> Applied edit to index.html  
+> Commit 6a8cd27 aider: Added box around h4 element.  
+
+
+</div>
--- a/website/examples/complex-change.md
+++ b/website/examples/complex-change.md
@ -0,0 +1,371 @@
+---
+parent: Example chat transcripts
+nav_order: 3
+---
+
+# A complex multi-file change, with debugging
+
+This is a chat transcript of a fairly complex code change. The user is trying to update tests to use an input mock provided by `prompt_toolkit`. This requires changes to multiple source files.
+
+Some things to notice:
+  - The initial changes don't work, so the user provides error messages and snippets from the `prompt_toolkit` docs to collaboratively debug and solve the problem.
+  - The user edits a file outside the chat using their editor. `aider` notices these out-of-band changes and offers to commit them.
+  - At one point the user hits `^C` to interrupt the LLM response, since it's not a useful response. The user gives clarifications and gets a better result on the next reply.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider tests/test_main.py aider/getinput.py  
+> Added tests/test_main.py to the chat  
+> Added aider/getinput.py to the chat  
+
+#### instead of using /dev/null to send no input in the main tests, can we use the prompt_toolkit input mocking functions?  
+
+Yes, we can use the `prompt_toolkit.input` module to mock the input in the tests. Here's how to modify the `tests/test_main.py` file:
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+import os
+import sys
+import tempfile
+from unittest import TestCase
+from aider.main import main
+import subprocess
+=======
+import os
+import tempfile
+from unittest import TestCase
+from aider.main import main
+import subprocess
+from prompt_toolkit.input import create_pipe_input
+from prompt_toolkit.output import DummyOutput
+>>>>>>> UPDATED
+```
+
+Now, update the tests to use the `create_pipe_input` and `DummyOutput`:
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+            with open(os.devnull, "r") as dev_null:
+                save_stdin = sys.stdin
+                sys.stdin = dev_null
+                main([])
+                sys.stdin = save_stdin
+=======
+            pipe_input = create_pipe_input()
+            save_stdin = sys.stdin
+            sys.stdin = pipe_input
+            main([], input=pipe_input, output=DummyOutput())
+            sys.stdin = save_stdin
+            pipe_input.close()
+>>>>>>> UPDATED
+```
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+            with open(os.devnull, "r") as dev_null:
+                save_stdin = sys.stdin
+                sys.stdin = dev_null
+                main(["foo.txt"])
+                sys.stdin = save_stdin
+                self.assertTrue(os.path.exists("foo.txt"))
+=======
+            pipe_input = create_pipe_input()
+            save_stdin = sys.stdin
+            sys.stdin = pipe_input
+            main(["foo.txt"], input=pipe_input, output=DummyOutput())
+            sys.stdin = save_stdin
+            pipe_input.close()
+            self.assertTrue(os.path.exists("foo.txt"))
+>>>>>>> UPDATED
+```
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+            with open(os.devnull, "r") as dev_null:
+                save_stdin = sys.stdin
+                sys.stdin = dev_null
+                main(["--yes", "foo.txt"])
+                sys.stdin = save_stdin
+                self.assertTrue(os.path.exists("foo.txt"))
+=======
+            pipe_input = create_pipe_input()
+            save_stdin = sys.stdin
+            sys.stdin = pipe_input
+            main(["--yes", "foo.txt"], input=pipe_input, output=DummyOutput())
+            sys.stdin = save_stdin
+            pipe_input.close()
+            self.assertTrue(os.path.exists("foo.txt"))
+>>>>>>> UPDATED
+```
+
+These changes will replace the usage of `/dev/null` with `prompt_toolkit`'s input mocking functions.
+
+> Applied edit to tests/test_main.py  
+> Applied edit to tests/test_main.py  
+> Applied edit to tests/test_main.py  
+> Applied edit to tests/test_main.py  
+> Commit c177e29 aider: Replaced usage of /dev/null with prompt_toolkit's input mocking functions in tests.  
+
+
+> Git repo has uncommitted changes.  
+> Suggested commit message:  
+> Added import statement for sys module in test_main.py.  
+> Commit before the chat proceeds [y/n/commit message]? y  
+> Commit 6724fcd Added import statement for sys module in test_main.py.  
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### /add aider/main.py  
+
+> Added aider/main.py to the chat  
+
+
+#### I am getting this error: FAILED tests/test_main.py::TestMain::test_main_with_empty_git_dir_new_file - TypeError: main() got an unexpected keyword argument 'input'  
+
+I apologize for the confusion. It seems I made a mistake in the `main()` function definition in `aider/main.py`. The `main()` function should accept `input` and `output` keyword arguments. Here's the fix:
+
+```python
+aider/main.py
+<<<<<<< ORIGINAL
+def main(args=None):
+    if args is None:
+        args = sys.argv[1:]
+=======
+def main(args=None, input=None, output=None):
+    if args is None:
+        args = sys.argv[1:]
+    if input is None:
+        input = sys.stdin
+    if output is None:
+        output = sys.stdout
+>>>>>>> UPDATED
+```
+
+Now, the `main()` function will accept the `input` and `output` keyword arguments, and the tests should pass.
+
+> Applied edit to aider/main.py  
+> Commit 79dfba9 aider: Fixed main() function to accept input and output keyword arguments.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### we need to pass those into InputOutput and to prompt_toolkit right?  
+
+Yes, you're right. We need to pass the `input` and `output` arguments to the `InputOutput` class and then to the `prompt_toolkit.prompt()` function. Here's how to modify the `aider/getinput.py` and `aider/main.py` files:
+
+In `aider/getinput.py`, update the `InputOutput` class to accept `input` and `output` arguments:
+
+```python
+aider/getinput.py
+<<<<<<< ORIGINAL
+class InputOutput:
+    def __init__(self, pretty, yes, input_history_file, chat_history_file):
+=======
+class InputOutput:
+    def __init__(self, pretty, yes, input_history_file, chat_history_file, input=None, output=None):
+        self.input = input
+        self.output = output
+>>>>>>> UPDATED
+```
+
+Now, update the `prompt()` function calls in the `get_input()` method to use the `input` and `output` arguments:
+
+```python
+aider/getinput.py
+<<<<<<< ORIGINAL
+            line = prompt(
+                show,
+                completer=completer_instance,
+                history=FileHistory(self.input_history_file),
+                style=style,
+                reserve_space_for_menu=4,
+                complete_style=CompleteStyle.MULTI_COLUMN,
+            )
+=======
+            line = prompt(
+                show,
+                completer=completer_instance,
+                history=FileHistory(self.input_history_file),
+                style=style,
+                reserve_space_for_menu=4,
+                complete_style=CompleteStyle.MULTI_COLUMN,
+                input=self.input,
+                output=self.output,
+            )
+>>>>>>> UPDATED
+```
+
+In `aider/main.py`, pass the `input` and `output` arguments to the `InputOutput` class:
+
+```python
+aider/main.py
+<<<<<<< ORIGINAL
+    io = InputOutput(args.pretty, args.yes, args.input_history_file, args.chat_history_file)
+=======
+    io = InputOutput(args.pretty, args.yes, args.input_history_file, args.chat_history_file, input=input, output=output)
+>>>>>>> UPDATED
+```
+
+These changes will ensure that the `input` and `output` arguments are passed to the `InputOutput` class and then to the `prompt_toolkit.prompt()` function.
+
+> Applied edit to aider/getinput.py  
+> Applied edit to aider/getinput.py  
+> Applied edit to aider/main.py  
+> Commit cea9d2a aider: Updated InputOutput class to accept input and output arguments and passed them to prompt_toolkit.prompt() function.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### FAILED tests/test_main.py::TestMain::test_main_with_empty_git_dir_new_file - TypeError: prompt() got an unexpected keyword argument 'input'  
+
+I apologize for the confusion. It seems I made a mistake in the `prompt_toolkit.prompt()` function call. The `prompt()` function accepts `stdin` and `stdout` arguments instead of `input` and `output`. Here's the fix:
+
+Update the `prompt()` function call in the `get_input()` method in `aider/getinput.py`:
+
+```python
+aider/getinput.py
+<<<<<<< ORIGINAL
+            input=self.input,
+            output=self.output,
+=======
+            stdin=self.input,
+            stdout=self.output,
+>>>>>>> UPDATED
+```
+
+These changes will ensure that the `input` and `output` arguments are passed correctly to the `prompt_toolkit.prompt()` function.
+
+> Applied edit to aider/getinput.py  
+> Commit 55d3540 aider: Fixed `prompt()` function call in `get_input()` method to pass `stdin` and `stdout` arguments instead of `input` and `output`.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### prompt_toolkit.shortcuts.prompt(message: AnyFormattedText | None = None, *, history: History | None = None, editing_mode: EditingMode | None = None, refresh_interval: float | None = None, vi_mode: bool | None = None, lexer: Lexer | None = None, completer: Completer | None = None, complete_in_thread: bool | None = None, is_password: bool | None = None, key_bindings: KeyBindingsBase | None = None, bottom_toolbar: AnyFormattedText | None = None, style: BaseStyle | None = None, color_depth: ColorDepth | None = None, cursor: AnyCursorShapeConfig = None, include_default_pygments_style: FilterOrBool | None = None, style_transformation: StyleTransformation | None = None, swap_light_and_dark_colors: FilterOrBool | None = None, rprompt: AnyFormattedText | None = None, multiline: FilterOrBool | None = None, prompt_continuation: PromptContinuationText | None = None, wrap_lines: FilterOrBool | None = None, enable_history_search: FilterOrBool | None = None, search_ignore_case: FilterOrBool | None = None, complete_while_typing: FilterOrBool | None = None, validate_while_typing: FilterOrBool | None = None, complete_style: CompleteStyle | None = None, auto_suggest: AutoSuggest | None = None, validator: Validator | None = None, clipboard: Clipboard | None = None, mouse_support: FilterOrBool | None = None, input_processors: list[Processor] | None = None, placeholder: AnyFormattedText | None = None, reserve_space_for_menu: int | None = None, enable_system_prompt: FilterOrBool | None = None, enable_suspend: FilterOrBool | None = None, enable_open_in_editor: FilterOrBool | None = None, tempfile_suffix: str | Callable[[], str] | None = None, tempfile: str | Callable[[], str] | None = None, default: str = '', accept_default: bool = False, pre_run: Callable[[], None] | None = None) → str  
+
+I apologize for the confusion earlier. It seems that the `prompt_toolkit.prompt()` function does not accept `stdin` and `stdout` arguments directly. Instead, we need to use the `Application` class from `prompt_toolkit` to create a custom application with the desired input and output. Here's how to modify the `aider/getinput.py` file:
+
+First, import the necessary classes and functions from `prompt_tool
+
+> ^C KeyboardInterrupt  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### class prompt_toolkit.shortcuts.PromptSession(message: AnyFormattedText = '', *, multiline: FilterOrBool = False, wrap_lines: FilterOrBool = True, is_password: FilterOrBool = False, vi_mode: bool = False, editing_mode: EditingMode = <EditingMode.EMACS: 'EMACS'>, complete_while_typing: FilterOrBool = True, validate_while_typing: FilterOrBool = True, enable_history_search: FilterOrBool = False, search_ignore_case: FilterOrBool = False, lexer: Lexer | None = None, enable_system_prompt: FilterOrBool = False, enable_suspend: FilterOrBool = False, enable_open_in_editor: FilterOrBool = False, validator: Validator | None = None, completer: Completer | None = None, complete_in_thread: bool = False, reserve_space_for_menu: int = 8, complete_style: CompleteStyle = <CompleteStyle.COLUMN: 'COLUMN'>, auto_suggest: AutoSuggest | None = None, style: BaseStyle | None = None, style_transformation: StyleTransformation | None = None, swap_light_and_dark_colors: FilterOrBool = False, color_depth: ColorDepth | None = None, cursor: AnyCursorShapeConfig = None, include_default_pygments_style: FilterOrBool = True, history: History | None = None, clipboard: Clipboard | None = None, prompt_continuation: PromptContinuationText | None = None, rprompt: AnyFormattedText = None, bottom_toolbar: AnyFormattedText = None, mouse_support: FilterOrBool = False, input_processors: list[Processor] | None = None, placeholder: AnyFormattedText | None = None, key_bindings: KeyBindingsBase | None = None, erase_when_done: bool = False, tempfile_suffix: str | Callable[[], str] | None = '.txt', tempfile: str | Callable[[], str] | None = None, refresh_interval: float = 0, input: Input | None = None, output: Output | None = None)  
+
+Thank you for providing the `PromptSession` class information. We can use the `PromptSession` class to create a custom session with the desired input and output. Here's how to modify the `aider/getinput.py` file:
+
+First, import the `PromptSession` class from `prompt_toolkit.shortcuts`:
+
+```python
+aider/getinput.py
+<<<<<<< ORIGINAL
+from prompt_toolkit import prompt
+=======
+from prompt_toolkit.shortcuts import PromptSession
+>>>>>>> UPDATED
+```
+
+Now, update the `get_input()` method to use the `PromptSession` class:
+
+```python
+aider/getinput.py
+<<<<<<< ORIGINAL
+            line = prompt(
+                show,
+                completer=completer_instance,
+                history=FileHistory(self.input_history_file),
+                style=style,
+                reserve_space_for_menu=4,
+                complete_style=CompleteStyle.MULTI_COLUMN,
+                input=self.input,
+                output=self.output,
+            )
+=======
+            session = PromptSession(
+                message=show,
+                completer=completer_instance,
+                history=FileHistory(self.input_history_file),
+                style=style,
+                reserve_space_for_menu=4,
+                complete_style=CompleteStyle.MULTI_COLUMN,
+                input=self.input,
+                output=self.output,
+            )
+            line = session.prompt()
+>>>>>>> UPDATED
+```
+
+These changes will ensure that the `input` and `output` arguments are passed correctly to the `prompt_toolkit`'s `PromptSession` class.
+
+> Applied edit to aider/getinput.py  
+> Applied edit to aider/getinput.py  
+> Commit ac92cca aider: Replaced `prompt()` with `PromptSession()` to correctly pass input and output arguments.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### try using create_input(StringIO('')) in the tests  
+
+
+I understand that you want to use `create_input(StringIO(''))` in the tests. However, I need more information about which test cases you want to apply this change to and how you want to use it. Please provide more details or examples.
+
+
+#### the ones that currently use create_pipe_input()  
+
+I see that you want to replace `create_pipe_input()` with `create_input(StringIO(''))` in the test cases. Here are the changes for the `tests/test_main.py` file:
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+from prompt_toolkit.input import create_pipe_input
+=======
+from prompt_toolkit.input import create_input
+from io import StringIO
+>>>>>>> UPDATED
+```
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+            pipe_input = create_pipe_input()
+=======
+            pipe_input = create_input(StringIO(''))
+>>>>>>> UPDATED
+```
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+            pipe_input = create_pipe_input()
+=======
+            pipe_input = create_input(StringIO(''))
+>>>>>>> UPDATED
+```
+
+```python
+tests/test_main.py
+<<<<<<< ORIGINAL
+            pipe_input = create_pipe_input()
+=======
+            pipe_input = create_input(StringIO(''))
+>>>>>>> UPDATED
+```
+
+These changes will replace the usage of `create_pipe_input()` with `create_input(StringIO(''))` in the test cases.
+
+> Applied edit to tests/test_main.py  
+> Applied edit to tests/test_main.py  
+> Applied edit to tests/test_main.py  
+> Applied edit to tests/test_main.py  
+> Commit 4bb043f aider: Replaced the usage of `create_pipe_input()` with `create_input(StringIO(''))` in the test cases.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### nice, that worked!  
+
+I'm glad the changes worked for you! If you have any more requests or need further assistance, feel free to ask.
+
+
+</div>
--- a/website/examples/css-exercises.md
+++ b/website/examples/css-exercises.md
@ -0,0 +1,86 @@
+---
+parent: Example chat transcripts
+---
+
+# Complete a css exercise with aider
+
+In this chat transcript, the user asks about the open source [CSS Exercises repo](https://github.com/TheOdinProject/css-exercises) and completes an animation exercise.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider
+
+#### what is this repo?  
+
+Based on the file list provided, it seems like this repository contains a collection of CSS and HTML exercises, solutions, and examples. The exercises cover various topics such as Flexbox, Grid, Animation, Foundations, and Margin & Padding. The repository also includes images and other assets used in the exercises.
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### /add animation/03-dropdown-menu/README.md
+
+> Added animation/03-dropdown-menu/README.md to the chat  
+
+Yes, I see the content of the `animation/03-dropdown-menu/README.md` file. It describes an exercise to add animation to a dropdown menu, creating a bounce illusion when the dropdown expands close to its final end state. The desired outcome is shown in a GIF file, and the hints provided suggest using the `transform-origin` property and adding an intermediate step to the keyframe at-rule to achieve the bounce effect.
+
+Is there any specific change or improvement you would like me to make to this file or any other file in the repository?
+
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can you make the changes suggested in the readme?  
+
+To make the changes suggested in the `animation/03-dropdown-menu/README.md`, I need to see the contents of the following files:
+
+1. `animation/03-dropdown-menu/style.css`
+2. `animation/03-dropdown-menu/script.js`
+
+Please provide the contents of these files so I can suggest the necessary changes.
+
+
+> animation/03-dropdown-menu/style.css  
+> animation/03-dropdown-menu/script.js  
+> Add these files to the chat? y  
+
+
+Apologies for the oversight. I see the content of the `animation/03-dropdown-menu/style.css` and `animation/03-dropdown-menu/script.js` files now. I will suggest the necessary changes to add the bounce animation to the dropdown menu.
+
+```css
+animation/03-dropdown-menu/style.css
+<<<<<<< ORIGINAL
+.visible {
+  display: block;
+}
+=======
+.visible {
+  display: block;
+  animation: expand 0.3s ease-out;
+  transform-origin: top;
+}
+
+@keyframes expand {
+  0% {
+    transform: scaleY(0);
+  }
+  80% {
+    transform: scaleY(1.1);
+  }
+  100% {
+    transform: scaleY(1);
+  }
+}
+>>>>>>> UPDATED
+```
+
+These changes add the bounce animation to the dropdown menu by creating a new keyframe animation called `expand`. The `transform-origin` property is set to `top` to make the dropdown menu start transforming from the top. The intermediate step at 80% of the animation creates the bounce illusion.
+
+> Applied edit to animation/03-dropdown-menu/style.css  
+> Commit bb3a813 aider: Added bounce animation to dropdown menu.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### that worked!  
+
+I'm glad the changes worked for you! If you have any more requests or need assistance with other parts of the code, feel free to ask. I'm here to help!
+
+</div>
--- a/website/examples/hello-world-flask.md
+++ b/website/examples/hello-world-flask.md
@ -0,0 +1,134 @@
+---
+parent: Example chat transcripts
+nav_order: 1
+---
+
+# Create a simple flask app with aider
+
+In this chat transcript, the user asks to make a simple flask app with various endpoints.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider app.py  
+> Creating empty file app.py  
+> Added app.py to the chat  
+
+#### make a flask app with a /hello endpoint that returns hello world  
+
+To create a Flask app with a `/hello` endpoint that returns "Hello, World!", we need to install Flask, import it, create an app instance, and define a route for the `/hello` endpoint. Here's how to update the `app.py` file:
+
+```python
+app.py
+<<<<<<< ORIGINAL
+
+=======
+from flask import Flask
+
+app = Flask(__name__)
+
+@app.route('/hello')
+def hello():
+    return "Hello, World!"
+
+if __name__ == '__main__':
+    app.run()
+>>>>>>> UPDATED
+```
+
+Make sure to install Flask in your project environment using `pip install flask` before running the app.
+
+> Applied edit to app.py  
+> Commit 414c394 aider: Added a `/hello` endpoint that returns "Hello, World!" to the Flask app.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### add an endpoint like /add/3/5 which returns the sum of the 2 numbers  
+
+To add an endpoint like `/add/3/5` that returns the sum of the two numbers, you can create a new route with two URL parameters. Update the `app.py` file as follows:
+
+```python
+app.py
+<<<<<<< ORIGINAL
+@app.route('/hello')
+def hello():
+    return "Hello, World!"
+=======
+@app.route('/add/<int:num1>/<int:num2>')
+def add(num1, num2):
+    return str(num1 + num2)
+
+@app.route('/hello')
+def hello():
+    return "Hello, World!"
+>>>>>>> UPDATED
+```
+
+Now, when you visit `/add/3/5`, it will return the sum of the two numbers, which is `8`.
+
+> Applied edit to app.py  
+> Commit 0872c51 aider: Added an endpoint `/add/<int:num1>/<int:num2>` that returns the sum of the two numbers.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### add /fibonacci/X  
+
+To add a `/fibonacci/X` endpoint that returns the Xth number in the Fibonacci sequence, you can create a new route with a URL parameter for X and a function to calculate the Fibonacci number. Update the `app.py` file as follows:
+
+```python
+app.py
+<<<<<<< ORIGINAL
+@app.route('/hello')
+def hello():
+    return "Hello, World!"
+=======
+def fibonacci(n):
+    if n <= 0:
+        return "Invalid input. Please enter a positive integer."
+    elif n == 1:
+        return 0
+    elif n == 2:
+        return 1
+    else:
+        a, b = 0, 1
+        for _ in range(n - 2):
+            a, b = b, a + b
+        return b
+
+@app.route('/fibonacci/<int:x>')
+def fibonacci_x(x):
+    return str(fibonacci(x))
+
+@app.route('/hello')
+def hello():
+    return "Hello, World!"
+>>>>>>> UPDATED
+```
+
+Now, when you visit `/fibonacci/X`, it will return the Xth number in the Fibonacci sequence. For example, `/fibonacci/5` will return `3`.
+
+> Applied edit to app.py  
+> Commit 163fba8 aider: Added a `/fibonacci/X` endpoint that returns the Xth number in the Fibonacci sequence.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### remove the hello endpoint  
+
+To remove the `/hello` endpoint, you can simply delete the corresponding route and function in the `app.py` file. Here's the update:
+
+```python
+app.py
+<<<<<<< ORIGINAL
+@app.route('/hello')
+def hello():
+    return "Hello, World!"
+=======
+>>>>>>> UPDATED
+```
+
+Now, the `/hello` endpoint has been removed from the Flask app.
+
+> Applied edit to app.py  
+> Commit eab6620 aider: Removed the `/hello` endpoint from the Flask app.  
+
+
+</div>
--- a/website/examples/hello.md
+++ b/website/examples/hello.md
@ -0,0 +1,35 @@
+---
+parent: Example chat transcripts
+---
+
+# Hello aider!
+
+Here is the simplest possible "hello world" example of using aider:
+
+<br/>
+<br/>
+<br/>
+<div class="chat-transcript" markdown="1">
+
+#### change hello to goodbye
+
+To change the output from "hello" to "goodbye", we need to modify the `hello.py` file.
+
+Here is the change:
+
+```python
+hello.py
+<<<<<<< ORIGINAL
+print("hello")
+=======
+print("goodbye")
+>>>>>>> UPDATED
+```
+
+> Applied edit to hello.py  
+> Commit 672ae42 aider: Changed output from "hello" to "goodbye" in hello.py.
+
+</div>
+<br/>
+<br/>
+<br/>
--- a/website/examples/no-color.md
+++ b/website/examples/no-color.md
@ -0,0 +1,167 @@
+---
+parent: Example chat transcripts
+---
+
+# Honor the NO_COLOR environment variable
+
+The user pastes the specification of the NO_COLOR environment variable protocol
+from [no-color.org](https://no-color.org) into the aider chat.
+
+Aider figures out which files need to be edited, and asks permission to make changes.
+It then disables all the color and pretty-printing functions if NO_COLOR is
+set, so the tool conforms to the specification.
+
+The user then asks for a test case, which aider adds as well.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider  
+> Using git repo: .git  
+> Use /help to see in-chat commands.  
+
+#### Command-line software which adds ANSI color to its output by default should check for a NO_COLOR environment variable that, when present and not an empty string (regardless of its value), prevents the addition of ANSI color.  
+
+To implement this feature, we need to modify the `aider/io.py` file. Please make it *read-write* so I can provide the necessary changes.
+
+> aider/io.py  
+> Add these files to the chat? y  
+
+Great! Now, I will make the necessary changes to check for the `NO_COLOR` environment variable and disable ANSI color when it is present and not an empty string.
+
+```python
+aider/io.py
+<<<<<<< ORIGINAL
+import os
+=======
+import os
+import os.path
+>>>>>>> UPDATED
+```
+
+```python
+aider/io.py
+<<<<<<< ORIGINAL
+    def __init__(
+        self,
+        pretty=True,
+        yes=False,
+        input_history_file=None,
+        chat_history_file=None,
+        input=None,
+        output=None,
+        user_input_color="blue",
+        tool_output_color=None,
+        tool_error_color="red",
+    ):
+        self.user_input_color = user_input_color
+        self.tool_output_color = tool_output_color
+        self.tool_error_color = tool_error_color
+        self.input = input
+        self.output = output
+        self.pretty = pretty
+        self.yes = yes
+        self.input_history_file = input_history_file
+        if chat_history_file is not None:
+            self.chat_history_file = Path(chat_history_file)
+        else:
+            self.chat_history_file = None
+
+        if pretty:
+            self.console = Console()
+        else:
+            self.console = Console(force_terminal=True, no_color=True)
+=======
+    def __init__(
+        self,
+        pretty=True,
+        yes=False,
+        input_history_file=None,
+        chat_history_file=None,
+        input=None,
+        output=None,
+        user_input_color="blue",
+        tool_output_color=None,
+        tool_error_color="red",
+    ):
+        no_color = os.environ.get("NO_COLOR")
+        if no_color is not None and no_color != "":
+            pretty = False
+
+        self.user_input_color = user_input_color if pretty else None
+        self.tool_output_color = tool_output_color if pretty else None
+        self.tool_error_color = tool_error_color if pretty else None
+        self.input = input
+        self.output = output
+        self.pretty = pretty
+        self.yes = yes
+        self.input_history_file = input_history_file
+        if chat_history_file is not None:
+            self.chat_history_file = Path(chat_history_file)
+        else:
+            self.chat_history_file = None
+
+        self.console = Console(force_terminal=not pretty, no_color=not pretty)
+>>>>>>> UPDATED
+```
+
+These changes will make the software check for the `NO_COLOR` environment variable and disable ANSI color when it is present and not an empty string.
+
+> Applied edit to aider/io.py  
+> Applied edit to aider/io.py  
+> Commit 3218242 aider: Added check for NO_COLOR environment variable to disable ANSI color.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### add a test
+
+To add a test for the `NO_COLOR` environment variable, we will create a new test function in the `tests/test_io.py` file. Since this file does not exist, we will create it first.
+
+```python
+tests/test_io.py
+<<<<<<< ORIGINAL
+=======
+import os
+import unittest
+from unittest.mock import patch
+from aider.io import InputOutput
+
+class TestInputOutput(unittest.TestCase):
+    def test_no_color_environment_variable(self):
+        with patch.dict(os.environ, {"NO_COLOR": "1"}):
+            io = InputOutput()
+            self.assertFalse(io.pretty)
+
+if __name__ == "__main__":
+    unittest.main()
+>>>>>>> UPDATED
+```
+
+This test function checks if the `pretty` attribute of the `InputOutput` class is set to `False` when the `NO_COLOR` environment variable is set.
+
+> Allow creation of new file tests/test_io.py? y  
+> Add tests/test_io.py to git? y  
+> Applied edit to tests/test_io.py  
+> Commit 52a9df5 aider: Added a test function to check if the `pretty` attribute of the `InputOutput` class is set to `False` when the `NO_COLOR` environment variable is set.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### /run pytest
+
+> ============================= test session starts ==============================  
+> platform darwin -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0  
+> rootdir: /Users/gauthier/Projects/aider  
+> collected 25 items  
+>   
+> tests/test_coder.py ......                                               [ 24%]  
+> tests/test_commands.py .                                                 [ 28%]  
+> tests/test_io.py .                                                       [ 32%]  
+> tests/test_main.py ....                                                  [ 48%]  
+> tests/test_repomap.py ...                                                [ 60%]  
+> tests/test_utils.py ..........                                           [100%]  
+>   
+> ============================== 25 passed in 1.41s ==============================
+>  
+> Add the output to the chat? y
+
+
+</div>
--- a/website/examples/pong.md
+++ b/website/examples/pong.md
@ -0,0 +1,315 @@
+---
+parent: Example chat transcripts
+---
+
+# Build pong with aider and pygame.
+
+In this chat transcript, the user asks to make a pong game using pygame.
+After setting up a basic game, the user asks for some customization of the colors and game mechanics.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider  
+
+#### can you make a simple game of pong with pygame?  
+
+Sure, I can help you create a simple Pong game using Pygame. First, let's break down the steps needed to create the game:
+
+1. Install Pygame library.
+2. Initialize Pygame and create a game window.
+3. Create Paddle and Ball classes.
+4. Initialize the paddles and the ball.
+5. Create the game loop.
+6. Handle user input for paddle movement.
+7. Update the ball position and check for collisions.
+8. Draw the paddles and ball on the screen.
+9. Check for scoring and update the score.
+10. Display the score on the screen.
+
+Now, let's start with the code. I'll provide you with the code changes in an *EDIT BLOCK*. Since we are creating a new file, I'll name it `pong_game.py`.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+=======
+import pygame
+import sys
+
+# Initialize Pygame
+pygame.init()
+
+# Set up the game window
+screen_width = 640
+screen_height = 480
+screen = pygame.display.set_mode((screen_width, screen_height))
+pygame.display.set_caption("Pong")
+
+# Game loop
+running = True
+while running:
+    for event in pygame.event.get():
+        if event.type == pygame.QUIT:
+            running = False
+
+    # Update game objects
+
+    # Draw game objects
+
+    # Update the display
+    pygame.display.flip()
+
+# Quit Pygame
+pygame.quit()
+sys.exit()
+>>>>>>> UPDATED
+```
+
+This is the basic structure of the game. Now, let's add the Paddle and Ball classes, handle user input, update the ball position, draw the objects, and handle scoring. I'll provide the code changes in the next message.
+
+> Allow creation of new file pong_game.py? y  
+> Add pong_game.py to git? y  
+> Applied edit to pong_game.py  
+> Commit 14799cb aider: Added basic Pygame initialization and game loop structure to `pong_game.py`.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### go ahead  
+
+Now, let's add the Paddle and Ball classes, handle user input, update the ball position, draw the objects, and handle scoring. I'll provide the code changes in an *EDIT BLOCK*.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+import pygame
+import sys
+
+# Initialize Pygame
+pygame.init()
+
+# Set up the game window
+screen_width = 640
+screen_height = 480
+screen = pygame.display.set_mode((screen_width, screen_height))
+pygame.display.set_caption("Pong")
+=======
+import pygame
+import sys
+import random
+
+# Initialize Pygame
+pygame.init()
+
+# Set up the game window
+screen_width = 640
+screen_height = 480
+screen = pygame.display.set_mode((screen_width, screen_height))
+pygame.display.set_caption("Pong")
+
+# Paddle and Ball classes
+class Paddle(pygame.sprite.Sprite):
+    def __init__(self, x, y):
+        super().__init__()
+        self.width = 10
+        self.height = 60
+        self.image = pygame.Surface((self.width, self.height))
+        self.image.fill((255, 255, 255))
+        self.rect = self.image.get_rect()
+        self.rect.x = x
+        self.rect.y = y
+
+    def move(self, dy):
+        if self.rect.y + dy >= 0 and self.rect.y + dy <= screen_height - self.height:
+            self.rect.y += dy
+
+class Ball(pygame.sprite.Sprite):
+    def __init__(self):
+        super().__init__()
+        self.width = 10
+        self.height = 10
+        self.image = pygame.Surface((self.width, self.height))
+        self.image.fill((255, 255, 255))
+        self.rect = self.image.get_rect()
+        self.rect.x = screen_width // 2 - self.width // 2
+        self.rect.y = screen_height // 2 - self.height // 2
+        self.dx = random.choice([-1, 1]) * 4
+        self.dy = random.choice([-1, 1]) * 4
+
+    def update(self):
+        self.rect.x += self.dx
+        self.rect.y += self.dy
+
+        # Collision with top and bottom
+        if self.rect.y <= 0 or self.rect.y >= screen_height - self.height:
+            self.dy = -self.dy
+
+        # Scoring
+        if self.rect.x <= 0 or self.rect.x >= screen_width - self.width:
+            self.reset()
+
+    def reset(self):
+        self.rect.x = screen_width // 2 - self.width // 2
+        self.rect.y = screen_height // 2 - self.height // 2
+        self.dx = random.choice([-1, 1]) * 4
+        self.dy = random.choice([-1, 1]) * 4
+
+    def collide_with(self, paddle):
+        if self.rect.colliderect(paddle.rect):
+            self.dx = -self.dx
+>>>>>>> UPDATED
+```
+
+Now, let's initialize the paddles and the ball, handle user input, update the ball position, draw the objects, and handle scoring.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+# Game loop
+running = True
+while running:
+    for event in pygame.event.get():
+        if event.type == pygame.QUIT:
+            running = False
+
+    # Update game objects
+
+    # Draw game objects
+
+    # Update the display
+    pygame.display.flip()
+=======
+# Initialize paddles and ball
+left_paddle = Paddle(10, screen_height // 2 - 30)
+right_paddle = Paddle(screen_width - 20, screen_height // 2 - 30)
+ball = Ball()
+
+# Game loop
+running = True
+while running:
+    for event in pygame.event.get():
+        if event.type == pygame.QUIT:
+            running = False
+
+    # Handle user input
+    keys = pygame.key.get_pressed()
+    if keys[pygame.K_w]:
+        left_paddle.move(-5)
+    if keys[pygame.K_s]:
+        left_paddle.move(5)
+    if keys[pygame.K_UP]:
+        right_paddle.move(-5)
+    if keys[pygame.K_DOWN]:
+        right_paddle.move(5)
+
+    # Update game objects
+    ball.update()
+    ball.collide_with(left_paddle)
+    ball.collide_with(right_paddle)
+
+    # Draw game objects
+    screen.fill((0, 0, 0))
+    screen.blit(left_paddle.image, left_paddle.rect)
+    screen.blit(right_paddle.image, right_paddle.rect)
+    screen.blit(ball.image, ball.rect)
+
+    # Update the display
+    pygame.display.flip()
+    pygame.time.delay(16)
+>>>>>>> UPDATED
+```
+
+Now, you should have a simple Pong game using Pygame. You can further improve the game by adding a scoring system and displaying the score on the screen.
+
+> Applied edit to pong_game.py  
+> Applied edit to pong_game.py  
+> Commit b20bcee aider: Added Paddle and Ball classes, handled user input, updated ball position, drew objects, and handled scoring.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### nice, it works. but the ball moves way too fast.  
+
+I'm glad it works! Let's slow down the ball by reducing its speed. We can do this by changing the values of `self.dx` and `self.dy` in the `Ball` class.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+        self.dx = random.choice([-1, 1]) * 4
+        self.dy = random.choice([-1, 1]) * 4
+=======
+        self.dx = random.choice([-1, 1]) * 2
+        self.dy = random.choice([-1, 1]) * 2
+>>>>>>> UPDATED
+```
+
+This change will reduce the ball's speed by half. You can adjust the values further if you want the ball to move even slower.
+
+> Applied edit to pong_game.py  
+> Commit f4ceed4 aider: Reduced ball speed by half.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### can you make the paddles a bit bigger? and make one paddle red and one blue?  
+
+Sure, let's make the paddles bigger and change their colors. We can do this by modifying the `Paddle` class in the `pong_game.py` file.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+        self.width = 10
+        self.height = 60
+        self.image = pygame.Surface((self.width, self.height))
+        self.image.fill((255, 255, 255))
+=======
+        self.width = 15
+        self.height = 80
+        self.image = pygame.Surface((self.width, self.height))
+        self.color = (255, 255, 255) if x < screen_width // 2 else (0, 0, 255)
+        self.image.fill(self.color)
+>>>>>>> UPDATED
+```
+
+Now, let's change the color of the left paddle to red.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+left_paddle = Paddle(10, screen_height // 2 - 30)
+=======
+left_paddle = Paddle(10, screen_height // 2 - 40)
+left_paddle.color = (255, 0, 0)
+left_paddle.image.fill(left_paddle.color)
+>>>>>>> UPDATED
+```
+
+With these changes, the paddles will be larger, and the left paddle will be red while the right paddle will be blue.
+
+> Applied edit to pong_game.py  
+> Applied edit to pong_game.py  
+> Commit 2c645c9 aider: Changed paddle size and color, and set left paddle to red.  
+
+</div>
+<div class="chat-transcript" markdown="1">
+#### each time the ball bounces off the paddle, can you make it go 10% faster?  
+
+Sure, let's increase the ball's speed by 10% each time it bounces off a paddle. We can do this by modifying the `collide_with` method in the `Ball` class.
+
+```python
+pong_game.py
+<<<<<<< ORIGINAL
+    def collide_with(self, paddle):
+        if self.rect.colliderect(paddle.rect):
+            self.dx = -self.dx
+=======
+    def collide_with(self, paddle):
+        if self.rect.colliderect(paddle.rect):
+            self.dx = -self.dx * 1.1
+            self.dy *= 1.1
+>>>>>>> UPDATED
+```
+
+With this change, the ball's speed will increase by 10% in both the x and y directions each time it bounces off a paddle.
+
+> Applied edit to pong_game.py  
+> Commit 693faf4 aider: Increased ball speed by 10% in both x and y directions each time it bounces off a paddle.  
+
+
+</div>
--- a/website/examples/semantic-search-replace.md
+++ b/website/examples/semantic-search-replace.md
@ -0,0 +1,59 @@
+---
+parent: Example chat transcripts
+---
+# Semantic search & replace code with aider
+
+In this chat transcript, the user asks to modify a series of function calls. 
+
+Notice that it isn't replacing exact strings.
+Making the change requires dealing with various formatting and semantic differences in the calls that it is updating.
+
+Some edits were elided from this transcript for brevity.
+
+<div class="chat-transcript" markdown="1">
+
+> $ aider aider/coder.py  
+> Added aider/coder.py to the chat  
+
+#### replace all the self.console.print() calls that contain [red] with calls to self.io.tool_error() with [red] removed from the string
+
+```python
+aider/coder.py
+<<<<<<< ORIGINAL
+            self.console.print(
+                f"[red]This tool will almost certainly fail to work with {main_model}"
+            )
+=======
+            self.io.tool_error(f"This tool will almost certainly fail to work with {main_model}")
+>>>>>>> UPDATED
+```
+
+```python
+aider/coder.py
+<<<<<<< ORIGINAL
+            self.console.print("[red]Files are not in a git repo.")
+=======
+            self.io.tool_error("Files are not in a git repo.")
+>>>>>>> UPDATED
+```
+
+```python
+aider/coder.py
+<<<<<<< ORIGINAL
+            self.console.print("[red]Malformed ORIGINAL/UPDATE blocks, retrying...")
+            self.console.print("[red]", Text(err))
+=======
+            self.io.tool_error("Malformed ORIGINAL/UPDATE blocks, retrying...")
+            self.io.tool_error(Text(err))
+>>>>>>> UPDATED
+```
+
+Please note that I couldn't find the `self.io.tool_error()` method in the provided code. You may need to create this method in the appropriate class or provide the correct method name to replace the `self.console.print()` calls.
+
+> Applied edit to aider/coder.py  
+> Applied edit to aider/coder.py  
+> Applied edit to aider/coder.py  
+> Commit 919cfc2 aider: Replaced all self.console.print() calls with self.io.tool_error() calls with [red] removed from the string.  
+
+
+</div>
--- a/Show more
+++ b/Show more