Merge branch 'main' into call-graph

2025-05-31 17:55:01 +00:00 · 2023-05-26 17:07:17 -07:00 · 2023-05-26 17:07:17 -07:00 · 1e1feeaa21
commit 1e1feeaa21
parent 599a3b2730 c33169756a
9 changed files with 222 additions and 92 deletions
--- a/README.md
+++ b/README.md
@ -33,10 +33,10 @@ You can find more chat transcripts on the [examples page](https://aider.chat/exa
 * `aider` will apply the edits suggested by GPT-4 directly to your source files.
 * `aider` will automatically commit each changeset to your local git repo with a descriptive commit message. These frequent, automatic commits provide a safety net. It's easy to undo `aider` changes or use standard git workflows to manage longer sequences of changes.
 * `aider` can review multiple source files at once and make coordinated code changes across all of them in a single changeset/commit.
-* `aider` gives GPT a
+* `aider` can give GPT a
 [map of your entire git repo](https://aider.chat/docs/ctags.html),
-so it can ask for permission to review whichever files seem relevant to your requests.
+which helps it understand and modify large codebases.
-* You can also edit the files using your editor while chatting with `aider`.
+* You can edit the files by hand using your editor while chatting with `aider`.
  * `aider` will notice if you edit the files outside the chat.
  * It will help you commit these out-of-band changes, if you'd like.
  * It will bring the updated file contents into the chat.
@ -49,8 +49,11 @@ so it can ask for permission to review whichever files seem relevant to your req
 1. Install the package:
    * From GitHub: `pip install git+https://github.com/paul-gauthier/aider.git`
    * From your local copy of the repo in develop mode to pick up local edits immediately: `pip install -e .` 
 2. Set up your OpenAI API key as an environment variable `OPENAI_API_KEY` or by including it in a `.env` file.
 3. Optionally, install [universal ctags](https://github.com/universal-ctags/ctags). This is helpful if you plan to work with repositories with more than a handful of files.  This allows `aider --ctags` to build a [map of your entire git repo](https://aider.chat/docs/ctags.html) and share it with GPT to help it better understand and modify large codebases.
 ## Usage
 Run the `aider` tool by executing the following command:
@ -65,19 +68,40 @@ You can also just launch `aider` anywhere in a git repo without naming files on
 It will discover all the files in the repo.
 You can then add and remove individual files in the chat session with the `/add` and `/drop` chat commands described below.
-You can also use additional command-line options to customize the behavior of the tool. The following options are available, along with their corresponding environment variable overrides:
+You can also use additional command-line options, environment variables or configuration file
 to set many options:
- `--input-history-file INPUT_HISTORY_FILE`: Specify the chat input history file (default: .aider.input.history). Override the default with the environment variable `AIDER_INPUT_HISTORY_FILE`.
+```
- `--chat-history-file CHAT_HISTORY_FILE`: Specify the chat history file (default: .aider.chat.history.md). Override the default with the environment variable `AIDER_CHAT_HISTORY_FILE`.
+  -h, --help            show this help message and exit
- `--model MODEL`: Specify the model to use for the main chat (default: gpt-4). Override the default with the environment variable `AIDER_MODEL`.
+  -c CONFIG_FILE, --config CONFIG_FILE
- `-3`: Use gpt-3.5-turbo model for the main chat (not advised). No environment variable override.
+                        Specify the config file (default: search for
- `--ctags`: Add ctags to the chat to help GPT understand the codebase (default: False, `AIDER_CTAGS`). Requires [universal ctags](https://github.com/universal-ctags/ctags). Override the default with the environment variable `AIDER_CTAGS`.
+                        .aider.conf.yml in git root or home directory)
- `--no-pretty`: Disable pretty, colorized output. Override the default with the environment variable `AIDER_PRETTY` (default: 1 for enabled, 0 for disabled).
+  --input-history-file INPUT_HISTORY_FILE
- `--no-auto-commits`: Disable auto commit of changes. Override the default with the environment variable `AIDER_AUTO_COMMITS` (default: 1 for enabled, 0 for disabled).
+                        Specify the chat input history file (default:
- `--show-diffs`: Show diffs when committing changes (default: False). Override the default with the environment variable `AIDER_SHOW_DIFFS` (default: 0 for False, 1 for True).
+                        .aider.input.history) [env var: AIDER_INPUT_HISTORY_FILE]
- `--yes`: Always say yes to every confirmation (default: False).
+  --chat-history-file CHAT_HISTORY_FILE
-
+                        Specify the chat history file (default:
-For more information, run `aider --help`.
+                        .aider.chat.history.md) [env var: AIDER_CHAT_HISTORY_FILE]
  --model MODEL         Specify the model to use for the main chat (default: gpt-4)
                        [env var: AIDER_MODEL]
  -3                    Use gpt-3.5-turbo model for the main chat (not advised)
  --pretty              Enable pretty, colorized output (default: True) [env var:
                        AIDER_PRETTY]
  --no-pretty           Disable pretty, colorized output
  --apply FILE          Apply the changes from the given file instead of running
                        the chat (debug)
  --auto-commits        Enable auto commit of changes (default: True) [env var:
                        AIDER_AUTO_COMMIT]
  --no-auto-commits     Disable auto commit of changes
  --dry-run             Perform a dry run without applying changes (default: False)
  --show-diffs          Show diffs when committing changes (default: False) [env
                        var: AIDER_SHOW_DIFFS]
  --ctags [CTAGS]       Add ctags to the chat to help GPT understand the codebase
                        (default: check for ctags executable) [env var:
                        AIDER_CTAGS]
  --yes                 Always say yes to every confirmation
  -v, --verbose         Enable verbose output
 ```
 ## Chat commands
--- a/aider/coder.py
+++ b/aider/coder.py
@ -168,6 +168,9 @@ class Coder:
        if self.abs_fnames:
            files_content = prompts.files_content_prefix
            files_content += self.get_files_content()
        else:
            files_content = prompts.files_no_full_files
        all_content += files_content
        other_files = set(self.get_all_abs_files()) - set(self.abs_fnames)
@ -204,7 +207,7 @@ class Coder:
                self.num_control_c += 1
                if self.num_control_c >= 2:
                    break
-                self.io.tool_error("^C again to quit")
+                self.io.tool_error("^C again or /exit to quit")
            except EOFError:
                return
--- a/aider/commands.py
+++ b/aider/commands.py
@ -1,8 +1,8 @@
 import sys
 import os
 import git
 import subprocess
 import shlex
 from rich.prompt import Confirm
 from prompt_toolkit.completion import Completion
 from aider import prompts
@ -16,20 +16,8 @@ class Commands:
        if inp[0] == "/":
            return True
    def help(self):
        "Show help about all commands"
        commands = self.get_commands()
        for cmd in commands:
            cmd_method_name = f"cmd_{cmd[1:]}"
            cmd_method = getattr(self, cmd_method_name, None)
            if cmd_method:
                description = cmd_method.__doc__
                self.io.tool(f"{cmd} {description}")
            else:
                self.io.tool(f"{cmd} No description available.")
    def get_commands(self):
-        commands = ["/help"]
+        commands = []
        for attr in dir(self):
            if attr.startswith("cmd_"):
                commands.append("/" + attr[4:])
@ -62,15 +50,15 @@ class Commands:
        all_commands = self.get_commands()
        matching_commands = [cmd for cmd in all_commands if cmd.startswith(first_word)]
        if len(matching_commands) == 1:
            if matching_commands[0] == "/help":
                self.help()
            else:
            return self.do_run(matching_commands[0][1:], rest_inp)
        elif len(matching_commands) > 1:
-            self.io.tool_error("Ambiguous command: ', '.join(matching_commands)}")
+            self.io.tool_error(f"Ambiguous command: {', '.join(matching_commands)}")
        else:
            self.io.tool_error(f"Error: {first_word} is not a valid command.")
    # any method called cmd_xxx becomes a command automatically.
    # each one must take an args param.
    def cmd_commit(self, args):
        "Commit edits to the repo made outside the chat (commit message optional)"
@ -251,6 +239,10 @@ class Commands:
        )
        return msg
    def cmd_exit(self, args):
        "Exit the application"
        sys.exit()
    def cmd_ls(self, args):
        "List all known files and those included in the chat session"
@ -274,3 +266,15 @@ class Commands:
            self.io.tool("\nRepo files not in the chat:\n")
        for file in other_files:
            self.io.tool(f"  {file}")
    def cmd_help(self, args):
        "Show help about all commands"
        commands = sorted(self.get_commands())
        for cmd in commands:
            cmd_method_name = f"cmd_{cmd[1:]}"
            cmd_method = getattr(self, cmd_method_name, None)
            if cmd_method:
                description = cmd_method.__doc__
                self.io.tool(f"{cmd} {description}")
            else:
                self.io.tool(f"{cmd} No description available.")
--- a/aider/main.py
+++ b/aider/main.py
@ -1,47 +1,85 @@
 import os
 import sys
-import argparse
+import git
 import configargparse
 from dotenv import load_dotenv
 from aider.coder import Coder
 from aider.io import InputOutput
 def get_git_root():
    try:
        repo = git.Repo(search_parent_directories=True)
        return repo.working_tree_dir
    except git.InvalidGitRepositoryError:
        return None
 def main(args=None, input=None, output=None):
    if args is None:
        args = sys.argv[1:]
    load_dotenv()
    env_prefix = "AIDER_"
-    parser = argparse.ArgumentParser(description="aider - chat with GPT about your code")
+
    default_config_files = [
        os.path.expanduser("~/.aider.conf.yml"),
    ]
    git_root = get_git_root()
    if git_root:
        default_config_files.insert(0, os.path.join(git_root, ".aider.conf.yml"))
    parser = configargparse.ArgumentParser(
        description="aider - chat with GPT about your code",
        add_config_file_help=True,
        default_config_files=default_config_files,
        config_file_parser_class=configargparse.YAMLConfigFileParser,
    )
    parser.add_argument(
        "-c",
        "--config",
        is_config_file=True,
        metavar="CONFIG_FILE",
        help=(
            "Specify the config file (default: search for .aider.conf.yml in git root or home"
            " directory)"
        ),
    )
    parser.add_argument(
        "files",
        metavar="FILE",
        nargs="*",
        help="a list of source code files (optional)",
    )
    default_input_history_file = (
        os.path.join(git_root, ".aider.input.history") if git_root else ".aider.input.history"
    )
    default_chat_history_file = (
        os.path.join(git_root, ".aider.chat.history.md") if git_root else ".aider.chat.history.md"
    )
    parser.add_argument(
        "--input-history-file",
        metavar="INPUT_HISTORY_FILE",
-        default=os.environ.get(f"{env_prefix}INPUT_HISTORY_FILE", ".aider.input.history"),
+        env_var=f"{env_prefix}INPUT_HISTORY_FILE",
-        help=(
+        default=default_input_history_file,
-            "Specify the chat input history file (default: .aider.input.history,"
+        help=f"Specify the chat input history file (default: {default_input_history_file})",
            f" ${env_prefix}INPUT_HISTORY_FILE)"
        ),
    )
    parser.add_argument(
        "--chat-history-file",
        metavar="CHAT_HISTORY_FILE",
-        default=os.environ.get(f"{env_prefix}CHAT_HISTORY_FILE", ".aider.chat.history.md"),
+        env_var=f"{env_prefix}CHAT_HISTORY_FILE",
-        help=(
+        default=default_chat_history_file,
-            "Specify the chat history file (default: .aider.chat.history.md,"
+        help=f"Specify the chat history file (default: {default_chat_history_file})",
            f" ${env_prefix}CHAT_HISTORY_FILE)"
        ),
    )
    parser.add_argument(
        "--model",
        metavar="MODEL",
-        default=os.environ.get(f"{env_prefix}MODEL", "gpt-4"),
+        env_var=f"{env_prefix}MODEL",
-        help=f"Specify the model to use for the main chat (default: gpt-4, ${env_prefix}MODEL)",
+        default="gpt-4",
        help="Specify the model to use for the main chat (default: gpt-4)",
    )
    parser.add_argument(
        "-3",
@ -50,24 +88,38 @@ def main(args=None, input=None, output=None):
        const="gpt-3.5-turbo",
        help="Use gpt-3.5-turbo model for the main chat (not advised)",
    )
    parser.add_argument(
        "--pretty",
        action="store_true",
        env_var=f"{env_prefix}PRETTY",
        default=True,
        help="Enable pretty, colorized output (default: True)",
    )
    parser.add_argument(
        "--no-pretty",
        action="store_false",
        dest="pretty",
-        help=f"Disable pretty, colorized output (${env_prefix}PRETTY)",
+        help="Disable pretty, colorized output",
        default=bool(int(os.environ.get(f"{env_prefix}PRETTY", 1))),
    )
    parser.add_argument(
        "--apply",
        metavar="FILE",
        help="Apply the changes from the given file instead of running the chat (debug)",
    )
    parser.add_argument(
        "--auto-commits",
        action="store_true",
        env_var=f"{env_prefix}AUTO_COMMIT",
        default=True,
        help="Enable auto commit of changes (default: True)",
    )
    parser.add_argument(
        "--no-auto-commits",
        action="store_false",
-        dest="auto_commits",
+        dest="auto_commit",
-        help=f"Disable auto commit of changes (${env_prefix}AUTO_COMMITS)",
+        help="Disable auto commit of changes",
        default=bool(int(os.environ.get(f"{env_prefix}AUTO_COMMITS", 1))),
    )
    parser.add_argument(
        "--dry-run",
@ -78,17 +130,21 @@ def main(args=None, input=None, output=None):
    parser.add_argument(
        "--show-diffs",
        action="store_true",
-        help=f"Show diffs when committing changes (default: False, ${env_prefix}SHOW_DIFFS)",
+        env_var=f"{env_prefix}SHOW_DIFFS",
-        default=bool(int(os.environ.get(f"{env_prefix}SHOW_DIFFS", 0))),
+        help="Show diffs when committing changes (default: False)",
        default=False,
    )
    parser.add_argument(
        "--ctags",
-        action="store_true",
+        type=lambda x: (str(x).lower() == "true"),
        nargs="?",
        const=True,
        default=None,
        env_var=f"{env_prefix}CTAGS",
        help=(
-            "Add ctags to the chat to help GPT understand the codebase (default: False,"
+            "Add ctags to the chat to help GPT understand the codebase (default: check for ctags"
-            f" ${env_prefix}CTAGS)"
+            " executable)"
        ),
        default=bool(int(os.environ.get(f"{env_prefix}CTAGS", 0))),
    )
    parser.add_argument(
        "--yes",
@ -97,7 +153,8 @@ def main(args=None, input=None, output=None):
        default=False,
    )
    parser.add_argument(
-        "-v", "--verbose",
+        "-v",
        "--verbose",
        action="store_true",
        help="Enable verbose output",
        default=False,
--- a/aider/prompts.py
+++ b/aider/prompts.py
@ -8,14 +8,12 @@ Take requests for changes to the supplied code.
 If the request is ambiguous, ask questions.
 Once you understand the request you MUST:
-1. List the files you need to modify.
+1. List the files you need to modify. If they are *read-only* ask the user to make them *read-write* using the file's full path name.
 2. Think step-by-step and explain the needed changes.
 3. Describe each change with an *edit block* per the example below.
 """
-system_reminder = """Base any edits off the files shown in the user's last msg.
+system_reminder = """You MUST format EVERY code change with an *edit block* like this:
 You MUST format EVERY code change with an *edit block* like this:
 ```python
 some/dir/example.py
@ -29,11 +27,11 @@ some/dir/example.py
    def add(a,b):
 >>>>>>> UPDATED
 Every *edit block* must be fenced w/triple backticks with the correct code language.
 Every *edit block* must start with the full path! *NEVER* propose edit blocks for *read-only* files.
 The ORIGINAL section must be an *exact* set of lines from the file:
 - NEVER SKIP LINES!
 - Include all original leading spaces and indentation!
 Every *edit block* must be fenced w/triple backticks with the correct code language.
 Every *edit block* must start with the full path!
 Edits to different parts of a file each need their own *edit block*.
@ -54,11 +52,13 @@ files_content_gpt_no_edits = "I didn't see any properly formatted edits in your
 files_content_local_edits = "I edited the files myself."
-files_content_prefix = "Propose changes to *only* these files (ask before editing others):\n"
+files_content_prefix = "These are the *read-write* files:\n"
 files_no_full_files = "I am not sharing any *read-write* files yet."
 repo_content_prefix = (
-    "Here is a map of all the {other}files{ctags_msg}. You *must* ask with the"
+    "All the files below here are *read-only* files. Notice that files in directories are indented."
-    " full path before editing these:\n\n"
+    " Use their parent dirs to build their full path.\n"
 )
--- a/aider/repomap.py
+++ b/aider/repomap.py
@ -3,6 +3,7 @@ import json
 import sys
 import subprocess
 import tiktoken
 import tempfile
 from collections import defaultdict
 from aider import prompts, utils
@ -48,14 +49,20 @@ def fname_to_components(fname, with_colon):
 class RepoMap:
-    def __init__(self, use_ctags=True, root=None, main_model="gpt-4"):
+    ctags_cmd = ["ctags", "--fields=+S", "--extras=-F", "--output-format=json"]
    def __init__(self, use_ctags=None, root=None, main_model="gpt-4"):
        if not root:
            root = os.getcwd()
        self.use_ctags = use_ctags
        self.tokenizer = tiktoken.encoding_for_model(main_model)
        self.root = root
        if use_ctags is None:
            self.use_ctags = self.check_for_ctags()
        else:
            self.use_ctags = use_ctags
        self.tokenizer = tiktoken.encoding_for_model(main_model)
    def get_repo_map(self, chat_files, other_files):
        res = self.choose_files_listing(other_files)
        if not res:
@ -123,7 +130,7 @@ class RepoMap:
    def split_path(self, path):
        path = os.path.relpath(path, self.root)
-        return fname_to_components(path, True)
+        return [path + ":"]
    def run_ctags(self, filename):
        # Check if the file is in the cache and if the modification time has not changed
@ -132,7 +139,7 @@ class RepoMap:
        if cache_key in TAGS_CACHE and TAGS_CACHE[cache_key]["mtime"] == file_mtime:
            return TAGS_CACHE[cache_key]["data"]
-        cmd = ["ctags", "--fields=+S", "--extras=-F", "--output-format=json", filename]
+        cmd = self.ctags_cmd + [filename]
        output = subprocess.check_output(cmd).decode("utf-8")
        output = output.splitlines()
@ -169,6 +176,17 @@ class RepoMap:
        return tags
    def check_for_ctags(self):
        try:
            with tempfile.TemporaryDirectory() as tempdir:
                hello_py = os.path.join(tempdir, "hello.py")
                with open(hello_py, "w") as f:
                    f.write("def hello():\n    print('Hello, world!')\n")
                self.get_tags(hello_py)
        except Exception:
            return False
        return True
 def find_py_files(directory):
    if not os.path.isdir(directory):
@ -197,6 +215,7 @@ def call_map():
    """
    rm = RepoMap()
    # res = rm.get_tags_map(fnames)
    # print(res)
@ -222,7 +241,7 @@ def call_map():
            # dump("ref", fname, ident)
            references[ident].append(show_fname)
-    for ident,fname in defines.items():
+    for ident, fname in defines.items():
        dump(fname, ident)
    idents = set(defines.keys()).intersection(set(references.keys()))
@ -256,7 +275,9 @@ def call_map():
    ranked = nx.pagerank(G, weight="weight")
    # drop low weight edges for plotting
-    edges_to_remove = [(node1, node2) for node1, node2, data in G.edges(data=True) if data['weight'] < 1]
+    edges_to_remove = [
        (node1, node2) for node1, node2, data in G.edges(data=True) if data["weight"] < 1
    ]
    G.remove_edges_from(edges_to_remove)
    # Remove isolated nodes (nodes with no edges)
    dump(G.nodes())
@ -272,8 +293,8 @@ def call_map():
        dot.node(fname, penwidth=str(pen))
    max_w = max(edges.values())
-    for refs,defs,data in G.edges(data=True):
+    for refs, defs, data in G.edges(data=True):
-        weight = data['weight']
+        weight = data["weight"]
        r = random.randint(0, 255)
        g = random.randint(0, 255)
@ -286,7 +307,7 @@ def call_map():
        print()
        print(name)
        for ident in sorted(labels[name]):
-            print('\t', ident)
+            print("\t", ident)
        # print(f"{refs} -{weight}-> {defs}")
    top_rank = sorted([(rank, node) for (node, rank) in ranked.items()], reverse=True)
@ -296,5 +317,6 @@ def call_map():
    dot.render("tmp", format="pdf", view=True)
 if __name__ == "__main__":
    call_map()
--- a/assets/robot-flowchart.png
+++ b/assets/robot-flowchart.png
--- a/docs/ctags.md
+++ b/docs/ctags.md
@ -1,5 +1,7 @@
-# Improving GPT-4's codebase understanding with ctags
+# Improving GPT-4's codebase understanding with a map
 ![robot flowchat](../assets/robot-flowchart.png)
 GPT-4 is extremely useful for "self-contained" coding tasks,
 like generating brand new code or modifying a pure function
@ -8,7 +10,7 @@ that has no dependencies.
 But it's difficult to use GPT-4 to modify or extend
 a large, complex pre-existing codebase.
 To modify such code, GPT needs to understand the dependencies and APIs
-which interconnect all of its subsystems.
+which interconnect its subsystems.
 Somehow we need to provide this "code context" to GPT
 when we ask it to accomplish a coding task. Specifically, we need to:
@ -22,8 +24,11 @@ To address these issues, `aider` now
 sends GPT a **concise map of your whole git repository**
 that includes
 all declared variables and functions with call signatures.
-This *repo map* is built using `ctags`
+This *repo map* is built automatically using `ctags`, which
-and enables GPT to better comprehend, navigate
+extracts symbol definitions from source files. Historically,
 ctags were generated and indexed by IDEs and editors to
 help humans search and navigate large codebases.
 Instead, we're going to use ctags to help GPT better comprehend, navigate
 and edit code in larger repos.
 To get a sense of how effective this can be, this
@ -35,6 +40,12 @@ Using only the meta-data in the repo map, GPT is able to figure out how to
 call the method to be tested, as well as how to instantiate multiple
 class objects that are required to prepare for the test.
 To code with GPT-4 using the techniques discussed here:
  - Install [aider](https://github.com/paul-gauthier/aider#installation).
  - Install [universal ctags](https://github.com/universal-ctags/ctags).
  - Run `aider --ctags` inside your repo.
 ## The problem: code context
@ -63,7 +74,7 @@ For the example above, you could send the file that
 contains the Foo class
 and the file that contains the BarLog logging subsystem.
 This works pretty well, and is supported by `aider` -- you
-can manually specify which files to "add to the chat".
+can manually specify which files to "add to the chat" you are having with GPT.
 But it's not ideal to have to manually identify the right
 set of files to add to the chat.
@ -139,7 +150,14 @@ map. Universal ctags can scan source code written in many
 languages, and extract data about all the symbols defined in each
 file.
-For example, here is the `ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
+Historically, ctags were generated and indexed by IDEs or code editors 
 to make it easier for a human to search and navigate a
 codebase, find the implementation of functions, etc.
 Instead, we're going to use ctags to help GPT navigate and understand the codebase.
 Here is the type of output you get when you run ctags on source code. Specifically,
 this is the
 `ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
 ```json
 {
@ -160,8 +178,8 @@ For example, here is the `ctags --fields=+S --output-format=json` output for the
 ```
 The repo map is built using this type of `ctags` data,
-formatted into the space
+but formatted into the space
-efficient hierarchical tree format shown above.
+efficient hierarchical tree format shown earlier.
 This is a format that GPT can easily understand
 and which conveys the map data using a
 minimal number of tokens.
@ -198,7 +216,7 @@ Some possible approaches to reducing the amount of map data are:
 One key goal is to prefer solutions which are language agnostic or
 which can be easily deployed against most popular code languages.
 The `ctags` solution has this benefit, since it comes pre-built
-with tooling for most popular languages.
+with support for most popular languages.
 I suspect that Language Server Protocol might be an even
 better tool than `ctags` for this problem.
 But it is more cumbersome to deploy for a broad
@ -212,5 +230,5 @@ To use this experimental repo map feature:
  - Install [aider](https://github.com/paul-gauthier/aider#installation).
  - Install [universal ctags](https://github.com/universal-ctags/ctags).
-  - Run `aider` with the `--ctags` option inside your repo.
+  - Run `aider --ctags` inside your repo.
--- a/requirements.txt
+++ b/requirements.txt
@ -24,3 +24,5 @@ wcwidth==0.2.6
 yarl==1.9.2
 pytest==7.3.1
 tiktoken==0.4.0
 configargparse
 PyYAML