mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-31 17:55:01 +00:00
Merge branch 'main' into call-graph
This commit is contained in:
commit
1e1feeaa21
9 changed files with 222 additions and 92 deletions
54
README.md
54
README.md
|
@ -33,10 +33,10 @@ You can find more chat transcripts on the [examples page](https://aider.chat/exa
|
||||||
* `aider` will apply the edits suggested by GPT-4 directly to your source files.
|
* `aider` will apply the edits suggested by GPT-4 directly to your source files.
|
||||||
* `aider` will automatically commit each changeset to your local git repo with a descriptive commit message. These frequent, automatic commits provide a safety net. It's easy to undo `aider` changes or use standard git workflows to manage longer sequences of changes.
|
* `aider` will automatically commit each changeset to your local git repo with a descriptive commit message. These frequent, automatic commits provide a safety net. It's easy to undo `aider` changes or use standard git workflows to manage longer sequences of changes.
|
||||||
* `aider` can review multiple source files at once and make coordinated code changes across all of them in a single changeset/commit.
|
* `aider` can review multiple source files at once and make coordinated code changes across all of them in a single changeset/commit.
|
||||||
* `aider` gives GPT a
|
* `aider` can give GPT a
|
||||||
[map of your entire git repo](https://aider.chat/docs/ctags.html),
|
[map of your entire git repo](https://aider.chat/docs/ctags.html),
|
||||||
so it can ask for permission to review whichever files seem relevant to your requests.
|
which helps it understand and modify large codebases.
|
||||||
* You can also edit the files using your editor while chatting with `aider`.
|
* You can edit the files by hand using your editor while chatting with `aider`.
|
||||||
* `aider` will notice if you edit the files outside the chat.
|
* `aider` will notice if you edit the files outside the chat.
|
||||||
* It will help you commit these out-of-band changes, if you'd like.
|
* It will help you commit these out-of-band changes, if you'd like.
|
||||||
* It will bring the updated file contents into the chat.
|
* It will bring the updated file contents into the chat.
|
||||||
|
@ -49,8 +49,11 @@ so it can ask for permission to review whichever files seem relevant to your req
|
||||||
1. Install the package:
|
1. Install the package:
|
||||||
* From GitHub: `pip install git+https://github.com/paul-gauthier/aider.git`
|
* From GitHub: `pip install git+https://github.com/paul-gauthier/aider.git`
|
||||||
* From your local copy of the repo in develop mode to pick up local edits immediately: `pip install -e .`
|
* From your local copy of the repo in develop mode to pick up local edits immediately: `pip install -e .`
|
||||||
|
|
||||||
2. Set up your OpenAI API key as an environment variable `OPENAI_API_KEY` or by including it in a `.env` file.
|
2. Set up your OpenAI API key as an environment variable `OPENAI_API_KEY` or by including it in a `.env` file.
|
||||||
|
|
||||||
|
3. Optionally, install [universal ctags](https://github.com/universal-ctags/ctags). This is helpful if you plan to work with repositories with more than a handful of files. This allows `aider --ctags` to build a [map of your entire git repo](https://aider.chat/docs/ctags.html) and share it with GPT to help it better understand and modify large codebases.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
Run the `aider` tool by executing the following command:
|
Run the `aider` tool by executing the following command:
|
||||||
|
@ -65,19 +68,40 @@ You can also just launch `aider` anywhere in a git repo without naming files on
|
||||||
It will discover all the files in the repo.
|
It will discover all the files in the repo.
|
||||||
You can then add and remove individual files in the chat session with the `/add` and `/drop` chat commands described below.
|
You can then add and remove individual files in the chat session with the `/add` and `/drop` chat commands described below.
|
||||||
|
|
||||||
You can also use additional command-line options to customize the behavior of the tool. The following options are available, along with their corresponding environment variable overrides:
|
You can also use additional command-line options, environment variables or configuration file
|
||||||
|
to set many options:
|
||||||
|
|
||||||
- `--input-history-file INPUT_HISTORY_FILE`: Specify the chat input history file (default: .aider.input.history). Override the default with the environment variable `AIDER_INPUT_HISTORY_FILE`.
|
```
|
||||||
- `--chat-history-file CHAT_HISTORY_FILE`: Specify the chat history file (default: .aider.chat.history.md). Override the default with the environment variable `AIDER_CHAT_HISTORY_FILE`.
|
-h, --help show this help message and exit
|
||||||
- `--model MODEL`: Specify the model to use for the main chat (default: gpt-4). Override the default with the environment variable `AIDER_MODEL`.
|
-c CONFIG_FILE, --config CONFIG_FILE
|
||||||
- `-3`: Use gpt-3.5-turbo model for the main chat (not advised). No environment variable override.
|
Specify the config file (default: search for
|
||||||
- `--ctags`: Add ctags to the chat to help GPT understand the codebase (default: False, `AIDER_CTAGS`). Requires [universal ctags](https://github.com/universal-ctags/ctags). Override the default with the environment variable `AIDER_CTAGS`.
|
.aider.conf.yml in git root or home directory)
|
||||||
- `--no-pretty`: Disable pretty, colorized output. Override the default with the environment variable `AIDER_PRETTY` (default: 1 for enabled, 0 for disabled).
|
--input-history-file INPUT_HISTORY_FILE
|
||||||
- `--no-auto-commits`: Disable auto commit of changes. Override the default with the environment variable `AIDER_AUTO_COMMITS` (default: 1 for enabled, 0 for disabled).
|
Specify the chat input history file (default:
|
||||||
- `--show-diffs`: Show diffs when committing changes (default: False). Override the default with the environment variable `AIDER_SHOW_DIFFS` (default: 0 for False, 1 for True).
|
.aider.input.history) [env var: AIDER_INPUT_HISTORY_FILE]
|
||||||
- `--yes`: Always say yes to every confirmation (default: False).
|
--chat-history-file CHAT_HISTORY_FILE
|
||||||
|
Specify the chat history file (default:
|
||||||
For more information, run `aider --help`.
|
.aider.chat.history.md) [env var: AIDER_CHAT_HISTORY_FILE]
|
||||||
|
--model MODEL Specify the model to use for the main chat (default: gpt-4)
|
||||||
|
[env var: AIDER_MODEL]
|
||||||
|
-3 Use gpt-3.5-turbo model for the main chat (not advised)
|
||||||
|
--pretty Enable pretty, colorized output (default: True) [env var:
|
||||||
|
AIDER_PRETTY]
|
||||||
|
--no-pretty Disable pretty, colorized output
|
||||||
|
--apply FILE Apply the changes from the given file instead of running
|
||||||
|
the chat (debug)
|
||||||
|
--auto-commits Enable auto commit of changes (default: True) [env var:
|
||||||
|
AIDER_AUTO_COMMIT]
|
||||||
|
--no-auto-commits Disable auto commit of changes
|
||||||
|
--dry-run Perform a dry run without applying changes (default: False)
|
||||||
|
--show-diffs Show diffs when committing changes (default: False) [env
|
||||||
|
var: AIDER_SHOW_DIFFS]
|
||||||
|
--ctags [CTAGS] Add ctags to the chat to help GPT understand the codebase
|
||||||
|
(default: check for ctags executable) [env var:
|
||||||
|
AIDER_CTAGS]
|
||||||
|
--yes Always say yes to every confirmation
|
||||||
|
-v, --verbose Enable verbose output
|
||||||
|
```
|
||||||
|
|
||||||
## Chat commands
|
## Chat commands
|
||||||
|
|
||||||
|
|
|
@ -168,6 +168,9 @@ class Coder:
|
||||||
if self.abs_fnames:
|
if self.abs_fnames:
|
||||||
files_content = prompts.files_content_prefix
|
files_content = prompts.files_content_prefix
|
||||||
files_content += self.get_files_content()
|
files_content += self.get_files_content()
|
||||||
|
else:
|
||||||
|
files_content = prompts.files_no_full_files
|
||||||
|
|
||||||
all_content += files_content
|
all_content += files_content
|
||||||
|
|
||||||
other_files = set(self.get_all_abs_files()) - set(self.abs_fnames)
|
other_files = set(self.get_all_abs_files()) - set(self.abs_fnames)
|
||||||
|
@ -204,7 +207,7 @@ class Coder:
|
||||||
self.num_control_c += 1
|
self.num_control_c += 1
|
||||||
if self.num_control_c >= 2:
|
if self.num_control_c >= 2:
|
||||||
break
|
break
|
||||||
self.io.tool_error("^C again to quit")
|
self.io.tool_error("^C again or /exit to quit")
|
||||||
except EOFError:
|
except EOFError:
|
||||||
return
|
return
|
||||||
|
|
||||||
|
|
|
@ -1,8 +1,8 @@
|
||||||
|
import sys
|
||||||
import os
|
import os
|
||||||
import git
|
import git
|
||||||
import subprocess
|
import subprocess
|
||||||
import shlex
|
import shlex
|
||||||
from rich.prompt import Confirm
|
|
||||||
from prompt_toolkit.completion import Completion
|
from prompt_toolkit.completion import Completion
|
||||||
from aider import prompts
|
from aider import prompts
|
||||||
|
|
||||||
|
@ -16,20 +16,8 @@ class Commands:
|
||||||
if inp[0] == "/":
|
if inp[0] == "/":
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def help(self):
|
|
||||||
"Show help about all commands"
|
|
||||||
commands = self.get_commands()
|
|
||||||
for cmd in commands:
|
|
||||||
cmd_method_name = f"cmd_{cmd[1:]}"
|
|
||||||
cmd_method = getattr(self, cmd_method_name, None)
|
|
||||||
if cmd_method:
|
|
||||||
description = cmd_method.__doc__
|
|
||||||
self.io.tool(f"{cmd} {description}")
|
|
||||||
else:
|
|
||||||
self.io.tool(f"{cmd} No description available.")
|
|
||||||
|
|
||||||
def get_commands(self):
|
def get_commands(self):
|
||||||
commands = ["/help"]
|
commands = []
|
||||||
for attr in dir(self):
|
for attr in dir(self):
|
||||||
if attr.startswith("cmd_"):
|
if attr.startswith("cmd_"):
|
||||||
commands.append("/" + attr[4:])
|
commands.append("/" + attr[4:])
|
||||||
|
@ -62,15 +50,15 @@ class Commands:
|
||||||
all_commands = self.get_commands()
|
all_commands = self.get_commands()
|
||||||
matching_commands = [cmd for cmd in all_commands if cmd.startswith(first_word)]
|
matching_commands = [cmd for cmd in all_commands if cmd.startswith(first_word)]
|
||||||
if len(matching_commands) == 1:
|
if len(matching_commands) == 1:
|
||||||
if matching_commands[0] == "/help":
|
|
||||||
self.help()
|
|
||||||
else:
|
|
||||||
return self.do_run(matching_commands[0][1:], rest_inp)
|
return self.do_run(matching_commands[0][1:], rest_inp)
|
||||||
elif len(matching_commands) > 1:
|
elif len(matching_commands) > 1:
|
||||||
self.io.tool_error("Ambiguous command: ', '.join(matching_commands)}")
|
self.io.tool_error(f"Ambiguous command: {', '.join(matching_commands)}")
|
||||||
else:
|
else:
|
||||||
self.io.tool_error(f"Error: {first_word} is not a valid command.")
|
self.io.tool_error(f"Error: {first_word} is not a valid command.")
|
||||||
|
|
||||||
|
# any method called cmd_xxx becomes a command automatically.
|
||||||
|
# each one must take an args param.
|
||||||
|
|
||||||
def cmd_commit(self, args):
|
def cmd_commit(self, args):
|
||||||
"Commit edits to the repo made outside the chat (commit message optional)"
|
"Commit edits to the repo made outside the chat (commit message optional)"
|
||||||
|
|
||||||
|
@ -251,6 +239,10 @@ class Commands:
|
||||||
)
|
)
|
||||||
return msg
|
return msg
|
||||||
|
|
||||||
|
def cmd_exit(self, args):
|
||||||
|
"Exit the application"
|
||||||
|
sys.exit()
|
||||||
|
|
||||||
def cmd_ls(self, args):
|
def cmd_ls(self, args):
|
||||||
"List all known files and those included in the chat session"
|
"List all known files and those included in the chat session"
|
||||||
|
|
||||||
|
@ -274,3 +266,15 @@ class Commands:
|
||||||
self.io.tool("\nRepo files not in the chat:\n")
|
self.io.tool("\nRepo files not in the chat:\n")
|
||||||
for file in other_files:
|
for file in other_files:
|
||||||
self.io.tool(f" {file}")
|
self.io.tool(f" {file}")
|
||||||
|
|
||||||
|
def cmd_help(self, args):
|
||||||
|
"Show help about all commands"
|
||||||
|
commands = sorted(self.get_commands())
|
||||||
|
for cmd in commands:
|
||||||
|
cmd_method_name = f"cmd_{cmd[1:]}"
|
||||||
|
cmd_method = getattr(self, cmd_method_name, None)
|
||||||
|
if cmd_method:
|
||||||
|
description = cmd_method.__doc__
|
||||||
|
self.io.tool(f"{cmd} {description}")
|
||||||
|
else:
|
||||||
|
self.io.tool(f"{cmd} No description available.")
|
||||||
|
|
109
aider/main.py
109
aider/main.py
|
@ -1,47 +1,85 @@
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
import argparse
|
import git
|
||||||
|
import configargparse
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from aider.coder import Coder
|
from aider.coder import Coder
|
||||||
from aider.io import InputOutput
|
from aider.io import InputOutput
|
||||||
|
|
||||||
|
|
||||||
|
def get_git_root():
|
||||||
|
try:
|
||||||
|
repo = git.Repo(search_parent_directories=True)
|
||||||
|
return repo.working_tree_dir
|
||||||
|
except git.InvalidGitRepositoryError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def main(args=None, input=None, output=None):
|
def main(args=None, input=None, output=None):
|
||||||
if args is None:
|
if args is None:
|
||||||
args = sys.argv[1:]
|
args = sys.argv[1:]
|
||||||
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
env_prefix = "AIDER_"
|
env_prefix = "AIDER_"
|
||||||
parser = argparse.ArgumentParser(description="aider - chat with GPT about your code")
|
|
||||||
|
default_config_files = [
|
||||||
|
os.path.expanduser("~/.aider.conf.yml"),
|
||||||
|
]
|
||||||
|
git_root = get_git_root()
|
||||||
|
if git_root:
|
||||||
|
default_config_files.insert(0, os.path.join(git_root, ".aider.conf.yml"))
|
||||||
|
|
||||||
|
parser = configargparse.ArgumentParser(
|
||||||
|
description="aider - chat with GPT about your code",
|
||||||
|
add_config_file_help=True,
|
||||||
|
default_config_files=default_config_files,
|
||||||
|
config_file_parser_class=configargparse.YAMLConfigFileParser,
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"-c",
|
||||||
|
"--config",
|
||||||
|
is_config_file=True,
|
||||||
|
metavar="CONFIG_FILE",
|
||||||
|
help=(
|
||||||
|
"Specify the config file (default: search for .aider.conf.yml in git root or home"
|
||||||
|
" directory)"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"files",
|
"files",
|
||||||
metavar="FILE",
|
metavar="FILE",
|
||||||
nargs="*",
|
nargs="*",
|
||||||
help="a list of source code files (optional)",
|
help="a list of source code files (optional)",
|
||||||
)
|
)
|
||||||
|
default_input_history_file = (
|
||||||
|
os.path.join(git_root, ".aider.input.history") if git_root else ".aider.input.history"
|
||||||
|
)
|
||||||
|
default_chat_history_file = (
|
||||||
|
os.path.join(git_root, ".aider.chat.history.md") if git_root else ".aider.chat.history.md"
|
||||||
|
)
|
||||||
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--input-history-file",
|
"--input-history-file",
|
||||||
metavar="INPUT_HISTORY_FILE",
|
metavar="INPUT_HISTORY_FILE",
|
||||||
default=os.environ.get(f"{env_prefix}INPUT_HISTORY_FILE", ".aider.input.history"),
|
env_var=f"{env_prefix}INPUT_HISTORY_FILE",
|
||||||
help=(
|
default=default_input_history_file,
|
||||||
"Specify the chat input history file (default: .aider.input.history,"
|
help=f"Specify the chat input history file (default: {default_input_history_file})",
|
||||||
f" ${env_prefix}INPUT_HISTORY_FILE)"
|
|
||||||
),
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--chat-history-file",
|
"--chat-history-file",
|
||||||
metavar="CHAT_HISTORY_FILE",
|
metavar="CHAT_HISTORY_FILE",
|
||||||
default=os.environ.get(f"{env_prefix}CHAT_HISTORY_FILE", ".aider.chat.history.md"),
|
env_var=f"{env_prefix}CHAT_HISTORY_FILE",
|
||||||
help=(
|
default=default_chat_history_file,
|
||||||
"Specify the chat history file (default: .aider.chat.history.md,"
|
help=f"Specify the chat history file (default: {default_chat_history_file})",
|
||||||
f" ${env_prefix}CHAT_HISTORY_FILE)"
|
|
||||||
),
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--model",
|
"--model",
|
||||||
metavar="MODEL",
|
metavar="MODEL",
|
||||||
default=os.environ.get(f"{env_prefix}MODEL", "gpt-4"),
|
env_var=f"{env_prefix}MODEL",
|
||||||
help=f"Specify the model to use for the main chat (default: gpt-4, ${env_prefix}MODEL)",
|
default="gpt-4",
|
||||||
|
help="Specify the model to use for the main chat (default: gpt-4)",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"-3",
|
"-3",
|
||||||
|
@ -50,24 +88,38 @@ def main(args=None, input=None, output=None):
|
||||||
const="gpt-3.5-turbo",
|
const="gpt-3.5-turbo",
|
||||||
help="Use gpt-3.5-turbo model for the main chat (not advised)",
|
help="Use gpt-3.5-turbo model for the main chat (not advised)",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--pretty",
|
||||||
|
action="store_true",
|
||||||
|
env_var=f"{env_prefix}PRETTY",
|
||||||
|
default=True,
|
||||||
|
help="Enable pretty, colorized output (default: True)",
|
||||||
|
)
|
||||||
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--no-pretty",
|
"--no-pretty",
|
||||||
action="store_false",
|
action="store_false",
|
||||||
dest="pretty",
|
dest="pretty",
|
||||||
help=f"Disable pretty, colorized output (${env_prefix}PRETTY)",
|
help="Disable pretty, colorized output",
|
||||||
default=bool(int(os.environ.get(f"{env_prefix}PRETTY", 1))),
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--apply",
|
"--apply",
|
||||||
metavar="FILE",
|
metavar="FILE",
|
||||||
help="Apply the changes from the given file instead of running the chat (debug)",
|
help="Apply the changes from the given file instead of running the chat (debug)",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--auto-commits",
|
||||||
|
action="store_true",
|
||||||
|
env_var=f"{env_prefix}AUTO_COMMIT",
|
||||||
|
default=True,
|
||||||
|
help="Enable auto commit of changes (default: True)",
|
||||||
|
)
|
||||||
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--no-auto-commits",
|
"--no-auto-commits",
|
||||||
action="store_false",
|
action="store_false",
|
||||||
dest="auto_commits",
|
dest="auto_commit",
|
||||||
help=f"Disable auto commit of changes (${env_prefix}AUTO_COMMITS)",
|
help="Disable auto commit of changes",
|
||||||
default=bool(int(os.environ.get(f"{env_prefix}AUTO_COMMITS", 1))),
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--dry-run",
|
"--dry-run",
|
||||||
|
@ -78,17 +130,21 @@ def main(args=None, input=None, output=None):
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--show-diffs",
|
"--show-diffs",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help=f"Show diffs when committing changes (default: False, ${env_prefix}SHOW_DIFFS)",
|
env_var=f"{env_prefix}SHOW_DIFFS",
|
||||||
default=bool(int(os.environ.get(f"{env_prefix}SHOW_DIFFS", 0))),
|
help="Show diffs when committing changes (default: False)",
|
||||||
|
default=False,
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--ctags",
|
"--ctags",
|
||||||
action="store_true",
|
type=lambda x: (str(x).lower() == "true"),
|
||||||
|
nargs="?",
|
||||||
|
const=True,
|
||||||
|
default=None,
|
||||||
|
env_var=f"{env_prefix}CTAGS",
|
||||||
help=(
|
help=(
|
||||||
"Add ctags to the chat to help GPT understand the codebase (default: False,"
|
"Add ctags to the chat to help GPT understand the codebase (default: check for ctags"
|
||||||
f" ${env_prefix}CTAGS)"
|
" executable)"
|
||||||
),
|
),
|
||||||
default=bool(int(os.environ.get(f"{env_prefix}CTAGS", 0))),
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--yes",
|
"--yes",
|
||||||
|
@ -97,7 +153,8 @@ def main(args=None, input=None, output=None):
|
||||||
default=False,
|
default=False,
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"-v", "--verbose",
|
"-v",
|
||||||
|
"--verbose",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Enable verbose output",
|
help="Enable verbose output",
|
||||||
default=False,
|
default=False,
|
||||||
|
|
|
@ -8,14 +8,12 @@ Take requests for changes to the supplied code.
|
||||||
If the request is ambiguous, ask questions.
|
If the request is ambiguous, ask questions.
|
||||||
|
|
||||||
Once you understand the request you MUST:
|
Once you understand the request you MUST:
|
||||||
1. List the files you need to modify.
|
1. List the files you need to modify. If they are *read-only* ask the user to make them *read-write* using the file's full path name.
|
||||||
2. Think step-by-step and explain the needed changes.
|
2. Think step-by-step and explain the needed changes.
|
||||||
3. Describe each change with an *edit block* per the example below.
|
3. Describe each change with an *edit block* per the example below.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
system_reminder = """Base any edits off the files shown in the user's last msg.
|
system_reminder = """You MUST format EVERY code change with an *edit block* like this:
|
||||||
|
|
||||||
You MUST format EVERY code change with an *edit block* like this:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
some/dir/example.py
|
some/dir/example.py
|
||||||
|
@ -29,11 +27,11 @@ some/dir/example.py
|
||||||
def add(a,b):
|
def add(a,b):
|
||||||
>>>>>>> UPDATED
|
>>>>>>> UPDATED
|
||||||
|
|
||||||
|
Every *edit block* must be fenced w/triple backticks with the correct code language.
|
||||||
|
Every *edit block* must start with the full path! *NEVER* propose edit blocks for *read-only* files.
|
||||||
The ORIGINAL section must be an *exact* set of lines from the file:
|
The ORIGINAL section must be an *exact* set of lines from the file:
|
||||||
- NEVER SKIP LINES!
|
- NEVER SKIP LINES!
|
||||||
- Include all original leading spaces and indentation!
|
- Include all original leading spaces and indentation!
|
||||||
Every *edit block* must be fenced w/triple backticks with the correct code language.
|
|
||||||
Every *edit block* must start with the full path!
|
|
||||||
|
|
||||||
Edits to different parts of a file each need their own *edit block*.
|
Edits to different parts of a file each need their own *edit block*.
|
||||||
|
|
||||||
|
@ -54,11 +52,13 @@ files_content_gpt_no_edits = "I didn't see any properly formatted edits in your
|
||||||
|
|
||||||
files_content_local_edits = "I edited the files myself."
|
files_content_local_edits = "I edited the files myself."
|
||||||
|
|
||||||
files_content_prefix = "Propose changes to *only* these files (ask before editing others):\n"
|
files_content_prefix = "These are the *read-write* files:\n"
|
||||||
|
|
||||||
|
files_no_full_files = "I am not sharing any *read-write* files yet."
|
||||||
|
|
||||||
repo_content_prefix = (
|
repo_content_prefix = (
|
||||||
"Here is a map of all the {other}files{ctags_msg}. You *must* ask with the"
|
"All the files below here are *read-only* files. Notice that files in directories are indented."
|
||||||
" full path before editing these:\n\n"
|
" Use their parent dirs to build their full path.\n"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -3,6 +3,7 @@ import json
|
||||||
import sys
|
import sys
|
||||||
import subprocess
|
import subprocess
|
||||||
import tiktoken
|
import tiktoken
|
||||||
|
import tempfile
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
|
|
||||||
from aider import prompts, utils
|
from aider import prompts, utils
|
||||||
|
@ -48,14 +49,20 @@ def fname_to_components(fname, with_colon):
|
||||||
|
|
||||||
|
|
||||||
class RepoMap:
|
class RepoMap:
|
||||||
def __init__(self, use_ctags=True, root=None, main_model="gpt-4"):
|
ctags_cmd = ["ctags", "--fields=+S", "--extras=-F", "--output-format=json"]
|
||||||
|
|
||||||
|
def __init__(self, use_ctags=None, root=None, main_model="gpt-4"):
|
||||||
if not root:
|
if not root:
|
||||||
root = os.getcwd()
|
root = os.getcwd()
|
||||||
|
|
||||||
self.use_ctags = use_ctags
|
|
||||||
self.tokenizer = tiktoken.encoding_for_model(main_model)
|
|
||||||
self.root = root
|
self.root = root
|
||||||
|
|
||||||
|
if use_ctags is None:
|
||||||
|
self.use_ctags = self.check_for_ctags()
|
||||||
|
else:
|
||||||
|
self.use_ctags = use_ctags
|
||||||
|
|
||||||
|
self.tokenizer = tiktoken.encoding_for_model(main_model)
|
||||||
|
|
||||||
def get_repo_map(self, chat_files, other_files):
|
def get_repo_map(self, chat_files, other_files):
|
||||||
res = self.choose_files_listing(other_files)
|
res = self.choose_files_listing(other_files)
|
||||||
if not res:
|
if not res:
|
||||||
|
@ -123,7 +130,7 @@ class RepoMap:
|
||||||
|
|
||||||
def split_path(self, path):
|
def split_path(self, path):
|
||||||
path = os.path.relpath(path, self.root)
|
path = os.path.relpath(path, self.root)
|
||||||
return fname_to_components(path, True)
|
return [path + ":"]
|
||||||
|
|
||||||
def run_ctags(self, filename):
|
def run_ctags(self, filename):
|
||||||
# Check if the file is in the cache and if the modification time has not changed
|
# Check if the file is in the cache and if the modification time has not changed
|
||||||
|
@ -132,7 +139,7 @@ class RepoMap:
|
||||||
if cache_key in TAGS_CACHE and TAGS_CACHE[cache_key]["mtime"] == file_mtime:
|
if cache_key in TAGS_CACHE and TAGS_CACHE[cache_key]["mtime"] == file_mtime:
|
||||||
return TAGS_CACHE[cache_key]["data"]
|
return TAGS_CACHE[cache_key]["data"]
|
||||||
|
|
||||||
cmd = ["ctags", "--fields=+S", "--extras=-F", "--output-format=json", filename]
|
cmd = self.ctags_cmd + [filename]
|
||||||
output = subprocess.check_output(cmd).decode("utf-8")
|
output = subprocess.check_output(cmd).decode("utf-8")
|
||||||
output = output.splitlines()
|
output = output.splitlines()
|
||||||
|
|
||||||
|
@ -169,6 +176,17 @@ class RepoMap:
|
||||||
|
|
||||||
return tags
|
return tags
|
||||||
|
|
||||||
|
def check_for_ctags(self):
|
||||||
|
try:
|
||||||
|
with tempfile.TemporaryDirectory() as tempdir:
|
||||||
|
hello_py = os.path.join(tempdir, "hello.py")
|
||||||
|
with open(hello_py, "w") as f:
|
||||||
|
f.write("def hello():\n print('Hello, world!')\n")
|
||||||
|
self.get_tags(hello_py)
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
def find_py_files(directory):
|
def find_py_files(directory):
|
||||||
if not os.path.isdir(directory):
|
if not os.path.isdir(directory):
|
||||||
|
@ -197,6 +215,7 @@ def call_map():
|
||||||
"""
|
"""
|
||||||
|
|
||||||
rm = RepoMap()
|
rm = RepoMap()
|
||||||
|
|
||||||
# res = rm.get_tags_map(fnames)
|
# res = rm.get_tags_map(fnames)
|
||||||
# print(res)
|
# print(res)
|
||||||
|
|
||||||
|
@ -222,7 +241,7 @@ def call_map():
|
||||||
# dump("ref", fname, ident)
|
# dump("ref", fname, ident)
|
||||||
references[ident].append(show_fname)
|
references[ident].append(show_fname)
|
||||||
|
|
||||||
for ident,fname in defines.items():
|
for ident, fname in defines.items():
|
||||||
dump(fname, ident)
|
dump(fname, ident)
|
||||||
|
|
||||||
idents = set(defines.keys()).intersection(set(references.keys()))
|
idents = set(defines.keys()).intersection(set(references.keys()))
|
||||||
|
@ -256,7 +275,9 @@ def call_map():
|
||||||
ranked = nx.pagerank(G, weight="weight")
|
ranked = nx.pagerank(G, weight="weight")
|
||||||
|
|
||||||
# drop low weight edges for plotting
|
# drop low weight edges for plotting
|
||||||
edges_to_remove = [(node1, node2) for node1, node2, data in G.edges(data=True) if data['weight'] < 1]
|
edges_to_remove = [
|
||||||
|
(node1, node2) for node1, node2, data in G.edges(data=True) if data["weight"] < 1
|
||||||
|
]
|
||||||
G.remove_edges_from(edges_to_remove)
|
G.remove_edges_from(edges_to_remove)
|
||||||
# Remove isolated nodes (nodes with no edges)
|
# Remove isolated nodes (nodes with no edges)
|
||||||
dump(G.nodes())
|
dump(G.nodes())
|
||||||
|
@ -272,8 +293,8 @@ def call_map():
|
||||||
dot.node(fname, penwidth=str(pen))
|
dot.node(fname, penwidth=str(pen))
|
||||||
|
|
||||||
max_w = max(edges.values())
|
max_w = max(edges.values())
|
||||||
for refs,defs,data in G.edges(data=True):
|
for refs, defs, data in G.edges(data=True):
|
||||||
weight = data['weight']
|
weight = data["weight"]
|
||||||
|
|
||||||
r = random.randint(0, 255)
|
r = random.randint(0, 255)
|
||||||
g = random.randint(0, 255)
|
g = random.randint(0, 255)
|
||||||
|
@ -286,7 +307,7 @@ def call_map():
|
||||||
print()
|
print()
|
||||||
print(name)
|
print(name)
|
||||||
for ident in sorted(labels[name]):
|
for ident in sorted(labels[name]):
|
||||||
print('\t', ident)
|
print("\t", ident)
|
||||||
# print(f"{refs} -{weight}-> {defs}")
|
# print(f"{refs} -{weight}-> {defs}")
|
||||||
|
|
||||||
top_rank = sorted([(rank, node) for (node, rank) in ranked.items()], reverse=True)
|
top_rank = sorted([(rank, node) for (node, rank) in ranked.items()], reverse=True)
|
||||||
|
@ -296,5 +317,6 @@ def call_map():
|
||||||
|
|
||||||
dot.render("tmp", format="pdf", view=True)
|
dot.render("tmp", format="pdf", view=True)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
call_map()
|
call_map()
|
||||||
|
|
BIN
assets/robot-flowchart.png
Normal file
BIN
assets/robot-flowchart.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 700 KiB |
|
@ -1,5 +1,7 @@
|
||||||
|
|
||||||
# Improving GPT-4's codebase understanding with ctags
|
# Improving GPT-4's codebase understanding with a map
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
GPT-4 is extremely useful for "self-contained" coding tasks,
|
GPT-4 is extremely useful for "self-contained" coding tasks,
|
||||||
like generating brand new code or modifying a pure function
|
like generating brand new code or modifying a pure function
|
||||||
|
@ -8,7 +10,7 @@ that has no dependencies.
|
||||||
But it's difficult to use GPT-4 to modify or extend
|
But it's difficult to use GPT-4 to modify or extend
|
||||||
a large, complex pre-existing codebase.
|
a large, complex pre-existing codebase.
|
||||||
To modify such code, GPT needs to understand the dependencies and APIs
|
To modify such code, GPT needs to understand the dependencies and APIs
|
||||||
which interconnect all of its subsystems.
|
which interconnect its subsystems.
|
||||||
Somehow we need to provide this "code context" to GPT
|
Somehow we need to provide this "code context" to GPT
|
||||||
when we ask it to accomplish a coding task. Specifically, we need to:
|
when we ask it to accomplish a coding task. Specifically, we need to:
|
||||||
|
|
||||||
|
@ -22,8 +24,11 @@ To address these issues, `aider` now
|
||||||
sends GPT a **concise map of your whole git repository**
|
sends GPT a **concise map of your whole git repository**
|
||||||
that includes
|
that includes
|
||||||
all declared variables and functions with call signatures.
|
all declared variables and functions with call signatures.
|
||||||
This *repo map* is built using `ctags`
|
This *repo map* is built automatically using `ctags`, which
|
||||||
and enables GPT to better comprehend, navigate
|
extracts symbol definitions from source files. Historically,
|
||||||
|
ctags were generated and indexed by IDEs and editors to
|
||||||
|
help humans search and navigate large codebases.
|
||||||
|
Instead, we're going to use ctags to help GPT better comprehend, navigate
|
||||||
and edit code in larger repos.
|
and edit code in larger repos.
|
||||||
|
|
||||||
To get a sense of how effective this can be, this
|
To get a sense of how effective this can be, this
|
||||||
|
@ -35,6 +40,12 @@ Using only the meta-data in the repo map, GPT is able to figure out how to
|
||||||
call the method to be tested, as well as how to instantiate multiple
|
call the method to be tested, as well as how to instantiate multiple
|
||||||
class objects that are required to prepare for the test.
|
class objects that are required to prepare for the test.
|
||||||
|
|
||||||
|
To code with GPT-4 using the techniques discussed here:
|
||||||
|
|
||||||
|
|
||||||
|
- Install [aider](https://github.com/paul-gauthier/aider#installation).
|
||||||
|
- Install [universal ctags](https://github.com/universal-ctags/ctags).
|
||||||
|
- Run `aider --ctags` inside your repo.
|
||||||
|
|
||||||
## The problem: code context
|
## The problem: code context
|
||||||
|
|
||||||
|
@ -63,7 +74,7 @@ For the example above, you could send the file that
|
||||||
contains the Foo class
|
contains the Foo class
|
||||||
and the file that contains the BarLog logging subsystem.
|
and the file that contains the BarLog logging subsystem.
|
||||||
This works pretty well, and is supported by `aider` -- you
|
This works pretty well, and is supported by `aider` -- you
|
||||||
can manually specify which files to "add to the chat".
|
can manually specify which files to "add to the chat" you are having with GPT.
|
||||||
|
|
||||||
But it's not ideal to have to manually identify the right
|
But it's not ideal to have to manually identify the right
|
||||||
set of files to add to the chat.
|
set of files to add to the chat.
|
||||||
|
@ -139,7 +150,14 @@ map. Universal ctags can scan source code written in many
|
||||||
languages, and extract data about all the symbols defined in each
|
languages, and extract data about all the symbols defined in each
|
||||||
file.
|
file.
|
||||||
|
|
||||||
For example, here is the `ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
|
Historically, ctags were generated and indexed by IDEs or code editors
|
||||||
|
to make it easier for a human to search and navigate a
|
||||||
|
codebase, find the implementation of functions, etc.
|
||||||
|
Instead, we're going to use ctags to help GPT navigate and understand the codebase.
|
||||||
|
|
||||||
|
Here is the type of output you get when you run ctags on source code. Specifically,
|
||||||
|
this is the
|
||||||
|
`ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
@ -160,8 +178,8 @@ For example, here is the `ctags --fields=+S --output-format=json` output for the
|
||||||
```
|
```
|
||||||
|
|
||||||
The repo map is built using this type of `ctags` data,
|
The repo map is built using this type of `ctags` data,
|
||||||
formatted into the space
|
but formatted into the space
|
||||||
efficient hierarchical tree format shown above.
|
efficient hierarchical tree format shown earlier.
|
||||||
This is a format that GPT can easily understand
|
This is a format that GPT can easily understand
|
||||||
and which conveys the map data using a
|
and which conveys the map data using a
|
||||||
minimal number of tokens.
|
minimal number of tokens.
|
||||||
|
@ -198,7 +216,7 @@ Some possible approaches to reducing the amount of map data are:
|
||||||
One key goal is to prefer solutions which are language agnostic or
|
One key goal is to prefer solutions which are language agnostic or
|
||||||
which can be easily deployed against most popular code languages.
|
which can be easily deployed against most popular code languages.
|
||||||
The `ctags` solution has this benefit, since it comes pre-built
|
The `ctags` solution has this benefit, since it comes pre-built
|
||||||
with tooling for most popular languages.
|
with support for most popular languages.
|
||||||
I suspect that Language Server Protocol might be an even
|
I suspect that Language Server Protocol might be an even
|
||||||
better tool than `ctags` for this problem.
|
better tool than `ctags` for this problem.
|
||||||
But it is more cumbersome to deploy for a broad
|
But it is more cumbersome to deploy for a broad
|
||||||
|
@ -212,5 +230,5 @@ To use this experimental repo map feature:
|
||||||
|
|
||||||
- Install [aider](https://github.com/paul-gauthier/aider#installation).
|
- Install [aider](https://github.com/paul-gauthier/aider#installation).
|
||||||
- Install [universal ctags](https://github.com/universal-ctags/ctags).
|
- Install [universal ctags](https://github.com/universal-ctags/ctags).
|
||||||
- Run `aider` with the `--ctags` option inside your repo.
|
- Run `aider --ctags` inside your repo.
|
||||||
|
|
|
@ -24,3 +24,5 @@ wcwidth==0.2.6
|
||||||
yarl==1.9.2
|
yarl==1.9.2
|
||||||
pytest==7.3.1
|
pytest==7.3.1
|
||||||
tiktoken==0.4.0
|
tiktoken==0.4.0
|
||||||
|
configargparse
|
||||||
|
PyYAML
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue