Merge branch 'main' into call-graph

This commit is contained in:
Paul Gauthier 2023-05-26 17:07:17 -07:00
commit 1e1feeaa21
9 changed files with 222 additions and 92 deletions

View file

@ -33,10 +33,10 @@ You can find more chat transcripts on the [examples page](https://aider.chat/exa
* `aider` will apply the edits suggested by GPT-4 directly to your source files.
* `aider` will automatically commit each changeset to your local git repo with a descriptive commit message. These frequent, automatic commits provide a safety net. It's easy to undo `aider` changes or use standard git workflows to manage longer sequences of changes.
* `aider` can review multiple source files at once and make coordinated code changes across all of them in a single changeset/commit.
* `aider` gives GPT a
* `aider` can give GPT a
[map of your entire git repo](https://aider.chat/docs/ctags.html),
so it can ask for permission to review whichever files seem relevant to your requests.
* You can also edit the files using your editor while chatting with `aider`.
which helps it understand and modify large codebases.
* You can edit the files by hand using your editor while chatting with `aider`.
* `aider` will notice if you edit the files outside the chat.
* It will help you commit these out-of-band changes, if you'd like.
* It will bring the updated file contents into the chat.
@ -49,8 +49,11 @@ so it can ask for permission to review whichever files seem relevant to your req
1. Install the package:
* From GitHub: `pip install git+https://github.com/paul-gauthier/aider.git`
* From your local copy of the repo in develop mode to pick up local edits immediately: `pip install -e .`
2. Set up your OpenAI API key as an environment variable `OPENAI_API_KEY` or by including it in a `.env` file.
3. Optionally, install [universal ctags](https://github.com/universal-ctags/ctags). This is helpful if you plan to work with repositories with more than a handful of files. This allows `aider --ctags` to build a [map of your entire git repo](https://aider.chat/docs/ctags.html) and share it with GPT to help it better understand and modify large codebases.
## Usage
Run the `aider` tool by executing the following command:
@ -65,19 +68,40 @@ You can also just launch `aider` anywhere in a git repo without naming files on
It will discover all the files in the repo.
You can then add and remove individual files in the chat session with the `/add` and `/drop` chat commands described below.
You can also use additional command-line options to customize the behavior of the tool. The following options are available, along with their corresponding environment variable overrides:
You can also use additional command-line options, environment variables or configuration file
to set many options:
- `--input-history-file INPUT_HISTORY_FILE`: Specify the chat input history file (default: .aider.input.history). Override the default with the environment variable `AIDER_INPUT_HISTORY_FILE`.
- `--chat-history-file CHAT_HISTORY_FILE`: Specify the chat history file (default: .aider.chat.history.md). Override the default with the environment variable `AIDER_CHAT_HISTORY_FILE`.
- `--model MODEL`: Specify the model to use for the main chat (default: gpt-4). Override the default with the environment variable `AIDER_MODEL`.
- `-3`: Use gpt-3.5-turbo model for the main chat (not advised). No environment variable override.
- `--ctags`: Add ctags to the chat to help GPT understand the codebase (default: False, `AIDER_CTAGS`). Requires [universal ctags](https://github.com/universal-ctags/ctags). Override the default with the environment variable `AIDER_CTAGS`.
- `--no-pretty`: Disable pretty, colorized output. Override the default with the environment variable `AIDER_PRETTY` (default: 1 for enabled, 0 for disabled).
- `--no-auto-commits`: Disable auto commit of changes. Override the default with the environment variable `AIDER_AUTO_COMMITS` (default: 1 for enabled, 0 for disabled).
- `--show-diffs`: Show diffs when committing changes (default: False). Override the default with the environment variable `AIDER_SHOW_DIFFS` (default: 0 for False, 1 for True).
- `--yes`: Always say yes to every confirmation (default: False).
For more information, run `aider --help`.
```
-h, --help show this help message and exit
-c CONFIG_FILE, --config CONFIG_FILE
Specify the config file (default: search for
.aider.conf.yml in git root or home directory)
--input-history-file INPUT_HISTORY_FILE
Specify the chat input history file (default:
.aider.input.history) [env var: AIDER_INPUT_HISTORY_FILE]
--chat-history-file CHAT_HISTORY_FILE
Specify the chat history file (default:
.aider.chat.history.md) [env var: AIDER_CHAT_HISTORY_FILE]
--model MODEL Specify the model to use for the main chat (default: gpt-4)
[env var: AIDER_MODEL]
-3 Use gpt-3.5-turbo model for the main chat (not advised)
--pretty Enable pretty, colorized output (default: True) [env var:
AIDER_PRETTY]
--no-pretty Disable pretty, colorized output
--apply FILE Apply the changes from the given file instead of running
the chat (debug)
--auto-commits Enable auto commit of changes (default: True) [env var:
AIDER_AUTO_COMMIT]
--no-auto-commits Disable auto commit of changes
--dry-run Perform a dry run without applying changes (default: False)
--show-diffs Show diffs when committing changes (default: False) [env
var: AIDER_SHOW_DIFFS]
--ctags [CTAGS] Add ctags to the chat to help GPT understand the codebase
(default: check for ctags executable) [env var:
AIDER_CTAGS]
--yes Always say yes to every confirmation
-v, --verbose Enable verbose output
```
## Chat commands

View file

@ -168,7 +168,10 @@ class Coder:
if self.abs_fnames:
files_content = prompts.files_content_prefix
files_content += self.get_files_content()
all_content += files_content
else:
files_content = prompts.files_no_full_files
all_content += files_content
other_files = set(self.get_all_abs_files()) - set(self.abs_fnames)
repo_content = self.repo_map.get_repo_map(self.abs_fnames, other_files)
@ -204,7 +207,7 @@ class Coder:
self.num_control_c += 1
if self.num_control_c >= 2:
break
self.io.tool_error("^C again to quit")
self.io.tool_error("^C again or /exit to quit")
except EOFError:
return

View file

@ -1,8 +1,8 @@
import sys
import os
import git
import subprocess
import shlex
from rich.prompt import Confirm
from prompt_toolkit.completion import Completion
from aider import prompts
@ -16,20 +16,8 @@ class Commands:
if inp[0] == "/":
return True
def help(self):
"Show help about all commands"
commands = self.get_commands()
for cmd in commands:
cmd_method_name = f"cmd_{cmd[1:]}"
cmd_method = getattr(self, cmd_method_name, None)
if cmd_method:
description = cmd_method.__doc__
self.io.tool(f"{cmd} {description}")
else:
self.io.tool(f"{cmd} No description available.")
def get_commands(self):
commands = ["/help"]
commands = []
for attr in dir(self):
if attr.startswith("cmd_"):
commands.append("/" + attr[4:])
@ -62,15 +50,15 @@ class Commands:
all_commands = self.get_commands()
matching_commands = [cmd for cmd in all_commands if cmd.startswith(first_word)]
if len(matching_commands) == 1:
if matching_commands[0] == "/help":
self.help()
else:
return self.do_run(matching_commands[0][1:], rest_inp)
return self.do_run(matching_commands[0][1:], rest_inp)
elif len(matching_commands) > 1:
self.io.tool_error("Ambiguous command: ', '.join(matching_commands)}")
self.io.tool_error(f"Ambiguous command: {', '.join(matching_commands)}")
else:
self.io.tool_error(f"Error: {first_word} is not a valid command.")
# any method called cmd_xxx becomes a command automatically.
# each one must take an args param.
def cmd_commit(self, args):
"Commit edits to the repo made outside the chat (commit message optional)"
@ -251,6 +239,10 @@ class Commands:
)
return msg
def cmd_exit(self, args):
"Exit the application"
sys.exit()
def cmd_ls(self, args):
"List all known files and those included in the chat session"
@ -274,3 +266,15 @@ class Commands:
self.io.tool("\nRepo files not in the chat:\n")
for file in other_files:
self.io.tool(f" {file}")
def cmd_help(self, args):
"Show help about all commands"
commands = sorted(self.get_commands())
for cmd in commands:
cmd_method_name = f"cmd_{cmd[1:]}"
cmd_method = getattr(self, cmd_method_name, None)
if cmd_method:
description = cmd_method.__doc__
self.io.tool(f"{cmd} {description}")
else:
self.io.tool(f"{cmd} No description available.")

View file

@ -1,47 +1,85 @@
import os
import sys
import argparse
import git
import configargparse
from dotenv import load_dotenv
from aider.coder import Coder
from aider.io import InputOutput
def get_git_root():
try:
repo = git.Repo(search_parent_directories=True)
return repo.working_tree_dir
except git.InvalidGitRepositoryError:
return None
def main(args=None, input=None, output=None):
if args is None:
args = sys.argv[1:]
load_dotenv()
env_prefix = "AIDER_"
parser = argparse.ArgumentParser(description="aider - chat with GPT about your code")
default_config_files = [
os.path.expanduser("~/.aider.conf.yml"),
]
git_root = get_git_root()
if git_root:
default_config_files.insert(0, os.path.join(git_root, ".aider.conf.yml"))
parser = configargparse.ArgumentParser(
description="aider - chat with GPT about your code",
add_config_file_help=True,
default_config_files=default_config_files,
config_file_parser_class=configargparse.YAMLConfigFileParser,
)
parser.add_argument(
"-c",
"--config",
is_config_file=True,
metavar="CONFIG_FILE",
help=(
"Specify the config file (default: search for .aider.conf.yml in git root or home"
" directory)"
),
)
parser.add_argument(
"files",
metavar="FILE",
nargs="*",
help="a list of source code files (optional)",
)
default_input_history_file = (
os.path.join(git_root, ".aider.input.history") if git_root else ".aider.input.history"
)
default_chat_history_file = (
os.path.join(git_root, ".aider.chat.history.md") if git_root else ".aider.chat.history.md"
)
parser.add_argument(
"--input-history-file",
metavar="INPUT_HISTORY_FILE",
default=os.environ.get(f"{env_prefix}INPUT_HISTORY_FILE", ".aider.input.history"),
help=(
"Specify the chat input history file (default: .aider.input.history,"
f" ${env_prefix}INPUT_HISTORY_FILE)"
),
env_var=f"{env_prefix}INPUT_HISTORY_FILE",
default=default_input_history_file,
help=f"Specify the chat input history file (default: {default_input_history_file})",
)
parser.add_argument(
"--chat-history-file",
metavar="CHAT_HISTORY_FILE",
default=os.environ.get(f"{env_prefix}CHAT_HISTORY_FILE", ".aider.chat.history.md"),
help=(
"Specify the chat history file (default: .aider.chat.history.md,"
f" ${env_prefix}CHAT_HISTORY_FILE)"
),
env_var=f"{env_prefix}CHAT_HISTORY_FILE",
default=default_chat_history_file,
help=f"Specify the chat history file (default: {default_chat_history_file})",
)
parser.add_argument(
"--model",
metavar="MODEL",
default=os.environ.get(f"{env_prefix}MODEL", "gpt-4"),
help=f"Specify the model to use for the main chat (default: gpt-4, ${env_prefix}MODEL)",
env_var=f"{env_prefix}MODEL",
default="gpt-4",
help="Specify the model to use for the main chat (default: gpt-4)",
)
parser.add_argument(
"-3",
@ -50,24 +88,38 @@ def main(args=None, input=None, output=None):
const="gpt-3.5-turbo",
help="Use gpt-3.5-turbo model for the main chat (not advised)",
)
parser.add_argument(
"--pretty",
action="store_true",
env_var=f"{env_prefix}PRETTY",
default=True,
help="Enable pretty, colorized output (default: True)",
)
parser.add_argument(
"--no-pretty",
action="store_false",
dest="pretty",
help=f"Disable pretty, colorized output (${env_prefix}PRETTY)",
default=bool(int(os.environ.get(f"{env_prefix}PRETTY", 1))),
help="Disable pretty, colorized output",
)
parser.add_argument(
"--apply",
metavar="FILE",
help="Apply the changes from the given file instead of running the chat (debug)",
)
parser.add_argument(
"--auto-commits",
action="store_true",
env_var=f"{env_prefix}AUTO_COMMIT",
default=True,
help="Enable auto commit of changes (default: True)",
)
parser.add_argument(
"--no-auto-commits",
action="store_false",
dest="auto_commits",
help=f"Disable auto commit of changes (${env_prefix}AUTO_COMMITS)",
default=bool(int(os.environ.get(f"{env_prefix}AUTO_COMMITS", 1))),
dest="auto_commit",
help="Disable auto commit of changes",
)
parser.add_argument(
"--dry-run",
@ -78,17 +130,21 @@ def main(args=None, input=None, output=None):
parser.add_argument(
"--show-diffs",
action="store_true",
help=f"Show diffs when committing changes (default: False, ${env_prefix}SHOW_DIFFS)",
default=bool(int(os.environ.get(f"{env_prefix}SHOW_DIFFS", 0))),
env_var=f"{env_prefix}SHOW_DIFFS",
help="Show diffs when committing changes (default: False)",
default=False,
)
parser.add_argument(
"--ctags",
action="store_true",
type=lambda x: (str(x).lower() == "true"),
nargs="?",
const=True,
default=None,
env_var=f"{env_prefix}CTAGS",
help=(
"Add ctags to the chat to help GPT understand the codebase (default: False,"
f" ${env_prefix}CTAGS)"
"Add ctags to the chat to help GPT understand the codebase (default: check for ctags"
" executable)"
),
default=bool(int(os.environ.get(f"{env_prefix}CTAGS", 0))),
)
parser.add_argument(
"--yes",
@ -97,7 +153,8 @@ def main(args=None, input=None, output=None):
default=False,
)
parser.add_argument(
"-v", "--verbose",
"-v",
"--verbose",
action="store_true",
help="Enable verbose output",
default=False,

View file

@ -8,14 +8,12 @@ Take requests for changes to the supplied code.
If the request is ambiguous, ask questions.
Once you understand the request you MUST:
1. List the files you need to modify.
1. List the files you need to modify. If they are *read-only* ask the user to make them *read-write* using the file's full path name.
2. Think step-by-step and explain the needed changes.
3. Describe each change with an *edit block* per the example below.
"""
system_reminder = """Base any edits off the files shown in the user's last msg.
You MUST format EVERY code change with an *edit block* like this:
system_reminder = """You MUST format EVERY code change with an *edit block* like this:
```python
some/dir/example.py
@ -29,11 +27,11 @@ some/dir/example.py
def add(a,b):
>>>>>>> UPDATED
Every *edit block* must be fenced w/triple backticks with the correct code language.
Every *edit block* must start with the full path! *NEVER* propose edit blocks for *read-only* files.
The ORIGINAL section must be an *exact* set of lines from the file:
- NEVER SKIP LINES!
- Include all original leading spaces and indentation!
Every *edit block* must be fenced w/triple backticks with the correct code language.
Every *edit block* must start with the full path!
Edits to different parts of a file each need their own *edit block*.
@ -54,11 +52,13 @@ files_content_gpt_no_edits = "I didn't see any properly formatted edits in your
files_content_local_edits = "I edited the files myself."
files_content_prefix = "Propose changes to *only* these files (ask before editing others):\n"
files_content_prefix = "These are the *read-write* files:\n"
files_no_full_files = "I am not sharing any *read-write* files yet."
repo_content_prefix = (
"Here is a map of all the {other}files{ctags_msg}. You *must* ask with the"
" full path before editing these:\n\n"
"All the files below here are *read-only* files. Notice that files in directories are indented."
" Use their parent dirs to build their full path.\n"
)

View file

@ -3,6 +3,7 @@ import json
import sys
import subprocess
import tiktoken
import tempfile
from collections import defaultdict
from aider import prompts, utils
@ -48,14 +49,20 @@ def fname_to_components(fname, with_colon):
class RepoMap:
def __init__(self, use_ctags=True, root=None, main_model="gpt-4"):
ctags_cmd = ["ctags", "--fields=+S", "--extras=-F", "--output-format=json"]
def __init__(self, use_ctags=None, root=None, main_model="gpt-4"):
if not root:
root = os.getcwd()
self.use_ctags = use_ctags
self.tokenizer = tiktoken.encoding_for_model(main_model)
self.root = root
if use_ctags is None:
self.use_ctags = self.check_for_ctags()
else:
self.use_ctags = use_ctags
self.tokenizer = tiktoken.encoding_for_model(main_model)
def get_repo_map(self, chat_files, other_files):
res = self.choose_files_listing(other_files)
if not res:
@ -123,7 +130,7 @@ class RepoMap:
def split_path(self, path):
path = os.path.relpath(path, self.root)
return fname_to_components(path, True)
return [path + ":"]
def run_ctags(self, filename):
# Check if the file is in the cache and if the modification time has not changed
@ -132,7 +139,7 @@ class RepoMap:
if cache_key in TAGS_CACHE and TAGS_CACHE[cache_key]["mtime"] == file_mtime:
return TAGS_CACHE[cache_key]["data"]
cmd = ["ctags", "--fields=+S", "--extras=-F", "--output-format=json", filename]
cmd = self.ctags_cmd + [filename]
output = subprocess.check_output(cmd).decode("utf-8")
output = output.splitlines()
@ -169,6 +176,17 @@ class RepoMap:
return tags
def check_for_ctags(self):
try:
with tempfile.TemporaryDirectory() as tempdir:
hello_py = os.path.join(tempdir, "hello.py")
with open(hello_py, "w") as f:
f.write("def hello():\n print('Hello, world!')\n")
self.get_tags(hello_py)
except Exception:
return False
return True
def find_py_files(directory):
if not os.path.isdir(directory):
@ -197,6 +215,7 @@ def call_map():
"""
rm = RepoMap()
# res = rm.get_tags_map(fnames)
# print(res)
@ -222,7 +241,7 @@ def call_map():
# dump("ref", fname, ident)
references[ident].append(show_fname)
for ident,fname in defines.items():
for ident, fname in defines.items():
dump(fname, ident)
idents = set(defines.keys()).intersection(set(references.keys()))
@ -256,7 +275,9 @@ def call_map():
ranked = nx.pagerank(G, weight="weight")
# drop low weight edges for plotting
edges_to_remove = [(node1, node2) for node1, node2, data in G.edges(data=True) if data['weight'] < 1]
edges_to_remove = [
(node1, node2) for node1, node2, data in G.edges(data=True) if data["weight"] < 1
]
G.remove_edges_from(edges_to_remove)
# Remove isolated nodes (nodes with no edges)
dump(G.nodes())
@ -272,8 +293,8 @@ def call_map():
dot.node(fname, penwidth=str(pen))
max_w = max(edges.values())
for refs,defs,data in G.edges(data=True):
weight = data['weight']
for refs, defs, data in G.edges(data=True):
weight = data["weight"]
r = random.randint(0, 255)
g = random.randint(0, 255)
@ -286,7 +307,7 @@ def call_map():
print()
print(name)
for ident in sorted(labels[name]):
print('\t', ident)
print("\t", ident)
# print(f"{refs} -{weight}-> {defs}")
top_rank = sorted([(rank, node) for (node, rank) in ranked.items()], reverse=True)
@ -296,5 +317,6 @@ def call_map():
dot.render("tmp", format="pdf", view=True)
if __name__ == "__main__":
call_map()

BIN
assets/robot-flowchart.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 700 KiB

View file

@ -1,5 +1,7 @@
# Improving GPT-4's codebase understanding with ctags
# Improving GPT-4's codebase understanding with a map
![robot flowchat](../assets/robot-flowchart.png)
GPT-4 is extremely useful for "self-contained" coding tasks,
like generating brand new code or modifying a pure function
@ -8,7 +10,7 @@ that has no dependencies.
But it's difficult to use GPT-4 to modify or extend
a large, complex pre-existing codebase.
To modify such code, GPT needs to understand the dependencies and APIs
which interconnect all of its subsystems.
which interconnect its subsystems.
Somehow we need to provide this "code context" to GPT
when we ask it to accomplish a coding task. Specifically, we need to:
@ -22,8 +24,11 @@ To address these issues, `aider` now
sends GPT a **concise map of your whole git repository**
that includes
all declared variables and functions with call signatures.
This *repo map* is built using `ctags`
and enables GPT to better comprehend, navigate
This *repo map* is built automatically using `ctags`, which
extracts symbol definitions from source files. Historically,
ctags were generated and indexed by IDEs and editors to
help humans search and navigate large codebases.
Instead, we're going to use ctags to help GPT better comprehend, navigate
and edit code in larger repos.
To get a sense of how effective this can be, this
@ -35,6 +40,12 @@ Using only the meta-data in the repo map, GPT is able to figure out how to
call the method to be tested, as well as how to instantiate multiple
class objects that are required to prepare for the test.
To code with GPT-4 using the techniques discussed here:
- Install [aider](https://github.com/paul-gauthier/aider#installation).
- Install [universal ctags](https://github.com/universal-ctags/ctags).
- Run `aider --ctags` inside your repo.
## The problem: code context
@ -63,7 +74,7 @@ For the example above, you could send the file that
contains the Foo class
and the file that contains the BarLog logging subsystem.
This works pretty well, and is supported by `aider` -- you
can manually specify which files to "add to the chat".
can manually specify which files to "add to the chat" you are having with GPT.
But it's not ideal to have to manually identify the right
set of files to add to the chat.
@ -139,7 +150,14 @@ map. Universal ctags can scan source code written in many
languages, and extract data about all the symbols defined in each
file.
For example, here is the `ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
Historically, ctags were generated and indexed by IDEs or code editors
to make it easier for a human to search and navigate a
codebase, find the implementation of functions, etc.
Instead, we're going to use ctags to help GPT navigate and understand the codebase.
Here is the type of output you get when you run ctags on source code. Specifically,
this is the
`ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
```json
{
@ -160,8 +178,8 @@ For example, here is the `ctags --fields=+S --output-format=json` output for the
```
The repo map is built using this type of `ctags` data,
formatted into the space
efficient hierarchical tree format shown above.
but formatted into the space
efficient hierarchical tree format shown earlier.
This is a format that GPT can easily understand
and which conveys the map data using a
minimal number of tokens.
@ -198,7 +216,7 @@ Some possible approaches to reducing the amount of map data are:
One key goal is to prefer solutions which are language agnostic or
which can be easily deployed against most popular code languages.
The `ctags` solution has this benefit, since it comes pre-built
with tooling for most popular languages.
with support for most popular languages.
I suspect that Language Server Protocol might be an even
better tool than `ctags` for this problem.
But it is more cumbersome to deploy for a broad
@ -212,5 +230,5 @@ To use this experimental repo map feature:
- Install [aider](https://github.com/paul-gauthier/aider#installation).
- Install [universal ctags](https://github.com/universal-ctags/ctags).
- Run `aider` with the `--ctags` option inside your repo.
- Run `aider --ctags` inside your repo.

View file

@ -24,3 +24,5 @@ wcwidth==0.2.6
yarl==1.9.2
pytest==7.3.1
tiktoken==0.4.0
configargparse
PyYAML