mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-29 08:44:59 +00:00
copy
This commit is contained in:
parent
cef990cd98
commit
93e29eda94
1 changed files with 13 additions and 20 deletions
|
@ -5,28 +5,20 @@
|
||||||
|
|
||||||
Aider is an open source command line chat tool that lets you work with GPT to edit
|
Aider is an open source command line chat tool that lets you work with GPT to edit
|
||||||
code in your local git repo.
|
code in your local git repo.
|
||||||
You can use aider to have GPT add features, write tests or make other changes to your code.
|
|
||||||
|
|
||||||
To do this, aider needs to be able to reliably recognize when GPT wants to edit local files,
|
To do this, aider needs to be able to reliably recognize when GPT wants to edit local files,
|
||||||
determine which files to modify and what edits to apply.
|
determine which files it wants to modify and what changes to save.
|
||||||
This direct read/modify/write integration allows
|
Such automated
|
||||||
users to harness GPT's coding skills without
|
code editing hinges on the "edit format" portion of the system prompt, which specifies
|
||||||
needing to repeatedly copy & paste
|
|
||||||
code back and forth between their files and a ChatGPT window.
|
|
||||||
|
|
||||||
Successful automated
|
|
||||||
code editing hinges on the "edit format", which specifies
|
|
||||||
how GPT should structure code edits in its responses.
|
how GPT should structure code edits in its responses.
|
||||||
Aider instructs GPT to use a specific
|
|
||||||
edit format as part of the system prompt.
|
|
||||||
|
|
||||||
Aider currently uses simple text based editing formats, but
|
Aider currently uses simple text based editing formats, but
|
||||||
[OpenAI's new function calling
|
[OpenAI's new function calling
|
||||||
API](https://openai.com/blog/function-calling-and-other-api-updates)
|
API](https://openai.com/blog/function-calling-and-other-api-updates)
|
||||||
looks like a promising way to create more structured edit formats.
|
looks like a promising way to create more structured edit formats.
|
||||||
Before making such a big change, I wanted
|
I wanted
|
||||||
a quantitative way to assess the benefits
|
a quantitative way to assess the potential benefits
|
||||||
of function based editing.
|
of switching aider to function based editing.
|
||||||
|
|
||||||
With this in mind, I developed a
|
With this in mind, I developed a
|
||||||
benchmark based on the [Exercism
|
benchmark based on the [Exercism
|
||||||
|
@ -34,14 +26,15 @@ python](https://github.com/exercism/python) coding exercises.
|
||||||
This
|
This
|
||||||
benchmark evaluates how effectively aider and GPT can translate a
|
benchmark evaluates how effectively aider and GPT can translate a
|
||||||
natural language coding request into executable code saved into
|
natural language coding request into executable code saved into
|
||||||
files that pass unit tests. It's an end-to-end evaluation of not just
|
files that pass unit tests.
|
||||||
|
It provides an end-to-end evaluation of not just
|
||||||
GPT's coding ability, but also its capacity to *edit existing code*
|
GPT's coding ability, but also its capacity to *edit existing code*
|
||||||
and *format those code edits* so that aider can save the
|
and *format those code edits* so that aider can save the
|
||||||
edits to the local source files.
|
edits to the local source files.
|
||||||
|
|
||||||
I ran this code editing benchmark
|
I ran the benchmark
|
||||||
on all the ChatGPT models except `gpt-4-32k`, using a variety of edit formats.
|
on all the ChatGPT models (except `gpt-4-32k`), using a variety of edit formats.
|
||||||
The results were quite interesting:
|
The results were interesting:
|
||||||
|
|
||||||
- Asking GPT to return an updated copy of the whole file in a standard markdown fenced code block proved to be the most reliable and effective edit format across all GPT-3.5 and GPT-4 models. The results for this `whole` edit format are shown in solid blue in the graph.
|
- Asking GPT to return an updated copy of the whole file in a standard markdown fenced code block proved to be the most reliable and effective edit format across all GPT-3.5 and GPT-4 models. The results for this `whole` edit format are shown in solid blue in the graph.
|
||||||
- Using the new functions API for edits performed worse than the above whole file method, for all the models. GPT-3.5 especially produced inferior code and frequently mangled this output format. This was surprising, as the functions API was introduced to enhance the reliability of structured outputs. The results for these `...-func` edit methods are shown as patterned bars in the graph (both green and blue).
|
- Using the new functions API for edits performed worse than the above whole file method, for all the models. GPT-3.5 especially produced inferior code and frequently mangled this output format. This was surprising, as the functions API was introduced to enhance the reliability of structured outputs. The results for these `...-func` edit methods are shown as patterned bars in the graph (both green and blue).
|
||||||
|
@ -62,7 +55,7 @@ Or should they type up a properly escaped and
|
||||||
syntactically correct json data structure
|
syntactically correct json data structure
|
||||||
that contains the text of the new code?
|
that contains the text of the new code?
|
||||||
|
|
||||||
Using more complex output formats with GPT seems to introduce two issues:
|
Using more complex output formats with GPT seems to cause two issues:
|
||||||
|
|
||||||
- It makes GPT write worse code. Keeping the output format simple seems to allow GPT to devote more attention to the actual coding task.
|
- It makes GPT write worse code. Keeping the output format simple seems to allow GPT to devote more attention to the actual coding task.
|
||||||
- It reduces GPT's adherence to the output format, making it more challenging for tools like aider to accurately identify and apply the edits GPT is attempting to make.
|
- It reduces GPT's adherence to the output format, making it more challenging for tools like aider to accurately identify and apply the edits GPT is attempting to make.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue