mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-29 08:44:59 +00:00
copy
This commit is contained in:
parent
6e82c5c7a5
commit
96d60de1b5
1 changed files with 8 additions and 7 deletions
|
@ -1,12 +1,17 @@
|
||||||
|
|
||||||
# GPT code editing benchmarks
|
# GPT code editing benchmarks
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
Aider is an open source command line chat tool that lets you ask GPT for features, changes and
|
Aider is an open source command line chat tool that lets you ask GPT for features, changes and
|
||||||
improvements to code in your local git repos.
|
improvements to code in your local git repos.
|
||||||
I spend a lot of time trying to make aider better at this sort of chat driven AI code editing,
|
I spend a lot of time trying to make aider better at this sort of chat driven AI code editing,
|
||||||
so that user chat requests are more likely to result in effective changes to their codebase.
|
so that user chat requests are more likely to result in effective changes to their codebase.
|
||||||
|
|
||||||
Improving code editing involves tweaking and experimenting with
|
Having a reliable way for GPT to read/modify/write source files is critical to
|
||||||
|
using GPT to edit code within an existing codebase.
|
||||||
|
Making GPT code editing reliable often
|
||||||
|
involves tweaking and experimenting with
|
||||||
the "edit format" that aider uses.
|
the "edit format" that aider uses.
|
||||||
The edit format specifies how GPT should format code edits in its replies,
|
The edit format specifies how GPT should format code edits in its replies,
|
||||||
and can range from simply "return an updated copy of the whole file" to
|
and can range from simply "return an updated copy of the whole file" to
|
||||||
|
@ -14,15 +19,12 @@ and can range from simply "return an updated copy of the whole file" to
|
||||||
[function calling API](https://openai.com/blog/function-calling-and-other-api-updates)
|
[function calling API](https://openai.com/blog/function-calling-and-other-api-updates)
|
||||||
to specify a bunch of specific diffs".
|
to specify a bunch of specific diffs".
|
||||||
|
|
||||||
Having a reliable way for GPT to read/modify/write source files is critical to
|
|
||||||
using GPT to edit code within an existing codebase.
|
|
||||||
|
|
||||||
To measure the impact of changes to the edit format,
|
To measure the impact of changes to the edit format,
|
||||||
I created a code editing benchmark based on the
|
I created a code editing benchmark based on the
|
||||||
[Exercism python](https://github.com/exercism/python)
|
[Exercism python](https://github.com/exercism/python)
|
||||||
coding exercises.
|
coding exercises.
|
||||||
The benchmark measures how well aider & GPT can turn
|
The benchmark measures how well aider & GPT can turn
|
||||||
a human coding request into
|
a natural language coding request into
|
||||||
actual runnable code saved into files that passes unit tests.
|
actual runnable code saved into files that passes unit tests.
|
||||||
This is an end-to-end assessment
|
This is an end-to-end assessment
|
||||||
of not just how well GPT can write code, but also how well it
|
of not just how well GPT can write code, but also how well it
|
||||||
|
@ -31,7 +33,7 @@ can *edit existing code* and
|
||||||
so that aider can save the edits to the
|
so that aider can save the edits to the
|
||||||
local source files.
|
local source files.
|
||||||
|
|
||||||
I ran the benchmark
|
I ran the code editing benchmark
|
||||||
on almost all the ChatGPT models, using a variety of edit formats.
|
on almost all the ChatGPT models, using a variety of edit formats.
|
||||||
This produced some interesting observations:
|
This produced some interesting observations:
|
||||||
|
|
||||||
|
@ -52,7 +54,6 @@ when working with source code.
|
||||||
|
|
||||||
More details on the benchmark, edit formats and results are discussed below.
|
More details on the benchmark, edit formats and results are discussed below.
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## The benchmark
|
## The benchmark
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue