mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-31 09:44:59 +00:00
copy
This commit is contained in:
parent
6b6935e56d
commit
dfb921c509
2 changed files with 6 additions and 4 deletions
|
@ -196,7 +196,7 @@ def show_stats(dirnames):
|
||||||
arrowprops={"arrowstyle": "->", "connectionstyle": "arc3,rad=0.3"},
|
arrowprops={"arrowstyle": "->", "connectionstyle": "arc3,rad=0.3"},
|
||||||
)
|
)
|
||||||
ax.annotate(
|
ax.annotate(
|
||||||
"Second attempt,\nincluding\nunit test errors",
|
"Second attempt,\nincluding unit\ntest error output",
|
||||||
xy=(2.55, 56),
|
xy=(2.55, 56),
|
||||||
xytext=(3.5, top),
|
xytext=(3.5, top),
|
||||||
horizontalalignment="center",
|
horizontalalignment="center",
|
||||||
|
|
|
@ -35,13 +35,15 @@ With that in mind, I've been benchmarking the new models.
|
||||||
## gpt-4-1106-preview
|
## gpt-4-1106-preview
|
||||||
|
|
||||||
- The new `gpt-4-1106-preview` model seems **much faster** than the earlier GPT-4 models! I won't be able to properly quantify this until the rate limits loosen up. Currently I am seeing 10X faster responses.
|
- The new `gpt-4-1106-preview` model seems **much faster** than the earlier GPT-4 models! I won't be able to properly quantify this until the rate limits loosen up. Currently I am seeing 10X faster responses.
|
||||||
- **It is better at producing correct code on the first try**. It gets ~59% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
|
- **It is better at producing correct code on the first try**. It gets ~60% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
|
||||||
- The new model seems to perform similarly to the old models after being given a chance to correct bugs by reviewing test suite error output.
|
- The new model seems to perform somewhat better (69%) than the old models (63-64%) after being given a chance to correct bugs by reviewing test suite error output.
|
||||||
|
|
||||||
**These results are preliminiary.**
|
**These results are preliminiary.**
|
||||||
OpenAI is enforcing very low
|
OpenAI is enforcing very low
|
||||||
rate limits on the new GPT-4 model. The limits are so low, that
|
rate limits on the new GPT-4 model. The limits are so low, that
|
||||||
I have only been able to attempt 56 out of 133 exercism problems.
|
I have only been able to attempt
|
||||||
|
58
|
||||||
|
out of 133 exercism problems.
|
||||||
They are randomly chosen, so results should be *roughly*
|
They are randomly chosen, so results should be *roughly*
|
||||||
indicative of the full benchmark.
|
indicative of the full benchmark.
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue