improved token limit err msgs and docs #678

2025-05-29 00:35:00 +00:00 · 2024-06-14 07:28:33 -07:00 · 2024-06-14 07:28:33 -07:00 · a30e656304
commit a30e656304
parent 0fc6b9beaa
3 changed files with 150 additions and 7 deletions
--- a/website/docs/troubleshooting/token-limits.md
+++ b/website/docs/troubleshooting/token-limits.md
@ -0,0 +1,86 @@
+---
+parent: Troubleshooting
+nav_order: 25
+---
+
+# Token limits
+
+Every LLM has limits on how many tokens it can process:
+
+- The model's **context window** limits how many tokens of
+*input and output* it can process.
+- Each model has limit on how many **output tokens** it can
+produce.
+
+Aider will report an error if a model responds indicating that
+it has exceeded a token limit.
+The error will include suggested actions to try and
+avoid hitting token limits.
+Here's an example error:
+
+```
+Model gpt-3.5-turbo has hit a token limit!
+
+Input tokens: 768 of 16385
+Output tokens: 4096 of 4096 -- exceeded output limit!
+Total tokens: 4864 of 16385
+
+To reduce output tokens:
+- Ask for smaller changes in each request.
+- Break your code into smaller source files.
+- Try using a stronger model like gpt-4o or opus that can return diffs.
+
+For more info: https://aider.chat/docs/token-limits.html
+```
+
+## Input tokens & context window size
+
+The most common problem is trying to send too much data to a 
+model,
+overflowing its context window.
+Technically you can exhaust the context window if the input is
+too large or if the input plus output are too large.
+
+Strong models like GPT-4o and Opus have quite
+large context windows, so this sort of error is
+typically only an issue when working with weaker models.
+
+The easiest solution is to try and reduce the input tokens
+by removing files from the chat.
+It's best to only add the files that aider will need to *edit*
+to complete your request.
+
+- Use `/tokens` to see token usage.
+- Use `/drop` to remove unneeded files from the chat session.
+- Use `/clear` to clear the chat history.
+- Break your code into smaller source files.
+
+## Output token limits
+
+Most models have quite small output limits, often as low
+as 4k tokens.
+If you ask aider to make a large change that affects a lot
+of code, the LLM may hit output token limits
+as it tries to send back all the changes.
+
+To avoid hitting output token limits:
+
+- Ask for smaller changes in each request.
+- Break your code into smaller source files.
+- Try using a stronger model like gpt-4o or opus that can return diffs.
+
+
+## Other causes
+
+Sometimes token limit errors are caused by 
+non-compliant API proxy servers
+or bugs in the API server you are using to host a local model.
+Aider has been well tested when directly connecting to 
+major 
+[LLM provider cloud APIs](https://aider.chat/docs/llms.html).
+For serving local models, 
+[Ollama](https://aider.chat/docs/llms/ollama.html) is known to work well with aider.
+
+Try using aider without an API proxy server
+or directly with one of the recommended cloud APIs
+and see if your token limit problems resolve.