Commit graph

1688 commits

Author SHA1 Message Date
Paul Gauthier
a3285edc4f aider: Add --openai-organization-id param to set OpenAI organization ID.
# Aider chat conversation:

USER: add --openai-organization-id param that sets OpenAI(organization=...)

ASSISTANT: ```diff
--- aider/main.py
+++ aider/main.py
@@ ... @@
     model_group.add_argument(
         "--openai-api-deployment-id",
         metavar="OPENAI_API_DEPLOYMENT_ID",
         env_var="OPENAI_API_DEPLOYMENT_ID",
         help="Specify the deployment_id",
     )
+    model_group.add_argument(
+        "--openai-organization-id",
+        metavar="OPENAI_ORGANIZATION_ID",
+        env_var="OPENAI_ORGANIZATION_ID",
+        help="Specify the OpenAI organization ID",
+    )
     model_group.add_argument(
         "--openrouter",
         dest="openai_api_base",
         action="store_const",
@@ ... @@
     if args.openai_api_type == "azure":
         client = openai.AzureOpenAI(
             api_key=args.openai_api_key,
             azure_endpoint=args.openai_api_base,
             api_version=args.openai_api_version,
             azure_deployment=args.openai_api_deployment_id,
         )
     else:
         kwargs = dict()
         if args.openai_api_base:
             kwargs["base_url"] = args.openai_api_base
             if "openrouter.ai" in args.openai_api_base:
                 kwargs["default_headers"] = {
                     "HTTP-Referer": "http://aider.chat",
                     "X-Title": "Aider",
                 }
+        if args.openai_organization_id:
+            kwargs["organization"] = args.openai_organization_id

         client = openai.OpenAI(api_key=args.openai_api_key, **kwargs)

@@ ... @@
     if args.verbose:
         show = scrub_sensitive_info(parser.format_values())
         io.tool_output(show)
         io.tool_output("Option settings:")
         for arg, val in sorted(vars(args).items()):
             io.tool_output(f"  - {arg}: {scrub_sensitive_info(str(val))}")
+        if args.openai_organization_id:
+            io.tool_output(f"  - openai_organization_id: {args.openai_organization_id}")
```
This diff adds a new command-line argument `--openai-organization-id` to the `aider/main.py` file, which allows the user to specify the OpenAI organization ID. It also ensures that if this argument is provided, it is passed to the `openai.OpenAI` client initialization as the `organization` parameter. Additionally, it outputs the organization ID if verbose mode is enabled.
2024-02-24 07:11:58 -08:00
Paul Gauthier
69d1107b31 set version to 0.24.2-dev 2024-02-11 05:51:40 -08:00
Paul Gauthier
d26693e636 version bump to 0.24.1 2024-02-11 05:51:09 -08:00
Paul Gauthier
72e6cb901c Fixed bug with cost computations when --no-steam in effect #475 2024-02-11 05:48:32 -08:00
Paul Gauthier
0fa2505ac5 Delete pandoc installer 2024-02-10 08:48:22 -08:00
Paul Gauthier
3d948788dd set version to 0.24.1-dev 2024-02-10 07:35:36 -08:00
Paul Gauthier
08a22b9cf7 version bump to 0.24.0 2024-02-10 07:35:02 -08:00
Paul Gauthier
1e70b3fb05 Updated HISTORY 2024-02-10 07:34:21 -08:00
Paul Gauthier
d376767ca3 Use gpt-3.5-turbo-0125 for commit messages 2024-02-10 07:32:26 -08:00
Paul Gauthier
6ddfc894e7 Updated HISTORY 2024-02-10 07:31:04 -08:00
Paul Gauthier
bdef4308fe Simpler calls to pypandoc 2024-02-08 16:14:17 -08:00
Paul Gauthier
efff174f9a Use download_pandoc, which works everywhere including arm64 2024-02-08 15:56:00 -08:00
Paul Gauthier
2dee76378b keep hrefs 2024-02-08 15:19:00 -08:00
Paul Gauthier
05c74202ec copy 2024-02-08 14:20:49 -08:00
Paul Gauthier
cc36329691 output with pandoc, cleanup with bs and re 2024-02-08 14:14:42 -08:00
Paul Gauthier
be60b785a4 aider: Remove all data: URLs from slimdown_html.
# Aider chat conversation:

USER: in `slimdown_html` remove all data: urls

ASSISTANT: ```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def slimdown_html(soup):
     # Remove all <img> tags
     for img in soup.find_all('img'):
         img.decompose()
+    # Remove all elements with data: URLs
+    for tag in soup.find_all(href=lambda x: x and x.startswith('data:')):
+        tag.decompose()
+    for tag in soup.find_all(src=lambda x: x and x.startswith('data:')):
+        tag.decompose()
     # Remove all per-element CSS styles
     for tag in soup.find_all(True):
         tag.attrs.pop('style', None)
     # Remove all internal anchor elements
     for anchor in soup.find_all('a', href=True):
         if anchor['href'].startswith('#'):
             anchor.decompose()
     return soup
```
2024-02-08 12:12:03 -08:00
Paul Gauthier
84ae51c005 Add functionality to write scraped content to a temporary file and optimize HTML parsing. 2024-02-08 12:11:59 -08:00
Paul Gauthier
33a62c8dc6 aider: finish slimdown_html. remove all <img> tags and all per-element css-styles and all internal anchor elements
# Aider chat conversation:

USER: finish `slimdown_html`. remove all <img> tags and all per-element css-styles and all internal anchor elements

ASSISTANT: ```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
-def slimdown_html(page_source: str) -> str:
-    soup = BeautifulSoup(page_source, "html.parser")
-    # ...
+def slimdown_html(page_source: str) -> str:
+    soup = BeautifulSoup(page_source, "html.parser")
+    # Remove all <img> tags
+    for img in soup.find_all('img'):
+        img.decompose()
+    # Remove all per-element CSS styles
+    for tag in soup.find_all(True):
+        tag.attrs.pop('style', None)
+    # Remove all internal anchor elements
+    for anchor in soup.find_all('a', href=True):
+        if anchor['href'].startswith('#'):
+            anchor.decompose()
+    return str(soup)
```
2024-02-08 12:10:12 -08:00
Paul Gauthier
1520bb976a Change html_to_text to html_to_markdown and add slimdown_html function. 2024-02-08 12:10:10 -08:00
Paul Gauthier
1509e21698 aider: Implemented html_to_markdown using pypandoc.
# Aider chat conversation:

USER: use pypandoc to implement `html_to_markdown`

ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will:

1. Add an import for `pypandoc`.
2. Implement the `html_to_markdown` function using `pypandoc.convert_text`.

Here are the diffs for those changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 import sys
+import pypandoc
 import httpx
 from bs4 import BeautifulSoup
 from playwright.sync_api import sync_playwright
@@ ... @@
 def html_to_markdown(page_source: str) -> str:
-    pass
+    return pypandoc.convert_text(page_source, 'md', format='html')
```

Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.
2024-02-08 12:02:45 -08:00
Paul Gauthier
75e7f2bd4e Add html_to_markdown function to convert HTML to Markdown. 2024-02-08 12:02:42 -08:00
Paul Gauthier
681f26d010 Print playwright instructions after the content is displayed, so they are not lost 2024-02-08 12:01:18 -08:00
Paul Gauthier
e42005a78e plumbed through to /web 2024-02-08 11:47:28 -08:00
Paul Gauthier
5b78d929a6 Roughly working scraper 2024-02-08 11:44:55 -08:00
Paul Gauthier
9bf3a6e0c6 aider: Implemented HTTP request using httpx library in scrape_with_httpx method.
# Aider chat conversation:

USER: finish `scrape_with_httpx`

ASSISTANT: To finish the `scrape_with_httpx` method, I will implement the HTTP request using the `httpx` library. This will involve making a GET request to the provided URL and returning the content of the response.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
     def scrape_with_httpx(self, url):
-        pass
+        import httpx
+        headers = {
+            'User-Agent': aider_user_agent
+        }
+        try:
+            with httpx.Client(headers=headers) as client:
+                response = client.get(url)
+                response.raise_for_status()
+                return response.text
+        except httpx.HTTPError as http_err:
+            self.print_error(f'HTTP error occurred: {http_err}')
+        except Exception as err:
+            self.print_error(f'An error occurred: {err}')
+        return None
```

This implementation uses `httpx.Client` to send a GET request with a custom User-Agent header. It handles any HTTP errors by printing them out and returns `None` in case of an error. If the request is successful, it returns the text content of the response.
2024-02-08 11:29:46 -08:00
Paul Gauthier
e04187d1ad Refactor scraper to use Playwright for web scraping and handle user agent string. 2024-02-08 11:29:44 -08:00
Paul Gauthier
f10a9d2e7b aider: Retrieve the user agent from the browser context after creating a new page in the scrape_with_playwright function.
# Aider chat conversation:

USER: determine the user_agent

ASSISTANT: To determine the `user_agent` in the `scrape_with_playwright` function, we can use the `browser.new_context()` method which allows us to specify a `user_agent` string. We will set a common user agent string for a desktop browser.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright()
^C KeyboardInterrupt

USER: I don't want to SET it. I want to determine what it is

ASSISTANT: To determine the `user_agent` being used by the browser in the `scrape_with_playwright` function, we can retrieve it from the browser context after creating a new page. Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright() as p:
         try:
             browser = p.chromium.launch()
         except Exception as e:
             print(repr(e))
             return
+        page = browser.new_page()
+        user_agent = page.evaluate("navigator.userAgent")
+        print(f"User Agent: {user_agent}")
-        page = browser.new_page()
         page.goto(url)
         content = page.content()
         browser.close()

     return content
```
2024-02-08 11:02:54 -08:00
Paul Gauthier
65111591d7 Add scraper module for data extraction. 2024-02-08 11:02:52 -08:00
Paul Gauthier
1758937042 wip 2024-02-08 07:18:39 -08:00
Paul Gauthier
ef12107bd7 aider: Add command /web to use headless selenium to scrape a webpage and add the content to the chat.
# Aider chat conversation:

USER: Add a command /web to use headless selenium to scrape a webpage and add the content to the chat

ASSISTANT: To implement the `/web` command, we will need to add a new method to the `Commands` class in the `aider/commands.py` file. This method will use Selenium with a headless browser to scrape the content of a given webpage and add the content to the chat.

Here are the diffs for the changes:

```diff
--- aider/commands.py
+++ aider/commands.py
@@ ... @@
 from pathlib import Path
+from selenium import webdriver
+from selenium.webdriver.chrome.options import Options
+from selenium.webdriver.chrome.service import Service
+from selenium.webdriver.common.by import By
+from webdriver_manager.chrome import ChromeDriverManager
 import re
 import subprocess
@@ ... @@
 class Commands:
     voice = None
+    web_driver = None

     def __init__(self, io, coder, voice_language=None):
         self.io = io
@@ ... @@
         self.voice_language = voice_language
         self.tokenizer = coder.main_model.tokenizer
+        self.initialize_web_driver()

+    def initialize_web_driver(self):
+        chrome_options = Options()
+        chrome_options.add_argument("--headless")
+        chrome_options.add_argument("--disable-gpu")
+        chrome_options.add_argument("--no-sandbox")
+        chrome_options.add_argument("--disable-dev-shm-usage")
+        self.web_driver = webdriver.Chrome(
+            service=Service(ChromeDriverManager().install()),
+            options=chrome_options
+        )
+
+    def cmd_web(self, args):
+        "Use headless selenium to scrape a webpage and add the content to the chat"
+        url = args.strip()
+        if not url:
+            self.io.tool_error("Please provide a URL to scrape.")
+            return
+
+        try:
+            self.web_driver.get(url)
+            page_content = self.web_driver.find_element(By.TAG_NAME, "body").text
+            self.io.tool_output(f"Content from {url}:\n{page_content}")
+            return page_content
+        except Exception as e:
+            self.io.tool_error(f"Error scraping {url}: {e}")
+
     def is_command(self, inp):
         return inp[0] in "/!"
@@ ... @@
```

This diff adds the necessary imports for Selenium and the WebDriver manager, initializes a headless Chrome WebDriver when the `Commands` class is instantiated, and adds a new command method `cmd_web` that takes a URL as an argument, scrapes the webpage, and outputs the content to the chat.
2024-02-07 14:22:02 -08:00
Paul Gauthier
8d9c2ba60b Added links to openai docs/pricing 2024-02-07 14:17:24 -08:00
Paul Gauthier
49f81ac784 ssh 2024-02-07 12:13:32 -08:00
Paul Gauthier
35f812859e Rationalized openai model attributes #458 2024-02-07 12:07:30 -08:00
Paul Gauthier
873a388693 Fixed the ! shortcut for /run 2024-02-06 16:14:27 -08:00
Paul Gauthier
926e393529 set version to 0.23.1-dev 2024-02-03 08:40:50 -08:00
Paul Gauthier
b0fd819241 version bump to 0.23.0 2024-02-03 08:40:17 -08:00
Paul Gauthier
174702babe Updated gpt-4 turbo model switches 2024-02-03 08:39:19 -08:00
Paul Gauthier
e2e2ee83a4 Added support for gpt-4-0125-preview 2024-02-03 08:27:31 -08:00
Paul Gauthier
1317ee1ace Added --openrouter 2024-01-31 17:33:50 -08:00
Paul Gauthier
e79732dcf7 Added env_vars for all the OPENAI_ env vars 2024-01-31 15:26:53 -08:00
Paul Gauthier
065bc63e33 Merge remote-tracking branch 'origin/main' 2024-01-23 13:07:56 -08:00
Paul Gauthier
0c314144d2 Added /quit #451 2024-01-23 13:07:24 -08:00
paul-gauthier
247cab56f0
Merge pull request #453 from zackees/patch-1
fix https://github.com/paul-gauthier/aider/issues/452
2024-01-23 13:05:15 -08:00
Paul Gauthier
c31f2313ba updated docs 2024-01-23 12:58:29 -08:00
Paul Gauthier
3e96fda71a catch shutdown errors 2024-01-23 10:45:58 -08:00
Paul Gauthier
af4f224c00 cleanup 2024-01-23 10:02:07 -08:00
Paul Gauthier
580c52bd85 set mdargs on init 2024-01-23 09:58:57 -08:00
Paul Gauthier
da131da427 aider: Add __del__ method to MarkdownStream to call self.live.stop().
# Aider chat conversation:

USER: make sire self.live.stop() is called no matter how the object gets destroyed

ASSISTANT: To ensure that `self.live.stop()` is called no matter how the object gets destroyed, we can use Python's built-in `__del__` method. This method is called when an object is about to be destroyed. We can override this method in our `MarkdownStream` class to stop the `Live` instance.

Here are the steps to make this change:

1. Add a `__del__` method to the `MarkdownStream` class that calls `self.live.stop()`.

Here is the *SEARCH/REPLACE* block:

aider/mdstream.py
<source>python
<<<<<<< SEARCH
class MarkdownStream:
    live = None
    when = 0
    min_delay = 0.050
    live_window = 6

    def __init__(self):
        self.printed = []

        self.live = Live(Text(''), refresh_per_second=1./self.min_delay)
        self.live.start()
=======
class MarkdownStream:
    live = None
    when = 0
    min_delay = 0.050
    live_window = 6

    def __init__(self):
        self.printed = []

        self.live = Live(Text(''), refresh_per_second=1./self.min_delay)
        self.live.start()

    def __del__(self):
        self.live.stop()
>>>>>>> REPLACE
</source>
2024-01-23 09:50:27 -08:00
Paul Gauthier
24f1e01177 Refactored MarkdownStream to use rich.live for real-time updates and improved text rendering. 2024-01-23 09:50:23 -08:00
Paul Gauthier
b143bc56ac switch to mdstream 2024-01-23 09:20:31 -08:00