Commit graph

1520 commits

Author SHA1 Message Date
Paul Gauthier
1520bb976a Change html_to_text to html_to_markdown and add slimdown_html function. 2024-02-08 12:10:10 -08:00
Paul Gauthier
1509e21698 aider: Implemented html_to_markdown using pypandoc.
# Aider chat conversation:

USER: use pypandoc to implement `html_to_markdown`

ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will:

1. Add an import for `pypandoc`.
2. Implement the `html_to_markdown` function using `pypandoc.convert_text`.

Here are the diffs for those changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 import sys
+import pypandoc
 import httpx
 from bs4 import BeautifulSoup
 from playwright.sync_api import sync_playwright
@@ ... @@
 def html_to_markdown(page_source: str) -> str:
-    pass
+    return pypandoc.convert_text(page_source, 'md', format='html')
```

Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.
2024-02-08 12:02:45 -08:00
Paul Gauthier
75e7f2bd4e Add html_to_markdown function to convert HTML to Markdown. 2024-02-08 12:02:42 -08:00
Paul Gauthier
681f26d010 Print playwright instructions after the content is displayed, so they are not lost 2024-02-08 12:01:18 -08:00
Paul Gauthier
e42005a78e plumbed through to /web 2024-02-08 11:47:28 -08:00
Paul Gauthier
5b78d929a6 Roughly working scraper 2024-02-08 11:44:55 -08:00
Paul Gauthier
9bf3a6e0c6 aider: Implemented HTTP request using httpx library in scrape_with_httpx method.
# Aider chat conversation:

USER: finish `scrape_with_httpx`

ASSISTANT: To finish the `scrape_with_httpx` method, I will implement the HTTP request using the `httpx` library. This will involve making a GET request to the provided URL and returning the content of the response.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
     def scrape_with_httpx(self, url):
-        pass
+        import httpx
+        headers = {
+            'User-Agent': aider_user_agent
+        }
+        try:
+            with httpx.Client(headers=headers) as client:
+                response = client.get(url)
+                response.raise_for_status()
+                return response.text
+        except httpx.HTTPError as http_err:
+            self.print_error(f'HTTP error occurred: {http_err}')
+        except Exception as err:
+            self.print_error(f'An error occurred: {err}')
+        return None
```

This implementation uses `httpx.Client` to send a GET request with a custom User-Agent header. It handles any HTTP errors by printing them out and returns `None` in case of an error. If the request is successful, it returns the text content of the response.
2024-02-08 11:29:46 -08:00
Paul Gauthier
e04187d1ad Refactor scraper to use Playwright for web scraping and handle user agent string. 2024-02-08 11:29:44 -08:00
Paul Gauthier
f10a9d2e7b aider: Retrieve the user agent from the browser context after creating a new page in the scrape_with_playwright function.
# Aider chat conversation:

USER: determine the user_agent

ASSISTANT: To determine the `user_agent` in the `scrape_with_playwright` function, we can use the `browser.new_context()` method which allows us to specify a `user_agent` string. We will set a common user agent string for a desktop browser.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright()
^C KeyboardInterrupt

USER: I don't want to SET it. I want to determine what it is

ASSISTANT: To determine the `user_agent` being used by the browser in the `scrape_with_playwright` function, we can retrieve it from the browser context after creating a new page. Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright() as p:
         try:
             browser = p.chromium.launch()
         except Exception as e:
             print(repr(e))
             return
+        page = browser.new_page()
+        user_agent = page.evaluate("navigator.userAgent")
+        print(f"User Agent: {user_agent}")
-        page = browser.new_page()
         page.goto(url)
         content = page.content()
         browser.close()

     return content
```
2024-02-08 11:02:54 -08:00
Paul Gauthier
65111591d7 Add scraper module for data extraction. 2024-02-08 11:02:52 -08:00
Paul Gauthier
1758937042 wip 2024-02-08 07:18:39 -08:00
Paul Gauthier
ef12107bd7 aider: Add command /web to use headless selenium to scrape a webpage and add the content to the chat.
# Aider chat conversation:

USER: Add a command /web to use headless selenium to scrape a webpage and add the content to the chat

ASSISTANT: To implement the `/web` command, we will need to add a new method to the `Commands` class in the `aider/commands.py` file. This method will use Selenium with a headless browser to scrape the content of a given webpage and add the content to the chat.

Here are the diffs for the changes:

```diff
--- aider/commands.py
+++ aider/commands.py
@@ ... @@
 from pathlib import Path
+from selenium import webdriver
+from selenium.webdriver.chrome.options import Options
+from selenium.webdriver.chrome.service import Service
+from selenium.webdriver.common.by import By
+from webdriver_manager.chrome import ChromeDriverManager
 import re
 import subprocess
@@ ... @@
 class Commands:
     voice = None
+    web_driver = None

     def __init__(self, io, coder, voice_language=None):
         self.io = io
@@ ... @@
         self.voice_language = voice_language
         self.tokenizer = coder.main_model.tokenizer
+        self.initialize_web_driver()

+    def initialize_web_driver(self):
+        chrome_options = Options()
+        chrome_options.add_argument("--headless")
+        chrome_options.add_argument("--disable-gpu")
+        chrome_options.add_argument("--no-sandbox")
+        chrome_options.add_argument("--disable-dev-shm-usage")
+        self.web_driver = webdriver.Chrome(
+            service=Service(ChromeDriverManager().install()),
+            options=chrome_options
+        )
+
+    def cmd_web(self, args):
+        "Use headless selenium to scrape a webpage and add the content to the chat"
+        url = args.strip()
+        if not url:
+            self.io.tool_error("Please provide a URL to scrape.")
+            return
+
+        try:
+            self.web_driver.get(url)
+            page_content = self.web_driver.find_element(By.TAG_NAME, "body").text
+            self.io.tool_output(f"Content from {url}:\n{page_content}")
+            return page_content
+        except Exception as e:
+            self.io.tool_error(f"Error scraping {url}: {e}")
+
     def is_command(self, inp):
         return inp[0] in "/!"
@@ ... @@
```

This diff adds the necessary imports for Selenium and the WebDriver manager, initializes a headless Chrome WebDriver when the `Commands` class is instantiated, and adds a new command method `cmd_web` that takes a URL as an argument, scrapes the webpage, and outputs the content to the chat.
2024-02-07 14:22:02 -08:00
Paul Gauthier
8d9c2ba60b Added links to openai docs/pricing 2024-02-07 14:17:24 -08:00
Paul Gauthier
49f81ac784 ssh 2024-02-07 12:13:32 -08:00
Paul Gauthier
35f812859e Rationalized openai model attributes #458 2024-02-07 12:07:30 -08:00
Paul Gauthier
873a388693 Fixed the ! shortcut for /run 2024-02-06 16:14:27 -08:00
Paul Gauthier
926e393529 set version to 0.23.1-dev 2024-02-03 08:40:50 -08:00
Paul Gauthier
b0fd819241 version bump to 0.23.0 2024-02-03 08:40:17 -08:00
Paul Gauthier
174702babe Updated gpt-4 turbo model switches 2024-02-03 08:39:19 -08:00
Paul Gauthier
e2e2ee83a4 Added support for gpt-4-0125-preview 2024-02-03 08:27:31 -08:00
Paul Gauthier
1317ee1ace Added --openrouter 2024-01-31 17:33:50 -08:00
Paul Gauthier
e79732dcf7 Added env_vars for all the OPENAI_ env vars 2024-01-31 15:26:53 -08:00
Paul Gauthier
065bc63e33 Merge remote-tracking branch 'origin/main' 2024-01-23 13:07:56 -08:00
Paul Gauthier
0c314144d2 Added /quit #451 2024-01-23 13:07:24 -08:00
paul-gauthier
247cab56f0
Merge pull request #453 from zackees/patch-1
fix https://github.com/paul-gauthier/aider/issues/452
2024-01-23 13:05:15 -08:00
Paul Gauthier
c31f2313ba updated docs 2024-01-23 12:58:29 -08:00
Paul Gauthier
3e96fda71a catch shutdown errors 2024-01-23 10:45:58 -08:00
Paul Gauthier
af4f224c00 cleanup 2024-01-23 10:02:07 -08:00
Paul Gauthier
580c52bd85 set mdargs on init 2024-01-23 09:58:57 -08:00
Paul Gauthier
da131da427 aider: Add __del__ method to MarkdownStream to call self.live.stop().
# Aider chat conversation:

USER: make sire self.live.stop() is called no matter how the object gets destroyed

ASSISTANT: To ensure that `self.live.stop()` is called no matter how the object gets destroyed, we can use Python's built-in `__del__` method. This method is called when an object is about to be destroyed. We can override this method in our `MarkdownStream` class to stop the `Live` instance.

Here are the steps to make this change:

1. Add a `__del__` method to the `MarkdownStream` class that calls `self.live.stop()`.

Here is the *SEARCH/REPLACE* block:

aider/mdstream.py
<source>python
<<<<<<< SEARCH
class MarkdownStream:
    live = None
    when = 0
    min_delay = 0.050
    live_window = 6

    def __init__(self):
        self.printed = []

        self.live = Live(Text(''), refresh_per_second=1./self.min_delay)
        self.live.start()
=======
class MarkdownStream:
    live = None
    when = 0
    min_delay = 0.050
    live_window = 6

    def __init__(self):
        self.printed = []

        self.live = Live(Text(''), refresh_per_second=1./self.min_delay)
        self.live.start()

    def __del__(self):
        self.live.stop()
>>>>>>> REPLACE
</source>
2024-01-23 09:50:27 -08:00
Paul Gauthier
24f1e01177 Refactored MarkdownStream to use rich.live for real-time updates and improved text rendering. 2024-01-23 09:50:23 -08:00
Paul Gauthier
b143bc56ac switch to mdstream 2024-01-23 09:20:31 -08:00
Paul Gauthier
e50a0e8b09 noop 2024-01-23 08:55:28 -08:00
Paul Gauthier
85383079f6 cleanup 2024-01-23 08:55:08 -08:00
Paul Gauthier
e6a4edf2e2 aider: Mark _text as noqa 501.
# Aider chat conversation:

USER: mark _text as noqa 501

ASSISTANT: To mark `_text` as `noqa 501`, we need to modify `aider/mdstream.py`:

1. Add `# noqa: 501` at the end of the `_text` variable declaration.

Here is the *SEARCH/REPLACE* block:

aider/mdstream.py
<source>python
<<<<<<< SEARCH
_text = 5 * _text
=======
_text = 5 * _text  # noqa: 501
>>>>>>> REPLACE
</source>
2024-01-23 08:53:34 -08:00
Paul Gauthier
c051104dc0 Add MarkdownStream class for displaying markdown content in a stream. 2024-01-23 08:53:29 -08:00
Zachary Vorhies
c6ced90e51
fix https://github.com/paul-gauthier/aider/issues/452
Adds option to skip the check for an update, better supports tooling which wants to do it's own update mechanism and not have aider print it out unconditionally.
2024-01-22 18:52:06 -08:00
Paul Gauthier
009c5c3c0d set version to 0.22.1-dev 2024-01-22 17:52:04 -08:00
Paul Gauthier
28b69f381c version bump to 0.22.0 2024-01-22 17:51:33 -08:00
Paul Gauthier
4e3ace6199 Added ! as alias for /run #334 2024-01-22 13:52:59 -08:00
Paul Gauthier
96d60624cb added /test cmd #335 2024-01-22 13:50:21 -08:00
Paul Gauthier
06973d85a3 When autocompleting /add and /drop quote the filename if needed #440 2024-01-22 13:28:56 -08:00
Paul Gauthier
23cc78f36b typo 2024-01-22 13:21:16 -08:00
Paul Gauthier
a8b854fe8d fix f-string 2024-01-22 13:10:15 -08:00
Paul Gauthier
877dda2f3d On /undo, ask GPT to wait before retrying the changes #450 2024-01-22 13:05:38 -08:00
Paul Gauthier
61e68ffe9d set version to 0.21.2-dev 2024-01-10 07:50:45 -08:00
Paul Gauthier
0d11880a4a version bump to 0.21.1 2024-01-10 07:50:15 -08:00
Paul Gauthier
6a77657d95 added --4turbo and --4 aliases 2024-01-10 07:48:17 -08:00
Paul Gauthier
29a578c621 Complete the search for viable hunks all the way down to 0 space context lines #435 2024-01-09 08:34:33 -08:00
Paul Gauthier
e0e454fe82 set version to 0.21.1-dev 2024-01-08 10:44:34 -08:00