Commit graph

3475 commits

Author SHA1 Message Date
Paul Gauthier
265e71010f Make assets/ reference absolute 2024-02-11 07:58:25 -08:00
Paul Gauthier
ae3373eb89 fixed redirect 2024-02-11 07:57:50 -08:00
Paul Gauthier
e630e51668 better excerpt logic; fixed svg ref 2024-02-11 07:55:44 -08:00
Paul Gauthier
d7476bb303 aider: Update blog index to display post excerpt if available, otherwise use the first 100 words.
# Aider chat conversation:

USER: use the excerpt if one exists, otherwise keep using the first 100 words

ASSISTANT: ```diff
--- blog/index.html
+++ blog/index.html
@@ ... @@
 <div class="entry">
-    {{ post.content | strip_html | truncatewords: 100 }}...
+    {{ post.excerpt | default: post.content | strip_html | truncatewords: 100 }}...
 </div>
```
2024-02-11 07:52:29 -08:00
Paul Gauthier
a3f9f6c5c6 aider: Added an excerpt for speed benchmarks report.
# Aider chat conversation:

USER: add an excerpt

ASSISTANT: ```diff
--- _posts/2023-11-06-benchmarks-speed-1106.md
+++ _posts/2023-11-06-benchmarks-speed-1106.md
@@ ... @@
 ---
 title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
+excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
 redirect_from:
   - /docs/benchmarks-speed-1106.md
 ---
```
2024-02-11 07:51:38 -08:00
Paul Gauthier
81ad2f18fe moved benchmarks-speed-1106 article 2024-02-11 07:50:18 -08:00
Paul Gauthier
2f8109ece1 copy 2024-02-11 06:36:47 -08:00
Paul Gauthier
a6f8d7ec6f aider: Add the redirect plugin to the _config.yml.
# Aider chat conversation:

USER: add the redirect plugin

ASSISTANT: To add the Jekyll Redirect From plugin to your `_config.yml`, you need to add the plugin to the `plugins` list. Here's the diff for the change:

```diff
--- _config.yml
+++ _config.yml
@@ ... @@
 theme: jekyll-theme-cayman
 url: "https://aider.chat"
+plugins:
+  - jekyll-redirect-from
+
 defaults:
   - scope:
       path: "README.md"
       type: "pages"
     values:
       description: "A command-line chat tool for coding with GPT"
```
2024-02-11 06:34:01 -08:00
Paul Gauthier
c8a732dde2 copy 2024-02-11 06:31:57 -08:00
Paul Gauthier
0904e8e1a0 copy 2024-02-11 06:29:40 -08:00
Paul Gauthier
c95a4ae171 copy 2024-02-11 06:28:16 -08:00
Paul Gauthier
027faa384d whitespace 2024-02-11 06:21:11 -08:00
Paul Gauthier
f948c855ac renamed 2024-02-11 06:18:35 -08:00
Paul Gauthier
ac22c314d1 use first 100 words for excerpt 2024-02-11 06:16:21 -08:00
Paul Gauthier
69c32c2903 aider: Add frontmatter for 301 redirect from old URL to new URL.
# Aider chat conversation:

USER: this post used to be live at /docs/aider-20.html. add frontmatter so that that old url does a 301 redirect to here.

ASSISTANT: ```diff
--- _posts/2024-02-11-aider-20.html
+++ _posts/2024-02-11-aider-20.html
@@ ... @@
+
+---
+redirect_from:
+  - /docs/aider-20.html
+---
+
 # Aider v0.20.0 is live

 Aider v0.20.0 is now available in pypi.
```
2024-02-11 06:15:54 -08:00
Paul Gauthier
f317915e8a Added new blog post for February 11, 2024. 2024-02-11 06:15:53 -08:00
Paul Gauthier
3fb698cdca added blog 2024-02-11 06:07:29 -08:00
Paul Gauthier
69d1107b31 set version to 0.24.2-dev 2024-02-11 05:51:40 -08:00
Paul Gauthier
d26693e636 version bump to 0.24.1 2024-02-11 05:51:09 -08:00
Paul Gauthier
19d1a57e5b Updated HISTORY 2024-02-11 05:49:30 -08:00
Paul Gauthier
72e6cb901c Fixed bug with cost computations when --no-steam in effect #475 2024-02-11 05:48:32 -08:00
Paul Gauthier
a1a2cf7ac2 Added pipx faq entry 2024-02-10 10:40:12 -08:00
Paul Gauthier
0fa2505ac5 Delete pandoc installer 2024-02-10 08:48:22 -08:00
Paul Gauthier
3d948788dd set version to 0.24.1-dev 2024-02-10 07:35:36 -08:00
Paul Gauthier
08a22b9cf7 version bump to 0.24.0 2024-02-10 07:35:02 -08:00
Paul Gauthier
1e70b3fb05 Updated HISTORY 2024-02-10 07:34:21 -08:00
Paul Gauthier
d376767ca3 Use gpt-3.5-turbo-0125 for commit messages 2024-02-10 07:32:26 -08:00
Paul Gauthier
6ddfc894e7 Updated HISTORY 2024-02-10 07:31:04 -08:00
Paul Gauthier
bdef4308fe Simpler calls to pypandoc 2024-02-08 16:14:17 -08:00
Paul Gauthier
efff174f9a Use download_pandoc, which works everywhere including arm64 2024-02-08 15:56:00 -08:00
Paul Gauthier
2dee76378b keep hrefs 2024-02-08 15:19:00 -08:00
Paul Gauthier
62c67a1e23 copy 2024-02-08 14:41:51 -08:00
Paul Gauthier
115bf10978 Added /web to commands doc 2024-02-08 14:39:18 -08:00
Paul Gauthier
50b6a00331 bs4 -> beautifulsoup4 2024-02-08 14:29:56 -08:00
Paul Gauthier
05c74202ec copy 2024-02-08 14:20:49 -08:00
Paul Gauthier
cc36329691 output with pandoc, cleanup with bs and re 2024-02-08 14:14:42 -08:00
Paul Gauthier
be60b785a4 aider: Remove all data: URLs from slimdown_html.
# Aider chat conversation:

USER: in `slimdown_html` remove all data: urls

ASSISTANT: ```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def slimdown_html(soup):
     # Remove all <img> tags
     for img in soup.find_all('img'):
         img.decompose()
+    # Remove all elements with data: URLs
+    for tag in soup.find_all(href=lambda x: x and x.startswith('data:')):
+        tag.decompose()
+    for tag in soup.find_all(src=lambda x: x and x.startswith('data:')):
+        tag.decompose()
     # Remove all per-element CSS styles
     for tag in soup.find_all(True):
         tag.attrs.pop('style', None)
     # Remove all internal anchor elements
     for anchor in soup.find_all('a', href=True):
         if anchor['href'].startswith('#'):
             anchor.decompose()
     return soup
```
2024-02-08 12:12:03 -08:00
Paul Gauthier
84ae51c005 Add functionality to write scraped content to a temporary file and optimize HTML parsing. 2024-02-08 12:11:59 -08:00
Paul Gauthier
33a62c8dc6 aider: finish slimdown_html. remove all <img> tags and all per-element css-styles and all internal anchor elements
# Aider chat conversation:

USER: finish `slimdown_html`. remove all <img> tags and all per-element css-styles and all internal anchor elements

ASSISTANT: ```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
-def slimdown_html(page_source: str) -> str:
-    soup = BeautifulSoup(page_source, "html.parser")
-    # ...
+def slimdown_html(page_source: str) -> str:
+    soup = BeautifulSoup(page_source, "html.parser")
+    # Remove all <img> tags
+    for img in soup.find_all('img'):
+        img.decompose()
+    # Remove all per-element CSS styles
+    for tag in soup.find_all(True):
+        tag.attrs.pop('style', None)
+    # Remove all internal anchor elements
+    for anchor in soup.find_all('a', href=True):
+        if anchor['href'].startswith('#'):
+            anchor.decompose()
+    return str(soup)
```
2024-02-08 12:10:12 -08:00
Paul Gauthier
1520bb976a Change html_to_text to html_to_markdown and add slimdown_html function. 2024-02-08 12:10:10 -08:00
Paul Gauthier
1509e21698 aider: Implemented html_to_markdown using pypandoc.
# Aider chat conversation:

USER: use pypandoc to implement `html_to_markdown`

ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will:

1. Add an import for `pypandoc`.
2. Implement the `html_to_markdown` function using `pypandoc.convert_text`.

Here are the diffs for those changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 import sys
+import pypandoc
 import httpx
 from bs4 import BeautifulSoup
 from playwright.sync_api import sync_playwright
@@ ... @@
 def html_to_markdown(page_source: str) -> str:
-    pass
+    return pypandoc.convert_text(page_source, 'md', format='html')
```

Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.
2024-02-08 12:02:45 -08:00
Paul Gauthier
75e7f2bd4e Add html_to_markdown function to convert HTML to Markdown. 2024-02-08 12:02:42 -08:00
Paul Gauthier
681f26d010 Print playwright instructions after the content is displayed, so they are not lost 2024-02-08 12:01:18 -08:00
Paul Gauthier
e42005a78e plumbed through to /web 2024-02-08 11:47:28 -08:00
Paul Gauthier
5b78d929a6 Roughly working scraper 2024-02-08 11:44:55 -08:00
Paul Gauthier
9bf3a6e0c6 aider: Implemented HTTP request using httpx library in scrape_with_httpx method.
# Aider chat conversation:

USER: finish `scrape_with_httpx`

ASSISTANT: To finish the `scrape_with_httpx` method, I will implement the HTTP request using the `httpx` library. This will involve making a GET request to the provided URL and returning the content of the response.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
     def scrape_with_httpx(self, url):
-        pass
+        import httpx
+        headers = {
+            'User-Agent': aider_user_agent
+        }
+        try:
+            with httpx.Client(headers=headers) as client:
+                response = client.get(url)
+                response.raise_for_status()
+                return response.text
+        except httpx.HTTPError as http_err:
+            self.print_error(f'HTTP error occurred: {http_err}')
+        except Exception as err:
+            self.print_error(f'An error occurred: {err}')
+        return None
```

This implementation uses `httpx.Client` to send a GET request with a custom User-Agent header. It handles any HTTP errors by printing them out and returns `None` in case of an error. If the request is successful, it returns the text content of the response.
2024-02-08 11:29:46 -08:00
Paul Gauthier
e04187d1ad Refactor scraper to use Playwright for web scraping and handle user agent string. 2024-02-08 11:29:44 -08:00
Paul Gauthier
f10a9d2e7b aider: Retrieve the user agent from the browser context after creating a new page in the scrape_with_playwright function.
# Aider chat conversation:

USER: determine the user_agent

ASSISTANT: To determine the `user_agent` in the `scrape_with_playwright` function, we can use the `browser.new_context()` method which allows us to specify a `user_agent` string. We will set a common user agent string for a desktop browser.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright()
^C KeyboardInterrupt

USER: I don't want to SET it. I want to determine what it is

ASSISTANT: To determine the `user_agent` being used by the browser in the `scrape_with_playwright` function, we can retrieve it from the browser context after creating a new page. Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright() as p:
         try:
             browser = p.chromium.launch()
         except Exception as e:
             print(repr(e))
             return
+        page = browser.new_page()
+        user_agent = page.evaluate("navigator.userAgent")
+        print(f"User Agent: {user_agent}")
-        page = browser.new_page()
         page.goto(url)
         content = page.content()
         browser.close()

     return content
```
2024-02-08 11:02:54 -08:00
Paul Gauthier
65111591d7 Add scraper module for data extraction. 2024-02-08 11:02:52 -08:00
Paul Gauthier
e2acc028d0 aider: Set the user agent to 'Aider v0.24.0-dev'.
# Aider chat conversation:

USER: set the user agent to 'Aider v0.24.0-dev'

ASSISTANT: ```diff
--- playw.py
+++ playw.py
@@ ... @@
 def main(url):
     # Check if Chromium is installed, if not, install it
     with sync_playwright() as p:
-        p.chromium.launch()
+        browser = p.chromium.launch(user_agent='Aider v0.24.0-dev')
+        page = browser.new_page()
+        page.goto(url)
+        content = page.content()
+        print(content)
+        browser.close()

 if __name__ == "__main__":
     if len(sys.argv) < 2:
```

USER: UnifiedDiffNoMatch: hunk failed to apply!

playw.py does not contain lines that match the diff you provided!
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The diff needs to apply cleanly to the lines in playw.py!

playw.py does not contain these 7 exact lines in a row:
```
def main(url):
    # Check if Chromium is installed, if not, install it
    with sync_playwright() as p:
        p.chromium.launch()

if __name__ == "__main__":
    if len(sys.argv) < 2:
```

ASSISTANT: I apologize for the oversight. Here is the corrected diff:

```diff
--- playw.py
+++ playw.py
@@ ... @@
     with sync_playwright() as p:
-        browser = p.chromium.launch()
+        browser = p.chromium.launch(user_agent='Aider v0.24.0-dev')
         page = browser.new_page()
         page.goto(url)
         #page.wait_for_load_state('networkidle')
         content = page.content()
         print(content)
         browser.close()
```
2024-02-08 07:59:12 -08:00