Commit graph

3626 commits

Author SHA1 Message Date
Paul Gauthier
65d43dff0f aider: Add a date to each blog entry.
# Aider chat conversation:

USER: add a date to each blog entry

ASSISTANT: ```diff
--- blog/index.html
+++ blog/index.html
@@ ... @@
 <div class="posts">
   {% for post in site.posts %}
     <article class="post">
+      <p class="post-date">{{ post.date | date: "%b %-d, %Y" }}</p>
       <h1><a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a></h1>
       <div class="entry">
           {% if post.excerpt %}
```
2024-02-11 08:02:11 -08:00
Paul Gauthier
265e71010f Make assets/ reference absolute 2024-02-11 07:58:25 -08:00
Paul Gauthier
ae3373eb89 fixed redirect 2024-02-11 07:57:50 -08:00
Paul Gauthier
e630e51668 better excerpt logic; fixed svg ref 2024-02-11 07:55:44 -08:00
Paul Gauthier
d7476bb303 aider: Update blog index to display post excerpt if available, otherwise use the first 100 words.
# Aider chat conversation:

USER: use the excerpt if one exists, otherwise keep using the first 100 words

ASSISTANT: ```diff
--- blog/index.html
+++ blog/index.html
@@ ... @@
 <div class="entry">
-    {{ post.content | strip_html | truncatewords: 100 }}...
+    {{ post.excerpt | default: post.content | strip_html | truncatewords: 100 }}...
 </div>
```
2024-02-11 07:52:29 -08:00
Paul Gauthier
a3f9f6c5c6 aider: Added an excerpt for speed benchmarks report.
# Aider chat conversation:

USER: add an excerpt

ASSISTANT: ```diff
--- _posts/2023-11-06-benchmarks-speed-1106.md
+++ _posts/2023-11-06-benchmarks-speed-1106.md
@@ ... @@
 ---
 title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
+excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
 redirect_from:
   - /docs/benchmarks-speed-1106.md
 ---
```
2024-02-11 07:51:38 -08:00
Paul Gauthier
81ad2f18fe moved benchmarks-speed-1106 article 2024-02-11 07:50:18 -08:00
Paul Gauthier
2f8109ece1 copy 2024-02-11 06:36:47 -08:00
Paul Gauthier
a6f8d7ec6f aider: Add the redirect plugin to the _config.yml.
# Aider chat conversation:

USER: add the redirect plugin

ASSISTANT: To add the Jekyll Redirect From plugin to your `_config.yml`, you need to add the plugin to the `plugins` list. Here's the diff for the change:

```diff
--- _config.yml
+++ _config.yml
@@ ... @@
 theme: jekyll-theme-cayman
 url: "https://aider.chat"
+plugins:
+  - jekyll-redirect-from
+
 defaults:
   - scope:
       path: "README.md"
       type: "pages"
     values:
       description: "A command-line chat tool for coding with GPT"
```
2024-02-11 06:34:01 -08:00
Paul Gauthier
c8a732dde2 copy 2024-02-11 06:31:57 -08:00
Paul Gauthier
0904e8e1a0 copy 2024-02-11 06:29:40 -08:00
Paul Gauthier
c95a4ae171 copy 2024-02-11 06:28:16 -08:00
Paul Gauthier
027faa384d whitespace 2024-02-11 06:21:11 -08:00
Paul Gauthier
f948c855ac renamed 2024-02-11 06:18:35 -08:00
Paul Gauthier
ac22c314d1 use first 100 words for excerpt 2024-02-11 06:16:21 -08:00
Paul Gauthier
69c32c2903 aider: Add frontmatter for 301 redirect from old URL to new URL.
# Aider chat conversation:

USER: this post used to be live at /docs/aider-20.html. add frontmatter so that that old url does a 301 redirect to here.

ASSISTANT: ```diff
--- _posts/2024-02-11-aider-20.html
+++ _posts/2024-02-11-aider-20.html
@@ ... @@
+
+---
+redirect_from:
+  - /docs/aider-20.html
+---
+
 # Aider v0.20.0 is live

 Aider v0.20.0 is now available in pypi.
```
2024-02-11 06:15:54 -08:00
Paul Gauthier
f317915e8a Added new blog post for February 11, 2024. 2024-02-11 06:15:53 -08:00
Paul Gauthier
3fb698cdca added blog 2024-02-11 06:07:29 -08:00
Paul Gauthier
69d1107b31 set version to 0.24.2-dev 2024-02-11 05:51:40 -08:00
Paul Gauthier
d26693e636 version bump to 0.24.1 2024-02-11 05:51:09 -08:00
Paul Gauthier
19d1a57e5b Updated HISTORY 2024-02-11 05:49:30 -08:00
Paul Gauthier
72e6cb901c Fixed bug with cost computations when --no-steam in effect #475 2024-02-11 05:48:32 -08:00
Paul Gauthier
a1a2cf7ac2 Added pipx faq entry 2024-02-10 10:40:12 -08:00
Paul Gauthier
0fa2505ac5 Delete pandoc installer 2024-02-10 08:48:22 -08:00
Paul Gauthier
3d948788dd set version to 0.24.1-dev 2024-02-10 07:35:36 -08:00
Paul Gauthier
08a22b9cf7 version bump to 0.24.0 2024-02-10 07:35:02 -08:00
Paul Gauthier
1e70b3fb05 Updated HISTORY 2024-02-10 07:34:21 -08:00
Paul Gauthier
d376767ca3 Use gpt-3.5-turbo-0125 for commit messages 2024-02-10 07:32:26 -08:00
Paul Gauthier
6ddfc894e7 Updated HISTORY 2024-02-10 07:31:04 -08:00
Paul Gauthier
bdef4308fe Simpler calls to pypandoc 2024-02-08 16:14:17 -08:00
Paul Gauthier
efff174f9a Use download_pandoc, which works everywhere including arm64 2024-02-08 15:56:00 -08:00
Paul Gauthier
2dee76378b keep hrefs 2024-02-08 15:19:00 -08:00
Paul Gauthier
62c67a1e23 copy 2024-02-08 14:41:51 -08:00
Paul Gauthier
115bf10978 Added /web to commands doc 2024-02-08 14:39:18 -08:00
Paul Gauthier
50b6a00331 bs4 -> beautifulsoup4 2024-02-08 14:29:56 -08:00
Paul Gauthier
05c74202ec copy 2024-02-08 14:20:49 -08:00
Paul Gauthier
cc36329691 output with pandoc, cleanup with bs and re 2024-02-08 14:14:42 -08:00
Paul Gauthier
be60b785a4 aider: Remove all data: URLs from slimdown_html.
# Aider chat conversation:

USER: in `slimdown_html` remove all data: urls

ASSISTANT: ```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def slimdown_html(soup):
     # Remove all <img> tags
     for img in soup.find_all('img'):
         img.decompose()
+    # Remove all elements with data: URLs
+    for tag in soup.find_all(href=lambda x: x and x.startswith('data:')):
+        tag.decompose()
+    for tag in soup.find_all(src=lambda x: x and x.startswith('data:')):
+        tag.decompose()
     # Remove all per-element CSS styles
     for tag in soup.find_all(True):
         tag.attrs.pop('style', None)
     # Remove all internal anchor elements
     for anchor in soup.find_all('a', href=True):
         if anchor['href'].startswith('#'):
             anchor.decompose()
     return soup
```
2024-02-08 12:12:03 -08:00
Paul Gauthier
84ae51c005 Add functionality to write scraped content to a temporary file and optimize HTML parsing. 2024-02-08 12:11:59 -08:00
Paul Gauthier
33a62c8dc6 aider: finish slimdown_html. remove all <img> tags and all per-element css-styles and all internal anchor elements
# Aider chat conversation:

USER: finish `slimdown_html`. remove all <img> tags and all per-element css-styles and all internal anchor elements

ASSISTANT: ```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
-def slimdown_html(page_source: str) -> str:
-    soup = BeautifulSoup(page_source, "html.parser")
-    # ...
+def slimdown_html(page_source: str) -> str:
+    soup = BeautifulSoup(page_source, "html.parser")
+    # Remove all <img> tags
+    for img in soup.find_all('img'):
+        img.decompose()
+    # Remove all per-element CSS styles
+    for tag in soup.find_all(True):
+        tag.attrs.pop('style', None)
+    # Remove all internal anchor elements
+    for anchor in soup.find_all('a', href=True):
+        if anchor['href'].startswith('#'):
+            anchor.decompose()
+    return str(soup)
```
2024-02-08 12:10:12 -08:00
Paul Gauthier
1520bb976a Change html_to_text to html_to_markdown and add slimdown_html function. 2024-02-08 12:10:10 -08:00
Paul Gauthier
1509e21698 aider: Implemented html_to_markdown using pypandoc.
# Aider chat conversation:

USER: use pypandoc to implement `html_to_markdown`

ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will:

1. Add an import for `pypandoc`.
2. Implement the `html_to_markdown` function using `pypandoc.convert_text`.

Here are the diffs for those changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 import sys
+import pypandoc
 import httpx
 from bs4 import BeautifulSoup
 from playwright.sync_api import sync_playwright
@@ ... @@
 def html_to_markdown(page_source: str) -> str:
-    pass
+    return pypandoc.convert_text(page_source, 'md', format='html')
```

Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.
2024-02-08 12:02:45 -08:00
Paul Gauthier
75e7f2bd4e Add html_to_markdown function to convert HTML to Markdown. 2024-02-08 12:02:42 -08:00
Paul Gauthier
681f26d010 Print playwright instructions after the content is displayed, so they are not lost 2024-02-08 12:01:18 -08:00
Paul Gauthier
e42005a78e plumbed through to /web 2024-02-08 11:47:28 -08:00
Paul Gauthier
5b78d929a6 Roughly working scraper 2024-02-08 11:44:55 -08:00
Paul Gauthier
9bf3a6e0c6 aider: Implemented HTTP request using httpx library in scrape_with_httpx method.
# Aider chat conversation:

USER: finish `scrape_with_httpx`

ASSISTANT: To finish the `scrape_with_httpx` method, I will implement the HTTP request using the `httpx` library. This will involve making a GET request to the provided URL and returning the content of the response.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
     def scrape_with_httpx(self, url):
-        pass
+        import httpx
+        headers = {
+            'User-Agent': aider_user_agent
+        }
+        try:
+            with httpx.Client(headers=headers) as client:
+                response = client.get(url)
+                response.raise_for_status()
+                return response.text
+        except httpx.HTTPError as http_err:
+            self.print_error(f'HTTP error occurred: {http_err}')
+        except Exception as err:
+            self.print_error(f'An error occurred: {err}')
+        return None
```

This implementation uses `httpx.Client` to send a GET request with a custom User-Agent header. It handles any HTTP errors by printing them out and returns `None` in case of an error. If the request is successful, it returns the text content of the response.
2024-02-08 11:29:46 -08:00
Paul Gauthier
e04187d1ad Refactor scraper to use Playwright for web scraping and handle user agent string. 2024-02-08 11:29:44 -08:00
Paul Gauthier
f10a9d2e7b aider: Retrieve the user agent from the browser context after creating a new page in the scrape_with_playwright function.
# Aider chat conversation:

USER: determine the user_agent

ASSISTANT: To determine the `user_agent` in the `scrape_with_playwright` function, we can use the `browser.new_context()` method which allows us to specify a `user_agent` string. We will set a common user agent string for a desktop browser.

Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright()
^C KeyboardInterrupt

USER: I don't want to SET it. I want to determine what it is

ASSISTANT: To determine the `user_agent` being used by the browser in the `scrape_with_playwright` function, we can retrieve it from the browser context after creating a new page. Here is the diff for the changes:

```diff
--- aider/scrape.py
+++ aider/scrape.py
@@ ... @@
 def scrape_with_playwright(url):
     with sync_playwright() as p:
         try:
             browser = p.chromium.launch()
         except Exception as e:
             print(repr(e))
             return
+        page = browser.new_page()
+        user_agent = page.evaluate("navigator.userAgent")
+        print(f"User Agent: {user_agent}")
-        page = browser.new_page()
         page.goto(url)
         content = page.content()
         browser.close()

     return content
```
2024-02-08 11:02:54 -08:00
Paul Gauthier
65111591d7 Add scraper module for data extraction. 2024-02-08 11:02:52 -08:00