aider

mirror of https://github.com/Aider-AI/aider.git synced 2025-05-29 00:35:00 +00:00

Author	SHA1	Message	Date
Paul Gauthier	d376767ca3	Use gpt-3.5-turbo-0125 for commit messages	2024-02-10 07:32:26 -08:00
Paul Gauthier	6ddfc894e7	Updated HISTORY	2024-02-10 07:31:04 -08:00
Paul Gauthier	bdef4308fe	Simpler calls to pypandoc	2024-02-08 16:14:17 -08:00
Paul Gauthier	efff174f9a	Use download_pandoc, which works everywhere including arm64	2024-02-08 15:56:00 -08:00
Paul Gauthier	2dee76378b	keep hrefs	2024-02-08 15:19:00 -08:00
Paul Gauthier	62c67a1e23	copy	2024-02-08 14:41:51 -08:00
Paul Gauthier	115bf10978	Added /web to commands doc	2024-02-08 14:39:18 -08:00
Paul Gauthier	50b6a00331	bs4 -> beautifulsoup4	2024-02-08 14:29:56 -08:00
Paul Gauthier	05c74202ec	copy	2024-02-08 14:20:49 -08:00
Paul Gauthier	cc36329691	output with pandoc, cleanup with bs and re	2024-02-08 14:14:42 -08:00
Paul Gauthier	be60b785a4	aider: Remove all data: URLs from slimdown_html. # Aider chat conversation: USER: in `slimdown_html` remove all data: urls ASSISTANT: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def slimdown_html(soup): # Remove all <img> tags for img in soup.find_all('img'): img.decompose() + # Remove all elements with data: URLs + for tag in soup.find_all(href=lambda x: x and x.startswith('data:')): + tag.decompose() + for tag in soup.find_all(src=lambda x: x and x.startswith('data:')): + tag.decompose() # Remove all per-element CSS styles for tag in soup.find_all(True): tag.attrs.pop('style', None) # Remove all internal anchor elements for anchor in soup.find_all('a', href=True): if anchor['href'].startswith('#'): anchor.decompose() return soup ```	2024-02-08 12:12:03 -08:00
Paul Gauthier	84ae51c005	Add functionality to write scraped content to a temporary file and optimize HTML parsing.	2024-02-08 12:11:59 -08:00
Paul Gauthier	33a62c8dc6	aider: finish `slimdown_html`. remove all <img> tags and all per-element css-styles and all internal anchor elements # Aider chat conversation: USER: finish `slimdown_html`. remove all <img> tags and all per-element css-styles and all internal anchor elements ASSISTANT: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ -def slimdown_html(page_source: str) -> str: - soup = BeautifulSoup(page_source, "html.parser") - # ... +def slimdown_html(page_source: str) -> str: + soup = BeautifulSoup(page_source, "html.parser") + # Remove all <img> tags + for img in soup.find_all('img'): + img.decompose() + # Remove all per-element CSS styles + for tag in soup.find_all(True): + tag.attrs.pop('style', None) + # Remove all internal anchor elements + for anchor in soup.find_all('a', href=True): + if anchor['href'].startswith('#'): + anchor.decompose() + return str(soup) ```	2024-02-08 12:10:12 -08:00
Paul Gauthier	1520bb976a	Change html_to_text to html_to_markdown and add slimdown_html function.	2024-02-08 12:10:10 -08:00
Paul Gauthier	1509e21698	aider: Implemented `html_to_markdown` using `pypandoc`. # Aider chat conversation: USER: use pypandoc to implement `html_to_markdown` ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will: 1. Add an import for `pypandoc`. 2. Implement the `html_to_markdown` function using `pypandoc.convert_text`. Here are the diffs for those changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ import sys +import pypandoc import httpx from bs4 import BeautifulSoup from playwright.sync_api import sync_playwright @@ ... @@ def html_to_markdown(page_source: str) -> str: - pass + return pypandoc.convert_text(page_source, 'md', format='html') ``` Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.	2024-02-08 12:02:45 -08:00
Paul Gauthier	75e7f2bd4e	Add html_to_markdown function to convert HTML to Markdown.	2024-02-08 12:02:42 -08:00
Paul Gauthier	681f26d010	Print playwright instructions after the content is displayed, so they are not lost	2024-02-08 12:01:18 -08:00
Paul Gauthier	e42005a78e	plumbed through to /web	2024-02-08 11:47:28 -08:00
Paul Gauthier	5b78d929a6	Roughly working scraper	2024-02-08 11:44:55 -08:00
Paul Gauthier	9bf3a6e0c6	aider: Implemented HTTP request using `httpx` library in `scrape_with_httpx` method. # Aider chat conversation: USER: finish `scrape_with_httpx` ASSISTANT: To finish the `scrape_with_httpx` method, I will implement the HTTP request using the `httpx` library. This will involve making a GET request to the provided URL and returning the content of the response. Here is the diff for the changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def scrape_with_httpx(self, url): - pass + import httpx + headers = { + 'User-Agent': aider_user_agent + } + try: + with httpx.Client(headers=headers) as client: + response = client.get(url) + response.raise_for_status() + return response.text + except httpx.HTTPError as http_err: + self.print_error(f'HTTP error occurred: {http_err}') + except Exception as err: + self.print_error(f'An error occurred: {err}') + return None ``` This implementation uses `httpx.Client` to send a GET request with a custom User-Agent header. It handles any HTTP errors by printing them out and returns `None` in case of an error. If the request is successful, it returns the text content of the response.	2024-02-08 11:29:46 -08:00
Paul Gauthier	e04187d1ad	Refactor scraper to use Playwright for web scraping and handle user agent string.	2024-02-08 11:29:44 -08:00
Paul Gauthier	f10a9d2e7b	aider: Retrieve the user agent from the browser context after creating a new page in the `scrape_with_playwright` function. # Aider chat conversation: USER: determine the user_agent ASSISTANT: To determine the `user_agent` in the `scrape_with_playwright` function, we can use the `browser.new_context()` method which allows us to specify a `user_agent` string. We will set a common user agent string for a desktop browser. Here is the diff for the changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def scrape_with_playwright(url): with sync_playwright() ^C KeyboardInterrupt USER: I don't want to SET it. I want to determine what it is ASSISTANT: To determine the `user_agent` being used by the browser in the `scrape_with_playwright` function, we can retrieve it from the browser context after creating a new page. Here is the diff for the changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def scrape_with_playwright(url): with sync_playwright() as p: try: browser = p.chromium.launch() except Exception as e: print(repr(e)) return + page = browser.new_page() + user_agent = page.evaluate("navigator.userAgent") + print(f"User Agent: {user_agent}") - page = browser.new_page() page.goto(url) content = page.content() browser.close() return content ```	2024-02-08 11:02:54 -08:00
Paul Gauthier	65111591d7	Add scraper module for data extraction.	2024-02-08 11:02:52 -08:00
Paul Gauthier	e2acc028d0	aider: Set the user agent to 'Aider v0.24.0-dev'. # Aider chat conversation: USER: set the user agent to 'Aider v0.24.0-dev' ASSISTANT: ```diff --- playw.py +++ playw.py @@ ... @@ def main(url): # Check if Chromium is installed, if not, install it with sync_playwright() as p: - p.chromium.launch() + browser = p.chromium.launch(user_agent='Aider v0.24.0-dev') + page = browser.new_page() + page.goto(url) + content = page.content() + print(content) + browser.close() if __name__ == "__main__": if len(sys.argv) < 2: ``` USER: UnifiedDiffNoMatch: hunk failed to apply! playw.py does not contain lines that match the diff you provided! Try again. DO NOT skip blank lines, comments, docstrings, etc! The diff needs to apply cleanly to the lines in playw.py! playw.py does not contain these 7 exact lines in a row: ``` def main(url): # Check if Chromium is installed, if not, install it with sync_playwright() as p: p.chromium.launch() if __name__ == "__main__": if len(sys.argv) < 2: ``` ASSISTANT: I apologize for the oversight. Here is the corrected diff: ```diff --- playw.py +++ playw.py @@ ... @@ with sync_playwright() as p: - browser = p.chromium.launch() + browser = p.chromium.launch(user_agent='Aider v0.24.0-dev') page = browser.new_page() page.goto(url) #page.wait_for_load_state('networkidle') content = page.content() print(content) browser.close() ```	2024-02-08 07:59:12 -08:00
Paul Gauthier	5a06508e0e	Install Chromium if not already installed before launching it.	2024-02-08 07:58:59 -08:00
Paul Gauthier	09de0f5d43	aider: Add check to see if Playwright needs to install Chromium and its dependencies. # Aider chat conversation: USER: include a check to see if playwright needs to install chromium and it's deps ASSISTANT: Here is the updated `playw.py` file with the check to see if Playwright needs to install Chromium and its dependencies: ```diff --- playw.py +++ playw.py @@ ... @@ import sys +from playwright.__main__ import main as playwright_install def main(url): + # Check if Chromium is installed, if not, install it + try: + with sync_playwright() as p: + p.chromium.launch() + except Exception as e: + print("Chromium is not installed. Installing necessary dependencies...") + playwright_install(['install', 'chromium']) + with sync_playwright() as p: browser = p.chromium.launch() page = browser.new_page() page.goto(url) content = page.content() print(content) browser.close() if __name__ == "__main__": if len(sys.argv) < 2: print("Usage: python playw.py <URL>") sys.exit(1) main(sys.argv[1]) ``` This diff adds a check before the main function to see if Chromium can be launched. If an exception occurs, it assumes Chromium is not installed and proceeds to install it using Playwright's command-line interface.	2024-02-08 07:24:46 -08:00
Paul Gauthier	d1782d1c87	aider: Remove asynchronous code and use synchronous Playwright API. # Aider chat conversation: USER: don't use async ASSISTANT: Here is the updated `playw.py` file without using async: ```diff --- playw.py +++ playw.py @@ ... @@ -import asyncio -from playwright.async_api import async_playwright +from playwright.sync_api import sync_playwright import sys -async def main(url): - async with async_playwright() as p: - browser = await p.chromium.launch() - page = await browser.new_page() - await page.goto(url) - content = await page.content() - print(content) - await browser.close() +def main(url): + with sync_playwright() as p: + browser = p.chromium.launch() + page = browser.new_page() + page.goto(url) + content = page.content() + print(content) + browser.close() if __name__ == "__main__": if len(sys.argv) < 2: print("Usage: python playw.py <URL>") sys.exit(1) - asyncio.run(main(sys.argv[1])) + main(sys.argv[1]) ``` This diff removes the asynchronous code and replaces it with the synchronous version provided by Playwright's `sync_api`. The `async` and `await` keywords are removed, and the `async_playwright` context manager is replaced with `sync_playwright`. The script should now run synchronously.	2024-02-08 07:21:23 -08:00
Paul Gauthier	a2649cabe9	aider: Added Playwright script to fetch text from a supplied URL. # Aider chat conversation: USER: write a playwright script to fetch the text from a url supplied on the cmd line ASSISTANT: To create a Playwright script that fetches text from a URL supplied on the command line, I will write the necessary code in the `playw.py` file. The script will use Playwright's Python package to launch a browser, navigate to the provided URL, and print the text content of the page to the console. Here is the diff for the new `playw.py` file: ```diff --- /dev/null +++ playw.py @@ ... @@ +import asyncio +from playwright.async_api import async_playwright +import sys + +async def main(url): + async with async_playwright() as p: + browser = await p.chromium.launch() + page = await browser.new_page() + await page.goto(url) + content = await page.content() + print(content) + await browser.close() + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python playw.py <URL>") + sys.exit(1) + asyncio.run(main(sys.argv[1])) ``` This script will require the Playwright package to be installed. You can install it using `pip install playwright` and then run `playwright install` to install the necessary browser binaries. To run the script, use the command `python playw.py <URL>`, replacing `<URL>` with the actual URL you want to fetch text from.	2024-02-08 07:20:50 -08:00
Paul Gauthier	1758937042	wip	2024-02-08 07:18:39 -08:00
Paul Gauthier	61531b9430	aider: Add chromedriver-autoinstaller to requirements.in. # Aider chat conversation: USER: I added these read-write files: requirements.in ASSISTANT: It seems there might be a misunderstanding. You've indicated that you've added the `requirements.in` file, but you haven't specified any changes that need to be made to it or any other files. If you need any modifications or have any specific requests regarding the `requirements.in` or any other file, please let me know, and I'll be happy to assist you with the necessary changes. USER: add chromedriver-autoinstller to .in not .txt ASSISTANT: Here is the diff to add `chromedriver-autoinstaller` to the `requirements.in` file: ```diff --- requirements.in +++ requirements.in @@ ... @@ webdriver_manager selenium +chromedriver-autoinstaller ``` This change adds `chromedriver-autoinstaller` to the `requirements.in` file, which is used to generate the `requirements.txt` file with `pip-compile`. After adding this line, you would run `pip-compile requirements.in` to update the `requirements.txt` file accordingly.	2024-02-07 16:30:30 -08:00
Paul Gauthier	99be34fb61	aider: Add chromedriver-autoinstaller to requirements.txt for automatic installation of ChromeDriver. # Aider chat conversation: USER: does the user need to install anything beyond the pip requirements to have selenium drive chrome? ASSISTANT: To have Selenium drive Chrome, the user needs to have the Chrome browser installed on their system and also the ChromeDriver executable, which allows Selenium to interface with Chrome. The ChromeDriver version must be compatible with the version of Chrome installed on the system. The `requirements.txt` file already includes `selenium`, but it does not include `chromedriver-autoinstaller`, which can automatically download and install the correct version of ChromeDriver for the user's version of Chrome. To facilitate this, we can add `chromedriver-autoinstaller` to the `requirements.txt` file. Here is the diff to add `chromedriver-autoinstaller` to the `requirements.txt`: ```diff --- requirements.txt +++ requirements.txt @@ ... @@ +selenium==4.17.2 +chromedriver-autoinstaller==0.3.1 # via -r requirements.in ``` By adding `chromedriver-autoinstaller`, the user can ensure that the correct version of ChromeDriver is installed and updated as needed, without manual intervention.	2024-02-07 16:29:45 -08:00
Paul Gauthier	ffdd034b7b	Add new dependencies and update existing ones.	2024-02-07 16:29:44 -08:00
Paul Gauthier	29fa63cd3b	aider: Add bs4 to requirements. # Aider chat conversation: USER: add bs4 ASSISTANT: ```diff --- requirements.in +++ requirements.in @@ ... @@ soundfile +bs4 PyYAML Pillow ```	2024-02-07 16:13:09 -08:00
Paul Gauthier	aa54fdd5f7	aider: Add the missing requirements for selenium and related packages. # Aider chat conversation: USER: add the missing requirements for selenium and related packages ASSISTANT: ```diff --- requirements.in +++ requirements.in @@ ... @@ +webdriver_manager +selenium ``` The above diff adds the missing requirements for Selenium and the WebDriver Manager to the `requirements.in` file.	2024-02-07 14:23:24 -08:00
Paul Gauthier	ef12107bd7	aider: Add command /web to use headless selenium to scrape a webpage and add the content to the chat. # Aider chat conversation: USER: Add a command /web to use headless selenium to scrape a webpage and add the content to the chat ASSISTANT: To implement the `/web` command, we will need to add a new method to the `Commands` class in the `aider/commands.py` file. This method will use Selenium with a headless browser to scrape the content of a given webpage and add the content to the chat. Here are the diffs for the changes: ```diff --- aider/commands.py +++ aider/commands.py @@ ... @@ from pathlib import Path +from selenium import webdriver +from selenium.webdriver.chrome.options import Options +from selenium.webdriver.chrome.service import Service +from selenium.webdriver.common.by import By +from webdriver_manager.chrome import ChromeDriverManager import re import subprocess @@ ... @@ class Commands: voice = None + web_driver = None def __init__(self, io, coder, voice_language=None): self.io = io @@ ... @@ self.voice_language = voice_language self.tokenizer = coder.main_model.tokenizer + self.initialize_web_driver() + def initialize_web_driver(self): + chrome_options = Options() + chrome_options.add_argument("--headless") + chrome_options.add_argument("--disable-gpu") + chrome_options.add_argument("--no-sandbox") + chrome_options.add_argument("--disable-dev-shm-usage") + self.web_driver = webdriver.Chrome( + service=Service(ChromeDriverManager().install()), + options=chrome_options + ) + + def cmd_web(self, args): + "Use headless selenium to scrape a webpage and add the content to the chat" + url = args.strip() + if not url: + self.io.tool_error("Please provide a URL to scrape.") + return + + try: + self.web_driver.get(url) + page_content = self.web_driver.find_element(By.TAG_NAME, "body").text + self.io.tool_output(f"Content from {url}:\n{page_content}") + return page_content + except Exception as e: + self.io.tool_error(f"Error scraping {url}: {e}") + def is_command(self, inp): return inp[0] in "/!" @@ ... @@ ``` This diff adds the necessary imports for Selenium and the WebDriver manager, initializes a headless Chrome WebDriver when the `Commands` class is instantiated, and adds a new command method `cmd_web` that takes a URL as an argument, scrapes the webpage, and outputs the content to the chat.	2024-02-07 14:22:02 -08:00
Paul Gauthier	8d9c2ba60b	Added links to openai docs/pricing	2024-02-07 14:17:24 -08:00
Paul Gauthier	49f81ac784	ssh	2024-02-07 12:13:32 -08:00
Paul Gauthier	35f812859e	Rationalized openai model attributes #458	2024-02-07 12:07:30 -08:00
Paul Gauthier	873a388693	Fixed the ! shortcut for /run	2024-02-06 16:14:27 -08:00
Paul Gauthier	926e393529	set version to 0.23.1-dev	2024-02-03 08:40:50 -08:00
Paul Gauthier	b0fd819241	version bump to 0.23.0	2024-02-03 08:40:17 -08:00
Paul Gauthier	174702babe	Updated gpt-4 turbo model switches	2024-02-03 08:39:19 -08:00
Paul Gauthier	bca265abfc	Updated HISTORY	2024-02-03 08:31:28 -08:00
Paul Gauthier	e2e2ee83a4	Added support for gpt-4-0125-preview	2024-02-03 08:27:31 -08:00
Paul Gauthier	f5887a5098	tweaking graph labels	2024-02-03 08:25:19 -08:00
Paul Gauthier	da14474cbe	Updated HISTORY	2024-01-31 17:55:22 -08:00
Paul Gauthier	1317ee1ace	Added --openrouter	2024-01-31 17:33:50 -08:00
Paul Gauthier	e79732dcf7	Added env_vars for all the OPENAI_ env vars	2024-01-31 15:26:53 -08:00
Paul Gauthier	885e33c2b3	copy	2024-01-26 13:51:03 -08:00
Paul Gauthier	edcf9b146b	updated graph	2024-01-25 15:33:52 -08:00

... 7 8 9 10 11 ...

3849 commits