aider

mirror of https://github.com/Aider-AI/aider.git synced 2025-06-11 07:04:59 +00:00

Author	SHA1	Message	Date
Paul Gauthier	1520bb976a	Change html_to_text to html_to_markdown and add slimdown_html function.	2024-02-08 12:10:10 -08:00
Paul Gauthier	1509e21698	aider: Implemented `html_to_markdown` using `pypandoc`. # Aider chat conversation: USER: use pypandoc to implement `html_to_markdown` ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will: 1. Add an import for `pypandoc`. 2. Implement the `html_to_markdown` function using `pypandoc.convert_text`. Here are the diffs for those changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ import sys +import pypandoc import httpx from bs4 import BeautifulSoup from playwright.sync_api import sync_playwright @@ ... @@ def html_to_markdown(page_source: str) -> str: - pass + return pypandoc.convert_text(page_source, 'md', format='html') ``` Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.	2024-02-08 12:02:45 -08:00
Paul Gauthier	75e7f2bd4e	Add html_to_markdown function to convert HTML to Markdown.	2024-02-08 12:02:42 -08:00
Paul Gauthier	681f26d010	Print playwright instructions after the content is displayed, so they are not lost	2024-02-08 12:01:18 -08:00
Paul Gauthier	e42005a78e	plumbed through to /web	2024-02-08 11:47:28 -08:00
Paul Gauthier	5b78d929a6	Roughly working scraper	2024-02-08 11:44:55 -08:00
Paul Gauthier	9bf3a6e0c6	aider: Implemented HTTP request using `httpx` library in `scrape_with_httpx` method. # Aider chat conversation: USER: finish `scrape_with_httpx` ASSISTANT: To finish the `scrape_with_httpx` method, I will implement the HTTP request using the `httpx` library. This will involve making a GET request to the provided URL and returning the content of the response. Here is the diff for the changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def scrape_with_httpx(self, url): - pass + import httpx + headers = { + 'User-Agent': aider_user_agent + } + try: + with httpx.Client(headers=headers) as client: + response = client.get(url) + response.raise_for_status() + return response.text + except httpx.HTTPError as http_err: + self.print_error(f'HTTP error occurred: {http_err}') + except Exception as err: + self.print_error(f'An error occurred: {err}') + return None ``` This implementation uses `httpx.Client` to send a GET request with a custom User-Agent header. It handles any HTTP errors by printing them out and returns `None` in case of an error. If the request is successful, it returns the text content of the response.	2024-02-08 11:29:46 -08:00
Paul Gauthier	e04187d1ad	Refactor scraper to use Playwright for web scraping and handle user agent string.	2024-02-08 11:29:44 -08:00
Paul Gauthier	f10a9d2e7b	aider: Retrieve the user agent from the browser context after creating a new page in the `scrape_with_playwright` function. # Aider chat conversation: USER: determine the user_agent ASSISTANT: To determine the `user_agent` in the `scrape_with_playwright` function, we can use the `browser.new_context()` method which allows us to specify a `user_agent` string. We will set a common user agent string for a desktop browser. Here is the diff for the changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def scrape_with_playwright(url): with sync_playwright() ^C KeyboardInterrupt USER: I don't want to SET it. I want to determine what it is ASSISTANT: To determine the `user_agent` being used by the browser in the `scrape_with_playwright` function, we can retrieve it from the browser context after creating a new page. Here is the diff for the changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ def scrape_with_playwright(url): with sync_playwright() as p: try: browser = p.chromium.launch() except Exception as e: print(repr(e)) return + page = browser.new_page() + user_agent = page.evaluate("navigator.userAgent") + print(f"User Agent: {user_agent}") - page = browser.new_page() page.goto(url) content = page.content() browser.close() return content ```	2024-02-08 11:02:54 -08:00
Paul Gauthier	65111591d7	Add scraper module for data extraction.	2024-02-08 11:02:52 -08:00
Paul Gauthier	1758937042	wip	2024-02-08 07:18:39 -08:00
Paul Gauthier	ef12107bd7	aider: Add command /web to use headless selenium to scrape a webpage and add the content to the chat. # Aider chat conversation: USER: Add a command /web to use headless selenium to scrape a webpage and add the content to the chat ASSISTANT: To implement the `/web` command, we will need to add a new method to the `Commands` class in the `aider/commands.py` file. This method will use Selenium with a headless browser to scrape the content of a given webpage and add the content to the chat. Here are the diffs for the changes: ```diff --- aider/commands.py +++ aider/commands.py @@ ... @@ from pathlib import Path +from selenium import webdriver +from selenium.webdriver.chrome.options import Options +from selenium.webdriver.chrome.service import Service +from selenium.webdriver.common.by import By +from webdriver_manager.chrome import ChromeDriverManager import re import subprocess @@ ... @@ class Commands: voice = None + web_driver = None def __init__(self, io, coder, voice_language=None): self.io = io @@ ... @@ self.voice_language = voice_language self.tokenizer = coder.main_model.tokenizer + self.initialize_web_driver() + def initialize_web_driver(self): + chrome_options = Options() + chrome_options.add_argument("--headless") + chrome_options.add_argument("--disable-gpu") + chrome_options.add_argument("--no-sandbox") + chrome_options.add_argument("--disable-dev-shm-usage") + self.web_driver = webdriver.Chrome( + service=Service(ChromeDriverManager().install()), + options=chrome_options + ) + + def cmd_web(self, args): + "Use headless selenium to scrape a webpage and add the content to the chat" + url = args.strip() + if not url: + self.io.tool_error("Please provide a URL to scrape.") + return + + try: + self.web_driver.get(url) + page_content = self.web_driver.find_element(By.TAG_NAME, "body").text + self.io.tool_output(f"Content from {url}:\n{page_content}") + return page_content + except Exception as e: + self.io.tool_error(f"Error scraping {url}: {e}") + def is_command(self, inp): return inp[0] in "/!" @@ ... @@ ``` This diff adds the necessary imports for Selenium and the WebDriver manager, initializes a headless Chrome WebDriver when the `Commands` class is instantiated, and adds a new command method `cmd_web` that takes a URL as an argument, scrapes the webpage, and outputs the content to the chat.	2024-02-07 14:22:02 -08:00
Paul Gauthier	8d9c2ba60b	Added links to openai docs/pricing	2024-02-07 14:17:24 -08:00
Paul Gauthier	49f81ac784	ssh	2024-02-07 12:13:32 -08:00
Paul Gauthier	35f812859e	Rationalized openai model attributes #458	2024-02-07 12:07:30 -08:00
Paul Gauthier	873a388693	Fixed the ! shortcut for /run	2024-02-06 16:14:27 -08:00
Paul Gauthier	926e393529	set version to 0.23.1-dev	2024-02-03 08:40:50 -08:00
Paul Gauthier	b0fd819241	version bump to 0.23.0	2024-02-03 08:40:17 -08:00
Paul Gauthier	174702babe	Updated gpt-4 turbo model switches	2024-02-03 08:39:19 -08:00
Paul Gauthier	e2e2ee83a4	Added support for gpt-4-0125-preview	2024-02-03 08:27:31 -08:00
Paul Gauthier	1317ee1ace	Added --openrouter	2024-01-31 17:33:50 -08:00
Paul Gauthier	e79732dcf7	Added env_vars for all the OPENAI_ env vars	2024-01-31 15:26:53 -08:00
Paul Gauthier	065bc63e33	Merge remote-tracking branch 'origin/main'	2024-01-23 13:07:56 -08:00
Paul Gauthier	0c314144d2	Added /quit #451	2024-01-23 13:07:24 -08:00
paul-gauthier	247cab56f0	Merge pull request #453 from zackees/patch-1 fix https://github.com/paul-gauthier/aider/issues/452	2024-01-23 13:05:15 -08:00
Paul Gauthier	c31f2313ba	updated docs	2024-01-23 12:58:29 -08:00
Paul Gauthier	3e96fda71a	catch shutdown errors	2024-01-23 10:45:58 -08:00
Paul Gauthier	af4f224c00	cleanup	2024-01-23 10:02:07 -08:00
Paul Gauthier	580c52bd85	set mdargs on init	2024-01-23 09:58:57 -08:00
Paul Gauthier	da131da427	aider: Add __del__ method to MarkdownStream to call self.live.stop(). # Aider chat conversation: USER: make sire self.live.stop() is called no matter how the object gets destroyed ASSISTANT: To ensure that `self.live.stop()` is called no matter how the object gets destroyed, we can use Python's built-in `__del__` method. This method is called when an object is about to be destroyed. We can override this method in our `MarkdownStream` class to stop the `Live` instance. Here are the steps to make this change: 1. Add a `__del__` method to the `MarkdownStream` class that calls `self.live.stop()`. Here is the SEARCH/REPLACE block: aider/mdstream.py <source>python <<<<<<< SEARCH class MarkdownStream: live = None when = 0 min_delay = 0.050 live_window = 6 def __init__(self): self.printed = [] self.live = Live(Text(''), refresh_per_second=1./self.min_delay) self.live.start() ======= class MarkdownStream: live = None when = 0 min_delay = 0.050 live_window = 6 def __init__(self): self.printed = [] self.live = Live(Text(''), refresh_per_second=1./self.min_delay) self.live.start() def __del__(self): self.live.stop() >>>>>>> REPLACE </source>	2024-01-23 09:50:27 -08:00
Paul Gauthier	24f1e01177	Refactored MarkdownStream to use rich.live for real-time updates and improved text rendering.	2024-01-23 09:50:23 -08:00
Paul Gauthier	b143bc56ac	switch to mdstream	2024-01-23 09:20:31 -08:00
Paul Gauthier	e50a0e8b09	noop	2024-01-23 08:55:28 -08:00
Paul Gauthier	85383079f6	cleanup	2024-01-23 08:55:08 -08:00
Paul Gauthier	e6a4edf2e2	aider: Mark _text as noqa 501. # Aider chat conversation: USER: mark _text as noqa 501 ASSISTANT: To mark `_text` as `noqa 501`, we need to modify `aider/mdstream.py`: 1. Add `# noqa: 501` at the end of the `_text` variable declaration. Here is the SEARCH/REPLACE block: aider/mdstream.py <source>python <<<<<<< SEARCH _text = 5 * _text ======= _text = 5 * _text # noqa: 501 >>>>>>> REPLACE </source>	2024-01-23 08:53:34 -08:00
Paul Gauthier	c051104dc0	Add MarkdownStream class for displaying markdown content in a stream.	2024-01-23 08:53:29 -08:00
Zachary Vorhies	c6ced90e51	fix https://github.com/paul-gauthier/aider/issues/452 Adds option to skip the check for an update, better supports tooling which wants to do it's own update mechanism and not have aider print it out unconditionally.	2024-01-22 18:52:06 -08:00
Paul Gauthier	009c5c3c0d	set version to 0.22.1-dev	2024-01-22 17:52:04 -08:00
Paul Gauthier	28b69f381c	version bump to 0.22.0	2024-01-22 17:51:33 -08:00
Paul Gauthier	4e3ace6199	Added ! as alias for /run #334	2024-01-22 13:52:59 -08:00
Paul Gauthier	96d60624cb	added /test cmd #335	2024-01-22 13:50:21 -08:00
Paul Gauthier	06973d85a3	When autocompleting /add and /drop quote the filename if needed #440	2024-01-22 13:28:56 -08:00
Paul Gauthier	23cc78f36b	typo	2024-01-22 13:21:16 -08:00
Paul Gauthier	a8b854fe8d	fix f-string	2024-01-22 13:10:15 -08:00
Paul Gauthier	877dda2f3d	On /undo, ask GPT to wait before retrying the changes #450	2024-01-22 13:05:38 -08:00
Paul Gauthier	61e68ffe9d	set version to 0.21.2-dev	2024-01-10 07:50:45 -08:00
Paul Gauthier	0d11880a4a	version bump to 0.21.1	2024-01-10 07:50:15 -08:00
Paul Gauthier	6a77657d95	added --4turbo and --4 aliases	2024-01-10 07:48:17 -08:00
Paul Gauthier	29a578c621	Complete the search for viable hunks all the way down to 0 space context lines #435	2024-01-09 08:34:33 -08:00
Paul Gauthier	e0e454fe82	set version to 0.21.1-dev	2024-01-08 10:44:34 -08:00

1 2 3 4 5 ...

1520 commits