mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-01 10:14:59 +00:00
aider: Implemented html_to_markdown
using pypandoc
.
# Aider chat conversation: USER: use pypandoc to implement `html_to_markdown` ASSISTANT: To implement `html_to_markdown` using `pypandoc`, I will: 1. Add an import for `pypandoc`. 2. Implement the `html_to_markdown` function using `pypandoc.convert_text`. Here are the diffs for those changes: ```diff --- aider/scrape.py +++ aider/scrape.py @@ ... @@ import sys +import pypandoc import httpx from bs4 import BeautifulSoup from playwright.sync_api import sync_playwright @@ ... @@ def html_to_markdown(page_source: str) -> str: - pass + return pypandoc.convert_text(page_source, 'md', format='html') ``` Please ensure that `pypandoc` is installed in your environment to use this implementation. If it's not installed, you can install it using `pip install pypandoc`.
This commit is contained in:
parent
75e7f2bd4e
commit
1509e21698
1 changed files with 2 additions and 1 deletions
|
@ -2,6 +2,7 @@
|
|||
|
||||
import sys
|
||||
|
||||
import pypandoc
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
@ -118,7 +119,7 @@ def html_to_text(page_source: str) -> str:
|
|||
|
||||
|
||||
def html_to_markdown(page_source: str) -> str:
|
||||
pass
|
||||
return pypandoc.convert_text(page_source, 'md', format='html')
|
||||
|
||||
def main(url):
|
||||
scraper = Scraper()
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue