aider

mirror of https://github.com/Aider-AI/aider.git synced 2025-06-02 10:45:00 +00:00

Author	SHA1	Message	Date
Paul Gauthier	cbcbff341b	style: format voice.py and suppress syntax warnings	2024-12-08 18:14:30 -08:00
Philippe de Reynal	e11faadf39	feat: ability to select audio input device	2024-11-30 11:24:34 +01:00
Paul Gauthier	3edcd71a44	fix: move pydub import to avoid warnings	2024-09-24 09:29:00 -07:00
Paul Gauthier (aider)	ab786279e6	style: Fix linter warnings	2024-09-24 09:28:31 -07:00
Paul Gauthier (aider)	e267dc13af	fix: Move imports to top of file	2024-09-24 09:28:28 -07:00
Paul Gauthier (aider)	d0e6dc2c1e	style: Wrap long lines in voice.py	2024-09-24 09:28:16 -07:00
Paul Gauthier (aider)	da752bb00c	fix: Suppress warning about missing ffmpeg or avconv	2024-09-24 09:28:13 -07:00
Mike Bailey	1cc30a22f9	feat: Option to compress audio files by ~90% Add option to reduce bandwidth (and potentially latency) by converting voice recordings (wav) into a compressed audio format (webm or mp3). Default behaviour is unchanged. > File uploads are currently limited to 25 MB and the following input file > types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm. > > - https://platform.openai.com/docs/guides/speech-to-text	2024-09-22 22:20:07 +10:00
Paul Gauthier	aedba59c6e	feat: add error handling for transcription in Voice class	2024-09-03 08:59:58 -07:00
Paul Gauthier (aider)	130579ea3c	style: Format error message in Voice class for better readability	2024-08-30 07:04:13 -07:00
Paul Gauthier (aider)	e88ec36a81	feat: improve error handling for audio device issues	2024-08-30 07:04:09 -07:00
Paul Gauthier	c598cdd576	Catch self.sd.PortAudioError gracefully #849	2024-07-13 07:57:12 +01:00
Paul Gauthier	9d02628cf8	streamlit borks sys.path, causes import("litellm") to load our litellm.py; fix	2024-07-03 21:32:50 -03:00
Paul Gauthier	ed35af44b3	defer numpy, bs4 and jsonschema	2024-07-03 13:35:33 -03:00
Paul Gauthier	73c6af8851	aider: Implemented a more flexible approach to handle different microphone sample rates in aider/voice.py. # Aider chat conversation: USER: https://github.com/paul-gauthier/aider/issues/612 https://github.com/paul-gauthier/aider/issues/612: [Skip to content](#start-of-content) ## Navigation Menu Toggle navigation [](https://github.com/) [Sign in](/login?return_to=https%3A%2F%2Fgithub.com%2Fpaul-gauthier%2Faider%2Fissues%2F612) - Product - [](https://github.com/features/actions) Actions Automate any workflow - [](https://github.com/features/packages) Packages Host and manage packages - [](https://github.com/features/security) Security Find and fix vulnerabilities - [](https://github.com/features/codespaces) Codespaces Instant dev environments - [](https://github.com/features/copilot) GitHub Copilot Write better code with AI - [](https://github.com/features/code-review) Code review Manage code changes - [](https://github.com/features/issues) Issues Plan and track work - [](https://github.com/features/discussions) Discussions Collaborate outside of code Explore - [All features](https://github.com/features) - [Documentation](https://docs.github.com) - [GitHub Skills](https://skills.github.com) - [Blog](https://github.blog) - Solutions For - [Enterprise](https://github.com/enterprise) - [Teams](https://github.com/team) - [Startups](https://github.com/enterprise/startups) - [Education](https://education.github.com) By Solution - [CI/CD & Automation](https://github.com/solutions/ci-cd) - [DevOps](https://github.com/solutions/devops) - [DevSecOps](https://resources.github.com/devops/fundamentals/devsecops) Resources - [Learning Pathways](https://resources.github.com/learn/pathways) - [White papers, Ebooks, Webinars](https://resources.github.com) - [Customer Stories](https://github.com/customer-stories) - [Partners](https://partner.github.com) - Open Source - [](/sponsors) GitHub Sponsors Fund open source developers - [](https://github.com/readme) The ReadME Project GitHub community articles Repositories - [Topics](https://github.com/topics) - [Trending](https://github.com/trending) - [Collections](https://github.com/collections) - Enterprise - [](/enterprise) Enterprise platform AI-powered developer platform Available add-ons - [](https://github.com/enterprise/advanced-security) Advanced Security Enterprise-grade security features - [](/features/copilot#enterprise) GitHub Copilot Enterprise-grade AI features - [](/premium-support) Premium Support Enterprise-grade 24/7 support - [Pricing](https://github.com/pricing) Search or jump to\... # Search code, repositories, users, issues, pull requests\... Search Clear [Search syntax tips](https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax) # Provide feedback We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback # Saved searches ## Use saved searches to filter your results more quickly Name Query To see all available qualifiers, see our [documentation](https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax). Cancel Create saved search [Sign in](/login?return_to=https%3A%2F%2Fgithub.com%2Fpaul-gauthier%2Faider%2Fissues%2F612) [Sign up](/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=paul-gauthier%2Faider) You signed in with another tab or window. [Reload]() to refresh your session. You signed out in another tab or window. [Reload]() to refresh your session. You switched accounts on another tab or window. [Reload]() to refresh your session. Dismiss alert {{ message }} ::: {role="main"} [paul-gauthier](/paul-gauthier) / [aider](/paul-gauthier/aider) Public - [Notifications](/login?return_to=%2Fpaul-gauthier%2Faider) You must be signed in to change notification settings - [Fork 1k](/login?return_to=%2Fpaul-gauthier%2Faider) - [ Star 10.5k](/login?return_to=%2Fpaul-gauthier%2Faider) - [Code ](/paul-gauthier/aider) - [Issues 57](/paul-gauthier/aider/issues) - [Pull requests 14](/paul-gauthier/aider/pulls) - [Actions ](/paul-gauthier/aider/actions) - [Projects 0](/paul-gauthier/aider/projects) - [Security](/paul-gauthier/aider/security) - [Insights ](/paul-gauthier/aider/pulse) Additional navigation options - [ Code ](/paul-gauthier/aider) - [ Issues ](/paul-gauthier/aider/issues) - [ Pull requests ](/paul-gauthier/aider/pulls) - [ Actions ](/paul-gauthier/aider/actions) - [ Projects ](/paul-gauthier/aider/projects) - [ Security ](/paul-gauthier/aider/security) - [ Insights ](/paul-gauthier/aider/pulse) New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. [Sign up for GitHub](/signup?return_to=%2Fpaul-gauthier%2Faider%2Fissues%2Fnew%2Fchoose) By clicking "Sign up for GitHub", you agree to our [terms of service](https://docs.github.com/terms) and [privacy statement](https://docs.github.com/privacy). We'll occasionally send you account related emails. Already on GitHub? [Sign in](/login?return_to=%2Fpaul-gauthier%2Faider%2Fissues%2Fnew%2Fchoose) to your account [Jump to bottom](#issue-comment-box) # Unsupported Audio Sample Rate #612 Open [aaronson2012](/aaronson2012) opened this issue May 24, 2024 · 0 comments Open # [Unsupported Audio Sample Rate](#top) #612 [aaronson2012](/aaronson2012) opened this issue May 24, 2024 · 0 comments Labels [enhancement](/paul-gauthier/aider/labels/enhancement) New feature or request ## Comments [](/aaronson2012) Copy link ### [aaronson2012](/aaronson2012) commented [May 24, 2024](#issue-2314049815) +-----------------------------------------------------------------------+ \| ### Issue \| \| \| \| aider was crashing when i ran the /voice command and I realized that \| \| my computers microphone did not support a sample rate of 16kHz. It \| \| was an easy fix for me i just changed the sample_rate variable to one \| \| my mic supported but this isn\'t a very elegant solution. Could there \| \| be a better way so all mics are supported? \| \| \| \| ### Version and model info \| \| \| \| Latest verision \| +-----------------------------------------------------------------------+ \| \| \| \| \| The text was updated successfully, but these errors were encountered: \| \| \| \| \| +-----------------------------------------------------------------------+ All reactions [paul-gauthier](/paul-gauthier) added the [enhancement](/paul-gauthier/aider/labels/enhancement) New feature or request label [Jun 4, 2024](#event-13039727805) [Sign up for free](/join?source=comment-repo) to join this conversation on GitHub. Already have an account? [Sign in to comment](/login?return_to=https%3A%2F%2Fgithub.com%2Fpaul-gauthier%2Faider%2Fissues%2F612) Assignees No one assigned Labels [enhancement](/paul-gauthier/aider/labels/enhancement) New feature or request Projects None yet Milestone No milestone Development No branches or pull requests 2 participants ::: ## Footer [](https://github.com) © 2024 GitHub, Inc. ### Footer navigation - [Terms](https://docs.github.com/site-policy/github-terms/github-terms-of-service) - [Privacy](https://docs.github.com/site-policy/privacy-policies/github-privacy-statement) - [Security](https://github.com/security) - [Status](https://www.githubstatus.com/) - [Docs](https://docs.github.com/) - [Contact](https://support.github.com?tags=dotcom-footer) - Manage cookies - Do not share my personal information You can't perform that action at this time. ASSISTANT: Based on the issue description, I believe the following file is most likely to need changes to support different microphone sample rates: aider/voice.py This file contains the Voice class which handles recording and transcribing audio. The sample_rate variable is hardcoded in this file. To make the suggested change, we would need to modify how the sample rate is determined in aider/voice.py. Please add aider/voice.py to the chat so I can propose the specific code changes needed. Let me know if you would like me to explain my reasoning further or if you have any other questions! USER: I added these files to the chat: aider/voice.py. If you need to propose edits to other existing files not already added to the chat, you MUST tell the me their full path names and ask me to add the files to the chat. End your reply and wait for my approval. You can keep asking if you then decide you need to edit more files. ASSISTANT: Here are the proposed changes to aider/voice.py to support different microphone sample rates: aider/voice.py ```python <<<<<<< SEARCH sample_rate = 16000 # 16kHz ======= try: sample_rate = int(self.sd.query_devices(None, 'input')['default_samplerate']) except (TypeError, ValueError): sample_rate = 16000 # fallback to 16kHz if unable to query device >>>>>>> REPLACE ``` This change: 1. Attempts to query the default sample rate of the input device using sounddevice's `query_devices()` function. 2. If successful, it uses that sample rate. 3. If it fails to get a valid sample rate, it falls back to the original 16kHz. This should allow aider to automatically adapt to the microphone's supported sample rate in most cases, while still having a reasonable fallback. Let me know if you would like me to explain or modify the changes further.	2024-06-04 10:27:22 -07:00
Paul Gauthier	9ff6770a04	refactored litellm to avoid duplicating workarounds	2024-05-08 08:05:15 -07:00
Paul Gauthier	01cf038bb5	Quiet litellm	2024-04-23 10:37:43 -07:00
Paul Gauthier	f1c09ececf	Switched voice to litellm	2024-04-18 14:13:26 -07:00
Paul Gauthier	c770fc4380	cleaned up client refs	2024-04-17 15:47:07 -07:00
Paul Gauthier	6ebc142377	roughed in openai 1.x	2023-12-05 07:37:05 -08:00
Paul Gauthier	22eb344192	disable /voice if import soundfile failes #274	2023-10-12 07:00:51 -07:00
Paul Gauthier	963bc83a9c	Late-bind importing sounddevice, as it was slowing down aider startup	2023-08-19 15:18:46 -07:00
Paul Gauthier	c07013abf4	finished roughing in --voice-language	2023-08-14 14:18:37 -03:00
Paul Gauthier	bddcdd39a8	Refactor record_and_transcribe method in voice.py to accept a language parameter.	2023-08-14 14:11:31 -03:00
Paul Gauthier	23579d4741	Refactor voice recording and transcription to allow for specifying the language.	2023-08-14 14:11:03 -03:00
Paul Gauthier	fcb209c5aa	refac	2023-08-11 11:16:31 -03:00
Paul Gauthier	b6b271b70c	pass the input history as a prompt to whisper	2023-08-11 11:03:37 -03:00
Paul Gauthier	c38a877d81	clip low volumes to 0	2023-08-11 10:33:20 -03:00
Paul Gauthier	b59e5bf298	copy	2023-08-10 22:27:18 -03:00
Paul Gauthier	24f574a7dc	better bar	2023-08-10 22:21:39 -03:00
Paul Gauthier	bfa7fedd18	catch ^c	2023-08-10 22:18:41 -03:00
Paul Gauthier	053751cb43	show loudness bar	2023-08-10 22:17:31 -03:00
Paul Gauthier	4dc704272e	aider: Updated callback function to calculate the root mean square (RMS) of the input audio data. # Aider chat conversation: USER: what is `indata` ? ASSISTANT: `indata` is a parameter in the callback function defined within the `record_and_transcribe` function in the `aider/voice.py` file. This parameter represents the input audio data that is being recorded. In the context of the `sounddevice` library, which is used for audio recording in this script, `indata` is a 2D numpy array containing the recorded audio data. The rows in the array represent individual frames, while the columns represent channels. In this specific script, the audio is being recorded in mono (single channel), so there will be only one column. The data in `indata` is being put into a queue and also being dumped for debugging purposes. USER: how can i estimate the avg volume based on indata? ASSISTANT: To estimate the average volume of the audio data, you can calculate the root mean square (RMS) of the `indata` array. The RMS represents the square root of the average of the squares of the values, and it's a common way to measure the magnitude of a varying quantity, such as audio signals. Here's how you can modify the `record_and_transcribe` function to calculate the RMS: ```python aider/voice.py <<<<<<< HEAD def callback(indata, frames, time, status): """This is called (from a separate thread) for each audio block.""" q.put(indata.copy()) dump(indata) dump(indata.mean()) ======= import numpy as np def callback(indata, frames, time, status): """This is called (from a separate thread) for each audio block.""" q.put(indata.copy()) rms = np.sqrt(np.mean(indata2)) dump(rms) >>>>>>> updated ``` In this change, we import the numpy library and replace the line `dump(indata.mean())` with `rms = np.sqrt(np.mean(indata2))` and `dump(rms)`. This calculates the RMS of the `indata` and dumps it for debugging purposes.	2023-08-10 21:57:42 -03:00
Paul Gauthier	aba2bb610d	proper temp file	2023-08-10 21:54:04 -03:00
Paul Gauthier	4812d17372	prettier	2023-08-10 17:12:19 -03:00
Paul Gauthier	7b993a9bbb	gracefully handle missing portaudio	2023-08-10 16:55:45 -03:00
Paul Gauthier	fed24aa444	lint	2023-08-10 16:53:34 -03:00
Paul Gauthier	d74a3ce123	Refactor record_and_transcribe function to remove unnecessary api_key parameter and import dump module.	2023-08-10 16:13:23 -03:00
Paul Gauthier	9b526d51e4	Refactored voice recording to use a queue and write audio to a file.	2023-08-10 16:10:02 -03:00
Paul Gauthier	1f42b0839f	aider: Fixed issue with passing the callback argument to the sd.rec function in record_and_transcribe function. # Aider chat conversation: USER: I ran this command: python -m aider.voice And got this output: Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/gauthier/Projects/aider/aider/voice.py", line 42, in <module> print(record_and_transcribe(api_key)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/gauthier/Projects/aider/aider/voice.py", line 22, in record_and_transcribe recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, callback=on_key_press) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/sounddevice.py", line 276, in rec ctx.start_stream(InputStream, samplerate, ctx.input_channels, TypeError: _CallbackContext.start_stream() got multiple values for argument 'callback' Recording started, press any key to stop... ASSISTANT: It seems like there's an issue with the way the `callback` argument is being passed to the `sd.rec` function in the `record_and_transcribe` function in the `aider/voice.py` file. The `sd.rec` function is a convenience function that internally creates an instance of `sd.InputStream` and starts recording. The `callback` argument should be passed to the `sd.InputStream` constructor, not the `sd.rec` function. Here's how we can fix this: 1. Modify the `aider/voice.py` file. ```python aider/voice.py <<<<<<< HEAD recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, callback=on_key_press) ======= # Create an instance of InputStream with the callback stream = sd.InputStream(samplerate=sample_rate, channels=1, callback=on_key_press) stream.start() recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1) >>>>>>> updated ``` This change creates an instance of `sd.InputStream` with the `on_key_press` callback and starts the stream before calling `sd.rec`. This should ensure that the callback is correctly passed to the `InputStream` and not the `rec` function.	2023-08-10 15:49:26 -03:00
Paul Gauthier	c7d23f1b5a	aider: Moved the `api_key` setting into the main function and pulled it from `os.environ`. # Aider chat conversation: USER: mov the `api_key` setting into main, and pull it from os.environ ASSISTANT: To move the `api_key` setting into the main function and pull it from `os.environ`, we need to modify the `aider/voice.py` file. Here are the changes: ```python aider/voice.py <<<<<<< HEAD def record_and_transcribe(): # Set your OpenAI API key openai.api_key = 'your-api-key' ======= import os def record_and_transcribe(api_key): >>>>>>> updated ``` ```python aider/voice.py <<<<<<< HEAD if __name__ == "__main__": print(record_and_transcribe()) ======= if __name__ == "__main__": api_key = os.getenv('OPENAI_API_KEY') if not api_key: raise ValueError("Please set the OPENAI_API_KEY environment variable.") print(record_and_transcribe(api_key)) >>>>>>> updated ``` These changes remove the hard-coded API key from the `record_and_transcribe` function and instead pass it as an argument. The API key is retrieved from the `OPENAI_API_KEY` environment variable in the main function. If the environment variable is not set, an error is raised.	2023-08-10 15:47:18 -03:00
Paul Gauthier	c0f6cf6977	aider: Added main function to call `record_and_transcribe` if the script is run directly. # Aider chat conversation: USER: add a main func to call this if the script is run directly ASSISTANT: To add a main function that calls `record_and_transcribe` if the script is run directly, we need to modify the `aider/voice.py` file. Here are the changes: ```python aider/voice.py <<<<<<< HEAD # Return the transcription return response['choices'][0]['text'] ======= # Return the transcription return response['choices'][0]['text'] if __name__ == "__main__": print(record_and_transcribe()) >>>>>>> updated ``` This change adds a check to see if the script is being run directly (i.e., not imported as a module). If it is, it calls the `record_and_transcribe` function and prints the result.	2023-08-10 15:46:31 -03:00
Paul Gauthier	1053d1781c	aider: Added a function `record_and_transcribe` to `aider/voice.py` that records audio until the user presses a key, then submits it to the Whisper API to get the text back. # Aider chat conversation: USER: what file formats does the openai whisper api accept? ASSISTANT: The OpenAI Whisper ASR API accepts audio data in the following formats: - FLAC: Free Lossless Audio Codec. This is a lossless audio format that doesn't lose any data during compression. It's the recommended format for best performance. - WAV: Waveform Audio File Format. This is a standard audio format used mainly in Windows PCs. - MULAW: μ-law algorithm (or mu-law) is a companding algorithm, primarily used in the digital telecommunication systems of North America and Japan. Please note that the audio data must be single-channel (mono), and the sample rate must be 16 kHz for FLAC and WAV formats, and 8 kHz for MULAW format. USER: make a script that records audio until the user presses a key, then submits it to whisper to get the text back ASSISTANT: To accomplish this, we need to create a new Python script that uses the `sounddevice` library to record audio, the `keyboard` library to listen for a key press, and the `openai.Whisper` API to transcribe the audio. Here is the edit block for this new script: ```python newfile/record_and_transcribe.py <<<<<<< HEAD ======= import sounddevice as sd import numpy as np import keyboard import openai import io import os # Set your OpenAI API key openai.api_key = 'your-api-key' # Set the sample rate and duration for the recording sample_rate = 16000 # 16kHz duration = 10 # in seconds # Create a callback function to stop recording when a key is pressed def on_key_press(e): print("Key pressed, stopping recording...") sd.stop() # Start the recording print("Recording started, press any key to stop...") recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, callback=on_key_press) # Wait for a key press keyboard.wait() # Convert the recording to bytes recording_bytes = io.BytesIO() np.save(recording_bytes, recording, allow_pickle=False) recording_bytes = recording_bytes.getvalue() # Transcribe the audio using the Whisper API response = openai.Whisper.asr.create(audio_data=recording_bytes) # Print the transcription print("Transcription: ", response['choices'][0]['text']) >>>>>>> updated ``` Please replace `'your-api-key'` with your actual OpenAI API key. This script records audio for a maximum of 10 seconds, but it stops recording as soon as any key is pressed. The recorded audio is then transcribed using the Whisper API, and the transcription is printed to the console. Please note that you need to install the `sounddevice`, `numpy`, `keyboard`, and `openai` Python libraries to run this script. You can install them using pip: ```bash pip install sounddevice numpy keyboard openai ``` Also, please note that the `keyboard` library requires root privileges to work, so you might need to run your script as a superuser. USER: make a function that records audio until the user presses a key, then submits it to whisper to get the text back ASSISTANT: To create a function that accomplishes this, we need to modify the `aider/voice.py` file. The function should be named `record_and_transcribe` and it should use the `sounddevice` and `keyboard` libraries to record audio and listen for a key press, respectively. It should also use the `openai.Whisper` API to transcribe the audio. Here is the edit block for this change: ```python aider/voice.py <<<<<<< HEAD ======= import sounddevice as sd import numpy as np import keyboard import openai import io def record_and_transcribe(): # Set your OpenAI API key openai.api_key = 'your-api-key' # Set the sample rate and duration for the recording sample_rate = 16000 # 16kHz duration = 10 # in seconds # Create a callback function to stop recording when a key is pressed def on_key_press(e): print("Key pressed, stopping recording...") sd.stop() # Start the recording print("Recording started, press any key to stop...") recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, callback=on_key_press) # Wait for a key press keyboard.wait() # Convert the recording to bytes recording_bytes = io.BytesIO() np.save(recording_bytes, recording, allow_pickle=False) recording_bytes = recording_bytes.getvalue() # Transcribe the audio using the Whisper API response = openai.Whisper.asr.create(audio_data=recording_bytes) # Return the transcription return response['choices'][0]['text'] >>>>>>> updated ``` Please replace `'your-api-key'` with your actual OpenAI API key. This function records audio for a maximum of 10 seconds, but it stops recording as soon as any key is pressed. The recorded audio is then transcribed using the Whisper API, and the transcription is returned by the function. Please note that you need to install the `sounddevice`, `numpy`, `keyboard`, and `openai` Python libraries to use this function. You can install them using pip: ```bash pip install sounddevice numpy keyboard openai ``` Also, please note that the `keyboard` library requires root privileges to work, so you might need to run your script as a superuser.	2023-08-10 15:45:49 -03:00
Paul Gauthier	fdb1cac20b	aider: Added aider/voice.py	2023-08-10 15:41:42 -03:00

44 commits