Commit graph

54 commits

Author SHA1 Message Date
Paul Gauthier (aider)
1d672616be fix: correct file export path in voice format conversion 2025-01-07 09:43:49 -08:00
Paul Gauthier
01ef2351b3 refactor: Use temporary variable for audio format to avoid state mutation 2025-01-04 12:08:05 -08:00
Paul Gauthier
de1d566e9e refactor: Simplify audio file handling and conversion logic 2025-01-04 12:06:07 -08:00
Paul Gauthier (aider)
19114a61ae style: Format print statement in voice.py for better readability 2025-01-04 12:04:13 -08:00
Paul Gauthier (aider)
36e5599ead feat: Add file size logging after audio conversion 2025-01-04 12:04:09 -08:00
Paul Gauthier (aider)
f5a82e575c style: Format print statement in voice.py for better readability 2025-01-04 12:03:32 -08:00
Paul Gauthier (aider)
1851de323d feat: add error handling and feedback for audio conversion 2025-01-04 12:03:25 -08:00
Paul Gauthier
f9408640a3 refactor: Simplify audio file size check and format conversion logic 2025-01-04 12:03:24 -08:00
Paul Gauthier (aider)
73837730fa feat: Add file size check and mp3 conversion for large audio files 2025-01-04 11:59:42 -08:00
Paul Gauthier
9b46991721 refactor: Add comment for handling large file uploads in voice.py 2025-01-04 11:59:40 -08:00
Paul Gauthier
cbcbff341b style: format voice.py and suppress syntax warnings 2024-12-08 18:14:30 -08:00
Philippe de Reynal
e11faadf39 feat: ability to select audio input device 2024-11-30 11:24:34 +01:00
Paul Gauthier
3edcd71a44 fix: move pydub import to avoid warnings 2024-09-24 09:29:00 -07:00
Paul Gauthier (aider)
ab786279e6 style: Fix linter warnings 2024-09-24 09:28:31 -07:00
Paul Gauthier (aider)
e267dc13af fix: Move imports to top of file 2024-09-24 09:28:28 -07:00
Paul Gauthier (aider)
d0e6dc2c1e style: Wrap long lines in voice.py 2024-09-24 09:28:16 -07:00
Paul Gauthier (aider)
da752bb00c fix: Suppress warning about missing ffmpeg or avconv 2024-09-24 09:28:13 -07:00
Mike Bailey
1cc30a22f9 feat: Option to compress audio files by ~90%
Add option to reduce bandwidth (and potentially latency) by converting
voice recordings (wav) into a compressed audio format (webm or mp3).

Default behaviour is unchanged.

> File uploads are currently limited to 25 MB and the following input file
> types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.
>
> - https://platform.openai.com/docs/guides/speech-to-text
2024-09-22 22:20:07 +10:00
Paul Gauthier
aedba59c6e feat: add error handling for transcription in Voice class 2024-09-03 08:59:58 -07:00
Paul Gauthier (aider)
130579ea3c style: Format error message in Voice class for better readability 2024-08-30 07:04:13 -07:00
Paul Gauthier (aider)
e88ec36a81 feat: improve error handling for audio device issues 2024-08-30 07:04:09 -07:00
Paul Gauthier
c598cdd576 Catch self.sd.PortAudioError gracefully #849 2024-07-13 07:57:12 +01:00
Paul Gauthier
9d02628cf8 streamlit borks sys.path, causes import("litellm") to load our litellm.py; fix 2024-07-03 21:32:50 -03:00
Paul Gauthier
ed35af44b3 defer numpy, bs4 and jsonschema 2024-07-03 13:35:33 -03:00
Paul Gauthier
73c6af8851 aider: Implemented a more flexible approach to handle different microphone sample rates in aider/voice.py.
# Aider chat conversation:

USER: https://github.com/paul-gauthier/aider/issues/612

https://github.com/paul-gauthier/aider/issues/612:

[Skip to content](#start-of-content)

## Navigation Menu

Toggle navigation

[](https://github.com/)

[Sign
in](/login?return_to=https%3A%2F%2Fgithub.com%2Fpaul-gauthier%2Faider%2Fissues%2F612)

-   Product

    -   [](https://github.com/features/actions)

        Actions

        Automate any workflow

    -   [](https://github.com/features/packages)

        Packages

        Host and manage packages

    -   [](https://github.com/features/security)

        Security

        Find and fix vulnerabilities

    -   [](https://github.com/features/codespaces)

        Codespaces

        Instant dev environments

    -   [](https://github.com/features/copilot)

        GitHub Copilot

        Write better code with AI

    -   [](https://github.com/features/code-review)

        Code review

        Manage code changes

    -   [](https://github.com/features/issues)

        Issues

        Plan and track work

    -   [](https://github.com/features/discussions)

        Discussions

        Collaborate outside of code

    Explore
    -   [All features](https://github.com/features)
    -   [Documentation](https://docs.github.com)
    -   [GitHub Skills](https://skills.github.com)
    -   [Blog](https://github.blog)

-   Solutions

    For
    -   [Enterprise](https://github.com/enterprise)
    -   [Teams](https://github.com/team)
    -   [Startups](https://github.com/enterprise/startups)
    -   [Education](https://education.github.com)

    By Solution
    -   [CI/CD & Automation](https://github.com/solutions/ci-cd)
    -   [DevOps](https://github.com/solutions/devops)
    -   [DevSecOps](https://resources.github.com/devops/fundamentals/devsecops)

    Resources
    -   [Learning Pathways](https://resources.github.com/learn/pathways)
    -   [White papers, Ebooks, Webinars](https://resources.github.com)
    -   [Customer Stories](https://github.com/customer-stories)
    -   [Partners](https://partner.github.com)

-   Open Source

    -   [](/sponsors)

        GitHub Sponsors

        Fund open source developers

    -   [](https://github.com/readme)

        The ReadME Project

        GitHub community articles

    Repositories
    -   [Topics](https://github.com/topics)
    -   [Trending](https://github.com/trending)
    -   [Collections](https://github.com/collections)

-   Enterprise

    -   [](/enterprise)

        Enterprise platform

        AI-powered developer platform

    Available add-ons
    -   [](https://github.com/enterprise/advanced-security)

        Advanced Security

        Enterprise-grade security features

    -   [](/features/copilot#enterprise)

        GitHub Copilot

        Enterprise-grade AI features

    -   [](/premium-support)

        Premium Support

        Enterprise-grade 24/7 support

-   [Pricing](https://github.com/pricing)

Search or jump to\...

# Search code, repositories, users, issues, pull requests\...

Search

Clear

[Search syntax
tips](https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax)

# Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

# Saved searches

## Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our
[documentation](https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax).

Cancel

Create saved search

[Sign
in](/login?return_to=https%3A%2F%2Fgithub.com%2Fpaul-gauthier%2Faider%2Fissues%2F612)

[Sign
up](/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=paul-gauthier%2Faider)

You signed in with another tab or window. [Reload]() to refresh your
session. You signed out in another tab or window. [Reload]() to refresh
your session. You switched accounts on another tab or window. [Reload]()
to refresh your session.

Dismiss alert

{{ message }}

::: {role="main"}

[paul-gauthier](/paul-gauthier) / **[aider](/paul-gauthier/aider)**
Public

-   [Notifications](/login?return_to=%2Fpaul-gauthier%2Faider) You must
    be signed in to change notification settings

-   [Fork 1k](/login?return_to=%2Fpaul-gauthier%2Faider)

-

    [ Star 10.5k](/login?return_to=%2Fpaul-gauthier%2Faider)

-   [Code ](/paul-gauthier/aider)
-   [Issues 57](/paul-gauthier/aider/issues)
-   [Pull requests 14](/paul-gauthier/aider/pulls)
-   [Actions ](/paul-gauthier/aider/actions)
-   [Projects 0](/paul-gauthier/aider/projects)
-   [Security](/paul-gauthier/aider/security)
-   [Insights ](/paul-gauthier/aider/pulse)

Additional navigation options

-   [ Code ](/paul-gauthier/aider)
-   [ Issues ](/paul-gauthier/aider/issues)
-   [ Pull requests ](/paul-gauthier/aider/pulls)
-   [ Actions ](/paul-gauthier/aider/actions)
-   [ Projects ](/paul-gauthier/aider/projects)
-   [ Security ](/paul-gauthier/aider/security)
-   [ Insights ](/paul-gauthier/aider/pulse)

New issue

**Have a question about this project?** Sign up for a free GitHub
account to open an issue and contact its maintainers and the community.

[Sign up for
GitHub](/signup?return_to=%2Fpaul-gauthier%2Faider%2Fissues%2Fnew%2Fchoose)

By clicking "Sign up for GitHub", you agree to our [terms of
service](https://docs.github.com/terms) and [privacy
statement](https://docs.github.com/privacy). We'll occasionally send you
account related emails.

Already on GitHub? [Sign
in](/login?return_to=%2Fpaul-gauthier%2Faider%2Fissues%2Fnew%2Fchoose)
to your account

[Jump to bottom](#issue-comment-box)

# Unsupported Audio Sample Rate #612

Open

[aaronson2012](/aaronson2012) opened this issue May 24, 2024 · 0
comments

Open

# [Unsupported Audio Sample Rate](#top) #612

[aaronson2012](/aaronson2012) opened this issue May 24, 2024 · 0
comments

Labels

[enhancement](/paul-gauthier/aider/labels/enhancement) New feature or
request

## Comments

[](/aaronson2012)

Copy link

###

**[aaronson2012](/aaronson2012)** commented [May 24,
2024](#issue-2314049815)

+-----------------------------------------------------------------------+
| ### Issue                                                             |
|                                                                       |
| aider was crashing when i ran the /voice command and I realized that  |
| my computers microphone did not support a sample rate of 16kHz. It    |
| was an easy fix for me i just changed the sample_rate variable to one |
| my mic supported but this isn\'t a very elegant solution. Could there |
| be a better way so all mics are supported?                            |
|                                                                       |
| ### Version and model info                                            |
|                                                                       |
| Latest verision                                                       |
+-----------------------------------------------------------------------+
|                                                                       |
|                                                                       |
| The text was updated successfully, but these errors were encountered: |
|                                                                       |
|                                                                       |
+-----------------------------------------------------------------------+

All reactions

[paul-gauthier](/paul-gauthier) added the
[enhancement](/paul-gauthier/aider/labels/enhancement) New feature or
request label [Jun 4, 2024](#event-13039727805)

[Sign up for free](/join?source=comment-repo) **to join this
conversation on GitHub**. Already have an account? [Sign in to
comment](/login?return_to=https%3A%2F%2Fgithub.com%2Fpaul-gauthier%2Faider%2Fissues%2F612)

Assignees

No one assigned

Labels

[enhancement](/paul-gauthier/aider/labels/enhancement) New feature or
request

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

2 participants

:::

## Footer

[](https://github.com) © 2024 GitHub, Inc.

### Footer navigation

-   [Terms](https://docs.github.com/site-policy/github-terms/github-terms-of-service)
-   [Privacy](https://docs.github.com/site-policy/privacy-policies/github-privacy-statement)
-   [Security](https://github.com/security)
-   [Status](https://www.githubstatus.com/)
-   [Docs](https://docs.github.com/)
-   [Contact](https://support.github.com?tags=dotcom-footer)
-   Manage cookies
-   Do not share my personal information

You can't perform that action at this time.

ASSISTANT: Based on the issue description, I believe the following file is most likely to need changes to support different microphone sample rates:

aider/voice.py

This file contains the Voice class which handles recording and transcribing audio. The sample_rate variable is hardcoded in this file.

To make the suggested change, we would need to modify how the sample rate is determined in aider/voice.py.

Please add aider/voice.py to the chat so I can propose the specific code changes needed. Let me know if you would like me to explain my reasoning further or if you have any other questions!

USER: I added these files to the chat: aider/voice.py.

If you need to propose edits to other existing files not already added to the chat, you *MUST* tell the me their full path names and ask me to *add the files to the chat*. End your reply and wait for my approval. You can keep asking if you then decide you need to edit more files.

ASSISTANT: Here are the proposed changes to aider/voice.py to support different microphone sample rates:

aider/voice.py
```python
<<<<<<< SEARCH
        sample_rate = 16000  # 16kHz
=======
        try:
            sample_rate = int(self.sd.query_devices(None, 'input')['default_samplerate'])
        except (TypeError, ValueError):
            sample_rate = 16000  # fallback to 16kHz if unable to query device
>>>>>>> REPLACE
```

This change:

1. Attempts to query the default sample rate of the input device using sounddevice's `query_devices()` function.
2. If successful, it uses that sample rate.
3. If it fails to get a valid sample rate, it falls back to the original 16kHz.

This should allow aider to automatically adapt to the microphone's supported sample rate in most cases, while still having a reasonable fallback.

Let me know if you would like me to explain or modify the changes further.
2024-06-04 10:27:22 -07:00
Paul Gauthier
9ff6770a04 refactored litellm to avoid duplicating workarounds 2024-05-08 08:05:15 -07:00
Paul Gauthier
01cf038bb5 Quiet litellm 2024-04-23 10:37:43 -07:00
Paul Gauthier
f1c09ececf Switched voice to litellm 2024-04-18 14:13:26 -07:00
Paul Gauthier
c770fc4380 cleaned up client refs 2024-04-17 15:47:07 -07:00
Paul Gauthier
6ebc142377 roughed in openai 1.x 2023-12-05 07:37:05 -08:00
Paul Gauthier
22eb344192 disable /voice if import soundfile failes #274 2023-10-12 07:00:51 -07:00
Paul Gauthier
963bc83a9c Late-bind importing sounddevice, as it was slowing down aider startup 2023-08-19 15:18:46 -07:00
Paul Gauthier
c07013abf4 finished roughing in --voice-language 2023-08-14 14:18:37 -03:00
Paul Gauthier
bddcdd39a8 Refactor record_and_transcribe method in voice.py to accept a language parameter. 2023-08-14 14:11:31 -03:00
Paul Gauthier
23579d4741 Refactor voice recording and transcription to allow for specifying the language. 2023-08-14 14:11:03 -03:00
Paul Gauthier
fcb209c5aa refac 2023-08-11 11:16:31 -03:00
Paul Gauthier
b6b271b70c pass the input history as a prompt to whisper 2023-08-11 11:03:37 -03:00
Paul Gauthier
c38a877d81 clip low volumes to 0 2023-08-11 10:33:20 -03:00
Paul Gauthier
b59e5bf298 copy 2023-08-10 22:27:18 -03:00
Paul Gauthier
24f574a7dc better bar 2023-08-10 22:21:39 -03:00
Paul Gauthier
bfa7fedd18 catch ^c 2023-08-10 22:18:41 -03:00
Paul Gauthier
053751cb43 show loudness bar 2023-08-10 22:17:31 -03:00
Paul Gauthier
4dc704272e aider: Updated callback function to calculate the root mean square (RMS) of the input audio data.
# Aider chat conversation:

USER: what is `indata` ?

ASSISTANT: `indata` is a parameter in the callback function defined within the `record_and_transcribe` function in the `aider/voice.py` file. This parameter represents the input audio data that is being recorded.

In the context of the `sounddevice` library, which is used for audio recording in this script, `indata` is a 2D numpy array containing the recorded audio data. The rows in the array represent individual frames, while the columns represent channels.

In this specific script, the audio is being recorded in mono (single channel), so there will be only one column. The data in `indata` is being put into a queue and also being dumped for debugging purposes.

USER: how can i estimate the avg volume based on indata?

ASSISTANT: To estimate the average volume of the audio data, you can calculate the root mean square (RMS) of the `indata` array. The RMS represents the square root of the average of the squares of the values, and it's a common way to measure the magnitude of a varying quantity, such as audio signals.

Here's how you can modify the `record_and_transcribe` function to calculate the RMS:

```python
aider/voice.py
<<<<<<< HEAD
    def callback(indata, frames, time, status):
        """This is called (from a separate thread) for each audio block."""
        q.put(indata.copy())
        dump(indata)
        dump(indata.mean())
=======
    import numpy as np

    def callback(indata, frames, time, status):
        """This is called (from a separate thread) for each audio block."""
        q.put(indata.copy())
        rms = np.sqrt(np.mean(indata**2))
        dump(rms)
>>>>>>> updated
```

In this change, we import the numpy library and replace the line `dump(indata.mean())` with `rms = np.sqrt(np.mean(indata**2))` and `dump(rms)`. This calculates the RMS of the `indata` and dumps it for debugging purposes.
2023-08-10 21:57:42 -03:00
Paul Gauthier
aba2bb610d proper temp file 2023-08-10 21:54:04 -03:00
Paul Gauthier
4812d17372 prettier 2023-08-10 17:12:19 -03:00
Paul Gauthier
7b993a9bbb gracefully handle missing portaudio 2023-08-10 16:55:45 -03:00
Paul Gauthier
fed24aa444 lint 2023-08-10 16:53:34 -03:00
Paul Gauthier
d74a3ce123 Refactor record_and_transcribe function to remove unnecessary api_key parameter and import dump module. 2023-08-10 16:13:23 -03:00
Paul Gauthier
9b526d51e4 Refactored voice recording to use a queue and write audio to a file. 2023-08-10 16:10:02 -03:00
Paul Gauthier
1f42b0839f aider: Fixed issue with passing the callback argument to the sd.rec function in record_and_transcribe function.
# Aider chat conversation:

USER: I ran this command:

 python -m aider.voice

And got this output:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/gauthier/Projects/aider/aider/voice.py", line 42, in <module>
    print(record_and_transcribe(api_key))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/aider/voice.py", line 22, in record_and_transcribe
    recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, callback=on_key_press)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/sounddevice.py", line 276, in rec
    ctx.start_stream(InputStream, samplerate, ctx.input_channels,
TypeError: _CallbackContext.start_stream() got multiple values for argument 'callback'
Recording started, press any key to stop...

ASSISTANT: It seems like there's an issue with the way the `callback` argument is being passed to the `sd.rec` function in the `record_and_transcribe` function in the `aider/voice.py` file.

The `sd.rec` function is a convenience function that internally creates an instance of `sd.InputStream` and starts recording. The `callback` argument should be passed to the `sd.InputStream` constructor, not the `sd.rec` function.

Here's how we can fix this:

1. Modify the `aider/voice.py` file.

```python
aider/voice.py
<<<<<<< HEAD
    recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, callback=on_key_press)
=======
    # Create an instance of InputStream with the callback
    stream = sd.InputStream(samplerate=sample_rate, channels=1, callback=on_key_press)
    stream.start()
    recording = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1)
>>>>>>> updated
```

This change creates an instance of `sd.InputStream` with the `on_key_press` callback and starts the stream before calling `sd.rec`. This should ensure that the callback is correctly passed to the `InputStream` and not the `rec` function.
2023-08-10 15:49:26 -03:00