θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

Conversation

@timothycarambat
Copy link
Member

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #850
resolves #849

What is in this change?

Supports OpenAI whisper along-side as an additional configuration option to allow quicker transcription of files (mp3, mp4, etc) - 25MB LIMIT!

The localWhisper model still can run if needed but it often crashes instances when on underpowered devices with restricted RAM or CPU. OpenAI is a quick way to resolve this and satisfy the user need.

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@timothycarambat
Copy link
Member Author

This sets up the ability for us to add speech-to-text chatting with workspace πŸ‘

@timothycarambat timothycarambat merged commit 0ada882 into master Mar 14, 2024
@timothycarambat timothycarambat deleted the 850-external-transcription-providers branch March 14, 2024 22:43
@AIbottesting
Copy link

Anything-LLM is awesome! Thank you and your team for all your hard work and passion. Because I am visually impaired, my only feature request is a text-to-speech button to read AI output. I currently copy and paste every single time into eSpeak which uses Microsoft Windows 10 built-in text-to-speech engine. This is my humble wish. However, if I could dream, being able to press a button for speech-to-text and be able to have a two-way conversation would be out of this world. Furthermore, could you imagine being able to scan a paper receipt and Anything-LLM puts the data into an Excel file for you! Wow

@timothycarambat
Copy link
Member Author

@AIbottesting thank you for raising this. I think we can easily support TTS using the built-in browser TTS. I am sorry we overlooked that kind of accessibility feature. Please let me know if you have any further accessibility issues and we will try to get those handled

@AIbottesting
Copy link

Thank you for being kind and no need to be sorry. On a side note, the government (Department of Rehabilitation) spends money on my behalf for a software called Kurzweil 3000 to read my college texts. I think this software is used by many disabled people at their jobs. I also believe software companies go through some kind of accessibility certification which may help your business side to be able to claim you are compliant. Just a thought. Keep on kicking butt as you do.

cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
* Support External Transcription providers

* patch files

* update docs

* fix return data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Add external transcription providers [BUG]: MP3 & MP4 upload causes Document Processor to become unavailable

3 participants