这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@timothycarambat
Copy link
Member

resolves #329

@review-agent-prime
Copy link

collector/processSingleFile/convert/asAudio.js

It would be beneficial to add more detailed error handling and logging in the asAudio function. This would make it easier to debug issues and understand the flow of the function.
Create Issue
See the diff
Checkout the fix

    try {
      const transcriber = await whisper.client();
      const audioData = await convertToWavAudioData(fullFilePath);
      const { text: content } = await transcriber(audioData, {
        chunk_length_s: 30,
        stride_length_s: 5,
      });
    } catch (error) {
      console.error(`Error while transcribing ${filename}: ${error.message}`);
      trashFile(fullFilePath);
      return { success: false, reason: `Error while transcribing ${filename}: ${error.message}` };
    }
git fetch origin && git checkout -b ReviewBot/Impro-en8wive origin/ReviewBot/Impro-en8wive

Instead of reading the entire file into memory with fs.readFileSync, consider using a stream to read the file. This would reduce the memory footprint of the function and improve performance for large files.
Create Issue
See the diff
Checkout the fix

    const readStream = fs.createReadStream(sourcePath);
    let chunks = [];
    for await (let chunk of readStream) {
      chunks.push(chunk);
    }
    buffer = Buffer.concat(chunks);
git fetch origin && git checkout -b ReviewBot/Impro-vxclfpz origin/ReviewBot/Impro-vxclfpz

collector/utils/WhisperProviders/localWhisper.js

Consider adding comments to the LocalWhisper class to explain what it does and how it works. This would make the code easier to understand for other developers.
Create Issue
See the diff
Checkout the fix

    // The LocalWhisper class is responsible for ...
    class LocalWhisper {
      constructor() {
        // ...
      }

      // The client method is used to ...
      async client() {
        // ...
      }
    }
git fetch origin && git checkout -b ReviewBot/Impro-kok8yta origin/ReviewBot/Impro-kok8yta

Add cleanup of hotdir and tmp on collector boot to prevent hanging files
split loading of model and file conversion into concurrency
@timothycarambat timothycarambat merged commit 61db981 into master Dec 15, 2023
@timothycarambat timothycarambat deleted the 329-embedded-whisper-audio branch December 15, 2023 19:20
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
…tplex-Labs#449)

* feat: Embed on-instance Whisper model for audio/mp4 transcribing
resolves Mintplex-Labs#329

* additional logging

* add placeholder for tmp folder in collector storage
Add cleanup of hotdir and tmp on collector boot to prevent hanging files
split loading of model and file conversion into concurrency

* update README

* update model size

* update supported filetypes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for audio file upload

2 participants