-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
feat: Embed on-instance Whisper model for audio/mp4 transcribing #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
collector/processSingleFile/convert/asAudio.jsIt would be beneficial to add more detailed error handling and logging in the asAudio function. This would make it easier to debug issues and understand the flow of the function. try {
const transcriber = await whisper.client();
const audioData = await convertToWavAudioData(fullFilePath);
const { text: content } = await transcriber(audioData, {
chunk_length_s: 30,
stride_length_s: 5,
});
} catch (error) {
console.error(`Error while transcribing ${filename}: ${error.message}`);
trashFile(fullFilePath);
return { success: false, reason: `Error while transcribing ${filename}: ${error.message}` };
}Instead of reading the entire file into memory with fs.readFileSync, consider using a stream to read the file. This would reduce the memory footprint of the function and improve performance for large files. const readStream = fs.createReadStream(sourcePath);
let chunks = [];
for await (let chunk of readStream) {
chunks.push(chunk);
}
buffer = Buffer.concat(chunks);collector/utils/WhisperProviders/localWhisper.jsConsider adding comments to the LocalWhisper class to explain what it does and how it works. This would make the code easier to understand for other developers. // The LocalWhisper class is responsible for ...
class LocalWhisper {
constructor() {
// ...
}
// The client method is used to ...
async client() {
// ...
}
} |
Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency
…tplex-Labs#449) * feat: Embed on-instance Whisper model for audio/mp4 transcribing resolves Mintplex-Labs#329 * additional logging * add placeholder for tmp folder in collector storage Add cleanup of hotdir and tmp on collector boot to prevent hanging files split loading of model and file conversion into concurrency * update README * update model size * update supported filetypes
resolves #329