-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Implement support for HuggingFace Inference Endpoints #680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
frontend/src/pages/GeneralSettings/LLMPreference/index.jsxThere is a lot of repetition in the code blocks for each LLM provider. This can be improved by creating a function that generates these blocks based on the provided parameters. function createLLMProvider(name, value, logo, options, description) {
return {
name: name,
value: value,
logo: logo,
options: options,
description: description
};
}
const LLMS = [
createLLMProvider("OpenAI", "openai", OpenAiLogo, <OpenAiOptions settings={settings} />, "The standard option for most non-commercial use."),
createLLMProvider("Azure OpenAI", "azure", AzureOpenAiLogo, <AzureAiOptions settings={settings} />, "The enterprise option of OpenAI hosted on Azure services."),
// ... other providers
];server/utils/chats/stream.jsConsider refactoring the function handleStreamResponses to reduce code duplication. The code blocks for handling different stream types (geminiStream, azureStream, togetherAiStream, huggingFaceStream) have a lot of similarities. You can create a separate function to handle the common tasks and call it in each case. This will make the code more readable and easier to maintain. function processStreamData(stream, response, uuid, sources) {
// common tasks for processing stream data
}
function handleStreamResponses(response, stream, responseProps) {
const { uuid = uuidv4(), sources = [] } = responseProps;
if (stream?.type === "geminiStream") {
processStreamData(stream, response, uuid, sources);
// specific tasks for geminiStream
}
// similar for other stream types
}Consider improving the error handling in the case of "huggingFaceStream". Currently, if there is an error in the JSON message, the function simply logs the error and continues. It would be better to also write the error to the response chunk, so that the client can handle it appropriately. if (stream.type === "huggingFaceStream") {
// ...
try {
const json = JSON.parse(message);
error = json?.error || null;
// ...
} catch (e) {
console.error(`Failed to parse message`, e);
writeResponseChunk(response, {
uuid,
sources: [],
type: "textResponseChunk",
textResponse: null,
close: true,
error: `Failed to parse message: ${e.message}`,
});
resolve("");
return;
}
// ...
} |
|
hi guys, first I would like to thank you for all your efforts 🏆 And regarding this PR, for HF support, I always have 422 err when I try to send req from anything-llm to HF TGI Any hint? |
|
What model are you using? Not all of them can support chat and may be missing a template to support chat via the HF TGI endpoint |
|
Hi there, tnx for re sorry I wasn't precise, I'm trying with this one: But you made a good point, maybe I can try with another one I will update you PS update |
|
Update: From cURL all good, python client all good I'm here if I can help you, share more info etc |
|
Can you paste in the cURL request you are using? Please remove the key! Testing an endpoint right now with OpenHermes on an A10G Getting 422 as well - going to open issue as this appears to be new |
|
here you go
ps |
|
I think this has to do with the model missing a chat_template file. It seems that for inferencing using TGI this must exist for the model. Will look into this since we cannot just run arbitrary chat completion if the inference endpoint does not automatically templatize the input using the inference API |
|
Hm, interesting. This model works out of the box: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B So this must be some kind of compatibility thing for models to support chat/message formatting? |
|
hmm interesting, I’m gonna try it out with that model, keep you updated…. btw maybe worthy to look on this and trying to implement for HF part of the app: |
|
about this point: I've noticed for example for Mistral 7b model, this is part of body req: And for Op
Cheers |
Pull Request Type
Relevant Issues
resolves #679
What is in this change?
Adds support for Hugging Faces dedicated inference endpoints for LLM Selection
Additional Information
We may need to build some documentation on how to set up a dedicate endpoint so that chats can be sent as it may not be obvious to a layperson. Anyone using HuggingFace should understand the requirements though as the defaults for a new endpoint do work.
Developer Validations
yarn lintfrom the root of the repo & committed changes