Implement support for HuggingFace Inference Endpoints #680

timothycarambat · 2024-02-06T17:15:34Z

Pull Request Type

Relevant Issues

resolves #679

What is in this change?

Adds support for Hugging Faces dedicated inference endpoints for LLM Selection

Additional Information

We may need to build some documentation on how to set up a dedicate endpoint so that chats can be sent as it may not be obvious to a layperson. Anyone using HuggingFace should understand the requirements though as the defaults for a new endpoint do work.

Developer Validations

I ran yarn lint from the root of the repo & committed changes
Relevant documentation has been updated
I have tested my code functionality
Docker build succeeds locally

review-agent-prime · 2024-02-06T17:18:29Z

frontend/src/pages/GeneralSettings/LLMPreference/index.jsx

There is a lot of repetition in the code blocks for each LLM provider. This can be improved by creating a function that generates these blocks based on the provided parameters.
Create Issue
See the diff
Checkout the fix

    function createLLMProvider(name, value, logo, options, description) {
      return {
        name: name,
        value: value,
        logo: logo,
        options: options,
        description: description
      };
    }

    const LLMS = [
      createLLMProvider("OpenAI", "openai", OpenAiLogo, <OpenAiOptions settings={settings} />, "The standard option for most non-commercial use."),
      createLLMProvider("Azure OpenAI", "azure", AzureOpenAiLogo, <AzureAiOptions settings={settings} />, "The enterprise option of OpenAI hosted on Azure services."),
      // ... other providers
    ];

git fetch origin && git checkout -b ReviewBot/Impro-4caz8o7 origin/ReviewBot/Impro-4caz8o7

server/utils/chats/stream.js

Consider refactoring the function handleStreamResponses to reduce code duplication. The code blocks for handling different stream types (geminiStream, azureStream, togetherAiStream, huggingFaceStream) have a lot of similarities. You can create a separate function to handle the common tasks and call it in each case. This will make the code more readable and easier to maintain.
Create Issue
See the diff
Checkout the fix

    function processStreamData(stream, response, uuid, sources) {
      // common tasks for processing stream data
    }

    function handleStreamResponses(response, stream, responseProps) {
      const { uuid = uuidv4(), sources = [] } = responseProps;

      if (stream?.type === "geminiStream") {
        processStreamData(stream, response, uuid, sources);
        // specific tasks for geminiStream
      }

      // similar for other stream types
    }

git fetch origin && git checkout -b ReviewBot/The-n-dw2c4ig origin/ReviewBot/The-n-dw2c4ig

Consider improving the error handling in the case of "huggingFaceStream". Currently, if there is an error in the JSON message, the function simply logs the error and continues. It would be better to also write the error to the response chunk, so that the client can handle it appropriately.
Create Issue
See the diff
Checkout the fix

    if (stream.type === "huggingFaceStream") {
      // ...
      try {
        const json = JSON.parse(message);
        error = json?.error || null;
        // ...
      } catch (e) {
        console.error(`Failed to parse message`, e);
        writeResponseChunk(response, {
          uuid,
          sources: [],
          type: "textResponseChunk",
          textResponse: null,
          close: true,
          error: `Failed to parse message: ${e.message}`,
        });
        resolve("");
        return;
      }
      // ...
    }

git fetch origin && git checkout -b ReviewBot/The-n-2opc8pa origin/ReviewBot/The-n-2opc8pa

server/utils/chats/stream.js

ognjenAct · 2024-02-20T15:45:50Z

hi guys, first I would like to thank you for all your efforts 🏆

And regarding this PR, for HF support, I always have 422 err when I try to send req from anything-llm to HF TGI

Any hint?

timothycarambat · 2024-02-20T18:43:16Z

What model are you using? Not all of them can support chat and may be missing a template to support chat via the HF TGI endpoint

ognjenAct · 2024-02-20T18:50:32Z

Hi there, tnx for re

sorry I wasn't precise, I'm trying with this one:
https://huggingface.co/mistralai/Mistral-7B-v0.1

But you made a good point, maybe I can try with another one

I will update you

PS update
which model you recommend to use from HF ?

ognjenAct · 2024-02-20T20:51:43Z

Update:
I've tried with llama2 7b Model, deployed on HF's TGI same 422 err : /

From cURL all good, python client all good

I'm here if I can help you, share more info etc

timothycarambat · 2024-02-20T21:38:03Z

Can you paste in the cURL request you are using? Please remove the key!
We are using the openai library and message format and passing tgi as the model to the endpoint- which is what their docs outline?

Testing an endpoint right now with OpenHermes on an A10G

Getting 422 as well - going to open issue as this appears to be new

ognjenAct · 2024-02-20T21:45:54Z

here you go

curl "https://jbtstsr7thraha6.eu-west-1.aws.endpoints.huggingface.cloud" \ -X POST \ -H "Accept: application/json" \ -H "Authorization: hf_***" \ -H "Content-Type: application/json" \ -d '{ "inputs": "Can you please let us know more details about your ", "parameters": {} }'

ps
I think openai lib is messing with format of req (body of req to be accurate) but I'm not sure ...

timothycarambat · 2024-02-20T22:21:54Z

I think this has to do with the model missing a chat_template file. It seems that for inferencing using TGI this must exist for the model. Will look into this since we cannot just run arbitrary chat completion if the inference endpoint does not automatically templatize the input using the inference API

timothycarambat · 2024-02-20T22:32:10Z

Hm, interesting. This model works out of the box: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

So this must be some kind of compatibility thing for models to support chat/message formatting?

ognjenAct · 2024-02-21T07:35:27Z

hmm interesting, I’m gonna try it out with that model, keep you updated…. btw maybe worthy to look on this and trying to implement for HF part of the app:
https://huggingface.co/docs/huggingface.js/en/inference/README

ognjenAct · 2024-02-21T15:28:05Z

about this point:
"So this must be some kind of compatibility thing for models to support chat/message formatting?"

I've noticed for example for Mistral 7b model, this is part of body req:
query({ "inputs": "Can you please let us know more details about your ", "parameters": {} }).then((response) => { console.log(JSON.stringify(response)); });
notice that is using input as param

And for Op
it is different, it is content as param that is part of body req:

messages = [ {"role": "system", "content": "You are Hermes 2."}, {"role": "user", "content": "Hello, who are you?"} ]
So maybe this can help you ...

Cheers

)

Implement support for HuggingFace Inference Endpoints

bc815e6

timothycarambat merged commit 2bc11d3 into master Feb 6, 2024

timothycarambat deleted the hugging-face-llm-support branch February 6, 2024 17:17

review-agent-prime bot reviewed Feb 6, 2024

View reviewed changes

server/utils/chats/stream.js Show resolved Hide resolved

cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025

Implement support for HuggingFace Inference Endpoints (Mintplex-Labs#680

05fe2f1

)

Uh oh!

Implement support for HuggingFace Inference Endpoints #680

Implement support for HuggingFace Inference Endpoints #680

Uh oh!

Conversation

timothycarambat commented Feb 6, 2024

Pull Request Type

Relevant Issues

What is in this change?

Additional Information

Developer Validations

Uh oh!

review-agent-prime bot commented Feb 6, 2024

frontend/src/pages/GeneralSettings/LLMPreference/index.jsx

server/utils/chats/stream.js

Uh oh!

Uh oh!

ognjenAct commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timothycarambat commented Feb 20, 2024

Uh oh!

ognjenAct commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ognjenAct commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timothycarambat commented Feb 20, 2024

Uh oh!

ognjenAct commented Feb 20, 2024

Uh oh!

timothycarambat commented Feb 20, 2024

Uh oh!

timothycarambat commented Feb 20, 2024

Uh oh!

ognjenAct commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ognjenAct commented Feb 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ognjenAct commented Feb 20, 2024 •

edited

Loading

ognjenAct commented Feb 20, 2024 •

edited

Loading

ognjenAct commented Feb 20, 2024 •

edited

Loading

ognjenAct commented Feb 21, 2024 •

edited

Loading