这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@timothycarambat
Copy link
Member

Pull Request Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 🔨 chore
  • 📝 docs

Relevant Issues

resolves #679

What is in this change?

Adds support for Hugging Faces dedicated inference endpoints for LLM Selection

Additional Information

We may need to build some documentation on how to set up a dedicate endpoint so that chats can be sent as it may not be obvious to a layperson. Anyone using HuggingFace should understand the requirements though as the defaults for a new endpoint do work.

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@timothycarambat timothycarambat merged commit 2bc11d3 into master Feb 6, 2024
@timothycarambat timothycarambat deleted the hugging-face-llm-support branch February 6, 2024 17:17
@review-agent-prime
Copy link

frontend/src/pages/GeneralSettings/LLMPreference/index.jsx

There is a lot of repetition in the code blocks for each LLM provider. This can be improved by creating a function that generates these blocks based on the provided parameters.
Create Issue
See the diff
Checkout the fix

    function createLLMProvider(name, value, logo, options, description) {
      return {
        name: name,
        value: value,
        logo: logo,
        options: options,
        description: description
      };
    }

    const LLMS = [
      createLLMProvider("OpenAI", "openai", OpenAiLogo, <OpenAiOptions settings={settings} />, "The standard option for most non-commercial use."),
      createLLMProvider("Azure OpenAI", "azure", AzureOpenAiLogo, <AzureAiOptions settings={settings} />, "The enterprise option of OpenAI hosted on Azure services."),
      // ... other providers
    ];
git fetch origin && git checkout -b ReviewBot/Impro-4caz8o7 origin/ReviewBot/Impro-4caz8o7

server/utils/chats/stream.js

Consider refactoring the function handleStreamResponses to reduce code duplication. The code blocks for handling different stream types (geminiStream, azureStream, togetherAiStream, huggingFaceStream) have a lot of similarities. You can create a separate function to handle the common tasks and call it in each case. This will make the code more readable and easier to maintain.
Create Issue
See the diff
Checkout the fix

    function processStreamData(stream, response, uuid, sources) {
      // common tasks for processing stream data
    }

    function handleStreamResponses(response, stream, responseProps) {
      const { uuid = uuidv4(), sources = [] } = responseProps;

      if (stream?.type === "geminiStream") {
        processStreamData(stream, response, uuid, sources);
        // specific tasks for geminiStream
      }

      // similar for other stream types
    }
git fetch origin && git checkout -b ReviewBot/The-n-dw2c4ig origin/ReviewBot/The-n-dw2c4ig

Consider improving the error handling in the case of "huggingFaceStream". Currently, if there is an error in the JSON message, the function simply logs the error and continues. It would be better to also write the error to the response chunk, so that the client can handle it appropriately.
Create Issue
See the diff
Checkout the fix

    if (stream.type === "huggingFaceStream") {
      // ...
      try {
        const json = JSON.parse(message);
        error = json?.error || null;
        // ...
      } catch (e) {
        console.error(`Failed to parse message`, e);
        writeResponseChunk(response, {
          uuid,
          sources: [],
          type: "textResponseChunk",
          textResponse: null,
          close: true,
          error: `Failed to parse message: ${e.message}`,
        });
        resolve("");
        return;
      }
      // ...
    }
git fetch origin && git checkout -b ReviewBot/The-n-2opc8pa origin/ReviewBot/The-n-2opc8pa

@ognjenAct
Copy link

ognjenAct commented Feb 20, 2024

hi guys, first I would like to thank you for all your efforts 🏆

And regarding this PR, for HF support, I always have 422 err when I try to send req from anything-llm to HF TGI

Any hint?

@timothycarambat
Copy link
Member Author

What model are you using? Not all of them can support chat and may be missing a template to support chat via the HF TGI endpoint

@ognjenAct
Copy link

ognjenAct commented Feb 20, 2024

Hi there, tnx for re

sorry I wasn't precise, I'm trying with this one:
https://huggingface.co/mistralai/Mistral-7B-v0.1

But you made a good point, maybe I can try with another one

I will update you

PS update
which model you recommend to use from HF ?

@ognjenAct
Copy link

ognjenAct commented Feb 20, 2024

Update:
I've tried with llama2 7b Model, deployed on HF's TGI same 422 err : /

From cURL all good, python client all good

I'm here if I can help you, share more info etc

@timothycarambat
Copy link
Member Author

Can you paste in the cURL request you are using? Please remove the key!
We are using the openai library and message format and passing tgi as the model to the endpoint- which is what their docs outline?

Testing an endpoint right now with OpenHermes on an A10G
Screenshot 2024-02-20 at 1 35 13 PM
Screenshot 2024-02-20 at 1 35 17 PM

Screenshot 2024-02-20 at 1 36 15 PM

Getting 422 as well - going to open issue as this appears to be new

@ognjenAct
Copy link

here you go

curl "https://jbtstsr7thraha6.eu-west-1.aws.endpoints.huggingface.cloud" \ -X POST \ -H "Accept: application/json" \ -H "Authorization: hf_***" \ -H "Content-Type: application/json" \ -d '{ "inputs": "Can you please let us know more details about your ", "parameters": {} }'

ps
I think openai lib is messing with format of req (body of req to be accurate) but I'm not sure ...

@timothycarambat
Copy link
Member Author

I think this has to do with the model missing a chat_template file. It seems that for inferencing using TGI this must exist for the model. Will look into this since we cannot just run arbitrary chat completion if the inference endpoint does not automatically templatize the input using the inference API

@timothycarambat
Copy link
Member Author

Hm, interesting. This model works out of the box: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

So this must be some kind of compatibility thing for models to support chat/message formatting?

@ognjenAct
Copy link

ognjenAct commented Feb 21, 2024

hmm interesting, I’m gonna try it out with that model, keep you updated…. btw maybe worthy to look on this and trying to implement for HF part of the app:
https://huggingface.co/docs/huggingface.js/en/inference/README

@ognjenAct
Copy link

about this point:
"So this must be some kind of compatibility thing for models to support chat/message formatting?"

I've noticed for example for Mistral 7b model, this is part of body req:
query({ "inputs": "Can you please let us know more details about your ", "parameters": {} }).then((response) => { console.log(JSON.stringify(response)); });
notice that is using input as param

And for Op
it is different, it is content as param that is part of body req:

messages = [ {"role": "system", "content": "You are Hermes 2."}, {"role": "user", "content": "Hello, who are you?"} ]
So maybe this can help you ...

Cheers

cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: HuggingFace LLM Inference Endpoint support

3 participants