这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@timothycarambat
Copy link
Member

resolves #525

Pass in an empty array of disallowedSpecials to handle all tokens as text and to be tokenized - which is the intention.
https://github.com/openai/tiktoken/blob/9e79899bc248d5313c7dd73562b5e211d728723d/tiktoken/core.py#L91C20-L91C38

@review-agent-prime
Copy link

server/utils/helpers/tiktoken.js

Instead of encoding the input string twice (once in tokensFromString and once in countFromString), you can call tokensFromString inside countFromString to avoid duplicate work. This will improve the performance of your code.
Create Issue
See the diff
Checkout the fix

    countFromString(input = "") {
      const tokens = this.tokensFromString(input);
      return tokens.length;
    }
git fetch origin && git checkout -b ReviewBot/Impro-8nr69hp origin/ReviewBot/Impro-8nr69hp

It would be helpful to add a comment explaining why the second and third parameters of the encode method are undefined and an empty array respectively. This will improve the readability of your code and make it easier for others to understand.
Create Issue
See the diff
Checkout the fix

    // The second parameter is set to undefined and the third parameter is set to an empty array for [reason]
    const tokens = this.encoder.encode(input, undefined, []);
git fetch origin && git checkout -b ReviewBot/Impro-3owmk7b origin/ReviewBot/Impro-3owmk7b

add clarification comment on implementation
@timothycarambat timothycarambat merged commit 92da23e into master Jan 4, 2024
@timothycarambat timothycarambat deleted the 525-tiktoken-sanitized-output branch January 4, 2024 23:47
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
* Handle special token in TikToken
resolves Mintplex-Labs#525

* remove duplicate method
add clarification comment on implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Handle special LLM tokens from unsanitized model responses

2 participants