Handle special token in TikToken #528

timothycarambat · 2024-01-04T23:27:59Z

resolves #525

Pass in an empty array of disallowedSpecials to handle all tokens as text and to be tokenized - which is the intention.
https://github.com/openai/tiktoken/blob/9e79899bc248d5313c7dd73562b5e211d728723d/tiktoken/core.py#L91C20-L91C38

resolves #525

review-agent-prime · 2024-01-04T23:28:46Z

server/utils/helpers/tiktoken.js

Instead of encoding the input string twice (once in tokensFromString and once in countFromString), you can call tokensFromString inside countFromString to avoid duplicate work. This will improve the performance of your code.
Create Issue
See the diff
Checkout the fix

    countFromString(input = "") {
      const tokens = this.tokensFromString(input);
      return tokens.length;
    }

git fetch origin && git checkout -b ReviewBot/Impro-8nr69hp origin/ReviewBot/Impro-8nr69hp

It would be helpful to add a comment explaining why the second and third parameters of the encode method are undefined and an empty array respectively. This will improve the readability of your code and make it easier for others to understand.
Create Issue
See the diff
Checkout the fix

    // The second parameter is set to undefined and the third parameter is set to an empty array for [reason]
    const tokens = this.encoder.encode(input, undefined, []);

git fetch origin && git checkout -b ReviewBot/Impro-3owmk7b origin/ReviewBot/Impro-3owmk7b

server/utils/helpers/tiktoken.js

add clarification comment on implementation

* Handle special token in TikToken resolves Mintplex-Labs#525 * remove duplicate method add clarification comment on implementation

Handle special token in TikToken

ac29747

resolves #525

review-agent-prime bot reviewed Jan 4, 2024

View reviewed changes

server/utils/helpers/tiktoken.js Show resolved Hide resolved

review-agent-prime bot reviewed Jan 4, 2024

View reviewed changes

server/utils/helpers/tiktoken.js Outdated Show resolved Hide resolved

remove duplicate method

9264d52

add clarification comment on implementation

timothycarambat merged commit 92da23e into master Jan 4, 2024

timothycarambat deleted the 525-tiktoken-sanitized-output branch January 4, 2024 23:47

cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025

Handle special token in TikToken (Mintplex-Labs#528)

b8c35e6

* Handle special token in TikToken resolves Mintplex-Labs#525 * remove duplicate method add clarification comment on implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handle special token in TikToken #528

Handle special token in TikToken #528

Uh oh!

timothycarambat commented Jan 4, 2024

Uh oh!

review-agent-prime bot commented Jan 4, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Handle special token in TikToken #528

Handle special token in TikToken #528

Uh oh!

Conversation

timothycarambat commented Jan 4, 2024

Uh oh!

review-agent-prime bot commented Jan 4, 2024

server/utils/helpers/tiktoken.js

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants