θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

Add option to control KoboldCPP max response tokens #3746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

shatfield4
Copy link
Collaborator

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #3708

What is in this change?

After some trial and error it was discovered that KoboldCPP uses the max_tokens param to control the size of the maximum amount of tokens in the response (this is not documented anywhere in their documentation)

KoboldCPP defaults to using 512 as the maximum amount of response tokens if not explicitly specified in the API call to their OpenAI compatible server

  • Add option to KoboldCPP options to allow for setting the maximum response tokens
  • Update .env.example to allow for configuring this option via .env

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@timothycarambat timothycarambat merged commit 8912d0f into master May 2, 2025
@timothycarambat timothycarambat deleted the 3708-bug-when-using-koboldcpp-the-max-response-length-is-always-512-regardless-of-context-size branch May 2, 2025 21:12
cabwds pushed a commit to cabwds/anything-llm that referenced this pull request Jul 3, 2025
add option to control koboldcpp max response tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: When using KoboldCPP, the max response length is always 512, regardless of context size.
2 participants