这是indexloc提供的服务,不要输入任何密码
Skip to content

Tags: masums/llama.cpp

Tags

b2636

Toggle b2636's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : add Command R Plus support (ggml-org#6491)

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <seast@Ss-Mac-Studio.local>
Co-authored-by: S <s@example.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b2632

Toggle b2632's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
quantize : fix precedence of cli args (ggml-org#6541)

b2630

Toggle b2630's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : save and restore kv cache for single seq id (ggml-org#6341)

* llama : save and restore kv cache for single seq id

* remove trailing whitespace

* respond error in case there's no space in the kv cache

* add kv seq save restore to test case

* add --slot-save-path arg to enable save restore and restrict save location

* Returning 0 for some cases, instead of asserting.

* cleanup error cases

* rename sequence state functions

* rename state get set functions

* add previous function names back in with DEPRECATED notice

* update doc

* adjust endpoints to preferred style

* fix restoring zero cell count

* handle seq rm return value

* unused param

* keep in the size check

* fix return types

* add server test case for slot save restore

* cleanup

* add cake

* cleanup style

* add special

* removing a whole sequence never fails

* move sequence state file functionality from server to llama to match session api and add version tags

* catch exceptions on save as well

* error log messages

* check types for stricter restore

* update server doc

* readme : update API changes date

* strict filename validation

* move include, reject bom as well

* also reject empty filename

* reject whitespace and trailing dot

---------

Co-authored-by: Martin Evans <martindevans@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b2629

Toggle b2629's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
remove row=1 cond (ggml-org#6532)

b2619

Toggle b2619's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
sync : ggml

b2615

Toggle b2615's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
gguf.py : add licence and version to gguf writer (ggml-org#6504)

b2613

Toggle b2613's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bench : make n_batch and n_ubatch configurable in Batched bench (ggml…

…-org#6500)

* bench: make n_batch and n_ubatch configurable

* bench: update doc for batched bench

b2612

Toggle b2612's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Fixed minor bug when enabling FP16 for non intel targets (ggml…

…-org#6464)

* moved INTEL_MKL guard from gemm_impl to gemm (wrapper)

* Update ggml-sycl.cpp

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

---------

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

b2608

Toggle b2608's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: exempt master branch workflows from getting cancelled (ggml-org#6486

)

* ci: exempt master branch workflows from getting cancelled

* apply to bench.yml

b2589

Toggle b2589's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add OpenChat, Alpaca, Vicuna chat templates (ggml-org#6397)

* Add openchat chat template

* Add chat template test for openchat

* Add chat template for vicuna

* Add chat template for orca-vicuna

* Add EOS for vicuna templates

* Combine vicuna chat templates

* Add tests for openchat and vicuna chat templates

* Add chat template for alpaca

* Add separate template name for vicuna-orca

* Remove alpaca, match deepseek with jinja output

* Regenerate chat template test with add_generation_prompt

* Separate deepseek bos from system message

* Match openchat template with jinja output

* Remove BOS token from templates, unprefix openchat