Tags · masums/llama.cpp

b2636

llama : add Command R Plus support (ggml-org#6491)

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <seast@Ss-Mac-Studio.local>
Co-authored-by: S <s@example.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Apr 9, 2024
5dc9dd7
zip
tar.gz

b2632

quantize : fix precedence of cli args (ggml-org#6541)

Apr 8, 2024
b73e564
zip
tar.gz

b2630

llama : save and restore kv cache for single seq id (ggml-org#6341)

* llama : save and restore kv cache for single seq id

* remove trailing whitespace

* respond error in case there's no space in the kv cache

* add kv seq save restore to test case

* add --slot-save-path arg to enable save restore and restrict save location

* Returning 0 for some cases, instead of asserting.

* cleanup error cases

* rename sequence state functions

* rename state get set functions

* add previous function names back in with DEPRECATED notice

* update doc

* adjust endpoints to preferred style

* fix restoring zero cell count

* handle seq rm return value

* unused param

* keep in the size check

* fix return types

* add server test case for slot save restore

* cleanup

* add cake

* cleanup style

* add special

* removing a whole sequence never fails

* move sequence state file functionality from server to llama to match session api and add version tags

* catch exceptions on save as well

* error log messages

* check types for stricter restore

* update server doc

* readme : update API changes date

* strict filename validation

* move include, reject bom as well

* also reject empty filename

* reject whitespace and trailing dot

---------

Co-authored-by: Martin Evans <martindevans@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Apr 8, 2024
beea6e1
zip
tar.gz

b2629

remove row=1 cond (ggml-org#6532)

Apr 8, 2024
87fb5b4
zip
tar.gz

b2619

sync : ggml

Apr 6, 2024
54ea069
zip
tar.gz

b2615

gguf.py : add licence and version to gguf writer (ggml-org#6504)

Apr 5, 2024
a8bd14d
zip
tar.gz

b2613

bench : make n_batch and n_ubatch configurable in Batched bench (ggml…

…-org#6500)

* bench: make n_batch and n_ubatch configurable

* bench: update doc for batched bench

Apr 5, 2024
87e21bb
zip
tar.gz

b2612

[SYCL] Fixed minor bug when enabling FP16 for non intel targets (ggml…

…-org#6464)

* moved INTEL_MKL guard from gemm_impl to gemm (wrapper)

* Update ggml-sycl.cpp

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

---------

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

Apr 5, 2024
1b496a7
zip
tar.gz

b2608

ci: exempt master branch workflows from getting cancelled (ggml-org#6486

)

* ci: exempt master branch workflows from getting cancelled

* apply to bench.yml

Apr 4, 2024
7dda1b7
zip
tar.gz

b2589

Add OpenChat, Alpaca, Vicuna chat templates (ggml-org#6397)

* Add openchat chat template

* Add chat template test for openchat

* Add chat template for vicuna

* Add chat template for orca-vicuna

* Add EOS for vicuna templates

* Combine vicuna chat templates

* Add tests for openchat and vicuna chat templates

* Add chat template for alpaca

* Add separate template name for vicuna-orca

* Remove alpaca, match deepseek with jinja output

* Regenerate chat template test with add_generation_prompt

* Separate deepseek bos from system message

* Match openchat template with jinja output

* Remove BOS token from templates, unprefix openchat

Apr 3, 2024
1ff4d9f
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b2636

b2632

b2630

b2629

b2619

b2615

b2613

b2612

b2608

b2589

Tags: masums/llama.cpp