这是indexloc提供的服务,不要输入任何密码
Skip to content

Tags: masums/llama.cpp

Tags

b2586

Toggle b2586's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Disable iqx on windows as WA (ggml-org#6435)

* disable iqx on windows as WA

* array instead of global_memory

b2581

Toggle b2581's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: bench: fix Resource not accessible by integration on PR event (gg…

…ml-org#6393)

b2579

Toggle b2579's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
split: allow --split-max-size option (ggml-org#6343)

* split by max size

* clean up arg parse

* split: ok

* add dry run option

* error on 0 tensors

* be positive

* remove next_metadata_size

b2578

Toggle b2578's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Vulkan k-quant mmq and ggml-backend offload functionality (ggml-org#6155

)

* Fix Vulkan no kv offload incoherence

* Add k-quant mul mat mat shaders

* Rework working buffer allocation, reduces vram use noticeably

Clean up cpu assist code, replaced with ggml-backend offload function

* Default to all dedicated GPUs

* Add fallback for integrated GPUs if no dedicated GPUs are found

* Add debug info which device is allocating memory

* Fix Intel dequant issue

Fix validation issue

* Fix Vulkan GGML_OP_GET_ROWS implementation

* Clean up merge artifacts

* Remove Vulkan warning

b2576

Toggle b2576's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[Model] Add support for xverse (ggml-org#6301)

* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from ggml-org#6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>

b2573

Toggle b2573's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake : add explicit metal version options (ggml-org#6370)

* cmake: add explicit metal version options

* Update CMakeLists.txt

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b2568

Toggle b2568's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
convert : refactor vocab selection logic (ggml-org#6355)

b2567

Toggle b2567's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llava : fix MobileVLM (ggml-org#6364)

* fix empty bug

* Update MobileVLM-README.md

added more results on devices

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update examples/llava/MobileVLM-README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update MobileVLM-README.md

remove gguf links

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b2566

Toggle b2566's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : fix command-r inference when omitting outputs (ggml-org#6367)

b2563

Toggle b2563's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : stop gracefully on SIGTERM (ggml-org#6348)