Tags · masums/llama.cpp

b2586

[SYCL] Disable iqx on windows as WA (ggml-org#6435)

* disable iqx on windows as WA

* array instead of global_memory

Apr 3, 2024
5260486
zip
tar.gz

b2581

ci: bench: fix Resource not accessible by integration on PR event (gg…

…ml-org#6393)

Mar 30, 2024
37e7854
zip
tar.gz

b2579

split: allow --split-max-size option (ggml-org#6343)

* split by max size

* clean up arg parse

* split: ok

* add dry run option

* error on 0 tensors

* be positive

* remove next_metadata_size

Mar 29, 2024
f7fc5f6
zip
tar.gz

b2578

Vulkan k-quant mmq and ggml-backend offload functionality (ggml-org#6155

)

* Fix Vulkan no kv offload incoherence

* Add k-quant mul mat mat shaders

* Rework working buffer allocation, reduces vram use noticeably

Clean up cpu assist code, replaced with ggml-backend offload function

* Default to all dedicated GPUs

* Add fallback for integrated GPUs if no dedicated GPUs are found

* Add debug info which device is allocating memory

* Fix Intel dequant issue

Fix validation issue

* Fix Vulkan GGML_OP_GET_ROWS implementation

* Clean up merge artifacts

* Remove Vulkan warning

Mar 29, 2024
ba0c7c7
zip
tar.gz

b2576

[Model] Add support for xverse (ggml-org#6301)

* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from ggml-org#6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>

Mar 29, 2024
0695747
zip
tar.gz

b2573

cmake : add explicit metal version options (ggml-org#6370)

* cmake: add explicit metal version options

* Update CMakeLists.txt

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Mar 29, 2024
8093987
zip
tar.gz

b2568

convert : refactor vocab selection logic (ggml-org#6355)

Mar 28, 2024
be55134
zip
tar.gz

b2567

llava : fix MobileVLM (ggml-org#6364)

* fix empty bug

* Update MobileVLM-README.md

added more results on devices

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update examples/llava/MobileVLM-README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update MobileVLM-README.md

remove gguf links

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Mar 28, 2024
66ba560
zip
tar.gz

b2566

llama : fix command-r inference when omitting outputs (ggml-org#6367)

Mar 28, 2024
0308f5e
zip
tar.gz

b2563

server : stop gracefully on SIGTERM (ggml-org#6348)

Mar 28, 2024
6902cb7
zip
tar.gz

Previous Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b2586

b2581

b2579

b2578

b2576

b2573

b2568

b2567

b2566

b2563

Tags: masums/llama.cpp