Tags: hearsilent/mlc-llm
Tags
[Model] Add support for OLMo architecture (mlc-ai#3046) This PR add support for OLMo architecture. Additional support: add support for clip-qkv. Test: already tested on android(pixel 4) and cuda(setting tensor_parallel_shrads=2)
[Bench] Update API backend names (mlc-ai#2968) This PR updates the backend names, introducing one name per backend framework. These backends may refer to the same api endpoint.
Added hermes 3 support (mlc-ai#2886) * added hermes 3 support * modified format * fixed lint
Initial commit --------- Co-authored-by: Hongyi Jin <jinhongyi02@gmail.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Tianqi Chen <tqchen@cmu.edu> Co-authored-by: Junru Shao <junrushao@apache.org> Co-authored-by: Zihao Ye <zhye@cs.washington.edu>