v0.19.dev0
This PR add support for OLMo architecture. Additional support: add support for clip-qkv. Test: already tested on android(pixel 4) and cuda(setting tensor_parallel_shrads=2)
This PR add support for OLMo architecture. Additional support: add support for clip-qkv. Test: already tested on android(pixel 4) and cuda(setting tensor_parallel_shrads=2)