-
Intel
- Shanghai
-
15:32
(UTC +08:00)
Lists (1)
Sort Name ascending (A-Z)
Stars
📰 Must-read papers and blogs on Speculative Decoding ⚡️
DeepEP: an efficient expert-parallel communication library
HabanaAI / vllm-fork
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
A Datacenter Scale Distributed Inference Serving Framework
The simplest, fastest repository for training/finetuning small-sized VLMs.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Real time interactive streaming digital human
Open Source framework for voice and multimodal conversational AI
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
FlashInfer: Kernel Library for LLM Serving
Fast inference from large lauguage models via speculative decoding
An Application Framework for AI Engineering
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Development repository for the Triton language and compiler
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination
GenAI components at micro-service level; GenAI service composer to create mega-service
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…