CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda 198 19 Updated Oct 10, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 7,016 893 Updated Aug 31, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,593 950 Updated Oct 11, 2025

gpgpu-sim / pytorch-gpgpu-sim

Modified version of PyTorch able to work with changes to GPGPU-Sim

C++ 56 30 Updated Nov 18, 2022

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,644 280 Updated Oct 10, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,308 218 Updated Oct 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,784 711 Updated Oct 11, 2025

bertmaher / simplegemm

Cuda 120 16 Updated Mar 17, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,163 103 Updated Oct 2, 2025

heyppen / AirPosture

Forked from allenv0/AirPosture

Turn your AirPods into a posture coach on macOS

Swift 4 1 Updated Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quinlan rhmaaa

Achievements

Achievements

Block or report rhmaaa

Stars

521xueweihan / git-tips

Dao-AILab / quack

sail-sg / Attention-Sink

fla-org / native-sparse-attention

October2001 / Awesome-KV-Cache-Compression

tilde-research / nsa-impl

huggingface / flux-fast

jt-zhang / Sparse_SageAttention_API

deepseek-ai / DeepSeek-MoE

SiriusNEO / Triton-Puzzles-Lite

ScalingIntelligence / tokasaurus

OpenBMB / CPM.cu

GeeeekExplorer / nano-vllm

deepseek-ai / DeepEP

gpgpu-sim / pytorch-gpgpu-sim

ModelTC / LightLLM

skyzh / tiny-llm

deepseek-ai / DeepGEMM

bertmaher / simplegemm

ByteDance-Seed / Triton-distributed

heyppen / AirPosture

zzli2022 / Awesome-System2-Reasoning-LLM

hemingkx / Awesome-Efficient-Reasoning

bytedance / flux

infinigence / FlashOverlap

Infrawaves / DeepEP_ibrc_dual-ports_multiQP

perplexityai / pplx-kernels

rhmaaa / comet-25

Dao-AILab / fast-hadamard-transform

VerticalResearchGroup / miaow