-
Sun Yat-sen University
- Guangzhou
-
16:27
(UTC +08:00) - https://gty111.github.io/info/
- https://orcid.org/0009-0005-2979-4486
Highlights
- Pro
Lists (19)
Sort Name ascending (A-Z)
AI
Benchmark
Compiler & DSL
CV & CG
Diffusion
Framework
Hardware
HPC
Instrumention&Reverse&Assemble
LAB
Math
NLP
Operating Systems
Recommendation
ROCM
Simulators
Template & Theme
Tools
Tutorial & Examples
Stars
Official PyTorch implementation for "Large Language Diffusion Models"
Standardized Serverless ML Inference Platform on Kubernetes
llm-d is a Kubernetes-native high-performance distributed LLM inference framework
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Kimi K2 is the large language model series developed by Moonshot AI team
verl: Volcano Engine Reinforcement Learning for LLMs
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Supercharge Your LLM with the Fastest KV Cache Layer
Distributed Compiler based on Triton for Parallel Systems
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Analyze computation-communication overlap in V3/R1.
DeepEP: an efficient expert-parallel communication library
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
My learning notes/codes for ML SYS.
A markdown version emoji cheat sheet
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
High performance Transformer implementation in C++.
Disaggregated serving system for Large Language Models (LLMs).
Graph Neural Network Library for PyTorch
Best Practices on Recommendation Systems