gty111

🎯

Focusing is all you need

Tianyu Guo gty111

🎯

Focusing is all you need

Ph.D. student of Sun Yat-Sen University, prior intern @Tencent. Simulators, GPU, architecture, AI Infra, MLSys

101 followers · 103 following

Sun Yat-sen University
Guangzhou
16:27 (UTC +08:00)
https://gty111.github.io/info/
https://orcid.org/0009-0005-2979-4486

Achievements

x3 x2

Achievements

x3 x2

Highlights

Lists (19)

Sort

Stars

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 2,585 173 Updated Jun 17, 2025

kserve / kserve

Standardized Serverless ML Inference Platform on Kubernetes

Python 4,368 1,214 Updated Jul 16, 2025

llm-d / llm-d

llm-d is a Kubernetes-native high-performance distributed LLM inference framework

Makefile 1,394 113 Updated Jul 16, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,144 178 Updated Mar 27, 2024

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

5,652 317 Updated Jul 16, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 11,159 1,855 Updated Jul 18, 2025

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 1,585 99 Updated Jul 18, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 476 33 Updated Feb 10, 2025

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 18,734 1,542 Updated Jun 16, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 3,204 354 Updated Jul 18, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 891 75 Updated Jul 15, 2025

HumanMLLM / LLaVA-Scissor

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 91 Updated Jul 1, 2025

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,014 157 Updated Jul 16, 2025

flexflow / flexflow-serve

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 48 5 Updated May 8, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,204 55 Updated Jun 13, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,827 300 Updated Mar 10, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,078 144 Updated Mar 21, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,236 194 Updated Mar 24, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,285 860 Updated Jul 18, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,658 1,041 Updated Jul 12, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,870 280 Updated May 15, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,939 181 Updated Jul 17, 2025

ikatyang / emoji-cheat-sheet

A markdown version emoji cheat sheet

TypeScript 13,216 4,545 Updated Jul 18, 2025

ollama / ollama

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.

Go 146,829 12,431 Updated Jul 17, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

492 37 Updated Jul 8, 2025

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 126 16 Updated Jan 18, 2025

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 643 70 Updated Apr 6, 2025

deepseek-ai / DeepSeek-V3

Python 98,282 16,003 Updated Jun 27, 2025

pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch

Python 22,620 3,851 Updated Jul 16, 2025

recommenders-team / recommenders

Best Practices on Recommendation Systems

Python 20,486 3,226 Updated Jul 18, 2025

Tianyu Guo gty111

Highlights

Lists (19)

AI

Benchmark

Compiler & DSL

CV & CG

Diffusion

Framework

Hardware

HPC

Instrumention&Reverse&Assemble

LAB

Math

NLP

Operating Systems

Recommendation

ROCM

Simulators

Template & Theme

Tools

Tutorial & Examples

Stars