xinhaoc

🕶️

Focusing

Xinhao Cheng xinhaoc

🕶️

Focusing

24 followers · 14 following

Carnegie Mellon University
Pittsburgh, PA
03:41 (UTC -08:00)
https://xinhaoc.github.io

Achievements

x2 x2

Achievements

x2 x2

Stars

30 results for source starred repositories

Clear filter

infinigence / FlashOverlap

A lightweight design for computation-communication overlap.

Cuda 185 8 Updated Oct 10, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,491 693 Updated Nov 18, 2025

MekkCyber / CutlassAcademy

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

244 12 Updated May 6, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,088 569 Updated Nov 18, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,867 903 Updated Sep 30, 2025

gavinliu6 / Makefile-Tutorial-zh-CN

Makefile 教程

HTML 296 35 Updated Mar 4, 2024

facebookexperimental / triton

Github mirror of trition-lang/triton repo.

MLIR 98 23 Updated Nov 18, 2025

flexflow / flexflow-serve

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 63 6 Updated Sep 15, 2025

GenseeAI / cognify

Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower execution latency, and lower execution cost. Also has a simple …

Python 261 31 Updated May 16, 2025

ColfaxResearch / cfx-article-src

C++ 154 32 Updated May 7, 2025

lynnboy / CppCoreGuidelines-zh-CN

Translation of C++ Core Guidelines [https://github.com/isocpp/CppCoreGuidelines] into Simplified Chinese.

2,427 334 Updated Sep 22, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,951 153 Updated Nov 18, 2025

timovv / notion-website-template

Make a personal website using Notion and GitHub Pages

Shell 143 66 Updated Oct 27, 2023

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,806 1,530 Updated Nov 15, 2025

jiazhihao / attention_superoptimizer

An Attention Superoptimizer

C++ 22 Updated Jan 20, 2025

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 22,839 1,392 Updated Nov 18, 2025

jonbarron / jonbarron.github.io

HTML 3,314 2,725 Updated Nov 5, 2025

Timothyxxx / RetrivalLMPapers

Paper collections of retrieval-based (augmented) language model.

232 12 Updated May 24, 2024

lambda7xx / awesome-AI-system

paper and its code for AI System

337 23 Updated Aug 15, 2025

mlc-ai / tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece

C++ 418 99 Updated Aug 8, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,614 498 Updated Jan 16, 2024

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,842 245 Updated Nov 17, 2025

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 52,523 6,143 Updated Sep 18, 2024