xsxszab

💭

sleeping

Yifei Wang xsxszab

💭

sleeping

LLM inference optimization

24 followers · 5 following

Tiktok
Sunnyvale, CA
04:09 (UTC -07:00)
https://xsxszab.github.io/
in/yifei--wang

Achievements

Stars

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,785 711 Updated Oct 10, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,154 2,523 Updated Oct 10, 2025

NVIDIA / TensorRT-Model-Optimizer

A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…

Python 1,437 170 Updated Oct 10, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 7,011 892 Updated Aug 31, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 93,811 25,513 Updated Oct 10, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,501 238 Updated Oct 8, 2025

NVIDIA / NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 456 65 Updated Sep 25, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,873 133 Updated Oct 10, 2025

yuweihao / MambaOut

MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)

Python 2,543 49 Updated Mar 9, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,801 1,789 Updated Oct 10, 2025

loong / go-concurrency-exercises

Hands on exercises with real-life examples to study and practice Go concurrency patterns. Test-cases are provided to verify your answers.

Go 1,451 496 Updated Sep 23, 2024

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 108,324 11,150 Updated Oct 9, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 13,779 3,147 Updated Oct 10, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 18,723 3,102 Updated Oct 10, 2025

pybind / pybind11

Seamless operability between C++11 and Python

C++ 17,339 2,224 Updated Oct 6, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,177 2,290 Updated Oct 10, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,696 290 Updated Jun 12, 2025

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,254 292 Updated May 11, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 7,933 790 Updated Sep 19, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 21,211 2,493 Updated Aug 3, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,523 2,396 Updated Sep 8, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 59,760 10,600 Updated Oct 10, 2025

NirDiamant / RAG_Techniques

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…

Jupyter Notebook 22,259 2,489 Updated Oct 8, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,351 4,576 Updated Oct 10, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,148 514 Updated Sep 23, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,560 1,474 Updated Sep 25, 2025

adityatelange / hugo-PaperMod

A fast, clean, responsive Hugo theme.

HTML 12,463 3,241 Updated Oct 9, 2025

gohugoio / hugo

The world’s fastest framework for building websites.

Go 84,085 8,050 Updated Oct 9, 2025

infiniflow / ragflow

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

TypeScript 65,694 6,909 Updated Oct 10, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 87,430 13,266 Updated Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yifei Wang xsxszab

Achievements

Achievements

Block or report xsxszab

Stars

deepseek-ai / DeepGEMM

volcengine / verl

NVIDIA / TensorRT-Model-Optimizer

GeeeekExplorer / nano-vllm

pytorch / pytorch

thu-ml / SageAttention

NVIDIA / NVTX

mirage-project / mirage

yuweihao / MambaOut

NVIDIA / TensorRT-LLM

loong / go-concurrency-exercises

excalidraw / excalidraw

NVIDIA / Megatron-LM

sgl-project / sglang

pybind / pybind11

triton-lang / triton

QwenLM / Qwen2.5-Omni

casper-hansen / AutoAWQ

xlite-dev / LeetCUDA

liguodongiot / llm-action

huggingface / open-r1

vllm-project / vllm

NirDiamant / RAG_Techniques

deepspeedai / DeepSpeed

gpu-mode / lectures

NVIDIA / cutlass

adityatelange / hugo-PaperMod

gohugoio / hugo

infiniflow / ragflow

ggml-org / llama.cpp