+
Skip to content
View xsxszab's full-sized avatar
💭
sleeping
💭
sleeping

Block or report xsxszab

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,785 711 Updated Oct 10, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,154 2,523 Updated Oct 10, 2025

A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…

Python 1,437 170 Updated Oct 10, 2025

Nano vLLM

Python 7,011 892 Updated Aug 31, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 93,811 25,513 Updated Oct 10, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,501 238 Updated Oct 8, 2025

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 456 65 Updated Sep 25, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,873 133 Updated Oct 10, 2025

MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)

Python 2,543 49 Updated Mar 9, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,801 1,789 Updated Oct 10, 2025

Hands on exercises with real-life examples to study and practice Go concurrency patterns. Test-cases are provided to verify your answers.

Go 1,451 496 Updated Sep 23, 2024

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 108,324 11,150 Updated Oct 9, 2025

Ongoing research training transformer models at scale

Python 13,779 3,147 Updated Oct 10, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 18,723 3,102 Updated Oct 10, 2025

Seamless operability between C++11 and Python

C++ 17,339 2,224 Updated Oct 6, 2025

Development repository for the Triton language and compiler

MLIR 17,177 2,290 Updated Oct 10, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,696 290 Updated Jun 12, 2025

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,254 292 Updated May 11, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 7,933 790 Updated Sep 19, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 21,211 2,493 Updated Aug 3, 2025

Fully open reproduction of DeepSeek-R1

Python 25,523 2,396 Updated Sep 8, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 59,760 10,600 Updated Oct 10, 2025

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…

Jupyter Notebook 22,259 2,489 Updated Oct 8, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,351 4,576 Updated Oct 10, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,148 514 Updated Sep 23, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,560 1,474 Updated Sep 25, 2025

A fast, clean, responsive Hugo theme.

HTML 12,463 3,241 Updated Oct 9, 2025

The world’s fastest framework for building websites.

Go 84,085 8,050 Updated Oct 9, 2025

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

TypeScript 65,694 6,909 Updated Oct 10, 2025

LLM inference in C/C++

C++ 87,430 13,266 Updated Oct 10, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载