这是indexloc提供的服务,不要输入任何密码
Skip to content
View hxdtest's full-sized avatar

Block or report hxdtest

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 318 29 Updated Nov 13, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,700 269 Updated Nov 6, 2025

A Quirky Assortment of CuTe Kernels

Python 657 61 Updated Oct 30, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,138 2,597 Updated Nov 19, 2025

Allow torch tensor memory to be released and resumed later

Python 175 28 Updated Nov 14, 2025

Examples of CUDA implementations by Cutlass CuTe

Makefile 250 33 Updated Jul 1, 2025

Implement Flash Attention using Cute.

Cuda 96 8 Updated Dec 17, 2024

Distributed Compiler based on Triton for Parallel Systems

Python 1,241 106 Updated Nov 18, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,878 305 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,899 748 Updated Nov 19, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,732 997 Updated Nov 18, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,927 286 Updated May 15, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 12,942 1,234 Updated Oct 28, 2025

Monte carlo tree search in python

Python 621 173 Updated Jul 2, 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 95,219 25,953 Updated Nov 20, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,904 699 Updated Aug 18, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 12,077 1,068 Updated Oct 29, 2025

Modeling, training, eval, and inference code for OLMo

Python 6,129 674 Updated Oct 24, 2025

Development repository for the Triton language and compiler

MLIR 17,599 2,394 Updated Nov 20, 2025

A tutorial for CUDA&PyTorch

C++ 165 32 Updated Jan 21, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,401 450 Updated Aug 2, 2025

Data preparation code for CrystalCoder 7B LLM

Python 45 5 Updated May 10, 2024

Pre-training code for CrystalCoder 7B LLM

Python 55 8 Updated May 10, 2024

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 5,530 290 Updated Nov 20, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,998 2,664 Updated Aug 12, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,166 355 Updated Dec 9, 2023

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,745 31,179 Updated Nov 19, 2025

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 906 170 Updated Dec 30, 2024

DLRover: An Automatic Distributed Deep Learning System

Python 1,592 198 Updated Nov 18, 2025

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.

C++ 1,147 357 Updated Jan 21, 2025