这是indexloc提供的服务,不要输入任何密码
Skip to content
View hxdtest's full-sized avatar

Block or report hxdtest

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
28 results for source starred repositories
Clear filter

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,076 167 Updated Jul 21, 2025

A Quirky Assortment of CuTe Kernels

Python 365 30 Updated Jul 25, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 11,481 1,912 Updated Jul 27, 2025

Allow torch tensor memory to be released and resumed later

Python 91 14 Updated Jul 9, 2025

Examples of CUDA implementations by Cutlass CuTe

Makefile 211 29 Updated Jul 1, 2025

Implement Flash Attention using Cute.

Cuda 89 7 Updated Dec 17, 2024

Distributed Compiler based on Triton for Parallel Systems

Python 916 80 Updated Jul 25, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,833 300 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

C++ 5,558 658 Updated Jul 25, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,313 871 Updated Jul 22, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,874 281 Updated May 15, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 10,782 1,045 Updated Jun 24, 2025

Monte carlo tree search in python

Python 608 172 Updated Jul 2, 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 91,807 24,778 Updated Jul 27, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,605 668 Updated Aug 18, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 12,003 1,061 Updated Jul 19, 2025

Modeling, training, eval, and inference code for OLMo

Python 5,822 634 Updated Jul 24, 2025

Development repository for the Triton language and compiler

MLIR 16,294 2,130 Updated Jul 27, 2025

A tutorial for CUDA&PyTorch

C++ 149 28 Updated Jan 21, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,244 435 Updated Jul 27, 2025

Data preparation code for CrystalCoder 7B LLM

Python 45 5 Updated May 10, 2024

Pre-training code for CrystalCoder 7B LLM

Python 54 7 Updated May 10, 2024

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 4,879 252 Updated Jul 26, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,145 2,556 Updated Aug 12, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 147,512 29,795 Updated Jul 26, 2025

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 883 167 Updated Dec 30, 2024

DLRover: An Automatic Distributed Deep Learning System

Python 1,510 189 Updated Jul 25, 2025

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.

C++ 1,116 361 Updated Jan 21, 2025