-
Zhejiang University
- Shanghai, China
Highlights
- Pro
Stars
🚀 A very efficient Texas Holdem GTO solver
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Shared Middle-Layer for Triton Compilation
Enabling on-the-fly manipulations with LLVM IR code of CUDA sources
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Efficient Triton Kernels for LLM Training
Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch. Much faster than direct convolutions for large kernel sizes.
A self-learning tutorail for CUDA High Performance Programing.
为 Eijhout 教授的Introduction to HPC提供中文翻译、 PPT和Lab。
Make your vim more power and much easer. 最实用的vim配置🔥
CUDA Templates and Python DSLs for High-Performance Linear Algebra
how to optimize some algorithm in cuda.
FlagGems is an operator library for large language models implemented in the Triton Language.
Typora最新的激活破解方案,三步即激活。 😊实时更新中/👩🎓学生党必备,有条件支持正版的请不要点开🔞🈲️。Activate Typora
Development repository for the Triton language and compiler
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
we want to create a repo to illustrate usage of transformers in chinese
Code for the paper "Language Models are Unsupervised Multitask Learners"
Video+code lecture on building nanoGPT from scratch
An annotated implementation of the Transformer paper.