+
Skip to content
View Salv1a's full-sized avatar
  • Zhejiang University
  • Shanghai, China

Highlights

  • Pro

Block or report Salv1a

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🚀 A very efficient Texas Holdem GTO solver ♠️♥️♣️♦️

C++ 2,186 386 Updated Nov 5, 2024

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 75,004 10,974 Updated Oct 9, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 7,936 791 Updated Sep 19, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,785 711 Updated Oct 10, 2025

Shared Middle-Layer for Triton Compilation

MLIR 289 78 Updated Oct 8, 2025

Enabling on-the-fly manipulations with LLVM IR code of CUDA sources

C++ 112 26 Updated Apr 18, 2025

A torch compile backend for multi-targets

Python 39 18 Updated Oct 10, 2025

CUDA 算子手撕与面试指南

Cuda 634 70 Updated Aug 23, 2025

An ML Systems Onboarding list

910 33 Updated Jan 24, 2025

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python 826 88 Updated Sep 16, 2025

个人中文简历 Latex 源码 https://hijiangtao.github.io/

TeX 2,580 630 Updated Sep 4, 2024

Efficient Triton Kernels for LLM Training

Python 5,731 413 Updated Oct 10, 2025

Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch. Much faster than direct convolutions for large kernel sizes.

Python 509 61 Updated Sep 28, 2023

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 749 74 Updated Jun 30, 2025

为 Eijhout 教授的Introduction to HPC提供中文翻译、 PPT和Lab。

C 323 44 Updated Apr 11, 2022

Make your vim more power and much easer. 最实用的vim配置🔥

Vim Script 1,729 258 Updated May 8, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,561 1,474 Updated Sep 25, 2025

how to optimize some algorithm in cuda.

Cuda 2,546 228 Updated Oct 9, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 691 142 Updated Oct 10, 2025

Typora最新的激活破解方案,三步即激活。 😊实时更新中/👩‍🎓学生党必备,有条件支持正版的请不要点开🔞🈲️。Activate Typora

1,956 190 Updated Jul 15, 2024

Hands-On Practical MLIR Tutorial

C++ 618 90 Updated Oct 20, 2023

Development repository for the Triton language and compiler

MLIR 17,179 2,291 Updated Oct 10, 2025

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 15,271 2,195 Updated Sep 3, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 34,815 15,212 Updated Oct 10, 2025

计算机自学指南

HTML 67,902 7,632 Updated Oct 9, 2025

we want to create a repo to illustrate usage of transformers in chinese

Shell 2,996 489 Updated Aug 18, 2024

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 24,254 5,795 Updated Aug 14, 2024

Video+code lecture on building nanoGPT from scratch

Python 4,420 697 Updated Aug 13, 2024

An annotated implementation of the Transformer paper.

Jupyter Notebook 6,602 1,423 Updated Apr 7, 2024

量化研究-券商金工研报复现

Jupyter Notebook 3,945 1,027 Updated Jul 23, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载