catswe

catswe

3 followers · 3 following

Achievements

Stars

kernelize-ai / triton-cpu

CPU Plugin for Triton

C++ 5 Updated Oct 10, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 93,818 25,516 Updated Oct 10, 2025

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,061 199 Updated Jun 8, 2023

rusty1s / pytorch_sparse

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

Python 1,088 158 Updated Aug 12, 2025

mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.

Python 400 74 Updated Oct 10, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,559 134 Updated Oct 9, 2025

openxla / tokamax

Tokamax: A GPU and TPU kernel library.

Python 89 Updated Oct 10, 2025

facebookexperimental / triton

Github mirror of trition-lang/triton repo.

MLIR 85 20 Updated Oct 10, 2025

leochlon / hallbayes

Python 1,126 108 Updated Oct 9, 2025

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 376 9 Updated Oct 9, 2025

microsoft / T-MAC

Low-bit LLM inference on CPU/NPU with lookup table

C++ 866 72 Updated Jun 5, 2025

NVIDIA / cuEmbed

CUDA Embedding Lookup Kernel Library

Cuda 28 4 Updated Jul 25, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 7,936 791 Updated Sep 19, 2025

Qovery / engine

The Orchestration Engine To Deliver Self-Service Infrastructure ⚡️

Rust 2,399 77 Updated Sep 1, 2025

ROCm / iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 83 19 Updated Oct 10, 2025

norxornor / modded-nanogpt-jax

NanoGPT speedrun in JAX. Originally at https://nor-git.pages.dev/modded-nanogpt-jax/

Python 2 1 Updated Aug 28, 2025

catswe / KANditioned

KANditioned: A very fast implementation of Kolmogorov-Arnold Networks

Python 12 Updated Oct 4, 2025

mobiusml / gemlite

Fast low-bit matmul kernels in Triton

Python 379 28 Updated Sep 28, 2025

vipulSharma18 / NCCL-From-First-Principles

NCCL communication API layer, and transport layer created from first principles.

C++ 11 Updated Aug 20, 2025

GistNoesis / FourierKAN

Python 740 62 Updated May 24, 2024

mintisan / awesome-kan

A comprehensive collection of KAN(Kolmogorov-Arnold Network)-related resources, including libraries, projects, tutorials, papers, and more, for researchers and developers in the Kolmogorov-Arnold N…

3,068 292 Updated Jun 30, 2025

iShohei220 / adopt

Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"

Jupyter Notebook 425 20 Updated Dec 12, 2024

apple / ml-ademamix

Python 67 4 Updated Nov 15, 2024

RadeonFlow / RadeonFlow_Kernels

Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X

C++ 70 5 Updated Aug 2, 2025

HomebrewML / HeavyBall

Efficient optimizers

Python 269 25 Updated Oct 9, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,733 414 Updated Oct 10, 2025

apple / ml-cross-entropy

Python 529 49 Updated Sep 23, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,030 168 Updated Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catswe

Achievements

Achievements

Block or report catswe

Stars

kernelize-ai / triton-cpu

pytorch / pytorch

openai / blocksparse

rusty1s / pytorch_sparse

mlcommons / algorithmic-efficiency

QwenLM / Qwen3-Omni

openxla / tokamax

facebookexperimental / triton

leochlon / hallbayes

NVIDIA / tilus

microsoft / T-MAC

NVIDIA / cuEmbed

xlite-dev / LeetCUDA

Qovery / engine

ROCm / iris

norxornor / modded-nanogpt-jax

catswe / KANditioned

mobiusml / gemlite

vipulSharma18 / NCCL-From-First-Principles

GistNoesis / FourierKAN

mintisan / awesome-kan

iShohei220 / adopt

apple / ml-ademamix

RadeonFlow / RadeonFlow_Kernels

HomebrewML / HeavyBall

linkedin / Liger-Kernel

apple / ml-cross-entropy

srush / Triton-Puzzles