mosure

👾

Mitchell Mosure mosure

👾

50 followers · 154 following

Madison, WI
02:47 (UTC -06:00)
https://mitchell.mosure.me

Achievements

x2 x2 x3

Achievements

x2 x2 x3

Organizations

Lists (1)

Sort

todo

40 repositories

Stars

41 stars written in Cuda

Clear filter

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 28,171 3,290 Updated Jun 26, 2025

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,068 2,027 Updated Oct 8, 2025

HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust

Cuda 11,153 426 Updated Nov 21, 2024

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,722 991 Updated Nov 6, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,888 745 Updated Nov 14, 2025

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 3,957 607 Updated Oct 2, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,910 197 Updated Nov 15, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,692 265 Updated Nov 6, 2025

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,805 464 Updated Oct 9, 2023

mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,414 177 Updated Feb 24, 2025

graphdeco-inria / diff-gaussian-rasterization

Cuda 1,322 401 Updated Oct 21, 2024

k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,283 231 Updated Nov 4, 2025

nv-tlabs / NKSR

[CVPR 2023 Highlight] Neural Kernel Surface Reconstruction

Cuda 892 61 Updated Sep 24, 2025

princeton-vl / lietorch

Cuda 804 88 Updated May 10, 2025

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 779 65 Updated Nov 14, 2025

19reborn / NeuS2

[ICCV 2023] Official code for NeuS2

Cuda 710 51 Updated Mar 22, 2024

clu0 / unet.cu

UNet diffusion model in pure CUDA

Cuda 653 31 Updated Jun 28, 2024

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 642 137 Updated Oct 20, 2025

theialab / radfoam

Original implementation of "Radiant Foam: Real-Time Differentiable Ray Tracing"

Cuda 620 36 Updated May 14, 2025

bycloudai / instant-ngp-Windows

Forked from NVlabs/instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 503 73 Updated Aug 14, 2022

b0nes164 / GPUSorting

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 401 23 Updated Dec 14, 2024