-
Apple
- Seattle
-
08:00
(UTC -08:00) - basujindal.me
- @basujindal
Highlights
- Pro
Stars
Fast and memory-efficient exact attention
CUDA Templates and Python DSLs for High-Performance Linear Algebra
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
The code powering searchthearxiv.com, a simple semantic search engine for more than 300,000 ML papers on arXiv.
Intel CPU undervolting and throttling configuration tool
Guide to linux undervolting for Haswell and never Intel CPUs
[NeurIPS 2024] Simple and Effective Masked Diffusion Language Model
Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun
A JAX research toolkit for building, editing, and visualizing neural networks.
This repository contains integer operators on GPUs for PyTorch.
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Open-Sora: Democratizing Efficient Video Production for All
Flash Attention in ~100 lines of CUDA (forward pass only)
Optimized Stable Diffusion modified to run on lower GPU VRAM
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streamin…
Stop messing around with finicky sampling parameters and just use DRµGS!
#1 Locally hosted web application that allows you to perform various operations on PDF files
Turn (almost) any Python command line program into a full GUI application with one line
Simple, free and efficient ad-blocker and privacy guard for Windows, macOS and Linux
Distribute and run LLMs with a single file.
Fast, collaborative live terminal sharing over the web