Lists (32)
Sort Name ascending (A-Z)
AI Efficiency
Attention
Checkpointing
Collective Communication
Compilers
CUDA
Datasets
Diffusion
Distributed Systems
Graphs
gRPC
Hardware
HPC
Infrastructure as Code
Jax
Job Orchestration
K8s
Kernel Language
Language Models
Linux
Mechanistic Interpretability
NVIDIA GDS: DMA/RDMA
PEFT
PyTorch
Quantization
Research Tools
Rust Machine Learning Ecosystem
Serving
Simulation
Storage
Vision
WASM
- All languages
- Agda
- Assembly
- Bikeshed
- C
- C#
- C++
- CMake
- CSS
- Common Lisp
- Coq
- Cuda
- Cython
- D
- Dockerfile
- Elixir
- F#
- Fortran
- GLSL
- Go
- HCL
- HTML
- Handlebars
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Julia
- Jupyter Notebook
- Koka
- Kotlin
- LLVM
- Lean
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Metal
- Mojo
- Nim
- OCaml
- Perl
- PowerShell
- Python
- Rocq Prover
- Roff
- Ruby
- Rust
- SMT
- SaltStack
- Scala
- Shell
- Starlark
- Svelte
- Swift
- SystemVerilog
- Tcl
- TeX
- TypeScript
- Verilog
- Vim Script
- WebAssembly
- YAML
- Zig
Starred repositories
MoE training system for research, speed-running and profit
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
easily compare pytorch json profiles. vibe coded
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
Technical report of Kimina-Prover Preview.
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
Post-training with Tinker
Github mirror of trition-lang/triton repo.
Kimi K2 is the large language model series developed by Moonshot AI team
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
Verify Precision of all Kimi K2 API Vendor
A minimal implementation of DeepMind's Genie world model
Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.