Stars
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
An orchestration platform for the development, production, and observation of data assets.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
Counter-based random number generators for C, C++ and CUDA.
Command line interface for testing internet bandwidth using speedtest.net
Code for a dynamic multilevel Bayesian model to predict US presidential elections. Written in R and Stan.
COEXI(S)T - Modelling COVID-19 exit strategies for policy makers in the United Kingdom
Implementation of (overlap) local SGD in Pytorch
Simple Training and Deployment of Fast End-to-End Binary Networks
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Relax! Flux is the ML library that doesn't make you tensor
Algorithm examples in PlusCal, the algorithm language of Lamport's TLA+
Rust mid-level IR Abstract Interpreter
A cross-platform, OpenGL terminal emulator.
A WebGL accelerated JavaScript library for training and deploying ML models.
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.
Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2