{% include 'analytics.html' %}
{% include 'navbar_without_logo.html' %}
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set. Under the hood, Tuplex is based on data-driven compilation and dual-mode processing, two key techniques that make it possible for Tuplex to provide speed comparable to a pipeline written in hand-optimized C++.
We're currently in the process of releasing Tuplex to the public in the next days, if you're excited to try it and don't want to miss out on the release, please subscribe here to get an official release email! {% include 'mailchimp.html' %}
Because Tuplex compiles data science pipelines with inline Python to native code, it runs them 5–91x faster than systems that call into a Python interpreter.
Tuplex makes wrangling data easy: it works interactively in the Python toplevel, integrates with Jupyter Notebooks, and provides familiar APIs, all backed by its data-driven compiler. Tuplex jobs never crash on malformed inputs because Tuplex's dual-mode execution model separates the common-case inputs from exception-case inputs (e.g., malformed data, wrong types) and reports them separately.
Getting started with Tuplex is easy: we provide a Python package, Docker image, and instructions to build from source.
| Linux, Python 3.7-3.9: |
| $ pip install tuplex |
| macOS, Catalina or later: |
| $ docker run tuplex/tuplex |
| Leonhard F. Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska. Tuplex: Data Science in Python at Native Code Speed. Proceedings of SIGMOD 2021, June 2021. URL: https://doi.org/10.1145/3448016.3457244. | |
| Leonhard F. Spiegelberg and Tim Kraska. Tuplex: robust, efficient analytics when python rules. Proc. VLDB Endow., 12(12):1958–1961, August 2019. URL: https://doi.org/10.14778/3352063.3352109. |
|
|
|
|
| Andrew Wei | Andy Ly | Benjamin Givertz |
| Colby Anderson | Yunzhi Shao | Raghu Nimmagadda |
| Willam Riley |