这是indexloc提供的服务,不要输入任何密码
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

[Website](https://tuplex.cs.brown.edu/) [Documentation](https://tuplex.cs.brown.edu/python-api.html)

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code.
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code.
Tuplex has similar Python APIs to [Apache Spark](https://spark.apache.org/) or [Dask](https://dask.org/), but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set. Under the hood, Tuplex is based on data-driven compilation and dual-mode processing, two key techniques that make it possible for Tuplex to provide speed comparable to a pipeline written in hand-optimized C++.

You can join the discussion on Tuplex on our [Gitter community](https://gitter.im/tuplex/community) or read up more on the background of Tuplex in our [SIGMOD'21 paper](https://dl.acm.org/doi/abs/10.1145/3448016.3457244).
Expand Down Expand Up @@ -47,15 +47,15 @@ To install Tuplex, simply install the dependencies first and then build the pack
To build Tuplex, you need several other packages first which can be easily installed via [brew](https://brew.sh/).
```
brew install llvm@9 boost boost-python3 aws-sdk-cpp pcre2 antlr4-cpp-runtime googletest gflags yaml-cpp celero protobuf libmagic
python3 -m pip cloudpickle numpy
python3 -m pip install cloudpickle numpy
python3 setup.py install
```

#### Ubuntu build from source
To faciliate installing the dependencies for Ubuntu, we do provide two scripts (`scripts/ubuntu1804/install_reqs.sh` for Ubuntu 18.04, or `scripts/ubuntu2004/install_reqs.sh` for Ubuntu 20.04). To create an up to date version of Tuplex, simply run
```
./scripts/ubuntu1804/install_reqs.sh
python3 -m pip cloudpickle numpy
python3 -m pip install cloudpickle numpy
python3 setup.py install
```

Expand All @@ -77,6 +77,7 @@ To customize the cmake build, the following options are available to be passed v
| `CMAKE_BUILD_TYPE` | `Release` (default), `Debug`, `RelWithDebInfo`, `tsan`, `asan`, `ubsan` | select compile mode. Tsan/Asan/Ubsan correspond to Google Sanitizers. |
| `BUILD_WITH_AWS` | `ON` (default), `OFF` | build with AWS SDK or not. On Ubuntu this will build the Lambda executor. |
| `BUILD_WITH_ORC` | `ON`, `OFF` (default) | build with ORC file format support. |
| `BUILD_NATIVE` | `ON`, `OFF` (default) | build with `-march=native` to target platform architecture. |
| `SKIP_AWS_TESTS` | `ON` (default), `OFF` | skip aws tests, helpful when no AWS credentials/AWS Tuplex chain is setup. |
| `GENERATE_PDFS` | `ON`, `OFF` (default) | output in Debug mode PDF files if graphviz is installed (e.g., `brew install graphviz`) for ASTs of UDFs, query plans, ...|
| `PYTHON3_VERSION` | `3.6`, ... | when trying to select a python3 version to build against, use this by specifying `major.minor`. To specify the python executable, use the options provided by [cmake](https://cmake.org/cmake/help/git-stage/module/FindPython3.html). |
Expand Down
5 changes: 5 additions & 0 deletions scripts/build_wheel_linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ export CIBW_MANYLINUX_X86_64_IMAGE='registry-1.docker.io/tuplex/ci:latest'
# Use the following line to build only python3.9 wheel
export CIBW_BUILD="cp39-*"


# For Google Colab compatible wheel, use the following:
export CIBW_BUILD="cp37-*"
export CIBW_ARCHS_LINUX="x86_64"

# to test the others from 3.7-3.9, use these two lines:
#export CIBW_BUILD="cp3{7,8,9}-*"
#export CIBW_SKIP="cp3{5,6,7,8}-macosx* pp*"
Expand Down
24 changes: 24 additions & 0 deletions scripts/docker/colab/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Docker image mimicking Google Colab setup for testing purposes
# Certain large dependencies like torch (~2G) are not installed
FROM ubuntu:18.04

# Fix timezone to US
ENV TZ=America/New_York
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# Python version used is 3.7.12
RUN apt-get update && apt-get install -y wget curl build-essential zlib1g-dev libssl1.0-dev libncurses-dev libgdbm-dev libz-dev tk-dev libsqlite3-dev libreadline-dev liblzma-dev libffi-dev

WORKDIR /usr/src
RUN cd /usr/src && curl https://www.openssl.org/source/openssl-1.0.2o.tar.gz | tar xz && cd openssl-1.0.2o && ./config shared zlib --prefix=/usr/local/ && make && make install -j8

RUN wget https://www.python.org/ftp/python/3.7.12/Python-3.7.12.tgz

# DO not use enable-optimizations because it gets stuck...
RUN tar xf Python-3.7.12.tgz && cd Python-3.7.12 && ./configure --prefix=/usr --with-openssl=/usr/local --with-ensurepip=install && make install -j8

WORKDIR /work
ADD requirements.txt /work/requirements.txt

RUN apt-get install -y libgdal-dev libcairo2-dev libjpeg-dev libgif-dev
RUN pip3 install -r /work/requirements.txt
15 changes: 15 additions & 0 deletions scripts/docker/colab/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Docker image to mimick Google's Colab environment
This folder contains a docker image mimicking (best-effort) the environment of Google's Colab.

To create the image, simply use
```
docker build -t tuplex/colab .
```

Then, start the container via
```
docker run -it tuplex/colab bash
```

---
(c) 2021 Tuplex team
Loading