这是indexloc提供的服务,不要输入任何密码
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ print(res)
### Quickstart
To try out Tuplex, simply try out the following starter notebooks using Google Colab:

| Name | Link | Description |
|-------------------------|------------------|---------------------------------------------------------------------|
| (01) Intro to Tuplex | [Google Colab](https://colab.research.google.com/drive/1idqCRmvN-9_F2naJ6k1hbslbQT-2bAqa?usp=sharing) | Basic commands to manipulate columns and modify data with user code |
| (02) Working with Files | [Google Colab](https://colab.research.google.com/drive/10gOYUpxK_Bjkw11WYupuaflATsBPRgU0?usp=sharing) | Loading and saving files, detecting types. |
| Name | Link | Description |
|--------------------------------|------------------|------------------------------------------------------------|
| 1. Intro to Tuplex | [Google Colab](https://colab.research.google.com/drive/1idqCRmvN-9_F2naJ6k1hbslbQT-2bAqa?usp=sharing) | Basic commands to manipulate columns and modify data with user code. |
| 2. Working with Files | [Google Colab](https://colab.research.google.com/drive/10gOYUpxK_Bjkw11WYupuaflATsBPRgU0?usp=sharing) | Loading and saving files, detecting types. |


More examples can be found [here](https://tuplex.cs.brown.edu/gettingstarted.html).
Expand All @@ -53,7 +53,7 @@ More examples can be found [here](https://tuplex.cs.brown.edu/gettingstarted.htm
To install Tuplex, you can use a PyPi package for Linux or MacOS(Intel), or a Docker container which will launch a jupyter notebook with Tuplex preinstalled.
#### Docker
```
docker run -p 8888:8888 tuplex/tuplex
docker run -p 8888:8888 tuplex/tuplex:v0.3.5
```
#### PyPI
```
Expand All @@ -66,7 +66,7 @@ Tuplex is available for MacOS and Linux. The current version has been tested und
To install Tuplex, simply install the dependencies first and then build the package.

#### MacOS build from source
To build Tuplex, you need several other packages first which can be easily installed via [brew](https://brew.sh/). If you want to build Tuplex with AWS support, you need `macOS 10.13+`.
To build Tuplex, you need several other packages first which can be easily installed via [brew](https://brew.sh/). If you want to build Tuplex with AWS support, you need `macOS 10.13+`. Python 3.9 or earlier requires an older cloudpickle version (1.6.0) whereas Python 3.10+ requires cloudpickle 2.1.0+.
```
brew install llvm@9 boost boost-python3 aws-sdk-cpp pcre2 antlr4-cpp-runtime googletest gflags yaml-cpp celero protobuf libmagic
python3 -m pip install 'cloudpickle<2.0' numpy
Expand Down Expand Up @@ -102,7 +102,7 @@ To customize the cmake build, the following options are available to be passed v
| `BUILD_NATIVE` | `ON`, `OFF` (default) | build with `-march=native` to target platform architecture. |
| `SKIP_AWS_TESTS` | `ON` (default), `OFF` | skip aws tests, helpful when no AWS credentials/AWS Tuplex chain is setup. |
| `GENERATE_PDFS` | `ON`, `OFF` (default) | output in Debug mode PDF files if graphviz is installed (e.g., `brew install graphviz`) for ASTs of UDFs, query plans, ...|
| `PYTHON3_VERSION` | `3.6`, ... | when trying to select a python3 version to build against, use this by specifying `major.minor`. To specify the python executable, use the options provided by [cmake](https://cmake.org/cmake/help/git-stage/module/FindPython3.html). |
| `PYTHON3_VERSION` | `3.7`, ... | when trying to select a python3 version to build against, use this by specifying `major.minor`. To specify the python executable, use the options provided by [cmake](https://cmake.org/cmake/help/git-stage/module/FindPython3.html). |
| `LLVM_ROOT_DIR` | e.g. `/usr/lib/llvm-9` | specify which LLVM version to use |
| `BOOST_DIR` | e.g. `/opt/boost` | specify which Boost version to use. Note that the python component of boost has to be built against the python version used to build Tuplex |

Expand Down
4 changes: 3 additions & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,11 @@ jobs:
displayName: 'Install required packages'
- script: sudo apt-get install -y python3-setuptools ninja-build && sudo apt-get remove -y python-pexpect python3-pexpect && sudo python3.7 -m pip install --upgrade pip && sudo python3.7 -m pip uninstall -y pygments && sudo python3.7 -m pip install pytest pygments>=2.4.1 MarkupSafe==2.0 pexpect setuptools astor PyYAML jupyter nbformat pymongo eventlet==0.30.0 gunicorn pymongo && jupyter --version
displayName: 'Install python dependencies'
- script: cd tuplex/python && python3 -m pip install -r requirements.txt && python3 mongodb_test.py && pkill mongod || true
displayName: 'Test local MongoDB'
- script: TUPLEX_BUILD_ALL=1 CMAKE_ARGS="-DBUILD_WITH_ORC=ON -DLLVM_ROOT_DIR=/usr/lib/llvm-9 -DCMAKE_BUILD_TYPE=Release -DBUILD_FOR_CI=ON" python3 setup.py install --user
displayName: 'Build Tuplex'
- script: cd build/temp.linux-x86_64-3.7 && ctest --timeout 180 --output-on-failure
displayName: 'C++ tests'
- script: cd build/temp.linux-x86_64-3.7/dist/python && python3.7 -m pytest -x --full-trace -l --log-cli-level debug
- script: cd build/temp.linux-x86_64-3.7/dist/python && python3.7 -m pytest -x --full-trace -l --log-cli-level=DEBUG --capture=tee-sys
displayName: 'Python tests'
2 changes: 1 addition & 1 deletion doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
# The short X.Y version
version="0.3"
# The full version, including alpha/beta/rc tags
release="0.3.5dev"
release="0.3.5"


# -- General configuration ---------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions scripts/docker/tuplex/create-image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ cp -R ../../../examples/sample_data .
# build benchmark docker image
# copy from scripts to current dir because docker doesn't understand files
# outside the build context
docker build -t tuplex/tuplex:0.3.5dev -f Dockerfile . || exit 1
docker build -t tuplex/tuplex:0.3.5 -f Dockerfile . || exit 1

# is upload set?
if [[ "${UPLOAD}" == 'SET' ]]; then
docker login
docker push tuplex/tuplex:0.3.5dev
docker push tuplex/tuplex:0.3.5
fi
2 changes: 1 addition & 1 deletion scripts/set_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def LooseVersion(v):


# to create a testpypi version use X.Y.devN
version = '0.3.5dev'
version = '0.3.5'

# https://pypi.org/simple/tuplex/
# or https://test.pypi.org/simple/tuplex/
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -653,7 +653,7 @@ def tplx_package_data():
# logic and declaration, and simpler if you include description/version in a file.
setup(name="tuplex",
python_requires='>=3.7.0',
version="0.3.5dev",
version="0.3.5",
author="Leonhard Spiegelberg",
author_email="tuplex@cs.brown.edu",
description="Tuplex is a novel big data analytics framework incorporating a Python UDF compiler based on LLVM "
Expand Down
2 changes: 1 addition & 1 deletion tuplex/historyserver/thserver/version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# (c) L.Spiegelberg 2017 - 2022
__version__="0.3.5dev"
__version__="0.3.5"
9 changes: 9 additions & 0 deletions tuplex/python/mongodb_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import logging
logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.info('testing mongodb init')

from tuplex.utils.common import find_or_start_mongodb

res = find_or_start_mongodb('localhost', 27017, './webui/data', './webui/mongod.log')
20 changes: 20 additions & 0 deletions tuplex/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
nbconvert<7.0
jupyter<7.0
nbformat<7.0
Werkzeug<2.2.0
attrs>=19.2.0
dill>=0.2.7.1
pluggy>=0.6.0, <1.0.0
py>=1.5.2
pygments>=2.4.1
pytest>=5.3.2
six>=1.11.0
wcwidth>=0.1.7
astor
prompt_toolkit>=2.0.7
jedi>=0.13.2
cloudpickle>=0.6.1,<2.0.0
PyYAML>=3.13
psutil
pymongo
iso8601
2 changes: 1 addition & 1 deletion tuplex/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

setup(
name="Tuplex",
version="0.3.5dev",
version="0.3.5",
packages=find_packages(),
package_data={
# include libs in libexec
Expand Down
13 changes: 7 additions & 6 deletions tuplex/python/tests/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@
#----------------------------------------------------------------------------------------------------------------------#

def test_options():
return {'tuplex.partitionSize' : "128KB",
"tuplex.executorMemory" : "4MB",
"tuplex.useLLVMOptimizer" : True,
"tuplex.allowUndefinedBehavior" : False,
"tuplex.webui.enable" : False,
return {'tuplex.partitionSize': "128KB",
"tuplex.executorMemory": "8MB",
"tuplex.useLLVMOptimizer": True,
"tuplex.allowUndefinedBehavior": False,
"tuplex.webui.enable": False,
"tuplex.optimizer.mergeExceptionsInOrder": True,
"tuplex.csv.selectionPushdown" : True}
"tuplex.csv.selectionPushdown": True,
"tuplex.scratchDir": ".cache/"}
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_aggregates.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
from tuplex import *
import typing
import os
from helper import test_options

class TestAggregates(unittest.TestCase):
def setUp(self):
self.conf = {"webui.enable": False, "driverMemory": "8MB", "partitionSize": "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable": False, "driverMemory": "8MB", "partitionSize": "256KB"})

def test_simple_count(self):
c = Context(self.conf)
Expand Down
6 changes: 4 additions & 2 deletions tuplex/python/tests/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
import random
import numpy as np
from tuplex import *

from helper import test_options

class TestArithmetic(unittest.TestCase):
def setUp(self):
self.conf = {"webui.enable": False, "driverMemory": "8MB", "partitionSize": "256KB", "tuplex.optimizer.mergeExceptionsInOrder": True}
self.conf = test_options()
self.conf.update({"webui.enable": False, "driverMemory": "8MB",
"partitionSize": "256KB", "tuplex.optimizer.mergeExceptionsInOrder": True})

def test_add(self):
c = Context(self.conf)
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_closure.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
from unittest import TestCase
import tuplex
import time
from helper import test_options

class TestClosure(TestCase):

def setUp(self):
self.c = tuplex.Context(webui=False)
self.conf = test_options()
self.c = tuplex.Context(self.conf)


def testGlobalVar(self):
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_columns.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
from unittest import TestCase

import tuplex
from helper import test_options

class TestColumns(TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})
self.c = tuplex.Context(self.conf)

def test_withColumnNew(self):
Expand Down
4 changes: 1 addition & 3 deletions tuplex/python/tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,7 @@
from tuplex.utils.common import *

class TestConfig(unittest.TestCase):



# DO NOT USE test_options() here, these tests are designed to actually test options...
def testNestedDictOptions(self):

c = Context(conf={'executorMemory':'1MB', 'executorCount':3})
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import unittest
import os
from tuplex import *
from helper import test_options

class TestCSV(unittest.TestCase):

Expand All @@ -30,7 +31,8 @@ def setUp(self):
self._generate_csv_file('test.csv', ',')
self._generate_csv_file('test.tsv', '\t')
self._generate_csv_file('test_header.csv', ',', True)
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})

def tearDown(self):
os.remove('test.csv')
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_dictionaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
import unittest
from tuplex import *
from math import isclose
from helper import test_options

class TestDictionaries(unittest.TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})

# test pop(), popitem()
def test_attributes(self):
Expand Down
7 changes: 5 additions & 2 deletions tuplex/python/tests/test_exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,15 @@
from tuplex import Context
from random import randint, sample, shuffle
from math import floor
from helper import test_options

class TestExceptions(unittest.TestCase):

def setUp(self):
self.conf = {"tuplex.webui.enable": False, "executorCount": 8, "executorMemory": "256MB", "driverMemory": "256MB", "partitionSize": "256KB", "tuplex.optimizer.mergeExceptionsInOrder": False}
self.conf_in_order = {"tuplex.webui.enable": False, "executorCount": 8, "executorMemory": "256MB", "driverMemory": "256MB", "partitionSize": "256KB", "tuplex.optimizer.mergeExceptionsInOrder": True}
self.conf = test_options()
self.conf.update({"tuplex.webui.enable": False, "executorCount": 8, "executorMemory": "256MB", "driverMemory": "256MB", "partitionSize": "256KB", "tuplex.optimizer.mergeExceptionsInOrder": False})
self.conf_in_order = test_options()
self.conf_in_order.update({"tuplex.webui.enable": False, "executorCount": 8, "executorMemory": "256MB", "driverMemory": "256MB", "partitionSize": "256KB", "tuplex.optimizer.mergeExceptionsInOrder": True})

def test_merge_with_filter(self):
c = Context(self.conf_in_order)
Expand Down
5 changes: 3 additions & 2 deletions tuplex/python/tests/test_fallback.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@
import unittest
from tuplex import *
import numpy as np

from helper import test_options

# test fallback functionality, i.e. executing cloudpickled code
class TestFallback(unittest.TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})
self.c = Context(self.conf)

def testArbitaryObjecsts(self):
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@

import unittest
from tuplex import *
from helper import test_options

# test filter functionality
class TestFilter(unittest.TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})

def testFilter(self):
c = Context(self.conf)
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@

import unittest
from tuplex import *
from helper import test_options

class TestTuples(unittest.TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})

def testIndexI(self):
c = Context(self.conf)
Expand Down
4 changes: 3 additions & 1 deletion tuplex/python/tests/test_inspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@
import typing
import unittest
from tuplex import *
from helper import test_options

# test filter functionality
class TestInspection(unittest.TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "8MB", "partitionSize" : "256KB"})

def testTypes(self):
""" test .types property """
Expand Down
6 changes: 5 additions & 1 deletion tuplex/python/tests/test_is.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
import tuplex
from unittest import TestCase

from helper import test_options

"""
Tests functionality for `is` keyword.
"""
class TestIs(TestCase):

def setUp(self):
self.conf = {"webui.enable": False, "executorCount": "0"}
self.conf = test_options()
self.conf.update({"webui.enable": False, "executorCount": "0"})
self.c = tuplex.Context(self.conf)

def test_boolIsBool(self):
Expand Down
5 changes: 3 additions & 2 deletions tuplex/python/tests/test_lists.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@
import unittest
from tuplex import *
from math import isclose

from helper import test_options

class TestLists(unittest.TestCase):

def setUp(self):
self.conf = {"webui.enable" : False, "driverMemory" : "16MB", "partitionSize" : "256KB"}
self.conf = test_options()
self.conf.update({"webui.enable" : False, "driverMemory" : "16MB", "partitionSize" : "256KB"})

def test_subscripts(self):
c = Context(self.conf)
Expand Down
6 changes: 4 additions & 2 deletions tuplex/python/tests/test_logical.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@

import unittest
from tuplex import *

from helper import test_options

class TestLogical(unittest.TestCase):

def __init__(self, *args, **kwargs):
self.conf = {"webui.enable": False, "driverMemory": "64MB", "executorMemory": "2MB", "partitionSize": "128KB"}
self.conf = test_options()
self.conf.update({"webui.enable": False, "driverMemory": "64MB",
"executorMemory": "2MB", "partitionSize": "128KB"})
super(TestLogical, self).__init__(*args, **kwargs)

def testAnd(self):
Expand Down
Loading