这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@LeonhardFS
Copy link
Contributor

No description provided.

LeonhardFS and others added 30 commits June 30, 2021 23:36
Website updates

Details:

* Link Github repo from front page
* Update all repo links to point to new repo
* Fix up Docker commands to include Jupyter port

authored-by: Malte Schwarzkopf (malte@cs.brown.edu)
…WS credentials (tuplex#5)

Fix Tuplex crashing when no AWS credentials are present are in the environment.

Other:
- version bump to 0.3.1
- updated build-wheel

authored-by: Leonhard Spiegelberg <leonhard_spiegelberg@brown.edu>
Fix CI to create timestamp based dev versions to avoid conflicts in test.pypi.org

authored-by: Leonhard Spiegelberg <leonhard_spiegelberg@brown.edu>
single-line fix to make GH action push to pypi.org upon tagging.

authored-by: Leonhard Spiegelberg <leonhard_spiegelberg@brown.edu>
Update Ubuntu 20.04 install requirements scripts.

Details:
- Setup now installs pip3
- Setup now installs libmagic-dev

authored-by: Malte Schwarzkopf (malte@cs.brown.edu)
Fix bug in aggregateByKey function of Dataset where an old deprecated function was used.

Details: 

- use correct get_source function
- change wide exception handlers to narrow ones
- add python end-to-end tests for aggregate and aggregateByKey

authored-by: Rahul Yesantharao (rahuly@mit.edu)
coauthored-by: Leonhard Spiegelberg (lspiegel@cs.brown.edu)
Reduce compiler warnings and replace temporary file creation.

Details:
- Replace std::tmpnam use with mkstemp because, tmpnam(3) is deprecated and triggers compiler warnings.
- Add option to suppress #warning output
- The #warning directives create a lot of output spam that makes it
hard to see "real" warnings.
- Adds a new option (SHOW_EXPLICIT_WARNINGS) to enable them (default
off), otherwise suppresses them via -Wno-cpp.

authored by: Malte Schwarzkopf <malte@cs.brown.edu>
Shipped GCC 9.3 leads to build errors, therefore force use of GCC 10 on Ubuntu 20.04.

authored by: Malte Schwarzkopf (malte@cs.brown.edu)
… functions (tuplex#25)

Add support for builtin iterator functions `iter, zip, enumerate, next, reversed`. Fix bug when calling `len` on an empty list `[]`. Add support for multiple identifiers in loop, e.g. `for a, b in (t1, t2), (t3, t4)`

Details:

- Add iterator type and related functions for typing
- Add symbol for iterator-related functions (iter, zip, enumerate, next, reversed)
- Add iterator-specific annotation
- Add iterator-related helping functions in LLVMEnvironment
- Fix getListType for list of tuples
- Refactor code refactoring for error handling for unsupported types
- Refactor code for error handling for unsupported types
- Add iterator core class
- Add functions for creating iterator-related calls in FunctionRegistry
- Fix the case when expression (also called testlist) in for loop contains multiple elements, i.e. "for a, b in (t1, t2), (t3, t4)" should work now
- Update BlockGeneratorVisitor
   1. Update visit NCall, declareVariables, assignToSingleVariable to make iterator related calls work
   2. Fix codegen for list of tuples
   3. Add support for for loops with iterator as expression
- Use fallback mode for mixed AST node types in for loop exprlist for now
- Add tests about iterators
- Fix len() call on EMPTYLIST
- Fix bug in UnrollLoopsVisitor. Add tests

authored by: Yunzhi Shao (yunzhi_shao@brown.edu)
Add preliminary support for ORC file format with both input and output support.

Details:
- refactor FileInputOperator
- add ORC input support
- add ORC output support
- add #ifndef to comment checks leading to slowdown when deserializing rows in Release mode.

authored by: Benjamin Givertz (benjamin_givertz@brown.edu)
Details:
- updated required packages for MacOS
- Fix for CMake when under MacOS snappy lib is installed with brew
- Fix for Google Colab

authored by: Leonhard Spiegelberg (lspiegel@cs.brown.edu)
Details:
- Add flag to disable aws tests, disabled per default
- Add ORC option which is disabled by default until ORC tests are figured out

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
Fix compile settings to produce more compatible wheel.

Details:
- fix top-level README.md, should still address some more issues from tuplex#30
- disable march=native per default
- change interactive shell to only be imported when correct toolkit is present

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
…lex#37)

Improve column renaming by allowing position based renaming (via integer index).

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
…eLists.txt (tuplex#39)

Fix various issues in CMake to allow build on Ubuntu 20.04.

Details:
- Remove LTO for GCC on Linux because it's buggy
- Add using ucm for cmake options
- Cleanup to-level cmake file
- Make sure BUILD_FOR_LAMBDA/BUILD_NATIVE do not conflict
- Add check on whether ninja is installed
- Restrict top-level setup.py to build shared object only
- better detection of runtime to skip bootstrapped tuplex_runtime.py file
- Fix shell autocompletion by removing old version dependency for jedi/prompt_toolkit

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
Add support for `is` keyword when one operand is `bool` or `None`. This allows to write code conformant to various linters in several IDEs.

authored-by: Yash Gotmare <sgotmare@cs.brown.edu>
Details:
- Update CMake file to handle better brewed versions
- enabling ORC feature in CI, still turned off per default until stability is reached

authored-by: Benjamin Givertz <benjamin_givertz@brown.edu>
Co-authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
Details:
- Add support for Join/Aggregate operator
- Fix webui documentation webui i.e. `webui.enable` instead of `webui`
- Udpate footer
- Package webui with tuplex using pip
- Add autostart/autoshutdown of mongodb/webui on local machine when webui is enabled
- Update dependencies, newer astor required to avoid bugs
- Change `"."` to not be detected as floating point number
- Change docker file to include MongoDB in CI container
- Passing extra CMake args to top-level setup via environment variable `CMAKE_ARGS`
- Updating root level `setup.py` to build tests as well when right option is given (i.e. set `TUPLEX_BUILD_ALL=1` as environment variable)
- Update azure CI to use root-level `setup.py` script 

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
Co-authored-by: Colby Anderson <colby_anderson@brown.edu>
Add auto-unrolling for loops to allow compiling UDFs that contain type changes during loop execution by unrolling the first iteration if type changes result in type-stability starting from the second iteration onwards.

Details:
- Add type speculation and first iteration auto-unrolling for loops (for/while)
- Add tests for continue and break in traced typing
- Add new CompileError TYPE_ERROR_UNSUPPORTED_LOOP_TESTLIST_TYPE
- Refactored TraceVisitor/BlockGeneratorVisitor/UDF + operators to use a new struct CompilePolicy to determine behavior of UDF compiler

authored by: Yunzhi Shao <yunzhi_shao@brown.edu>
Coauthored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
Add new function to retrieve a batch of row objects instead of fine-grained per row locking which made certain tests quite slow to execute.

Details:
- Add new spilling metrics
- Add new batch wise get function for result set

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
Lambda runner now part of pip package, various bugfixes for setup. New one-line python based setup function for AWS Lambda deployment.

- Update cmake files for recent GoogleTest master -> main renaming, fix ANTLR issues, better Python3 detection
- Add docker script to build lambda
- Add warning in reflection.py now warns for time magic
- Update PCRE2 link
- Add new python based script to package Lambda runner
- Change Github action update to build lambda runner automatically
- Exclude new musllinux platform in cibuildwheel
- Bugfix for boolean options, now correctly passed
- Force openssl backed curl because of NSS bugs
- Fix a couple contextoptions -> python conversions
- Add experimental logging to python from C++
- Speed up credential retrieval through smarter chain
- Refactor credentials for S3, using faster way to infer region. No more costly EC2 requests.
- Enable orc in lambda runner

authored-by: Leonhard Spiegelberg <lspiegel@cs.brown.edu>
… S3 (tuplex#62)

Add list functionality to both Posix and AWS S3 VirtualFileSystems. Fixed Cmake to be 3.21 to avoid conflict with AWS SDK. Improve Lambda packaging.

Details:
- Fix S3 uri key/bucket extraction
- Add filesystem ls function for both Posix and AWS S3
- Fix permission to make zip package executable and changing mode to defalted
- Fix tuplex_runtime location in runner
- Add support for temp credentials in Lambda via session token
- Restrict cmake to 3.21 because newer cmake 3.22 breaks aws sdk as reported in aws/aws-sdk-cpp#1820
- Update lambda cost calculation

authored by: Leonhard Spiegelberg (leonhard@brown.edu)
Fix bug where non-conforming rows will crash Tuplex when the majority type is a simple tuple containing var length fields due to a missing type check in parallelize.

authored by: Ben Givertz (ben_givertz@brown.edu)
… anymore (tuplex#59)

Fix bug caused by specifying directories like `.` or `..`. Instead of deleting the output directory, a check is run and an error message emitted.

Details:
- Exclude `.` and `..` from `ls` command in Posix option
- Add various Posix fs helpers in `Utils.h`
- Add tests to prevent deletion of `.` and other protected directories
- Add output validation check at action invocation (e.g., ds.tocsv) and during pipeline execution to cover most cases

authored-by: Ben Givertz <ben_givertz@brown.edu>
Co-authored-by: Leonhard Spiegelberg <leonhard@brown.edu>
…plex#64)

Fix crash when invoking a pipeline involving integers and output correct result for `is` when integers are used. This is due to the weird quirk of CPython caching integers from `-5` to `256`.

authored-by: Leonhard Spiegelberg <leonhard@brown.edu>
Fix increase of ref count after calling GetItem on List.

authored-by: Leonhard Spiegelberg <leonhard@brown.edu>
… SIGMOD'21 results. (tuplex#68)

Add detailed instructions and CLI to reproduce experiments from SIGMOD'21 paper.

Details:
- Update script to reflect AWS SDK changes
- Fix spark 2.4 link
- Fix python package versions
- Add detailed AWS setup instructions
- Add NUM_RUNS env variable if available
- Update boost link
- Update sbt installation
- Add missing libmagic-dev
- Make boto3 optional
- Add top-level CLI script to fetch data, run experiments, plot figures
- Commented output path validation because too buggy to ship yet.

authored-by: Leonhard Spiegelberg <leonhard@brown.edu>
@LeonhardFS LeonhardFS changed the title [HOTFOX] Fix for key with optional tuplex. prefixing [HOTFIX] Fix for key with optional tuplex. prefixing Dec 15, 2021
LeonhardFS added a commit to LeonhardFS/tuplex-public that referenced this pull request Jul 3, 2022
@LeonhardFS
Copy link
Contributor Author

@LeonhardFS LeonhardFS closed this Jul 3, 2022
LeonhardFS added a commit that referenced this pull request Jul 7, 2022
)

Fix various bugs and improve cmake for more python setups.

Details:
- Fix for key with optional tuplex. prefixing #69
- Fix hashing for rows with more complex types
- Fix running tests on a machine with multiple users by prefixing the scratch dir with the user name
- Improve python interpreter detection in cmake
- Remove unused function from test to find python stdlib
- Fix for missing module, or import failure
- Add another mod import fallback
- Add support for typing.Optional[...] type hint and catch errors when extracting type annotations in CSV
- Add another code extract fallback using new ast.unparse feature for python 3.9+
- Fix bug94 by making fallback partitions immortal as well

authored-by: Leonhard Spiegelberg <leonhard@brown.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants