这是indexloc提供的服务,不要输入任何密码
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
5c99143
script updates to reflect AWS SDK changes
LeonhardFS Dec 7, 2021
4f8fbb6
fix spark link
Dec 8, 2021
c3de456
fixing python package versions
Dec 8, 2021
ee866fa
fix numpy to 1.15.4
LeonhardFS Dec 8, 2021
e67f849
repro effort
LeonhardFS Dec 8, 2021
5bf274a
readme update
LeonhardFS Dec 8, 2021
5a2b2e7
AWS setup instructions
LeonhardFS Dec 8, 2021
142202e
more description text
LeonhardFS Dec 9, 2021
9ddef92
plotting updated, fig3 done
LeonhardFS Dec 9, 2021
aba9167
fig3, fig7, table3 done
LeonhardFS Dec 9, 2021
c51d19a
figure 4 done
LeonhardFS Dec 9, 2021
a91d4dc
figure 8
LeonhardFS Dec 9, 2021
797befd
added figure9
LeonhardFS Dec 9, 2021
80cfd6b
added figure6
LeonhardFS Dec 9, 2021
34578ad
results
LeonhardFS Dec 9, 2021
67f2348
cleanup, plotting now fully working
LeonhardFS Dec 9, 2021
a617955
build script
LeonhardFS Dec 9, 2021
aede534
using NUM_RUNS env variable if available
LeonhardFS Dec 9, 2021
b5629d1
update boost link
LeonhardFS Dec 9, 2021
e55cf2f
fix
LeonhardFS Dec 9, 2021
92b32b9
new sbt install script
LeonhardFS Dec 9, 2021
b5cb722
fix
LeonhardFS Dec 9, 2021
b0cb954
missing libmagic added
LeonhardFS Dec 9, 2021
e2e156b
updated dependencies
LeonhardFS Dec 9, 2021
5a63493
update build script
LeonhardFS Dec 9, 2021
4a6df1d
compile fix
LeonhardFS Dec 9, 2021
dc2abee
fix
LeonhardFS Dec 9, 2021
16ebc5d
make boto3 optional
LeonhardFS Dec 9, 2021
3960b74
Bug fix for file output
bgivertz Dec 10, 2021
1961371
Bug fix for file output
bgivertz Dec 10, 2021
8f1ba2a
force add files
LeonhardFS Dec 10, 2021
335d025
removed debug print, why is this on master?
LeonhardFS Dec 10, 2021
9bef5a5
trying to normalize pattern
LeonhardFS Dec 10, 2021
52f2f70
removing tuplex output dirs becuse of the output validation -.-
LeonhardFS Dec 10, 2021
a59c4ba
add logs
LeonhardFS Dec 10, 2021
3403b17
more output validation challenges
LeonhardFS Dec 10, 2021
43cb9b7
deactivating validation of output specification because it's buggy
LeonhardFS Dec 10, 2021
9e9c42c
updated help message in script
LeonhardFS Dec 10, 2021
87c35d1
container management
LeonhardFS Dec 10, 2021
11ceb99
changed uniqueFileName func
LeonhardFS Dec 10, 2021
cfe3f03
start/stop commands added
LeonhardFS Dec 10, 2021
8e891f0
added run commands
LeonhardFS Dec 10, 2021
6280a58
run
LeonhardFS Dec 10, 2021
44b0969
refactored
LeonhardFS Dec 10, 2021
2bd820e
flag fix
LeonhardFS Dec 10, 2021
e1a2274
detach container automaticallhy
LeonhardFS Dec 10, 2021
43797e0
typo fix
LeonhardFS Dec 10, 2021
e988ceb
fix
LeonhardFS Dec 10, 2021
0713716
speculative container removal
LeonhardFS Dec 10, 2021
7d0fe78
adding missing quotes
LeonhardFS Dec 10, 2021
50c7648
lowercase
LeonhardFS Dec 10, 2021
9e9f675
more printing
LeonhardFS Dec 10, 2021
0b4f797
fix cmd
LeonhardFS Dec 10, 2021
73c7976
debug print
LeonhardFS Dec 10, 2021
2eebb5a
start container
LeonhardFS Dec 10, 2021
bfc3df4
new data download command
LeonhardFS Dec 10, 2021
6962951
link update
LeonhardFS Dec 10, 2021
d63564e
updated README
LeonhardFS Dec 10, 2021
c091188
update
LeonhardFS Dec 10, 2021
41e7942
hanging 7z
LeonhardFS Dec 10, 2021
e17647b
overwrite
LeonhardFS Dec 10, 2021
b9084dd
fix
LeonhardFS Dec 10, 2021
f314457
updating reqs
LeonhardFS Dec 10, 2021
6125b64
docker exec fix
LeonhardFS Dec 10, 2021
3c7e4fd
decode fix
LeonhardFS Dec 10, 2021
231acf7
Merge branch 'master' into sigmod-repro
LeonhardFS Dec 13, 2021
e60816c
add missing path conversion
LeonhardFS Dec 13, 2021
f297917
deactivating output validation, too buggy
LeonhardFS Dec 13, 2021
9b0b609
Trigger CI
bgivertz Dec 13, 2021
b4732aa
added changes in
LeonhardFS Dec 14, 2021
a83d47b
deactivating tests to validate output file specifiation
LeonhardFS Dec 14, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion benchmarks/311/runbenchmark.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

# use 5 runs (3 for very long jobs) and a timeout after 180min/3h
NUM_RUNS=11
NUM_RUNS="${NUM_RUNS:-11}"
TIMEOUT=14400
DATA_PATH='/data/311/311_preprocessed.csv'
RESDIR=/results/311
Expand All @@ -19,10 +19,12 @@ cp tuplex_config.json ${RESDIR}
echo "running tuplex"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-e2e-$r.txt"
rm -rf "${OUTPUT_DIR}/tuplex_output"
timeout $TIMEOUT ${PYTHON} runtuplex.py --path $DATA_PATH --output-path "${OUTPUT_DIR}/tuplex_output" >$LOG 2>$LOG.stderr
done
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-weld-$r.txt"
rm -rf "${OUTPUT_DIR}/tuplex_output"
timeout $TIMEOUT ${PYTHON} runtuplex.py --path $DATA_PATH --weld-mode --output-path "${OUTPUT_DIR}/tuplex_output" >$LOG 2>$LOG.stderr
done

Expand Down
18 changes: 17 additions & 1 deletion benchmarks/dirty_zillow/runbenchmark.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

# use 10 runs (3 for very long jobs) and a timeout after 180min/3h
NUM_RUNS=11
NUM_RUNS="${NUM_RUNS:-11}"
TIMEOUT=14400

DATA_PATH='/data/zillow/Zdirty/zillow_dirty@10G.csv'
Expand Down Expand Up @@ -31,36 +31,42 @@ echo "running experiments using 16x parallelism"
echo "running tuplex(synth) in resolve mode w. interpreter only"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-interpreter-synth-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode resolve --resolve-with-interpreter --path $DATA_SYNTH_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex(synth) in resolve mode w. interpreter only single-threaded"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-interpreter-st-synth-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --single-threaded --mode resolve --resolve-with-interpreter --path $DATA_SYNTH_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in resolve mode w. interpreter only"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-interpreter-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode resolve --resolve-with-interpreter --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in plain mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-plain-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode plain --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in resolve mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode resolve --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in custom mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-custom-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode custom --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

Expand All @@ -70,61 +76,71 @@ echo "running synthetic benchmark w. 16x parallelism"
echo "running tuplex in plain mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-plain-synth-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode plain --path $DATA_SYNTH_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in resolve mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-synth-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode resolve --path $DATA_SYNTH_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in custom mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-custom-synth-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode custom --path $DATA_SYNTH_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running experiments in single-thread mode"
echo "running tuplex in plain mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-plain-st-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --single-threaded --mode plain --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in resolve mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-st-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --single-threaded --mode resolve --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in custom mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-custom-st-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --single-threaded --mode custom --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in resolve(interpreter) mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-interpreter-st-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --single-threaded --mode resolve --resolve-with-interpreter --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running in-order resolve 16x parallelism experiments"
echo "running tuplex in plain (in order) mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-plain-in-order-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode plain --resolve-in-order --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in resolve (in order) mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-resolve-in-order-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode resolve --resolve-in-order --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done

echo "running tuplex in custom (in order) mode"
for ((r = 1; r <= NUM_RUNS; r++)); do
LOG="${RESDIR}/tuplex-run-custom-in-order-$r.txt"
rm -rf $OUTPUT_DIR
timeout $TIMEOUT ${PYTHON} runtuplex.py --mode custom --resolve-in-order --path $DATA_PATH --output-path $OUTPUT_DIR >$LOG 2>$LOG.stderr
done
Loading