Klarrio · gvdongen · Aug 10, 2021 · Aug 10, 2021
diff --git a/README.md b/README.md
@@ -4,24 +4,36 @@ This repository contains the code of the open stream processing benchmark.
 
 All documentation can be found in our [wiki](https://github.com/Klarrio/open-stream-processing-benchmark/wiki).
 
-This code base includes:
-- benchmark pipeline implementations: 
-- data stream generator to generate input streams locally or on a DC/OS cluster
-- Kafka scripts to start a cluster and read from a topic for local development
+It includes:
+- [benchmark](./benchmark): benchmark pipeline implementations
+- [data-stream-generator](./data-stream-generator): data stream generator to generate input streams locally or on a DC/OS cluster
+- [output-consumer](./output-consumer): consumes the output of the processing job and metrics-exporter from Kafka and stores it on S3.
+- [evaluator](./evaluator): computes performance metrics on the output of the output consumer.
+- [result analysis](./result-analysis): Jupyter notebooks to visualize the results.
+- [deployment](./deployment): deployment scripts to run the benchmark on an DC/OS setup on AWS.
+- [kafka-cluster-tools](./kafka-cluster-tools): Kafka scripts to start a cluster and read from a topic for local development
+- [metrics-exporter](./metrics-exporter): exports metrics of JMX and cAdvisor and writes them to Kafka.
 
 Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.
-
-## Running the benchmark locally 
-
-Documentation on running the benchmark locally can be found in our [wiki](https://github.com/Klarrio/open-stream-processing-benchmark/wiki/Architecture-and-deployment).
-
 ## References, Publications and Talks
 - [van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.](https://ieeexplore.ieee.org/abstract/document/9025240)
 The Supplemental Material of this paper can be found [here](https://s3.amazonaws.com/ieeecs.cdn.csdl.public/trans/td/2020/08/extras/ttd202008-09025240s1-supp1-2978480.pdf).
 
+- [van Dongen, G., & Van den Poel, D. (2021). A Performance Analysis Fault Recovery in Stream Processing Frameworks. IEEE Access.](https://ieeexplore.ieee.org/document/9466838)
+
+- [van Dongen, G., & Van den Poel, D. (2021). Influencing Factors in the Scalability of Distributed Stream Processing Jobs. IEEE Access.](https://ieeexplore.ieee.org/document/9507502)
+
 - Earlier work-in-progress publication:
 [van Dongen, G., Steurtewagen, B., & Van den Poel, D. (2018, July). Latency measurement of fine-grained operations in benchmarking distributed stream processing frameworks. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 247-250). IEEE.](https://ieeexplore.ieee.org/document/8457759)
-
-Talks related to this publication: 
+Talks related to this publication:
 
 - Spark Summit Europe 2019: [Stream Processing: Choosing the Right Tool for the Job - Giselle van Dongen](https://www.youtube.com/watch?v=PiEQR9AXgl4&t=2s)
+
+## Issues, questions or need help?
+
+Are you having issues with anything related to the project? Do you wish to use this project or extend it? The fastest way to contact me is through:
+
+LinkedIn: [giselle-van-dongen](https://www.linkedin.com/in/giselle-van-dongen/)
+
+Email: giselle.vandongen@klarrio.com
+
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,5 @@
+# OSPBench: Open Stream Processing Benchmark
+
+Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.
+
+Please consult the wiki of the repository to see details on deployment and running locally.
diff --git a/benchmark/build.sbt b/benchmark/build.sbt
@@ -76,4 +76,4 @@ def frameworkSettings(framework: String, versionDocker: String) = Seq(
   version := versionDocker,
   fork in Test := true,
   envVars in Test := Map("DEPLOYMENT_TYPE" -> "local", "MODE" -> "constant-rate", "KAFKA_BOOTSTRAP_SERVERS" -> "localhost:9092")
-)
+)
diff --git a/benchmark/common-benchmark/src/main/resources/local.conf b/benchmark/common-benchmark/src/main/resources/local.conf
@@ -6,7 +6,7 @@ environment {
 }
 
 general {
-  last.stage = "101"
+  last.stage = "3"
   partitions = 2
   stream.source {
     volume = "1"

diff --git a/benchmark/flink-benchmark/src/main/scala/flink/benchmark/BenchmarkSettingsForFlink.scala b/benchmark/flink-benchmark/src/main/scala/flink/benchmark/BenchmarkSettingsForFlink.scala
@@ -31,7 +31,7 @@ class BenchmarkSettingsForFlink(overrides: Map[String, Any] = Map()) extends Ser
       val checkpointDir = new File(general.configProperties.getString("flink.checkpoint.dir"))
       Try(FileUtils.cleanDirectory(checkpointDir))
       "file://" + checkpointDir.getCanonicalPath
-    } else general.configProperties.getString("flink.checkpoint.dir") + System.currentTimeMillis() + "/"
+    } else general.configProperties.getString("flink.checkpoint.dir") + general.outputTopic + "/"
 
 
     val jobProfileKey: String = general.mkJobProfileKey("flink", bufferTimeout)