这是indexloc提供的服务,不要输入任何密码
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 23 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,36 @@ This repository contains the code of the open stream processing benchmark.

All documentation can be found in our [wiki](https://github.com/Klarrio/open-stream-processing-benchmark/wiki).

This code base includes:
- benchmark pipeline implementations:
- data stream generator to generate input streams locally or on a DC/OS cluster
- Kafka scripts to start a cluster and read from a topic for local development
It includes:
- [benchmark](./benchmark): benchmark pipeline implementations
- [data-stream-generator](./data-stream-generator): data stream generator to generate input streams locally or on a DC/OS cluster
- [output-consumer](./output-consumer): consumes the output of the processing job and metrics-exporter from Kafka and stores it on S3.
- [evaluator](./evaluator): computes performance metrics on the output of the output consumer.
- [result analysis](./result-analysis): Jupyter notebooks to visualize the results.
- [deployment](./deployment): deployment scripts to run the benchmark on an DC/OS setup on AWS.
- [kafka-cluster-tools](./kafka-cluster-tools): Kafka scripts to start a cluster and read from a topic for local development
- [metrics-exporter](./metrics-exporter): exports metrics of JMX and cAdvisor and writes them to Kafka.

Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.

## Running the benchmark locally

Documentation on running the benchmark locally can be found in our [wiki](https://github.com/Klarrio/open-stream-processing-benchmark/wiki/Architecture-and-deployment).

## References, Publications and Talks
- [van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.](https://ieeexplore.ieee.org/abstract/document/9025240)
The Supplemental Material of this paper can be found [here](https://s3.amazonaws.com/ieeecs.cdn.csdl.public/trans/td/2020/08/extras/ttd202008-09025240s1-supp1-2978480.pdf).

- [van Dongen, G., & Van den Poel, D. (2021). A Performance Analysis Fault Recovery in Stream Processing Frameworks. IEEE Access.](https://ieeexplore.ieee.org/document/9466838)

- [van Dongen, G., & Van den Poel, D. (2021). Influencing Factors in the Scalability of Distributed Stream Processing Jobs. IEEE Access.](https://ieeexplore.ieee.org/document/9507502)

- Earlier work-in-progress publication:
[van Dongen, G., Steurtewagen, B., & Van den Poel, D. (2018, July). Latency measurement of fine-grained operations in benchmarking distributed stream processing frameworks. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 247-250). IEEE.](https://ieeexplore.ieee.org/document/8457759)

Talks related to this publication:
Talks related to this publication:

- Spark Summit Europe 2019: [Stream Processing: Choosing the Right Tool for the Job - Giselle van Dongen](https://www.youtube.com/watch?v=PiEQR9AXgl4&t=2s)

## Issues, questions or need help?

Are you having issues with anything related to the project? Do you wish to use this project or extend it? The fastest way to contact me is through:

LinkedIn: [giselle-van-dongen](https://www.linkedin.com/in/giselle-van-dongen/)

Email: giselle.vandongen@klarrio.com

5 changes: 5 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# OSPBench: Open Stream Processing Benchmark

Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.

Please consult the wiki of the repository to see details on deployment and running locally.
2 changes: 1 addition & 1 deletion benchmark/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,4 @@ def frameworkSettings(framework: String, versionDocker: String) = Seq(
version := versionDocker,
fork in Test := true,
envVars in Test := Map("DEPLOYMENT_TYPE" -> "local", "MODE" -> "constant-rate", "KAFKA_BOOTSTRAP_SERVERS" -> "localhost:9092")
)
)
2 changes: 1 addition & 1 deletion benchmark/common-benchmark/src/main/resources/local.conf
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ environment {
}

general {
last.stage = "101"
last.stage = "3"
partitions = 2
stream.source {
volume = "1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class BenchmarkSettingsForFlink(overrides: Map[String, Any] = Map()) extends Ser
val checkpointDir = new File(general.configProperties.getString("flink.checkpoint.dir"))
Try(FileUtils.cleanDirectory(checkpointDir))
"file://" + checkpointDir.getCanonicalPath
} else general.configProperties.getString("flink.checkpoint.dir") + System.currentTimeMillis() + "/"
} else general.configProperties.getString("flink.checkpoint.dir") + general.outputTopic + "/"


val jobProfileKey: String = general.mkJobProfileKey("flink", bufferTimeout)
Expand Down
Loading