Docker compose local (dev) distributed Storm cluster with full observability and network simulation by GGraziadei · Pull Request #8706 · apache/storm

GGraziadei · 2026-05-22T11:36:40Z

What is the purpose of the change

This PR introduces a repeatable, Docker-based distributed Storm dev cluster designed for realistic benchmarking storm-perf on a local machine. It provisions a complete environment, including Nimbus, ZooKeeper, and two Supervisor, forcing inter-worker traffic across the network to trigger true serialization overhead.
Backed by a full observability stack (Prometheus and Grafana), the setup provides granular, per-task tracking via Storm Metrics v2. Additionally, it includes a netsim.sh utility to inject controlled network latency and jitter, allowing developers to easily stress-test topology resilience and analyze bottlenecks under degraded network conditions.

How was the change tested

I verified the environment by executing the benchmark smoke test outlined in the README.md, running the FileReadWordCountTopo topology for 120 seconds across two workers on separate supervisors.
Smoke testing successfully validated the baseline performance and the replication of bottlenecks. Injecting typical datacenter network conditions (3 ms latency, 1 ms jitter) caused average complete latency to rise from 390 ms to 446 ms; this induced back-pressure safely reduced total tuple throughput from 40.93M to 36.25M without dropping packets.

rzo1

Thanks for the PR. Useful dev tooling and the README covers the setup well. A few things to sort out before it can merge against master:

1. CI will fail on Apache RAT. The two Grafana dashboards (grafana/dashboards/storm-cluster.json, storm-metrics-v2.json) have no ASF license header, and JSON has no comment syntax to carry one. RAT does scan JSON (we already exclude package-lock.json in the root pom.xml), so please add an exclusion there, e.g.:

<exclude>**/dev-tools/cluster/grafana/dashboards/*.json</exclude>

Please run mvn apache-rat:check -Prat locally to confirm nothing else (e.g. the new extlib-daemon/.gitignore) trips it.

2. topology.tuple.compression.enable references an unmerged feature. This config key isn't on master (the existing storm.compression.zstd.* / ZstdBridgeThriftSerializationDelegate is cluster-state serialization, not tuple compression). The Dockerfile comment "so it runs your code (e.g. the zstd tuple-compression feature)" and the topology.tuple.compression.enable: false in FileReadWordCountTopo-cluster.yaml will be silently ignored. Please drop these so the harness works against current master, or land it alongside the tuple-compression PR. (The EWMA/jitter config and metrics are fine — those are already on master.)

3. Please bind published ports to localhost. docker-compose.yml publishes 6627/8080/9090/3000 on 0.0.0.0. With unauthenticated Nimbus Thrift and Grafana admin/admin, that exposes a dev cluster to the whole LAN. 127.0.0.1:8080:8080 etc. is safer.

4. Windows support is missing — fine as a follow-up, but please note the Linux/macOS (or WSL2) requirement in the README Prerequisites. The scripts are bash + mvn (not mvn.cmd), and netsim.sh is tc/netem-only by nature.

Minor:

netsim.sh hardcodes cluster-supervisor{1,2}-1, which assumes the Compose project name is cluster; breaks under -p <name> or a renamed checkout. Consider resolving via docker compose ps -q supervisor1.
prepare-extlib.sh defaults STORM_VERSION to 3.0.0-SNAPSHOT instead of reading the pom like build-image.sh does — it'll cp a wrong-named jar after a version bump. Source .env or read the pom.
storm-metrics-v2.json is missing a trailing newline.

GGraziadei · 2026-05-25T08:57:33Z

Thanks for the detailed review and the helpful insights!
I have applied all the requested changes.
Added the exclusion for Grafana json dashboards in the root pom.xml and verified locally with mvn apache-rat:check -Prat.
Removed the references and configurations related to the unmerged tuple compression feature (apologies for the distraction and mixing the PR contents!).
Bound all published ports to 127.0.0.1 in docker-compose.yml to prevent LAN exposure.
Updated the README to state the Linux/macOS/WSL2 requirement explicitly, and resolved the hardcoded container names in netsim.sh, fixed the versioning fallback in prepare-extlib.sh, and added the missing trailing newline for a json file.

Everything is now pushed and ready for another look!

reiabreu · 2026-05-26T23:07:44Z

+#
+# netem applies to *egress* on each supervisor's eth0, so it shapes ALL traffic
+# leaving that container (inter-worker tuples, but also heartbeats to Nimbus/ZK).
+# Keep the delay moderate (<= ~150ms) so heartbeats don't time out. With both


can we echo a warning if supplied delay > 150?

reiabreu · 2026-05-26T23:15:09Z

@@ -0,0 +1,57 @@
+#!/usr/bin/env bash


can we add the execution of this script to build-image.sh ? It would be one less explicit step in setting up the environment

I totally agree. I'm moving this logic inside the build-image.sh script and adding a flag to disable prepare-extlib. This will be useful for developers who don't need to touch the metrics logic.

GGraziadei · 2026-05-27T14:15:58Z

Thank you for the review. I addressed all the comments, and I have just pushed the changes.

docker compose dev cluster

75ac619

GGraziadei changed the title ~~docker compose dev cluster~~ Docker compose local (dev) distributed Storm cluster with full observability and network simulation May 22, 2026

GGraziadei marked this pull request as draft May 22, 2026 12:01

GGraziadei marked this pull request as ready for review May 22, 2026 13:43

GGraziadei mentioned this pull request May 23, 2026

Add tuple compression for inter-worker communication #8707

Open

add license header

324156e

rzo1 reviewed May 24, 2026

View reviewed changes

review changes

81dfdc1

reiabreu reviewed May 26, 2026

View reviewed changes

minor changes

e524f1b

GGraziadei requested review from reiabreu and rzo1 May 27, 2026 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker compose local (dev) distributed Storm cluster with full observability and network simulation#8706

Docker compose local (dev) distributed Storm cluster with full observability and network simulation#8706
GGraziadei wants to merge 4 commits into
apache:masterfrom
GGraziadei:docker-dev-cluster

GGraziadei commented May 22, 2026

Uh oh!

rzo1 left a comment

Uh oh!

GGraziadei commented May 25, 2026

Uh oh!

reiabreu May 26, 2026

Uh oh!

reiabreu May 26, 2026

Uh oh!

GGraziadei May 27, 2026

Uh oh!

GGraziadei commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GGraziadei commented May 22, 2026

What is the purpose of the change

How was the change tested

Uh oh!

rzo1 left a comment

Choose a reason for hiding this comment

Uh oh!

GGraziadei commented May 25, 2026

Uh oh!

reiabreu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

reiabreu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

GGraziadei May 27, 2026

Choose a reason for hiding this comment

Uh oh!

GGraziadei commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants