CLI tool for deploying and benchmarking lakehouse architectures on Kubernetes.
Note: This package is published as
lakebench-k8son PyPI. Install withpip install lakebench-k8s. The CLI command islakebench.
Choosing between Hive and Polaris, Iceberg and Delta, or sizing Spark for 100 GB vs 10 TB shouldn't require weeks of manual setup. Lakebench deploys a complete lakehouse stack from a single YAML file, generates realistic data at any scale, runs the pipeline, benchmarks query performance, and tears everything down — so you can focus on comparing architectures, not plumbing.
pip install lakebench-k8sOr with pipx: pipx install lakebench-k8s
Pre-built binaries (no Python required) are available on GitHub Releases.
- Python 3.10+
kubectlandhelmon PATH- Kubernetes cluster (1.26+)
- S3-compatible object storage (FlashBlade, MinIO, AWS S3, etc.)
# 1. Generate config (interactive prompts for S3 details)
lakebench init --interactive
# 2. Deploy infrastructure
lakebench deploy lakebench.yaml
# 3. Generate test data
lakebench generate lakebench.yaml --wait
# 4. Run the pipeline + benchmark
lakebench run lakebench.yaml
# 5. View results
lakebench report
# 6. Tear down
lakebench destroy lakebench.yaml| Command | Description |
|---|---|
lakebench init |
Generate a starter configuration file |
lakebench validate <config> |
Validate config and test connectivity |
lakebench info <config> |
Show configuration summary |
lakebench recommend |
Recommend cluster sizing for a scale factor |
lakebench deploy <config> |
Deploy all infrastructure |
lakebench generate <config> |
Generate synthetic data to bronze bucket |
lakebench run <config> |
Execute the medallion pipeline with metrics |
lakebench benchmark <config> |
Run 8-query Trino benchmark |
lakebench query <config> |
Execute SQL queries against Trino |
lakebench status [config] |
Show deployment status |
lakebench logs <component> [config] |
Stream logs from a component |
lakebench report |
Generate HTML benchmark report |
lakebench destroy <config> |
Tear down all resources |
| Component | Version |
|---|---|
| Spark Operator | 2.4.0 (Kubeflow) |
| Apache Spark | 3.5.3 |
| Apache Iceberg | 1.5.0 |
| Hive Metastore | 3.1.3 (Stackable 25.7.0) |
| Trino | 479 |
| PostgreSQL | 16 |
Full documentation is in the docs/ directory:
- Getting Started — Prerequisites, install, first deployment
- Configuration — Full YAML reference
- CLI Reference — All commands and flags
- Recipes — Supported component combinations
- Troubleshooting — Common errors and fixes
Apache 2.0