Polaris Spark DevBox

A development environment for experimenting with Apache Polaris, Apache Spark, and Apache Iceberg integration. This project provides a Docker-based setup for quick prototyping and learning.

Important

This environment is intended for development and testing purposes only. Not suitable for production use.
This project is under active development and subject to changes.

Overview

Polaris Spark DevBox offers a pre-configured environment that combines:

Apache Polaris - A Rust-based query engine built on Apache Arrow DataFusion
Apache Spark (v3.5.4) - Unified analytics engine for large-scale data processing
Apache Iceberg - Open table format for huge analytic datasets

The environment includes:

Polaris Server (Java 21)
Apache Spark 3.5.4 with Hadoop 3
Jupyter Notebook with PySpark integration
Pre-configured networking and volume management
Sample datasets and Iceberg table examples

Core Features

🚀 Zero-configuration setup with Apache Polaris and Spark
📓 Integrated Jupyter environment with PySpark
🗂 Apache Iceberg table format support
🐳 Containerized development
🔄 Automated initialization
📚 Example notebooks demonstrating Polaris, Spark, and Iceberg integration
🛠️ Development utilities

Prerequisites

Note

Make sure you have all prerequisites installed before proceeding with the setup.

Python 3.11+
Docker Desktop
- Windows/macOS: Use Docker Desktop
- Linux: Docker Desktop for Linux or Docker Engine with Docker Compose
Sufficient disk space for containers and volumes

Environment Configuration

Caution

Keep your environment variables secure and never commit the .env file to version control.

Required variables in .env:

COMPOSE_PROJECT_NAME=polaris_spark_dev
POLARIS_CATALOG_NAME=my_catalog
POLARIS_DEFAULT_BASE_LOCATION=file:///data/polaris
POLARIS_PRINCIPAL_NAME=polarisuser
POLARIS_PRINCIPAL_ROLE_NAME=polarisuser_role
POLARIS_CATALOG_ROLE_NAME=my_catalog_role
POLARIS_API_HOST=localhost
POLARIS_API_PORT=8181

Getting Started

Tip

Ensure Docker is running before starting the containers.

Start the environment:
```
docker-compose up -d
```
Verify container status:
```
docker-compose ps
```
Setup Apache Polaris:
```
./setup
```
[!NOTE] This will configure Polaris with
- a catalog
- a principal and principal role
- a catalog role
- assign the catalog role to the principal role
- grant privileges
- generate a simple Jupyter Notebook verify the setup

Access Points

Note

All services are configured to run on localhost by default.

Jupyter Notebook: http://localhost:8888
Polaris API: http://localhost:10081
Polaris Admin: http://localhost:10082

Integrations

Apache Spark Integration

Uses Apache Spark 3.5.4
PySpark for Python interface
Spark SQL for data querying
Built-in Spark History Server

Apache Iceberg Integration

Apache Iceberg table format support
Schema evolution
Time travel queries
Partition evolution
Hidden partitioning

Apache Polaris Features

REST Catalog
SQL query support
Distributed query execution
Integration with Iceberg tables

Project Structure

polaris-spark-devbox/
├── connection_config.py     # Configuration utility
├── requirements.txt        # Python dependencies
├── docker-compose.yml     # Container orchestration
├── conf/                  # Configuration files
├── templates/             # Template files
├── notebooks/            # Jupyter notebooks
└── http/                # HTTP test files

Documentation

Note

All examples and documentation are automatically generated during setup.

The setup generates two types of documentation:

Jupyter Notebooks (notebooks/polaris_setup_verify.ipynb)
- Setup verification
- API usage examples
- Iceberg table operations
- Spark SQL queries
HTTP Files (http/polaris.http)
- REST API documentation
- Testing endpoints

Building Custom Images

Note

This section is for advanced users who want to build their own custom images.

The project uses Task to manage image builds. The Taskfile.yml provides tasks to build:

PySpark Notebook image with Jupyter
Apache Polaris base and server images

Prerequisites

Task installed
Docker with multi-platform build support

Available Tasks

# Build all images
task

# Build only Spark Notebook image
task build_spark_notebook_image

# Build Polaris base image
task build_polaris_base

# Build Polaris server image
task build_polaris_image

Configuration Variables

You can customize the build by modifying these variables in Taskfile.yml:

vars:
    JAVA_VERSION: 17                    # Java version for builds
    SPARK_VERSION: 3.5.4                # Apache Spark version
    HADOOP_VERSION: 3                   # Hadoop version
    POLARIS_VERSION: 0.9.x              # Apache Polaris version
    SPARK_NOTEBOOK_IMAGE: ghcr.io/kameshsampath/polaris-spark-devbox/spark35notebook
    POLARIS_BASE_IMAGE: ghcr.io/kameshsampath/polaris-spark-devbox/polaris-base
    POLARIS_SERVER_IMAGE: ghcr.io/kameshsampath/polaris-spark-devbox/polaris

Tip

The Polaris base image is built for both ARM64 and AMD64 architectures.

Dependencies

Related Resources

Contributing

Tip

Before submitting a PR, make sure to test your changes thoroughly.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

License

Apache License 2.0. See LICENSE for details.

Support

For questions and support:

Open an issue in the GitHub repository
Connect on LinkedIn

Built with ❤️ for Open Source by Kamesh Sampath, Developer Relations @ Snowflake

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
conf		conf
http		http
log		log
notebooks		notebooks
pyspark-notebook		pyspark-notebook
templates		templates
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.tool-versions		.tool-versions
Dockerfile.polaris		Dockerfile.polaris
Dockerfile.pyspark		Dockerfile.pyspark
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup		setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Polaris Spark DevBox

Overview

Core Features

Prerequisites

Environment Configuration

Getting Started

Access Points

Integrations

Apache Spark Integration

Apache Iceberg Integration

Apache Polaris Features

Project Structure

Documentation

Building Custom Images

Prerequisites

Available Tasks

Configuration Variables

Dependencies

Related Resources

Contributing

License

Support

About

Uh oh!

Packages

Uh oh!

Uh oh!

Languages

License

kameshsampath/polaris-spark-devbox

Folders and files

Latest commit

History

Repository files navigation

Polaris Spark DevBox

Overview

Core Features

Prerequisites

Environment Configuration

Getting Started

Access Points

Integrations

Apache Spark Integration

Apache Iceberg Integration

Apache Polaris Features

Project Structure

Documentation

Building Custom Images

Prerequisites

Available Tasks

Configuration Variables

Dependencies

Related Resources

Contributing

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Languages

Packages