Notebook

Summary

DataPulse supports running Jupyter notebooks with PySpark integration on Kubernetes.

Notebook is delpoyed as a Kubernetes deployment, and exposed as a Kubernetes service named notebook.

To create a notebook, access the following service:

Within the notebook, run the following code to automatically create a Spark session:

start()

The Spark UI link will also be given after the Spark session is created.

The notebook files are persisted in GCS.

The event logs of notebook pyspark applications are persisted in GCS.

The auto creation of Spark session is done by the startup script startup.py

It will do the following things:

The post-save hook is implemented in gcs_save_hook.py

It will automatically sync the notebook files to GCS after each save.