In this interactive half-day tutorial, participants explore the advanced applications of the National Science Data Fabric (NSDF) services and comprehensive strategies for end-to-end scientific data analysis.
The tutorial targets a broad audience—from researchers and students to developers and scientists—each finding valuable insights into managing and analyzing large datasets, with a particular focus on datasets exceeding 100 TB.
Attendees gain hands-on experience constructing modular workflows, leveraging public and private data storage and streaming solutions, and deploying sophisticated visualization and analysis dashboards for scientific discovery.
The tutorial highlights NSDF's role in supporting the VIS conference themes by providing scalable solutions for advances in visualization and visual analytics. It covers topics ranging from an overview of NSDF capabilities and common pain points in large-scale data analysis to hands-on exercises using NSDF services for Earth science datasets.
Advanced modules include handling and visualizing massive datasets in domains requiring high-resolution data management. Participants leave the tutorial with a deeper understanding of how NSDF services integrate into their research workflows to enhance data accessibility, sharing, and collaborative scientific discovery.
This tutorial advances knowledge in data-intensive computing and empowers attendees to harness the full potential of NSDF in their research domains.
The tutorial is organized into four progressive modules that guide participants from environment setup to large-scale scientific data analysis using the National Science Data Fabric (NSDF).
Each module builds on the previous one and introduces increasingly advanced capabilities for data-intensive scientific workflows.
| Module | Duration | Objective |
|---|---|---|
| I | 30 mins | Overview of the National Science Data Fabric (NSDF) and discussion of common challenges in large-scale scientific data analysis identified through user interviews. |
| II | 1 hour | Hands-on introduction to NSDF services, including visualization and dashboard creation using Earth science datasets. |
| III | 1 hour | Advanced NSDF capabilities for managing and analyzing datasets exceeding 100 TB, including scalable data access and processing workflows. |
| IV | 30 mins | Interactive Q&A session and discussion on how NSDF can support research across multiple scientific domains. |
Participants can run the tutorial using three supported environments:
- GitHub Codespaces (recommended for quick access)
- Docker containers
- ACCESS Jetstream2 cloud resources
Detailed setup instructions are provided in Module 1.
This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 2138811.