diff --git a/notebook_metadata.yml b/notebook_metadata.yml index 1abf2a31..a1532e4a 100644 --- a/notebook_metadata.yml +++ b/notebook_metadata.yml @@ -43,19 +43,31 @@ section: Euclid description: Review the types of flux measurements available, load template-fit and aperture magnitudes, and plot distributions and comparisons for different object types. - title: OpenUniverse2024 Visualization - file: tutorials/simulated-data/OpenUniverse2024Preview_Firefly.md + file: tutorials/simulated-data/OpenUniverse2024/OpenUniverse2024Preview_Firefly.md section: Simulated Data description: Use Firefly to get an overview of survey structure and visualize content. - title: CosmoDC2 Data Access file: tutorials/simulated-data/cosmoDC2_TAP_access.md section: Simulated Data description: Access, query, and visualize the CosmoDC2 catalog. +- title: OpenUniverse2024 Quickstart + file: tutorials/simulated-data/OpenUniverse2024/openuniverse2024_quickstart.md + section: Simulated Data + description: Access OpenUniverse2024 Roman and Rubin images and catalogs. +- title: OpenUniverse2024 TDE Light Curve + file: tutorials/simulated-data/OpenUniverse2024/TDE_light_curve.md + section: Simulated Data + description: Locate a simulated TDE, retrieve Roman images, and build a multi-epoch light curve. +- title: OpenUniverse2024 SED Fitting + file: tutorials/simulated-data/OpenUniverse2024/SED_fit.md + section: Simulated Data + description: Fit spectral energy distributions for supernova host galaxies using Prospector. - title: OpenUniverse2024 Time Domain - file: tutorials/simulated-data/openuniverse2024_roman_simulated_timedomainsurvey.md + file: tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_timedomainsurvey.md section: Simulated Data description: Access and analyze the simulated time-domain OpenUniverse2024 survey. - title: OpenUniverse2024 Roman Coadds - file: tutorials/simulated-data/openuniverse2024_roman_simulated_wideareasurvey.md + file: tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_wideareasurvey.md section: Simulated Data description: Access OpenUniverse2024 wide-area simulated survey data. - title: Roman HLSS Number Density diff --git a/toc.yml b/toc.yml index 0e49c5b8..3794a078 100644 --- a/toc.yml +++ b/toc.yml @@ -64,13 +64,19 @@ project: - title: Roman HLSS Number Density file: tutorials/simulated-data/roman_hlss_number_density.md - title: OpenUniverse2024 Roman coadds - file: tutorials/simulated-data/openuniverse2024_roman_simulated_wideareasurvey.md + file: tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_wideareasurvey.md - title: OpenUniverse2024 Visualization - file: tutorials/simulated-data/OpenUniverse2024Preview_Firefly.md + file: tutorials/simulated-data/OpenUniverse2024/OpenUniverse2024Preview_Firefly.md - title: OpenUniverse2024 Time Domain - file: tutorials/simulated-data/openuniverse2024_roman_simulated_timedomainsurvey.md + file: tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_timedomainsurvey.md - title: CosmoDC2 Data Access file: tutorials/simulated-data/cosmoDC2_TAP_access.md + - title: OpenUniverse2024 Quickstart + file: tutorials/simulated-data/OpenUniverse2024/openuniverse2024_quickstart.md + - title: OpenUniverse2024 TDE Light Curve + file: tutorials/simulated-data/OpenUniverse2024/TDE_light_curve.md + - title: OpenUniverse2024 SED Fitting + file: tutorials/simulated-data/OpenUniverse2024/SED_fit.md - title: Techniques & Tools file: tutorials/techniques-and-tools/techniques.md children: diff --git a/tutorials/simulated-data/OpenUniverse2024Preview_Firefly.md b/tutorials/simulated-data/OpenUniverse2024/OpenUniverse2024Preview_Firefly.md similarity index 100% rename from tutorials/simulated-data/OpenUniverse2024Preview_Firefly.md rename to tutorials/simulated-data/OpenUniverse2024/OpenUniverse2024Preview_Firefly.md diff --git a/tutorials/simulated-data/OpenUniverse2024/SED_fit.md b/tutorials/simulated-data/OpenUniverse2024/SED_fit.md new file mode 100644 index 00000000..11b7e2f5 --- /dev/null +++ b/tutorials/simulated-data/OpenUniverse2024/SED_fit.md @@ -0,0 +1,1806 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.19.1 +kernelspec: + name: python3 + display_name: python3 + language: python +authors: + - name: Jessica Krick + - name: Jaladh Singhal + - name: Brigitta Sipőcz +--- + +# SED fitting on OpenUniverse2024 Roman and Rubin Photometry + +## Learning Goals + +By the end of this tutorial, you will be able to: + +- access OpenUniverse2024 multiband photometry stored in cloud-based parquet format +- plot optical to IR spectral energy distributions(SEDs) +- use open-source codes to fit the SED of a sample of targets + +## Introduction + +The [OpenUniverse2024](https://ui.adsabs.harvard.edu/abs/2025MNRAS.tmp.1729T/abstract) dataset provides simulated multiwavelength observations combining Roman and Rubin realistic survey data. In this tutorial, we walk through a complete technique-focused workflow for analyzing a small sample of galaxies—from accessing the data, to constructing broadband SEDs, to running a first-pass physical model using Prospector. + +As an astrophysical case study, we choose to focus on supernova host galaxies. By measuring the SEDs of these hosts across optical to infrared wavelengths, we can estimate key physical properties—such as stellar mass and star formation rate. Comparing these properties between Type Ia and core-collapse supernova hosts helps reveal how different progenitor channels trace galaxy environments. We expect that Type Ia events, originating from older stellar populations, to have occured in more massive, evolved galaxies, whereas we expect core-collapse SNe to follow sites of active star formation in lower-mass systems. Exploring the distribution of supernova type as a function of host stellar mass thus provides direct insight into how stellar evolution, galaxy growth, and transient populations are linked. + +### Instructions + +We use [Prospector](https://prospect.readthedocs.io/en/stable/installation.html) in this tutorial to do the SED fitting. Running the full SED fitting step in this notebook can take a long time (~30 minutes for a sample of five). You do not need to run those fits unless you want to generate your own results for a custom sample — otherwise. We have provided an output file to explore and visualize the results. If you do want to run the fitting yourself, you will need to uncomment the relevant cell in section 3A. + +### Input + +- OpenUniverse 2024 simulated galaxy photometry +- OpenUniverse 2024 SN catalogs + +In this tutorial, we’ve hard-coded the cloud storage paths for the data files needed to run the examples. This allows you to jump straight into the analysis without worrying about locating or downloading the data manually. If you’d like to learn more about how cloud-based data access works—including how to browse and read files in the cloud—see the IRSA [Cloud Access Introduction](https://caltech-ipac.github.io/irsa-tutorials/cloud-access-intro/) tutorial. + +### Output + +- Plots of SED fits to SN host galaxies +- Histogram of SN type as a function host galaxy fitted stellar mass. + +### Assumptions and Simplifications + +In this tutorial we adopt several simplifying assumptions to keep the workflow fast and transparent: +- Fixed fractional flux uncertainties (10%) — The OpenUniverse catalogs do not include measurement errors, so we assume a uniform 10 % uncertainty across all bands to allow Prospector to estimate parameter uncertainties. +- Rubin to Roman flux scaling — Rubin (LSST) fluxes are rescaled to match Roman normalization at 0.87 µm to correct for small zero-point offsets introduced in the simulations. +- Simulated data products — All photometry and supernova catalogs come from the OpenUniverse 2024 simulation; results therefore reflect the assumptions and systematics of the simulated survey, not real observations. +- Model choice and priors — Fits use the default Prospector TemplateLibrary["parametric_sfh"] model with standard priors and an FSPS Chabrier IMF; these settings are meant for demonstration rather than physical tuning. +- No explicit dust or attenuation corrections — Aside from what is parameterized within the Prospector model, no additional reddening corrections are applied to the photometry. + +## Imports +Some dependencies (like fsps and prospector) require specialized setup as described below + +```{code-cell} ipython3 +import time +starttime = time.time() +``` + +```{code-cell} ipython3 +# Uncomment to install dependencies if needed. +# %pip install numpy pandas h5py matplotlib seaborn pyarrow gdown +# FSPS + Prospector + SEDPY+ dynasty are required for SED modeling +# %pip install astro-prospector astro-sedpy "dynesty<2.0.0" fsps +``` + +```{code-cell} ipython3 +import h5py +import pandas as pd +import pyarrow.fs +import pyarrow.parquet as pq +import matplotlib.pyplot as plt +import seaborn as sns +import os +import copy +from multiprocessing import Pool +import numpy as np +import gdown + +from sedpy.observate import load_filters, list_available_filters +from prospect.utils.obsutils import fix_obs +from prospect.models.templates import TemplateLibrary +from prospect.sources import CSPSpecBasis +from prospect.plotting import corner +``` + +## 1. Define the targets + +We begin by identifying our target galaxies — in this example, galaxies hosting simulated supernovae. +The OpenUniverse2024 dataset stores different aspects of the data in separate Parquet files: +- The SNANA file describes supernova events and their host associations. +- The galaxy flux table holds multi-band fluxes from Roman and Rubin. +- The galaxy info table includes derived physical parameters (redshift, stellar mass, etc.). + +We'll explore the structure of these Parquet files and then join them into a single table. + ++++ + +### 1.1 Figure out what is in the OpenUniverse parquet files. +list out some basic information including column names + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def inspect_parquet_columns(s3_path, max_rows=0): + """ + Read Parquet file from S3 into memory and inspect structure + + Parameters + ---------- + s3_path : str + Full S3 path to the Parquet file, e.g. + "s3://nasa-irsa-simulations/openuniverse2024/roman/full/.../SNANA_9921.parquet" + max_rows : int, optional + If > 0, print the first few rows of data for context. Default is 0 (just columns). + + Returns + ------- + pandas.DataFrame + The full DataFrame loaded from the Parquet file. + + """ + + fs = pyarrow.fs.S3FileSystem(anonymous=True) + df = pq.read_table(s3_path, filesystem=fs).to_pandas() + + print(f"Found {df.shape[1]} columns and {df.shape[0]} rows") + print("\nColumn names:") + for c in df.columns: + print(" ", c) + + # Optionally, print the first few rows of data for context + if max_rows > 0: + print(f"\nFirst {max_rows} rows:") + print(df.head(max_rows)) + + # Return the full DataFrame for further use + return df +``` + +```{code-cell} ipython3 +#inspect one of the SN parquet files + +sn_flux_file = "nasa-irsa-simulations/openuniverse2024/roman/full/roman_rubin_cats_v1.1.2_faint/snana_9921.parquet" +# These are broken up by region on the sky. +# The region refers to HEALPIX pixel index at nside=32 (order = 5) in the ring numbering scheme +# so in this case the "9921" in the filename refers to that sky region. + +df_snana = inspect_parquet_columns(sn_flux_file) +# The output lists number of columns and rows as well as all available column names +# This is useful for exploring what quantities the simulation provides before merging catalogs. +``` + +```{code-cell} ipython3 +#Now we would like to know what kinds of SN were used in the simulations +#list out the possible SN model names used by OpenUniverse 2024 +# See https://arxiv.org/pdf/2501.05632 for a description of these SN models. +df_snana["model_name"].unique() +``` + +### 1.2 Merge Catalogs +Next we merge the SN sample with the host galaxy fluxes and physical properties. This ensures that each supernova has associated photometric and physical data from its host galaxy, suitable for SED fitting. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def assemble_SN_data(sn_flux_file, galaxy_flux_file, galaxy_info_file): + """ + Load a supernova (SNANA) sample and join it with host galaxy flux + info tables. + + Parameters + ---------- + sn_flux_file : str + S3 path to the SNANA parquet file, e.g. + "s3://nasa-irsa-simulations/openuniverse2024/roman/full/.../snana_10050.parquet" + galaxy_flux_file : str + S3 path to the galaxy flux parquet (Roman+Rubin flux columns). + galaxy_info_file : str + S3 path to the corresponding galaxy info parquet (physical parameters). + + Returns + ------- + pandas.DataFrame + Joined DataFrame containing SN + host galaxy properties suitable for SED fitting. + """ + + # Initialize an anonymous (public read-only) S3 filesystem connection + fs = pyarrow.fs.S3FileSystem(anonymous=True) + + # Load SN sample + sn_df = pq.read_table(sn_flux_file, filesystem=fs).to_pandas() + print(f" Loaded {len(sn_df)} SN entries") + + # Load galaxy flux and info tables + gal_flux = pq.read_table(galaxy_flux_file, filesystem=fs).to_pandas() + gal_info = pq.read_table(galaxy_info_file, filesystem=fs).to_pandas() + + # Join host-galaxy flux and info tables on galaxy_id + gal_joined = gal_flux.merge(gal_info, on="galaxy_id", how="inner") + print(f" Joined galaxy tables → {len(gal_joined)} rows") + + # Join SN with its host galaxy using host_id + sn_joined = sn_df.merge(gal_joined, left_on="host_id", right_on="galaxy_id", how="inner") + print(f"✅ SN–host join completed: {len(sn_joined)} matched hosts") + + # print out the redshift range of the SN sample for reference + print(f" Redshift range of SN sample: {sn_joined['z_CMB'].min():.3f}–{sn_joined['z_CMB'].max():.3f}") + + return sn_joined +``` + +```{code-cell} ipython3 +region = "10050" +sn_flux_file = f"nasa-irsa-simulations/openuniverse2024/roman/full/roman_rubin_cats_v1.1.2_faint/snana_{region}.parquet" +galaxy_flux_file = f"nasa-irsa-simulations/openuniverse2024/roman/full/roman_rubin_cats_v1.1.2_faint/galaxy_flux_{region}.parquet" +galaxy_info_file = f"nasa-irsa-simulations/openuniverse2024/roman/full/roman_rubin_cats_v1.1.2_faint/galaxy_{region}.parquet" + +df_sn = assemble_SN_data(sn_flux_file, galaxy_flux_file, galaxy_info_file) +``` + +```{code-cell} ipython3 +#what does the merged SN dataframe look like? +df_sn +``` + +```{code-cell} ipython3 +#and list out all those column names so we know what we are working with: +print(df_sn.columns.tolist()) +``` + +### 1.3 Verify catalog matching worked +To confirm that the SN–host matching worked properly, we compare the supernova redshift (z_CMB) to its host galaxy redshift. +A tight correlation along the 1:1 line indicates successful matching. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_redshift_comparison(df): + """ + Plot SN vs. host redshifts to verify consistency + + Parameters + ---------- + df : pandas.DataFrame + DataFrame returned by load_SN_data() containing both SN and galaxy columns. + + Returns + ------- + matplotlib.figure.Figure + Scatter plot figure comparing z_CMB vs redshift. + """ + # column names for redshift + sn_col = "z_CMB" + gal_col = "redshift" + + # Drop rows with missing or non-finite redshifts + df_plot = df[[sn_col, gal_col]].dropna() + df_plot = df_plot[np.isfinite(df_plot[sn_col]) & np.isfinite(df_plot[gal_col])] + + fig, ax = plt.subplots(figsize=(6, 6)) + ax.scatter(df_plot[gal_col], df_plot[sn_col], s=20, alpha=0.6, edgecolor="none") + + # reference line + zmin = min(df_plot[gal_col].min(), df_plot[sn_col].min()) + zmax = max(df_plot[gal_col].max(), df_plot[sn_col].max()) + ax.plot([zmin, zmax], [zmin, zmax], "r--", lw=1.2, label="1:1 line") + + # Axis labels and aesthetics + ax.set_xlabel("Galaxy redshift", fontsize=12) + ax.set_ylabel("SN redshift", fontsize=12) + ax.set_title("SN vs Host Galaxy Redshift Comparison", fontsize=13) + ax.legend() + + plt.tight_layout() + plt.show() + return +``` + +```{code-cell} ipython3 +# Plot SN vs. host redshifts to verify consistency +plot_redshift_comparison(df_sn) +``` + +### 1.4 Save and/or reload the targets + +You may want to save your sample of targets to a file and then, next time you need it, skip the +cell above and load the sample from file. +This can save time when your sample is large, help ensure that the same input data is used for reproduceability, etc. + +```{code-cell} ipython3 +#filename = "galaxy_sample.csv" + +#Save dataframe to .csv +#df_sn.to_csv(filename, index=False) + +#reload dataframe from .csv +#df_sn = pd.read_csv(filename) +``` + +## 2. Clean and Visualize data + +With the matched sample we just built, we can visualize their spectral energy distributions (SEDs). +We will: +- Convert photon fluxes into AB [maggies](https://www.sdss3.org/dr8/algorithms/magnitudes.php) using sedpy filter definitions. +- Scale Rubin fluxes to match Roman normalization. +- Plot SEDs for individual and multiple galaxies. + +```{code-cell} ipython3 +# Central wavelengths of Rubin and Roman bands in microns +rubin_bands = { + 'lsst_maggies_u': 0.367, + 'lsst_maggies_g': 0.482, + 'lsst_maggies_r': 0.626, + 'lsst_maggies_i': 0.754, + 'lsst_maggies_z': 0.869, + 'lsst_maggies_y': 0.962, +} + +roman_bands = { + 'roman_maggies_R062': 0.62, + 'roman_maggies_Z087': 0.87, + 'roman_maggies_Y106': 1.06, + 'roman_maggies_J129': 1.29, + 'roman_maggies_H158': 1.58, + 'roman_maggies_F184': 1.84, + 'roman_maggies_W146': 1.46, + 'roman_maggies_K213': 2.13, +} +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def scale_rubin_to_roman(df, rubin_bands, roman_bands, match_wave=0.87, verbose=False): + """ + Scale Rubin (LSST) fluxes so that they match Roman fluxes at a given wavelength. + + This function identifies the Rubin and Roman bands that are closest in wavelength + to the chosen match point (default = 0.87 µm), computes a per-galaxy scaling factor + based on their flux ratio, and applies that factor to all Rubin bands so that the + two instruments are normalized consistently. + + Parameters + ---------- + df : pandas.DataFrame + DataFrame containing Rubin and Roman fluxes per galaxy. + Must include columns like 'lsst_maggies_z' and 'roman_maggies_Z087'. + rubin_bands : dict + Mapping of Rubin flux column names to wavelengths (in microns). + Example: {'lsst_maggies_u': 0.367, 'lsst_maggies_g': 0.482, ...} + roman_bands : dict + Mapping of Roman flux column names to wavelengths (in microns). + Example: {'roman_maggies_Z087': 0.87, 'roman_maggies_H158': 1.58, ...} + match_wave : float, optional + Wavelength (in microns) at which the two instruments are scaled to match. + Default = 0.87 µm (near Roman Z087 / Rubin z band). + verbose : bool, optional + If True, print diagnostic info about which bands were matched and the scale factor. + + Returns + ------- + pandas.DataFrame + Copy of the input DataFrame, with all Rubin maggie columns rescaled. + """ + + # Make a copy so we don't modify the original DataFrame + df_scaled = df + + # --- Find Rubin band closest to the target wavelength (match_wave) ------- + smallest_rubin_diff = float("inf") # Start with a very large difference + closest_rubin_band = None # Will hold (band_name, wavelength) + + # Loop through all Rubin bands and measure distance from match_wave + for band_name, wavelength in rubin_bands.items(): + + # Compute how far this band's wavelength is from the target match wavelength + diff = abs(wavelength - match_wave) + + # If this band is closer than any previous one, store it + if diff < smallest_rubin_diff: + smallest_rubin_diff = diff + closest_rubin_band = (band_name, wavelength) + + # Unpack results: band name and wavelength of the Rubin band closest to match_wave + rubin_col, rubin_wave = closest_rubin_band + + # --- Find Roman band closest to the target wavelength (match_wave) ------- + smallest_roman_diff = float("inf") # Start with a very large difference + closest_roman_band = None # Will hold (band_name, wavelength) + + # Loop through all Roman bands and measure distance from match_wave + for band_name, wavelength in roman_bands.items(): + + # Compute how far this band's wavelength is from the target match wavelength + diff = abs(wavelength - match_wave) + + # If this band is closer than any previous one, store it + if diff < smallest_roman_diff: + smallest_roman_diff = diff + closest_roman_band = (band_name, wavelength) + + # Unpack results: band name and wavelength of the Roman band closest to match_wave + roman_col, roman_wave = closest_roman_band + + # --- Compute the scaling factor (Roman / Rubin) per galaxy --------------- + scale_factors = df_scaled[roman_col] / df_scaled[rubin_col] + + # --- Optional diagnostic printout if verbose=True ------------------------ + if verbose: + # Show the matched bands and first few scale factors for verification + example_factors = scale_factors.head(3).round(3).tolist() + print(f"Scaling Rubin band {rubin_col} ({rubin_wave:.3f} µm) " + f"to match Roman band {roman_col} ({roman_wave:.3f} µm)") + print(f"Example scale factors (Roman/Rubin): {example_factors}") + + # --- Apply the scale factor to all Rubin bands --------------------------- + for col in rubin_bands.keys(): + + #Each Rubin flux is multiplied by its galaxy’s individual scale factor. + df_scaled[col] = df_scaled[col] * scale_factors + + return df_scaled +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def photons_to_maggies_with_filter(photon_flux, filt): + """ + Convert band-integrated photon flux [photons/s/cm^2] to AB maggies, + using sedpy filter definitions to get effective wavelength and width. + + Parameters + ---------- + photon_flux : float + Band-integrated photon flux [photons/s/cm^2]. + filt : sedpy.observate.Filter + Filter object from sedpy.load_filters (must include wave_effective and width). + + Returns + ------- + float + Flux in maggies (AB system). + """ + h = 6.62607015e-27 # erg s + c = 2.99792458e18 # Å/s + + lam = filt.wave_effective # effective wavelength [Å] + + # Effective filter width in Å (integral of throughput) + delta_lambda = np.trapezoid(filt.transmission, filt.wavelength) + + # photon energy at central wavelength + E_photon = h * c / lam + + # photon flux integrated over the band → flux density per Å + f_lambda = (photon_flux * E_photon) / delta_lambda # erg/s/cm^2/Å + + # convert to f_nu + f_nu = f_lambda * lam**2 / c + + # AB maggies + mag_ab = -2.5 * np.log10(f_nu) - 48.6 + + #maggies + maggies = 10**(-0.4 * mag_ab) + + return maggies +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def add_maggie_columns(df): + """ + Add maggie flux columns to DataFrame, one per Rubin+Roman band. + Uses sedpy filter definitions to compute maggies correctly. + + Parameters + ---------- + df : pandas.DataFrame + Must contain flux columns (photons/s/cm^2) for Rubin+Roman bands. + + Returns + ------- + pandas.DataFrame + Same dataframe with new *_maggies columns. + """ + # Map catalog flux column names to sedpy filter names + filter_map = { + "lsst_flux_u": "lsst_baseline_u", + "lsst_flux_g": "lsst_baseline_g", + "lsst_flux_r": "lsst_baseline_r", + "lsst_flux_i": "lsst_baseline_i", + "lsst_flux_z": "lsst_baseline_z", + "lsst_flux_y": "lsst_baseline_y", + "roman_flux_R062": "roman_wfi_f062", + "roman_flux_Z087": "roman_wfi_f087", + "roman_flux_Y106": "roman_wfi_f106", + "roman_flux_J129": "roman_wfi_f129", + "roman_flux_W146": "roman_wfi_f146", + "roman_flux_H158": "roman_wfi_f158", + "roman_flux_F184": "roman_wfi_f184", + "roman_flux_K213": "roman_wfi_f213", + } + + # Load all needed filters once + filter_names = list(filter_map.values()) + filters = load_filters(filter_names) + filter_dict = dict(zip(filter_names, filters)) + + # Apply conversion + for col, filt_name in filter_map.items(): + filt = filter_dict[filt_name] + new_col = col.replace("flux", "maggies") + df[new_col] = df[col].apply(lambda f: photons_to_maggies_with_filter(f, filt)) + + return df +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_single_sed(df, rubin_bands, roman_bands, galaxy_index=0, loglog=False): + """ + Plot the Spectral Energy Distribution (SED) for a galaxy in the DataFrame. + + Parameters + ---------- + df : pandas.DataFrame + DataFrame containing Rubin and Roman photometric flux columns in units of maggies + (e.g., 'lsst_maggies_g', 'roman_maggies_F184') and a 'galaxy_id' column. + rubin_bands : dict + Mapping of Rubin band names to their central wavelengths in microns. + Example: {'lsst_maggies_u': 0.367, 'lsst_maggies_g': 0.482, ...} + roman_bands : dict + Mapping of Roman band names to their central wavelengths in microns. + Example: {'roman_maggies_R062': 0.62, 'roman_maggies_Z087': 0.87, ...} + galaxy_index : int, optional + Index (integer position) or label of the galaxy row to plot. + Default is 0 (the first row in the DataFrame). + loglog : bool, optional + If True, use logarithmic scaling for both axes. Default is False. + + Returns + ------- + None + Displays a matplotlib plot of flux vs. wavelength for the selected galaxy. + """ + # Select target galaxy row safely (by index or ID) + try: + row = df.iloc[galaxy_index] + except IndexError as e: + raise ValueError( + f"Galaxy index {galaxy_index} is out of range for DataFrame " + f"with {len(df)} rows." + ) from e + + # Rubin + rubin_items = sorted(rubin_bands.items(), key=lambda x: x[1]) + rubin_waves = [w for _, w in rubin_items] + rubin_fluxes = [row[col] for col, _ in rubin_items] + + + # Roman + roman_items = sorted(roman_bands.items(), key=lambda x: x[1]) + roman_waves = [w for _, w in roman_items] + roman_fluxes = [row[col] for col, _ in roman_items] + + # Plotting + fig, ax = plt.subplots() + + if loglog: + ax.set_xscale("log") + ax.set_yscale("log") + + ax.plot(rubin_waves, rubin_fluxes, marker='o', color='#377eb8', label='Rubin') + ax.plot(roman_waves, roman_fluxes, marker='o', color='#e41a1c', label='Roman') + + ax.set_xlabel("Wavelength (μm)") + #ax.set_ylabel("Flux (photons / sec / cm²)") + ax.set_ylabel("Flux (maggies)") + galaxy_label = row.get("galaxy_id", galaxy_index) + ax.set_title(f"SED for Galaxy {galaxy_label}") + ax.legend() + plt.tight_layout() + plt.show() +``` + +We now plot flux versus wavelength for one galaxy, comparing Roman and Rubin photometric points. +The goal is to get a qualitative sense of the galaxy’s broadband SED shape. + +```{code-cell} ipython3 +#remove all targets which are not SN +df = df_sn[~df_sn["model_name"].isin(["FIXMAG", "NON1ASED.TDE-BBFIT"])].copy() + +# Add maggies to df +df_scaled = add_maggie_columns(df) + +# Scale Rubin to Roman +# This is required because the two different instruments do not align in flux in the simulation +df_scaled = scale_rubin_to_roman(df_scaled, rubin_bands, roman_bands) +``` + +```{code-cell} ipython3 +# Plot an SED +plot_single_sed(df_scaled, rubin_bands, roman_bands, 1, loglog=True) +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_many_seds(df, rubin_bands, roman_bands, n_galaxies=10, loglog=False): + """ + Plot SEDs for multiple galaxies from the DataFrame. + + Parameters + ---------- + df : pandas.DataFrame + DataFrame containing Rubin and Roman flux columns. + rubin_bands : dict + Mapping of Rubin band names to their central wavelengths in microns. + Example: {'lsst_maggies_u': 0.367, 'lsst_maggies_g': 0.482, ...} + roman_bands : dict + Mapping of Roman band names to their central wavelengths in microns. + Example: {'roman_maggies_R062': 0.62, 'roman_maggies_Z087': 0.87, ...} + n_galaxies : int, optional + Number of galaxies (rows) to plot. Default is 10. + loglog : bool, optional + If True, use log-log scale for both axes. Default is False. + + Returns + ------- + None + Displays a matplotlib plot of flux vs. wavelength for multiple galaxies. + """ + + # Rubin bands + rubin_items = list(rubin_bands.items()) + rubin_cols = [col for col, _ in rubin_items] + rubin_waves = [wave for _, wave in rubin_items] + + # Roman bands + roman_items = list(roman_bands.items()) + roman_cols = [col for col, _ in roman_items] + roman_waves = [wave for _, wave in roman_items] + + fig, ax = plt.subplots() + + if loglog: + ax.set_xscale("log") + ax.set_yscale("log") + + for i in range(min(n_galaxies, len(df))): + rubin_fluxes = df.iloc[i][rubin_cols].to_numpy(dtype=float) + roman_fluxes = df.iloc[i][roman_cols].to_numpy(dtype=float) + + ax.plot(rubin_waves, rubin_fluxes, marker='o', linestyle='-', color='#377eb8', alpha=0.6) + ax.plot(roman_waves, roman_fluxes, marker='o', linestyle='-', color='#e41a1c', alpha=0.6) + + ax.set_xlabel("Wavelength (μm)") + ax.set_ylabel("Flux (maggies)") + ax.set_title(f"SEDs for {min(n_galaxies, len(df))} Galaxies") + ax.legend(["Rubin", "Roman"], loc="best") + plt.tight_layout() + plt.show() +``` + +```{code-cell} ipython3 +plot_many_seds(df_scaled, rubin_bands, roman_bands, n_galaxies=50, loglog=True) +``` + ++++ {"jupyter": {"source_hidden": true}} + +## 3. SED fitting +In this section we will use [Prospector](https://prospect.readthedocs.io/en/stable/index.html), a Bayesian SED fitting code built on [FSPS](https://dfm.io/python-fsps/current/), to infer stellar population parameters for our galaxies by fitting their SEDs. We choose Prospector because it is an open-source, powerful, yet flexible package to infer stellar population properties that is widely used in the community. + +There are two ways to proceed, depending on whether you want to *run your own fits* or *use existing results*. + +### Option A — Run new fits (slow) +This option performs fresh Prospector fits for a small subset of galaxies (20 by default). +It is the most transparent path, showing every step of the fitting process and how the models are generated. +However, it is computationally expensive — each fit can take several minutes, and the total run may exceed half an hour on a modest machine. + +Choose this option if you: +- want to learn how Prospector actually performs the fitting, +- wish to modify model parameters or fitting settings + +### Option B — Use existing results(fast) +This option simply reads the saved results (`sn_fits.h5`) from a previous run. +It is ideal for quickly exploring the fitted parameters, verifying trends, or reproducing plots without waiting for new fits to complete. + +Choose this option if you: +- are mainly interested in visualizing and analyzing results, +- do not need to change the fitting configuration +--- + +Both paths produce the same variables — `df_small`, `obs_list`, and `outputs` — so all of the downstream analysis and plotting cells will work identically. +If you are following this notebook for the first time, start with **Option B** by leaving `RUN_FITS = FALSE`to get a sense of the workflow, and come back later to experiment with **Option A** when you are ready to explore the full fitting process. + +--- + +```{code-cell} ipython3 +# Leave this as False to load existing galaxy fits +# Change this to True to run Prospector fitting + +RUN_FITS = False +``` + +### 3A. Run new SED fits (optional, slow) +Prospector has its own built in functions to do writing and reading, but + 1) they don't work for multiple galaxy fits, and + 2) they error out on numpy not liking something, so + we'll build our own below called `save_outputs_hdf5` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def df_to_all_obs(df, flux_err_frac=0.1): + """ + Convert all galaxies in df into a list of Prospector obs dicts. + + Parameters + ---------- + df : pandas.DataFrame + Must contain *_maggies columns (from add_maggie_columns), 'redshift', and 'galaxy_id'. + flux_err_frac : float + Fractional flux error to assume. The OpenUniverse catalogs don't give flux uncertainties + default is to assume 10% uncertainties in the fluxes. + + Returns + ------- + list of dict + Each element is an obs dictionary for one galaxy. + """ + # Define Rubin and Roman maggie column names + rubin_maggies = [f"lsst_maggies_{b}" for b in "ugrizy"] + roman_maggies = ["roman_maggies_R062","roman_maggies_Z087","roman_maggies_Y106", + "roman_maggies_J129","roman_maggies_W146","roman_maggies_H158", + "roman_maggies_F184","roman_maggies_K213"] + + # Define corresponding sedpy filter names for Rubin and Roman + rubin_filters = [f"lsst_baseline_{b}" for b in "ugrizy"] + roman_filters = ["roman_wfi_f062","roman_wfi_f087","roman_wfi_f106", + "roman_wfi_f129","roman_wfi_f146","roman_wfi_f158", + "roman_wfi_f184","roman_wfi_f213"] + # Load all sedpy Filter objects at once + filters = load_filters(rubin_filters + roman_filters) + + obs_list = [] + + # Loop over galaxies and build a Prospector obs dict for each + for _, row in df.iterrows(): + mags = row[rubin_maggies + roman_maggies].to_numpy(dtype=float) + mags_unc = mags * flux_err_frac + obs = dict( + wavelength=None, + spectrum=None, + unc=None, + redshift=row["redshift"], + maggies=mags, + maggies_unc=mags_unc, + filters=filters, + galaxy_id=int(row["galaxy_id"]) # <-- keep track of galaxy + ) + obs_list.append(fix_obs(obs)) + + return obs_list +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def process_output(output, model, obs, sps): + """ + Convert raw output from fit_model into a plotting-ready results dict. + This stuff all happens as part of prospector's read_results, + but to avoid IO, we don't use write_results followed by read_results for every galaxy + + Works with both optimization-only and dynesty sampling runs + in Prospector 1.4 (PyPI release). + + Parameters + ---------- + output : dict + Raw dictionary returned by fit_model. + model : SpecModel + Prospector model object used in the fit. + obs : dict + Observed data dictionary for one galaxy. Must contain 'galaxy_id'. + sps : CSPSpecBasis + SPS object used in the fit. + + Returns + ------- + out_proc : dict + Results dictionary with at least: + - 'galaxy_id' : int + - 'chain' : ndarray + - 'weights' : ndarray + - 'theta_labels' : list + - 'bestfit' : dict with: + 'theta', 'spectrum', 'photometry', + 'restframe_wavelengths', + 'SFR_now', 'SFR_100', 'logSFR_100' + """ + out_proc = {} + + # Always copy galaxy_id from obs + out_proc["galaxy_id"] = obs["galaxy_id"] + + # --- case 1: dynesty sampler was run --- + if output.get("sampling") and output["sampling"][0] is not None: + # Unpack the dynesty sampler object + sampler, _ = output["sampling"] + + # Extract posterior samples, weights, and evidence + chain = sampler.samples + logwt = sampler.logwt + logz = sampler.logz[-1] + weights = np.exp(logwt - logz) + + # Store posterior chain, normalized weights, parameter labels, and log-likelihoods + out_proc["chain"] = chain + out_proc["weights"] = weights / np.sum(weights) + out_proc["theta_labels"] = model.free_params + out_proc["lnprobability"] = sampler.logl + + # Identify the best-fit sample = max likelihood sample + imax = np.argmax(sampler.logl) + theta_best = chain[imax] + + # Predict model spectrum and photometry for the best-fit parameters + spec, phot, mfrac = model.predict(theta_best, obs=obs, sps=sps) + + # Store best-fit results + out_proc["bestfit"] = { + "theta": theta_best, + "spectrum": spec, + "photometry": phot, + "restframe_wavelengths": sps.wavelengths, + } + + # --- case 2: optimization only --- + elif output.get("optimization") and output["optimization"][0] is not None: + + # Unpack optimization results and extract best-fit parameters + opt_result, _ = output["optimization"] + theta_best = opt_result.x + + # Predict model spectrum and photometry for the optimized parameters + spec, phot, mfrac = model.predict(theta_best, obs=obs, sps=sps) + + # Store best-fit results in same format as dynesty case + out_proc["chain"] = np.array([theta_best]) + out_proc["weights"] = np.array([1.0]) + out_proc["theta_labels"] = model.free_params + out_proc["bestfit"] = { + "theta": theta_best, + "spectrum": spec, + "photometry": phot, + "restframe_wavelengths": sps.wavelengths, + } + + else: + raise ValueError("Output has neither sampling nor optimization results") + + return out_proc +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def run_fit(obs, model_params, sps, lnprobfn, noise_model, fitting_kwargs): + """ + Run a single Prospector fit for one galaxy. + + Parameters + ---------- + obs : dict + Prospector obs dictionary for one galaxy. + model_params : dict + Template dictionary of model parameters (from TemplateLibrary). + A fresh model will be instantiated for each galaxy. + sps : prospect.sources.CSPSpecBasis + FSPS spectral basis (reused across galaxies). + lnprobfn : callable + Prospector log-probability function. + noise_model : optional + Noise model (usually None). + fitting_kwargs : dict + Extra fitting keyword arguments. + + Returns + ------- + dict + Output dictionary from fit_model. + """ + galaxy_id = obs.get("galaxy_id", None) + print(f"[Galaxy {galaxy_id}] Starting fit...") + + # Make a fresh copy of model params + model_params = copy.deepcopy(model_params) + + # Initialize redshift per galaxy + model_params["zred"]["init"] = obs["redshift"] + + # Initialize a Prospector spectral model using the chosen parameter set + model = SpecModel(model_params) + + # Run the full Prospector fitting routine for this galaxy + output = fit_model( + obs, # observation dictionary for one galaxy + model, # Prospector spectral model (SpecModel instance) + sps, # FSPS stellar population synthesis object + optimize=False, # skip optimization-only mode + dynesty=True, # use dynesty nested sampling for Bayesian inference + lnprobfn=lnprobfn, # log-probability function (likelihood) + noise=noise_model, # optional noise model (usually None) + **fitting_kwargs # additional sampler settings (e.g., nlive, dlogz) + ) + + # post-processing + processed = process_output(output, model, obs, sps) + + print(f"[Galaxy {galaxy_id}] Finished fit and post processing.") + + return processed +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def run_fit_star(args): + """ + Wrapper function to unpack arguments for ``run_fit`` when using + ``multiprocessing.Pool.imap_unordered``. + + ``imap_unordered`` passes each element of an iterable as a *single* + positional argument to the worker function. Since ``run_fit`` expects + multiple separate arguments (``obs``, ``model_params``, ``sps``, + ``lnprobfn``, ``noise_model``, ``fitting_kwargs``), this wrapper + unpacks the input tuple and forwards its contents to ``run_fit``. + + Parameters + ---------- + args : tuple + A tuple containing the full set of arguments required by + ``run_fit`` in the following order: + + ``(obs, model_params, sps, lnprobfn, noise_model, fitting_kwargs)`` + + Returns + ------- + dict + The processed Prospector output dictionary returned by ``run_fit``. + + Notes + ----- + This function exists *only* to allow ``run_fit`` to be used with + ``Pool.imap_unordered``. Unlike ``Pool.starmap``, which unpacks tuples + automatically but returns results in input order, + ``imap_unordered`` does not unpack tuples and yields results as soon + as they are completed, providing better parallel efficiency for fits + with variable runtimes. + + """ + return run_fit(*args) +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def fit_sample_parallel(obs_list, model_params, sps, lnprobfn, noise_model=(None,None), + fitting_kwargs=None, nproc=None): + """ + Fit a sample of galaxies in parallel using multiprocessing. + + Parameters + ---------- + obs_list : list of dict + List of Prospector obs dictionaries (one per galaxy). + model_params : dict + Model parameter dictionary (e.g. TemplateLibrary["parametric_sfh"]). + sps : prospect.sources.CSPSpecBasis + FSPS spectral basis (shared). + lnprobfn : callable + Prospector log-probability function. + noise_model : optional + Noise model (usually (None,None)). + fitting_kwargs : dict, optional + Extra kwargs for fit_model. + nproc : int, optional + Number of parallel processes. + + Returns + ------- + raw_results + List of Prospector output dicts, one per galaxy. + """ + # determine optimal number of processes = min(#galaxies to fit, available CPU - 1) + if nproc is None: + available = os.cpu_count() or 1 + nproc = min(len(obs_list), max(1, available - 1)) + + args = [(obs, model_params, sps, lnprobfn, noise_model, fitting_kwargs) + for obs in obs_list] + + #multiprocessing around run_fit function + chunksize = max(1, len(args) // (4 * nproc)) + results = [] + with Pool(processes=nproc) as pool: + for processed in pool.imap_unordered(run_fit_star, args, chunksize=chunksize): + results.append(processed) + + return results +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def save_outputs_hdf5(outputs, obs_list, filename="all_galaxies.h5"): + """ + Save multiple galaxy fit results and corresponding obs dicts to a single HDF5 file. + + Parameters + ---------- + outputs : list of dict + Each element is a processed output (from process_output), one per galaxy. + Must contain 'galaxy_id'. + obs_list : list of dict + Each element is an obs dictionary corresponding to the outputs. + Must contain 'galaxy_id'. + filename : str, optional + Output HDF5 filename. + """ + with h5py.File(filename, "w") as f: + for out, obs in zip(outputs, obs_list): + gid = str(out["galaxy_id"]) + # Each galaxy gets its own group named by galaxy_id + g = f.create_group(gid) + + # ---- save results ---- + g.create_dataset("chain", data=out["chain"]) + g.create_dataset("weights", data=out["weights"]) + g.create_dataset("theta_labels", data=np.array(out["theta_labels"], dtype="S")) + + bestfit_grp = g.create_group("bestfit") + for k, v in out["bestfit"].items(): + bestfit_grp.create_dataset(k, data=v) + + # ---- save obs ---- + obs_grp = g.create_group("obs") + obs_grp.create_dataset("redshift", data=obs["redshift"]) + obs_grp.create_dataset("maggies", data=obs["maggies"]) + obs_grp.create_dataset("maggies_unc", data=obs["maggies_unc"]) + filt_names = [f.name for f in obs["filters"]] + obs_grp.create_dataset("filters", data=np.array(filt_names, dtype="S")) + + print(f"[save] Wrote {len(outputs)} galaxies to {filename}") +``` + +### 3A. Step 1: Explore fitting options + ++++ {"jp-MarkdownHeadingCollapsed": true} + +One set of options you have when using Prospector is which filters to use. +OpenUniverse uses Rubin and Roman data but Prospector can include many other filters. +A list of all available `sedpy` filters are [here](https://github.com/bd-j/sedpy/blob/main/sedpy/data/filters/README.md) + +or you can list out the available filters below. + +```{code-cell} ipython3 +print(list_available_filters()[:20]) # peek at first 20 +``` + +A second set of options is which star formation history to choose. Here we show a basic description of all pre-defined parameter sets. + +```{code-cell} ipython3 +TemplateLibrary.show_contents() +``` + +### 3A. Step 2: Model setup + +```{code-cell} ipython3 +# we choose a parametric star formation history model to use for generic galaxy fitting +model_params = TemplateLibrary["parametric_sfh"] + +#disable any extra noise modeling and use only the measurement uncertainties provided in the data. +noise_model = (None, None) + +fitting_kwargs = dict( + # Number of live points used by dynesty; higher = better posterior sampling but slower + nlive_init=800, #200 + # Dynesty sampling strategy; 'rwalk' is robust for multi-dimensional, correlated posteriors. + nested_method="rwalk", + # Target number of effective posterior samples before termination; controls fit thoroughness. + nested_target_n_effective=800, #300 + # Convergence threshold in log-evidence; smaller values force deeper exploration of the posterior. + nested_dlogz_init=0.05 #0.2 +) +``` + +### 3A. Step 3: Data setup +This cell prepares the data for fitting by choosing a small subset of galaxies to fit and putting the data into the format that is expected by Prospector. + +```{code-cell} ipython3 +# Select a small subset of galaxies to fit + +# --- Select 10 SN Ia and 10 Core-Collapse (CC) host galaxies for fitting --- +#increasing this number will increase fitting time + +# Identify Type Ia and Core-Collapse SNe by model_name +# OpenUniverse2024 convention: "SALT3.NIR_WAVEEXT" → Type Ia +sn1a = df_scaled[df_scaled["model_name"] == "SALT3.NIR_WAVEEXT"] +ccsn = df_scaled[df_scaled["model_name"] != "SALT3.NIR_WAVEEXT"] + +# Randomly sample up to 10 of each (or fewer if limited) +sn1a_sample = sn1a.sample(n=min(10, len(sn1a)), random_state=42) +ccsn_sample = ccsn.sample(n=min(10, len(ccsn)), random_state=42) + +# Concatenate +df_small = pd.concat([sn1a_sample, ccsn_sample]).reset_index(drop=True) + +print(f"Selected {len(df_small)} host galaxies ({len(sn1a_sample)} Ia + {len(ccsn_sample)} CC)") +print(df_small['model_name'].value_counts()) +``` + +```{code-cell} ipython3 +#Prospector expects a specific obs dictionary format containing fluxes, uncertainties, filters, and redshift. +obs_list = df_to_all_obs(df_small, flux_err_frac=0.1) + +# Peek at first galaxy +obs_list[0] +``` + +### 3A. Step 4: Run the fits + ++++ + +:::{caution} +This step runs Prospector on a small subset of galaxies and writes a new `sn_fits.h5` file. It can take **30+ minutes** depending on machine and number of CPUs. If you only want to explore results, **skip this step** and go to **3B** below. If you are sure you want to run this section yourself, change the RUN_FITS variable below to "True" +::: + +```{code-cell} ipython3 +# This notebook uses [FSPS](https://github.com/cconroy20/fsps) to fit SEDs. +#FSPS requires installation which includes cloning that repo to get the data files. +#This repo ([ipac-sp-notebooks](https://github.com/IPAC-SW/ipac-sp-notebooks/)) includes FSPS as a submodule to #make things a little easier. +#If you have cloned this repo, running the following cell will complete the setup. +#If not, either clone this repo and then run the cell or else follow the full instructions at the FSPS link instead. + +if RUN_FITS: + + # Clone FSPS if not already available (only needed once). + # FSPS contains stellar population synthesis libraries used by Prospector. + # The first time this is run it will clone the FSPS repo and + # download more than 1 GB of code and data files. + # Idempotent unless there is a new FSPS commit in this repo (expect rarely). + !git submodule update --init + + # Set the environment variable pointing to the cloned fsps directory. + from pathlib import Path + os.environ["SPS_HOME"] = f"{Path().cwd() / 'fsps'}" + + # Now import fsps cleanly + import fsps, prospect.sources.galaxy_basis as gb #some hack required because of a bug in prospector + gb.fsps = fsps # inject fsps into the module namespace + + #verify this setup worked + sp = fsps.StellarPopulation() + print("Available FSPS libraries:", sp.libraries) + from prospect.models import SpecModel + from prospect.fitting import lnprobfn, fit_model +``` + +```{code-cell} ipython3 +if RUN_FITS: + + # Initialize spectral model + sps = CSPSpecBasis(zcontinuous=1) + + # Run fits in parallel + outputs = fit_sample_parallel( + obs_list, model_params, sps, + lnprobfn, noise_model, + fitting_kwargs=fitting_kwargs, + nproc=None # auto-adjusts based on CPU count + ) + + # Save results for later use + save_outputs_hdf5(outputs, obs_list, filename="sn_fits.h5") + + print(f"✅ Loaded {len(outputs)} fits and matched {len(df_small)} galaxies in df_scaled") +``` + +## 3B. Use existing results(fast, default) +If you already have a saved `sn_fits.h5` file from a previous run, +use this section to quickly load the fitted results instead of re-running Prospector. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def download_results_hdf5_from_gdrive(file_id, dest_path="./sn_fits.h5"): + """ + Download the Prospector results HDF5 file from Google Drive. + + Parameters + ---------- + file_id : str + Google Drive file identifier. + dest_path : str, optional + Location where the downloaded file will be written. + Default is "./sn_fits.h5". + + Returns + ------- + str + The path to the downloaded file. + """ + if not os.path.exists(dest_path): + print(f"[download] Fetching results file from Google Drive → {dest_path}") + url = f"https://drive.google.com/uc?id={file_id}" + gdown.download(url, dest_path, quiet=False) + else: + print(f"[download] File already exists → {dest_path}") + + return dest_path +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def load_outputs_hdf5(filename="sn_fits.h5", + file_id="1BuymtEXJsxbd8PqwkIEmff-76JGGa650"): + """ + Download (if necessary) and load multiple galaxy fit results + and corresponding obs dicts from a single HDF5 file. + Rebuilds sedpy.Filter objects. + + Parameters + ---------- + filename : str + HDF5 file created by save_outputs_hdf5. + + Returns + ------- + outputs : list of dict + List of processed output dicts. + obs_list : list of dict + List of obs dicts with sedpy.Filter objects rebuilt. + """ + + # --- Ensure file is available locally --- + filename = download_results_hdf5_from_gdrive( + file_id=file_id, + dest_path=filename + ) + + outputs, obs_list = [], [] + with h5py.File(filename, "r") as f: + for gid in f.keys(): + g = f[gid] + + # ---- reconstruct results ---- + out = { + "galaxy_id": gid, + "chain": g["chain"][()], + "weights": g["weights"][()], + "theta_labels": [x.decode("utf-8") for x in g["theta_labels"][()]], + "bestfit": {k: g["bestfit"][k][()] for k in g["bestfit"]} + } + outputs.append(out) + + # ---- reconstruct obs ---- + obs_g = g["obs"] + filt_names = [x.decode("utf-8") for x in obs_g["filters"][()]] + obs = dict( + redshift=obs_g["redshift"][()], + maggies=obs_g["maggies"][()], + maggies_unc=obs_g["maggies_unc"][()], + filters=load_filters(filt_names), # <- rebuild sedpy.Filter objects + ) + obs["galaxy_id"] = gid + obs_list.append(obs) + + print(f"[load] Loaded {len(outputs)} galaxies from {filename}") + return outputs, obs_list +``` + +```{code-cell} ipython3 +if not RUN_FITS: + # Load saved outputs and observation dictionaries + outputs, obs_list = load_outputs_hdf5("sn_fits.h5") + + # Match IDs between fits and your scaled dataframe + fitted_ids = [str(o["galaxy_id"]) for o in outputs] + df_small = df_scaled[df_scaled["galaxy_id"].astype(str).isin(fitted_ids)].copy() + + print(f"✅ Loaded {len(outputs)} fits and matched {len(df_small)} galaxies in df_scaled") +``` + +## 4. Plot the results +After fitting, we can visualize the results by comparing observed and modeled SEDs and inspecting posterior parameter distributions. + ++++ + +### 4.1 Plot the SEDs with their best fit from Prospector + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_SED_fit(outputs, obs_list, galaxy_id): + """ + Plot SED fit for one galaxy in outputs/obs_list and print diagnostics. + + Parameters + ---------- + outputs : list of dict + Prospector output dicts (from fit_sample_parallel or load_outputs_and_obs). + obs_list : list of dict + Observation dicts, same order as outputs. + galaxy_id : str or int + Which galaxy to plot. + + Returns + ------- + matplotlib.figure.Figure + The SED fit figure. + """ + # Locate this galaxy’s output and observation + out = next((o for o in outputs if str(o.get("galaxy_id")) == str(galaxy_id)), None) + obs = next((o for o in obs_list if str(o.get("galaxy_id")) == str(galaxy_id)), None) + + if out is None or obs is None: + raise ValueError(f"Galaxy {galaxy_id} not found in outputs/obs_list") + + fig, (ax_resid, ax_sed) = plt.subplots( + 2, 1, gridspec_kw=dict(height_ratios=[1, 4]), + sharex=True, figsize=(8, 6) + ) + + # Extract data + pwave = np.array([f.wave_effective for f in obs["filters"]]) + bsed = out["bestfit"] + obs_flux = np.array(obs["maggies"]) + obs_unc = np.array(obs["maggies_unc"]) + model_flux = np.array(bsed["photometry"]) + + # Identify Rubin vs Roman filters by name + filter_names = [f.name for f in obs["filters"]] + rubin_mask = np.array(["lsst" in name.lower() for name in filter_names]) + roman_mask = np.array(["roman" in name.lower() for name in filter_names]) + + # Convert model wavelengths from Å to µm for consistency with filter centers + wavelengths_micron = bsed["restframe_wavelengths"] * (1 + obs["redshift"]) / 1e4 + pwave_micron = pwave / 1e4 # convert filter effective wavelengths too + + + # --- lower panel: SED fit --- + ax_sed.errorbar( + pwave_micron[rubin_mask], obs_flux[rubin_mask], obs_unc[rubin_mask], + fmt="o", color="#377eb8", label="Rubin" + ) + ax_sed.errorbar( + pwave_micron[roman_mask], obs_flux[roman_mask], obs_unc[roman_mask], + fmt="o", color="#e41a1c", label="Roman" + ) + + # Overlay best-fit model spectrum + ax_sed.plot( + wavelengths_micron, + bsed["spectrum"], + color="firebrick", + label="Model spectrum" + ) + + # model photometry: black boxes + ax_sed.plot( + pwave_micron, model_flux, + "s", markersize=8, mec="k", mfc="none", mew=1.8, label="Model photometry" + ) + + # Y-axis scaling: tighten around data + y_min = np.nanmin(obs_flux - 2 * obs_unc) + y_max = np.nanmax(obs_flux + 2 * obs_unc) + ax_sed.set_ylim(y_min * 0.8, y_max * 1.2) + + ax_sed.set_ylabel(r"$f_\nu$ (maggies)") + ax_sed.set_xlabel(r"Wavelength (µm)") + ax_sed.set_xlim(pwave_micron.min() * 0.9, pwave_micron.max() * 1.1) + ax_sed.set_yscale("log") + ax_sed.set_title(f"SED Fit for Galaxy {galaxy_id}", fontsize=12) + ax_sed.legend(loc="best") + + # --- upper panel: residuals --- + chi = (obs_flux - model_flux) / obs_unc + # Separate Rubin and Roman photometric points for clarity + ax_resid.errorbar(pwave_micron[rubin_mask], chi[rubin_mask], fmt="o", color="#377eb8") + ax_resid.errorbar(pwave_micron[roman_mask], chi[roman_mask], fmt="o", color="#e41a1c") + ax_resid.axhline(0, color="k", linestyle=":") + ax_resid.set_ylabel(r"$\chi_{\rm best}$") + + fig.tight_layout() + return +``` + +```{code-cell} ipython3 +# Extract all galaxy IDs from the outputs list +galaxy_ids = [o["galaxy_id"] for o in outputs] + +# Loop through each of the first 5 galaxy IDs and plot its SED fit +for gid in galaxy_ids[:5]: + plot_SED_fit(outputs, obs_list, gid) +``` + +### 4.2 Plot corner plots of posteriors + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_corner(outputs, obs_list, galaxy_id): + """ + Plot marginalized posterior corner diagram for one galaxy using Prospector's built-in corner plotting. + + Parameters + ---------- + outputs : list of dict + Processed Prospector output dicts (from fit_sample_parallel or load_outputs_hdf5). + obs_list : list of dict + Observation dicts (not used here, included for consistency). + galaxy_id : str or int + Galaxy ID to plot. + + Returns + ------- + matplotlib.figure.Figure + The corner plot figure showing marginalized posteriors. + """ + color="royalblue" + gid = str(galaxy_id) + out = next((o for o in outputs if str(o.get("galaxy_id")) == gid), None) + if out is None: + raise ValueError(f"Galaxy {galaxy_id} not found in outputs") + + chain = out.get("chain") + weights = out.get("weights") + labels = out.get("theta_labels") + + if chain is None or weights is None or labels is None: + raise ValueError(f"Output for galaxy {galaxy_id} missing chain/weights/labels") + + nsamples, ndim = chain.shape + print(f"[Galaxy {galaxy_id}] Plotting corner with {nsamples} samples across {ndim} parameters") + + # Create subplots grid + fig, axes = plt.subplots(ndim, ndim, figsize=(10, 9)) + + # Plot using Prospector's built-in corner routine + axes = corner.allcorner( + chain.T, + labels, + axes, + weights=weights, + color=color, + show_titles=True, + ) + + # Compute best sample manually if missing lnprobability + if "lnprobability" in out and np.ndim(out["lnprobability"]) == 1: + imax = np.argmax(out["lnprobability"]) + theta_best = chain[imax] + elif "lnprobability" in out and np.ndim(out["lnprobability"]) == 2: + # sometimes stored as sampler.logl per live point + flat = out["lnprobability"].ravel() + imax = np.argmax(flat) + theta_best = chain[imax % len(chain)] + else: + # fallback to weighted best sample + imax = np.argmax(weights) + theta_best = chain[imax] + + # Overlay best-fit point + from prospect.plotting import corner as pcorner + pcorner.scatter(theta_best[:, None], axes, color="firebrick", marker="o", s=40) + #pcorner.scatter(theta_best, axes,color="firebrick", marker="o", s=40) + + fig.suptitle(f"Posterior Distributions — Galaxy {galaxy_id}", fontsize=14) + plt.tight_layout() + plt.show() + + return +``` + +```{code-cell} ipython3 +gid = galaxy_ids[0] +plot_corner(outputs, obs_list, gid) +``` + +```{code-cell} ipython3 +gid = galaxy_ids[3] +plot_corner(outputs, obs_list, gid) +``` + +## 5. Verification +As a sanity check, we compare the stellar masses derived by Prospector to the “true” simulated stellar masses stored in the OpenUniverse catalog. +Agreement along the 1:1 line indicates successful recovery of physical parameters. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_stellar_mass_verification(df, outputs): + """ + Verify measured stellar mass against input stellar mass for multiple galaxies. + + Parameters + ---------- + df : pandas.DataFrame + DataFrame containing the true/input stellar masses. Must include + a column "um_source_galaxy_obs_sm". + outputs : list of dict + List of processed Prospector results dicts (from process_output). + + Returns + ------- + matplotlib.figure.Figure + The verification plot figure. + """ + + # Extract input (true) stellar masses from the catalog + input_mass = df["um_source_galaxy_obs_sm"].values + print("Input mass", input_mass) + + # Initialize lists to store median and uncertainty for each galaxy + measured_mass, err_low, err_high = [], [], [] + + # Loop through all Prospector fit results + for out in outputs: + + # Extract the MCMC chain and parameter names + chain = out["chain"] + param_names = out["theta_labels"] + + # Identify the 'mass' parameter index (log10 of stellar mass) + idx = param_names.index("mass") #is in logmass + + # Convert log(mass) samples to linear scale (M☉) + samples = np.log10(chain[:, idx]) # convert to log + mass_samples = 10 ** samples # in M☉ + + # Compute median and 16th/84th percentiles as uncertainty bounds + median = np.median(mass_samples) + low, high = np.percentile(mass_samples, [16, 84]) + + # Store median and asymmetric error bars + measured_mass.append(median) + err_low.append(median - low) + err_high.append(high - median) + + # --- plotting --- + fig, ax = plt.subplots(figsize=(6, 6)) + ax.errorbar(input_mass, measured_mass, + yerr=[err_low, err_high], + fmt="o", color="royalblue", ecolor="lightgray", alpha=0.7) + + #make the 1:1 correlation line + lims = [min(input_mass.min(), min(measured_mass)), + max(input_mass.max(), max(measured_mass))] + ax.plot(lims, lims, "k--") + ax.set_xscale("log") + ax.set_yscale("log") + + ax.set_xlabel("Input stellar mass") + ax.set_ylabel("Measured stellar mass") + ax.set_title("Stellar Mass Verification") + + return +``` + +```{code-cell} ipython3 +plot_stellar_mass_verification(df_small, outputs) +``` + +The example shown here does not yield particularly accurate stellar‐mass estimates — this is understandable given the simplifying assumptions used in the tutorial. +As noted in the Assumptions section above, these quick Prospector runs are intended to demonstrate how to use and visualize OpenUniverse data rather than to optimize astrophysical fits. Users are encouraged to experiment with model parameters, priors, and error settings to obtain more realistic results. + ++++ + +## 6. Host galaxy properties as a function of SN type +Finally, we examine whether host galaxy properties differ between Type Ia and Core-Collapse supernovae. +We compare the distribution of Prospector-derived stellar masses for the two SN classes. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_1a_vs_cc_mass(df_sn, outputs): + """ + Plot histograms of Prospector-fitted host stellar masses for + Type Ia vs Core-Collapse (CC) supernova hosts. + + Parameters + ---------- + df_sn : pandas.DataFrame + Must contain columns 'model_name' and either 'id' or 'galaxy_id'. + 'model_name' is used to classify SN type. + outputs : list of dict + Processed Prospector outputs (from process_output), one per galaxy. + Must contain keys: 'galaxy_id', 'chain', and 'theta_labels'. + + Notes + ----- + - Stellar mass in Prospector is stored as log10(M/M☉). + - 'SALT3.NIR_WAVEEXT' is treated as Type Ia. + - All other SN models are grouped as Core-Collapse. + """ + + # --- extract stellar masses from Prospector outputs --- + prospector_masses = [] + for out in outputs: + + # Extract the MCMC chain and parameter names + chain = out["chain"] + param_names = out["theta_labels"] + gid = str(out["galaxy_id"]) # ensure string for consistency + + if "mass" not in param_names: + print(f"[warning] Galaxy {gid} has no 'mass' parameter; skipping.") + continue + + # Identify the 'mass' parameter index (log10 of stellar mass) + idx = param_names.index("mass") + logmass_samples = chain[:, idx] + logmass_median = np.median(logmass_samples) + prospector_masses.append(dict(galaxy_id=gid, log_mass=logmass_median)) + + df_mass = pd.DataFrame(prospector_masses) + + # --- ensure consistent dtype for merge keys --- + df_sn = df_sn.copy() + df_sn["galaxy_id"] = df_sn["galaxy_id"].astype(str) + df_mass["galaxy_id"] = df_mass["galaxy_id"].astype(str) + + # --- merge --- + merged = df_sn.merge(df_mass, left_on="galaxy_id", right_on="galaxy_id", how="inner") + + if merged.empty: + raise ValueError("[error] No matches found between df_sn and Prospector outputs. Check ID formats.") + + # --- classify SN types --- + merged["SN_type"] = merged["model_name"].apply( + lambda m: "Type Ia" if m == "SALT3.NIR_WAVEEXT" else "Core-Collapse" + ) + + # --- plot --- + plt.figure(figsize=(8, 5)) + sn_types = merged["SN_type"].unique() + + if len(sn_types) > 1: + # Two or more SN types → overlay histograms + sns.histplot( + data=merged, + x="log_mass", + hue="SN_type", + multiple="layer", + stat="density", + common_norm=False, + palette={"Type Ia": "tab:blue", "Core-Collapse": "tab:orange"}, + bins=20 + ) + plt.title("Prospector-Derived Host Stellar Mass:\nType Ia vs Core-Collapse SNe") + # ensure legend always appears + handles, labels = plt.gca().get_legend_handles_labels() + if not labels: + plt.legend(["Type Ia", "Core-Collapse"], title="SN Type", loc="best") + else: + plt.legend(title="SN Type", loc="best") + else: + # Only one SN type → single histogram, fixed color + single_type = sn_types[0] + color = "tab:blue" if single_type == "Type Ia" else "tab:orange" + sns.histplot( + data=merged, + x="log_mass", + color=color, + element="step", + stat="density", + bins=20 + ) + plt.title(f"Prospector-Derived Host Stellar Mass ({single_type} Only)") + + plt.xlabel(r"Host Stellar Mass $\log_{10}(M_*/M_\odot)$ (Prospector fit)") + plt.ylabel("Normalized Density") + plt.tight_layout() + plt.show() +``` + +```{code-cell} ipython3 +plot_1a_vs_cc_mass(df_sn, outputs) +``` + +Distribution of Prospector-derived host galaxy stellar masses for Type Ia and Core-Collapse supernovae. The two distributions do not show a clear separation, which likely reflects the limited accuracy of the stellar-mass fits in this quick demonstration. As discussed in the stellar-mass verification figure (Section 5), the simplified assumptions used in the tutorial—such as fixed flux uncertainties and basic model priors—introduce significant scatter that can obscure any underlying physical differences between the SN host populations. Larger sample sizes or more attention to fit assumptions may clarify this plot. + ++++ + +## Acknowledgements + +- [IPAC-IRSA](https://irsa.ipac.caltech.edu/) + +## About this notebook + +**Authors:** Jessica Krick, Jaladh Singhal, Brigitta Sipőcz + +**Updated:** 2026-02-12 + +**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions +or problems. + +**Runtime:** As of the date above, this notebook takes about 70s to run to completion on +a machine with 8GB RAM and 2 CPU *without* running any of the SED fitting. Running this tutorial *including* fitting on 20 galaxies takes about 40 minutes to run to completion on a machine with 64GB RAM and 16 CPU. + +**AI Acknowledgement:** This tutorial was developed with the assistance of OpenAI’s ChatGPT (GPT-5) + +## References +Johnson, B. D., Leja, J., Conroy, C., & Speagle, J. S. (2021). Stellar Population Inference with Prospector. +The Astrophysical Journal Supplement Series, 254(2), 22. + +Leja, J., Johnson, B. D., Conroy, C., van Dokkum, P. G., & Byler, N. (2017). An Older, More Quiescent Universe from Panchromatic SED Fitting of the 3D-HST Survey. +The Astrophysical Journal, 837 (2), 170. + +Conroy, C., Gunn, J. E., & White, M. (2009). The Propagation of Uncertainties in Stellar Population Synthesis Modeling. +The Astrophysical Journal, 699(1), 486–506. + +Conroy, C., & Gunn, J. E. (2010). The Propagation of Uncertainties in Stellar Population Synthesis Modeling. II. The Challenge of Comparing Galaxy Evolution Models to Observations. +The Astrophysical Journal, 712(2), 833–857. + +Johnson, B. D. (2024). python-fsps: Python bindings to the Flexible Stellar Population Synthesis (FSPS) code (Version 0.4.7) +Computersoftware. +Zenodo. https://doi.org/10.5281/zenodo.12447779 + +Speagle, J. S. (2020). DYNESTY: a dynamic nested sampling package for estimating Bayesian posteriors and evidences. +Monthly Notices of the Royal Astronomical Society, 493(3), 3132–3158. + +Speagle, J. S. (2025). dynesty: Dynamic Nested Sampling in Python (Version 2.1.0) +Computersoftware. +Zenodo. https://doi.org/10.5281/zenodo.17268284 + +Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J. (2013). emcee: The MCMC Hammer. +Publications of the Astronomical Society of the Pacific, 125(925), 306–312. + +The Roman Supernova Cosmology Project Infrastructure Team (2025). OpenUniverse2024: A shared, simulated view of the sky for the next generation of cosmological surveys. +Monthly Notices of the Royal Astronomical Society, 1729T. +https://doi.org/10.26131/IRSA596 + +```{code-cell} ipython3 +print("total time", time.time() - starttime) +``` + +```{code-cell} ipython3 + +``` diff --git a/tutorials/simulated-data/OpenUniverse2024/TDE_light_curve.md b/tutorials/simulated-data/OpenUniverse2024/TDE_light_curve.md new file mode 100644 index 00000000..0b3abbd2 --- /dev/null +++ b/tutorials/simulated-data/OpenUniverse2024/TDE_light_curve.md @@ -0,0 +1,895 @@ +--- +authors: +- name: Jessica Krick +- name: Jaladh Singhal +- name: "Brigitta Sip\u0151cz" +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.19.1 +kernelspec: + name: python3 + display_name: python3 + language: python +--- + +# TDE Light Curve + +## Learning Goals + +By the end of this tutorial, you will be able to : + +- Query the OpenUniverse2024 images for a source of interest +- Perform aperture photometry on images +- Generate a light curve +- Display full images and cutouts + +## Introduction + +The [OpenUniverse2024](https://arxiv.org/abs/2501.05632) simulation suite delivers ~70 deg² of matched optical/infrared imagery designed for both the LSST Wide‑Fast‑Deep (WFD) and the Nancy Grace Roman Space Telescope high-latitude survey, enabling joint survey planning and multi-wavelength systematics studies. +It incorporates the updated “Diffsky” extragalactic model, extended transient modeling across optical/IR wavelengths, and realistic telescope/instrument effects, producing roughly 400 TB of publicly available synthetic imaging and catalogs. +The goal of this project is to enable cross-collaboration and maximize science return from next-generation cosmological surveys by providing a consistent simulated sky observed by multiple observatories. + +Tidal Disruption Events (TDEs) occur when a star passes close enough to a supermassive black hole to be torn apart by tidal forces, producing a luminous flare that can outshine the host galaxy for weeks to months. +Identifying and characterizing TDE host galaxies is key to understanding the demographics of supermassive black holes and the galactic environments that produce these rare events. +This notebook demonstrates how to locate a simulated TDE from the OpenUniverse2024 transient input catalog, identify its host galaxy, and extract optical and infrared photometry from Roman images to construct a multi-epoch light curve. +The OpenUniverse2024 dataset also provides matched Rubin optical coverage over the same sky. +With a few simple changes in Sections 3 and 4, this workflow can be extended to other Roman and Rubin bands to build a true multi-wavelength light curve. + +### Instructions + +This notebook is designed to be run sequentially from top to bottom. All code is self-contained and relies on publicly accessible data. + +### Input + +- A TDE from the OpenUniverse2024 transient input catalog + +### Output + +- Light curve(s) of host galaxy(s) +- Cutout gallery of host galaxy(s) + +## Imports + +```{code-cell} ipython3 +# Uncomment the next line to install dependencies if needed. +!pip install numpy astropy s3fs photutils matplotlib scipy pandas fsspec pyarrow hpgeom astroquery +``` + +```{code-cell} ipython3 +from astropy.io import fits +import numpy as np +import s3fs +from matplotlib import pyplot as plt +import pandas as pd +from photutils.aperture import SkyCircularAperture, aperture_photometry +from scipy.ndimage import rotate +from astropy import units as u +from astropy.coordinates import SkyCoord +from astropy.nddata import Cutout2D +from astropy.wcs import WCS +import pyarrow.dataset as ds +import pyarrow.fs +import pyarrow.parquet as pq +import hpgeom +import json +from astroquery.ipac.irsa import Irsa + +import itertools +``` + +## 1. Explore the OpenUniverse2024 data directories + +This section of the tutorial demonstrates how to explore the OpenUniverse2024 data directories directly on S3 and inspect simulated Roman and Rubin images without downloading large datasets locally. It establishes a connection to the public NASA IRSA simulations bucket using s3fs, defines key directory paths for the full Roman and Rubin simulations (not the preview subsets), and illustrates how to browse image files for a selected band and pointing. The accompanying functions — summarize_fits_files() and show_gallery() — provide tools for quickly summarizing FITS file metadata (e.g., number of extensions, pointing information, pixel scale) and for visualizing a small gallery of example images from the chosen directory. + +In the prefix you will see that we choose "simple_model" simulations and not "truth" simulations because the simple_model images are the ones with noise and real effects, while "Truth" are noise free, perfect images. + +Also in the prefix you will see that we choose the full simulation, not the preview simulation for both Roman and Rubin. Differences between the "full" and "preview" simulations are clarified in the [this](https://arxiv.org/abs/2501.05632) publication + +```{code-cell} ipython3 +# Setup + +# Create a connection to the public NASA IRSA S3 storage bucket using the `s3fs` library. +# By setting `anon=True`, the connection is opened in **anonymous read-only mode**, +# allowing us to list and access public files (such as the OpenUniverse2024 Roman and +# Rubin simulation data) directly from S3 without requiring AWS credentials. + +#initialize a general interface to Amazon cloud +s3 = s3fs.S3FileSystem(anon=True) + +#general location information +BUCKET_NAME = "nasa-irsa-simulations" +OU_PREFIX = "openuniverse2024" +ROMAN_TDS_PREFIX = "roman/full/RomanTDS/images/simple_model" # +CATALOG_NAME = "roman_rubin_cats_v1.1.2_faint" + +#spcific location information +BAND= "J129" +POINTING = "10190" + +#the full path to the data we are interested in exploring +image_directory = f"{BUCKET_NAME}/{OU_PREFIX}/{ROMAN_TDS_PREFIX}/{BAND}/{POINTING}" + +#list the contents +s3.ls(image_directory) +``` + +```{code-cell} ipython3 +# open and explore extensions + +# how many files are in the bucket? +files = [f"s3://{f}" for f in s3.ls(image_directory)] +print(f"Found {len(files)} files") + +#pick one fits file to explore +fname = files[0] + +#describe the available extensions in this fits file +with fits.open(fname, use_fsspec=True, fsspec_kwargs={"anon": True}, memmap=False) as hdul: + print(f"File: {fname}") + print(f"Number of extensions: {len(hdul)}\n") + hdul.info() +``` + +This output lists the structure and contents of one example Roman TDS FITS image from the OpenUniverse2024 dataset. It shows that 18 image files were found in the selected directory, and the examined file contains four extensions: a primary header (no data) followed by three 4088×4088 pixel image planes labeled SCI, ERR, and DQ, which store the science image, per-pixel uncertainty, and data quality mask, respectively. For each extension, the output reports its type, data dimensions, and the first few header keywords to give you a sense of what is in the file. + ++++ + +Let's take a look at a few images to see what we are dealing with. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def show_gallery(files, max_images=9): + """ + Display a gallery of FITS images. + + Parameters + ---------- + files : list of str + List of S3 URIs to FITS files. + max_images : int, optional + Maximum number of images to display (default: 9). + """ + # Limit the number of images to display + n_images = min(len(files), max_images) + + # Choose number of columns: up to 3, or equal to n_images if fewer than 4 + ncols = n_images if n_images < 4 else 3 + + # Compute number of rows based on total images and columns + nrows = (n_images + ncols - 1) // ncols + + # Create the subplot grid + fig, axes = plt.subplots(nrows, ncols, figsize=(4 * ncols, 4 * nrows)) + axes = np.atleast_1d(axes).ravel() # flatten in case of 1D output + + # Loop through each file and display the image + for i, f in enumerate(files[:n_images]): + # Open FITS file directly from S3 (anonymous read) + with fits.open(f, fsspec_kwargs={"anon": True}, memmap=False) as hdul: + data = hdul[1].data # Extract image data + # Compute robust display scaling (5th–99th percentile) + vmin, vmax = np.nanpercentile(data, [5, 99]) + # Show image in grayscale with good contrast + axes[i].imshow(data, origin="lower", cmap="gray", vmin=vmin, vmax=vmax) + # Title: just the base filename + axes[i].set_title(f.split("/")[-1], fontsize=8) + # Remove tick marks and labels + axes[i].axis("off") + + # Hide any unused subplot axes + for j in range(i + 1, len(axes)): + axes[j].axis("off") + + # Adjust layout for neat display + plt.tight_layout() + plt.show() +``` + +```{code-cell} ipython3 +# Number of example images to display. Increase to see more of the available images. +n_gallery_images = 6 +show_gallery(files, max_images=n_gallery_images) +``` + +## 2. Find a TDE target from the transient catalog + +We use the OpenUniverse2024 transient input catalog — the same SNANA parquet files described in the [SED Fitting tutorial](SED_fit.md) — to find a TDE. +The catalog stores one parquet file per HEALPix region, and TDEs are rare, so not every region will contain one. +The catalog is split into three types of parquet files, each indexed by HEALPix region: + + 1. snana_{region}.parquet — the transient source catalog, with one row per simulated event (supernovae, TDEs, etc.), including fields such as the event + type (model_name) and the ID of the host galaxy (host_id) + 2. galaxy_{region}.parquet — the host galaxy catalog, with sky positions and physical properties for each galaxy + 3. galaxy_flux_{region}.parquet — multi-band photometry for each galaxy (used in the SED Fitting tutorial but not needed here) + +TDEs are rare, so not every region will contain one. +We use the known center of the Roman Time-Domain Survey to target the right region directly. +We then cross-match the TDE's host_id into the galaxy file to retrieve the host's sky coordinates for the image search that follows. + +First, we connect to S3 and list all available SNANA parquet files in the catalog. + +```{code-cell} ipython3 +fs = pyarrow.fs.S3FileSystem(anonymous=True) +catalog_prefix = f"{BUCKET_NAME}/{OU_PREFIX}/roman/full/{CATALOG_NAME}" + +# List all SNANA parquet files in the catalog directory, sorted for consistent ordering. +file_info = fs.get_file_info(pyarrow.fs.FileSelector(catalog_prefix, recursive=False)) +snana_files = sorted([ + f.path for f in file_info + if f.base_name.startswith("snana_") and f.base_name.endswith(".parquet") +]) + +print(f"Found {len(snana_files)} snana parquet files") +``` + +Since our goal is to make a light curve for a TDE, we need to pick a SNANA parquet file that is covered by the Roman Time-Domain Survey (TDS). + +```{code-cell} ipython3 +# Time Domain Survey (TDS) is centered at LSST ELAIS-S1 DDF: +ra = 9.45 +dec = -44.02 + +# In snana_{region}.parquet, region is HEALPix pixel ID at order 5 (nside=32) with RING ordering +nside = 32 +nest = False + +# Convert TDS center coordinates to region ID used in the naming of SNANA parquet files +region = hpgeom.angle_to_pixel(nside, ra, dec, lonlat=True, nest=False) +snana_path = [f for f in snana_files if f"snana_{region}.parquet" in f][0] +snana_path +``` + +Next, we scan this file and find a row with `model_name == "NON1ASED.TDE-BBFIT"` that represents a TDE. + +```{code-cell} ipython3 +# Read the parquet file into a pandas dataframe. +df = pq.read_table(snana_path, filesystem=fs).to_pandas() +# Look for TDE models. +mask = df["model_name"] == "NON1ASED.TDE-BBFIT" +if mask.any(): + # Choose the first TDE + tde_info = df[mask].iloc[0].squeeze() + print(f"Found a TDE in region {region} with the following info:") + print(tde_info) +else: + raise RuntimeError(f"No TDE found in region {region}.") +``` + +Once we have the TDE, we load the corresponding galaxy info parquet file for that region. Then we identify its host galaxy(s) by matching the host ID in the TDE info with the galaxy IDs in the galaxy info. + +```{code-cell} ipython3 +galaxy_info_file = f"{catalog_prefix}/galaxy_{region}.parquet" +host_galaxy = pq.read_table(galaxy_info_file, filesystem=fs, + # filter while reading pq for time efficiency + filters=[("galaxy_id", "==", tde_info["host_id"])] + ).to_pandas() +host_galaxy +``` + +## 3. Image Access +Now we have the host galaxy's position and ID in `host_galaxy`, ready for image queries. With those coordinates in hand, we then query the Roman image files that overlap this region, retrieving only the files needed for subsequent photometry and light-curve analysis. + +```{code-cell} ipython3 +# Point the astroquery IRSA client to the correct locations. +Irsa.sia_url = "https://irsa.ipac.caltech.edu/simulated/SIA" +Irsa.tap_url = "https://irsa.ipac.caltech.edu/simulated/TAP" + +Irsa.list_collections(servicetype='SIA') +``` + +```{code-cell} ipython3 +OU_ROMAN_SIA_COLLECTION = 'simulated_roman_openuniverse2024' +OU_RUBIN_SIA_COLLECTION = 'simulated_rubin_openuniverse2024' +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def get_s3_fpath(cloud_access): + cloud_info = json.loads(cloud_access) # converts str to dict + bucket_name = cloud_info['aws']['bucket_name'] + key = cloud_info['aws']['key'] + + return f's3://{bucket_name}/{key}' +``` + +First, we find the filenames of the images in the Roman TDS survey which include the TDE host galaxy. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def Roman_TDS_image_search(host_galaxy, radius, bandname): + """ + Get OpenUniverse2024 Roman TDS images for the TDE host galaxy. + + Parameters + ---------- + host_galaxy : pandas.DataFrame + Must include 'galaxy_id', 'ra', and 'dec' columns. + radius : astropy.units.Quantity + Search radius. + bandname : string + Roman bandname for which to search images (e.g. 'J129', 'H158'). + + Returns + ------- + astropy.table.Table + SIA result table filtered to TDE images in the specified band, + with an added 's3_uri' column for the image file paths. + """ + row = host_galaxy.iloc[0] + galaxy_id = int(row["galaxy_id"]) + ra_center, dec_center = row["ra"], row["dec"] + print(f"Accessing images in band={bandname} for galaxy_id={galaxy_id} at RA={ra_center:.3f}, Dec={dec_center:.3f} ...", end="") + + coords = SkyCoord(ra_center, dec_center, unit='deg') + sia_results = Irsa.query_sia(pos=(coords, radius.to(u.deg)), collection=OU_ROMAN_SIA_COLLECTION) + filtered_results = sia_results[['TDS_simple_model' in r['obs_id'] and bandname in r['energy_bandpassname'] + for r in sia_results]] + filtered_results['s3_uri'] = [get_s3_fpath(r['cloud_access']) for r in filtered_results] + + print(f"done. Found {len(filtered_results)} images.") + return filtered_results +``` + +```{code-cell} ipython3 +bands = ["J129", "H158"] +image_search_radius = 1 * u.arcsec # point-like since we just need images containing the host galaxy + +all_band_images = {} +for bandname in bands: + all_band_images[bandname] = Roman_TDS_image_search(host_galaxy, image_search_radius, bandname) +``` + +Since there are > 100 TDS images per band, we will: +1. restrict to images taken during the TDE event window (`start_mjd` to `end_mjd` from the transient catalog) so the light curve focuses on the TDE itself rather than the full survey duration. +2. filter out images where the host galaxy is close to the edge so that photometry is more reliable and partial cutouts can be avoided. +3. select only an evenly time-sampled subset of images so that photometry is quicker to run. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def select_images_by_mjd_quantiles(images, n_select=9): + """ + Select up to `n_select` images using rank-quantiles on time axis. + + Parameters + ---------- + images : astropy Table + Table of images (SIA results). + n_select : int, optional + Maximum number of images to select. Default is 9. + + Returns + ------- + astropy Table + Subset of input `images` corresponding to the selected quantiles. + """ + sorted_images = images.copy() # avoid modifying the original table + sorted_images.sort('t_min') # sort by time axis, t_min = t_max for TDS images, so we can use either + num_quantiles = min(n_select, len(sorted_images)) + quantile_indices = np.rint( # round off to nearest integer + np.linspace(0, len(sorted_images) - 1, num_quantiles) + ).astype(int) + return sorted_images[quantile_indices] +``` + +```{code-cell} ipython3 +galaxy_id = int(host_galaxy.iloc[0]["galaxy_id"]) + +for bandname in bands: + image_tbl = all_band_images[bandname] + + # 1. restrict to images within the TDE event window + tde_start, tde_end = tde_info["start_mjd"], tde_info["end_mjd"] + in_window = [tde_start <= r['t_min'] <= tde_end for r in image_tbl] + selected_images = image_tbl[in_window] + + # 2. filter out images where the host galaxy is close to the edge of image + # 'dist_to_point' is the distance from the center of the image to the host galaxy + # 's_fov' is the estimated diameter of the circular region covered by the image + # so we keep images where the host galaxy is within 0.45 * s_fov (= 90% from the image center) + not_on_edge = [r['dist_to_point'] < (0.45 * r['s_fov']) for r in selected_images] + selected_images = selected_images[not_on_edge] + + # 3. select an evenly time-sampled subset + selected_images = select_images_by_mjd_quantiles(selected_images) + print(f"Band {bandname}, Galaxy {galaxy_id}: Downsampled {len(image_tbl)} images to {len(selected_images)} images.") + + # store per-band image filenames as a band-specific column + host_galaxy[f"image_filenames_{bandname}"] = [selected_images['s3_uri'].tolist()] +``` + +```{code-cell} ipython3 +# check if we have a nested column of image filenames for the host galaxy +host_galaxy +``` + +## 4. Make a Light Curve +This section demonstrates how to extract and visualize a light curve for a Tidal Disruption Event using simulated Roman images in two bands (J129 and H158). +The first function, `run_aperture_photometry()`, performs simple circular aperture photometry on a set of FITS images from S3 using the astropy [photutils](https://photutils.readthedocs.io/en/stable/) package. +The two plotting functions then compile these measurements into time-ordered plots showing how the observed flux evolves across multiple visits, providing a first look at temporal variability that could signal transient activity or host-galaxy changes. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def run_aperture_photometry(host_galaxy, bandname, image_column="image_filenames", aperture_radius=1.0): + """ + Perform circular aperture photometry on a list of FITS images. + + Parameters + ---------- + host_galaxy : pandas.DataFrame + Must contain columns 'ra', 'dec', and a nested column with FITS image paths + (each a list of FITS image paths). + bandname : string + Bandname for which to do photometry. + image_column : string, optional + Name of the dataframe column containing lists of FITS image paths. + Default is "image_filenames". + aperture_radius : float, optional + Aperture radius in arcsec. Default is 1.0 arcseconds which is ~9 pixels for Roman . + + Returns + ------- + None + Results are added directly to `host_galaxy` as new columns: + f'mjd_obs_{{bandname}}', f'flux_{{bandname}}', f'flux_err_{{bandname}}', + and 'aperture_radius_pix'. + """ + + row = host_galaxy.iloc[0] + filenames = row[image_column] + print(f"Performing photometry in {bandname} for {len(filenames)} sampled images " + f"for host galaxy at RA={row['ra']:.3f}, Dec={row['dec']:.3f} ...", end="") + + mjd_list, flux_list, flux_err_list, aperture_radius_pix_list = [], [], [], [] + + for fname in filenames: + + #opening a gzipped fits file, don't recommend changing the next line. + with fits.open(fname, fsspec_kwargs={"anon": True}, memmap=False) as hdul: + data = hdul[1].data + header = hdul[1].header + + # Build a WCS object so photutils can convert between sky and pixel coordinates. + wcs = WCS(header) + + # Simple circular aperture centered on the host galaxy position + sky_position = SkyCoord(row["ra"], row["dec"], unit="deg", frame="icrs") + aperture = SkyCircularAperture(sky_position, r=aperture_radius * u.arcsec) + pixel_aperture = aperture.to_pixel(wcs) + + # Perform aperture photometry + phot_table = aperture_photometry(data, aperture, wcs=wcs) + + # Check output (optional) + #print(phot_table) + + # Background estimate (median of finite pixels) + background = np.nanmedian(data) + + # Subtract background from aperture sum + aperture_area = pixel_aperture.area + flux = phot_table['aperture_sum'][0] - background * aperture_area + + # Approximate uncertainty from background rms + flux_err = np.nanstd(data) * np.sqrt(aperture_area) + + # Observation mid-time from MJD-OBS + mjd_obs = header.get('MJD-OBS', None) + + #store related info + mjd_list.append(mjd_obs) + flux_list.append(flux) + flux_err_list.append(flux_err) + aperture_radius_pix_list.append(float(pixel_aperture.r)) + + print("done.") + + # Add as nested columns + host_galaxy[f"mjd_obs_{bandname}"] = [mjd_list] + host_galaxy[f"flux_{bandname}"] = [flux_list] + host_galaxy[f"flux_err_{bandname}"] = [flux_err_list] + host_galaxy["aperture_radius_pix"] = [aperture_radius_pix_list] +``` + +```{code-cell} ipython3 +# We choose an aperture radius of 1.0 arcsec(~9 Roman pixels at 0.11"/pix) +for bandname in bands: + run_aperture_photometry(host_galaxy, bandname, + image_column=f"image_filenames_{bandname}", + aperture_radius=1.0) +``` + +```{code-cell} ipython3 +#take a quick look at the dataframe of aperture photometry to see what we are working with +host_galaxy +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_single_band_light_curve(df, bandname, start_mjd): + """ + Plot the TDE host galaxy light curve for a single band, with error bars. + + Parameters + ---------- + df : pandas.DataFrame + Must contain f'mjd_obs_{{bandname}}', f'flux_{{bandname}}', + and f'flux_err_{{bandname}}' columns. + bandname : str + Photometric band name used for column labels. + start_mjd : float + MJD of the TDE start, used to set the x-axis origin. + + Returns + ------- + matplotlib.figure.Figure + The generated figure. + """ + row = df.iloc[0] + galaxy_id = row["galaxy_id"] + + # Extract nested arrays + #need to go to numpy so we can check for non-finite values + times = np.array(row[f"mjd_obs_{bandname}"], dtype=float) - start_mjd + fluxes = np.array(row[f"flux_{bandname}"], dtype=float) + flux_errs = np.array(row[f"flux_err_{bandname}"], dtype=float) + + # Drop NaNs / non-finite values + mask = np.isfinite(times) & np.isfinite(fluxes) + mask &= np.isfinite(flux_errs) + times, fluxes, flux_errs = times[mask], fluxes[mask], flux_errs[mask] + if len(times) == 0: + print(f"⚠️ No valid flux points.") + return None + + #sort on time + sort_idx = np.argsort(times) + times, fluxes, flux_errs = times[sort_idx], fluxes[sort_idx], flux_errs[sort_idx] + + # Normalize to median flux + median_flux = np.median(fluxes) + fluxes = fluxes / median_flux + flux_errs = flux_errs / median_flux + + # Plot + fig, ax = plt.subplots(figsize=(6, 4)) + ax.errorbar( + times, fluxes, yerr=flux_errs, fmt="o", capsize=3, label=f"Galaxy {galaxy_id}" + ) + ax.plot(times, fluxes, "-", alpha=0.6, color=ax.get_lines()[-1].get_color()) + + ax.set_xlabel("Days since start of TDE") + ax.set_ylabel("Normalized Flux") + ax.set_title(f"Light Curve for Galaxy {galaxy_id} ({bandname})") + ax.legend() + + # Restrict x-axis to data range with small padding + margin = 0.05 * (times.max() - times.min()) if len(times) > 1 else 0.1 + ax.set_xlim(times.min() - margin, times.max() + margin) + + plt.show() + return fig +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def plot_multiband_light_curve(df, bands, start_mjd): + """ + Plot the TDE host galaxy light curve for multiple bands on a single axes. + + Parameters + ---------- + df : pandas.DataFrame + Must contain f'mjd_obs_{{band}}', f'flux_{{band}}', and + f'flux_err_{{band}}' columns for each band in `bands`. + bands : list of str + Roman band names to plot (e.g. ['J129', 'H158']). + start_mjd : float + MJD of the TDE start, used to set the x-axis origin. + + Returns + ------- + matplotlib.figure.Figure + The generated figure. + """ + colors = itertools.cycle(plt.cm.tab10.colors) + + fig, ax = plt.subplots(figsize=(7, 5)) + row = df.iloc[0] + xmin, xmax = np.inf, -np.inf + + for bandname in bands: + # Extract nested arrays + #need to go to numpy so we can check for non-finite values + times = np.array(row[f"mjd_obs_{bandname}"], dtype=float) - start_mjd + fluxes = np.array(row[f"flux_{bandname}"], dtype=float) + flux_errs = np.array(row[f"flux_err_{bandname}"], dtype=float) + + # Drop invalid + mask = np.isfinite(times) & np.isfinite(fluxes) & np.isfinite(flux_errs) + times, fluxes, flux_errs = times[mask], fluxes[mask], flux_errs[mask] + + if len(times) == 0: #empty photometry + continue + + #sort on time + sort_idx = np.argsort(times) + times, fluxes, flux_errs = times[sort_idx], fluxes[sort_idx], flux_errs[sort_idx] + + # Normalize to median flux + median_flux = np.median(fluxes) + fluxes = fluxes / median_flux + flux_errs = flux_errs / median_flux + + color = next(colors) + ax.errorbar(times, fluxes, yerr=flux_errs, fmt="o", capsize=3, + color=color, label=bandname) + ax.plot(times, fluxes, "-", alpha=0.6, color=color) + + xmin = min(xmin, times.min()) + xmax = max(xmax, times.max()) + + ax.set_xlabel("Days since start of TDE") + ax.set_ylabel("Normalized Flux") + ax.set_title("TDE Host Galaxy Light Curve") + ax.legend(fontsize="small") + + if xmin == np.inf or xmax == -np.inf: + print("⚠️ No valid photometry data found for any band. Cannot plot light curve.") + plt.close(fig) + return fig + + # Restrict x-axis to data range with padding + margin = 0.05 * (xmax - xmin) if xmax > xmin else 0.1 + ax.set_xlim(xmin - margin, xmax + margin) + + plt.show() + return fig +``` + +```{code-cell} ipython3 +fig_single = plot_single_band_light_curve(host_galaxy, bands[0], start_mjd=tde_info["start_mjd"]) +``` + +```{code-cell} ipython3 +fig_light_curves = plot_multiband_light_curve(host_galaxy, bands, start_mjd=tde_info["start_mjd"]) +``` + +## 5. Make Cutouts +We follow the example in this [tutorial](https://caltech-ipac.github.io/irsa-tutorials/openuniverse2024-roman-simulated-timedomainsurvey/) to display cutouts of the potential host galaxies as a function of time with the time listed in the cutout title. + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def make_cutout(fname, ra, dec, size=100): + """ + Create a North-up cutout around (RA, Dec) from a Roman TDS FITS image on S3. + + Parameters + ---------- + fname : str + Full or partial S3 path to the .fits.gz image. + ra, dec : float + Target coordinates in degrees. + size : int or float, optional + Cutout size in pixels. Default = 100. + + Returns + ------- + cutout : astropy.nddata.Cutout2D or None + Cutout centered on (RA, Dec) and rotated so North points up. + Returns None if the target is outside the field. + """ + + coord = SkyCoord(ra=ra * u.deg, dec=dec * u.deg) + + with fits.open(fname, fsspec_kwargs={"anon": True}, memmap=False) as hdu: + img = hdu[1].data + header = hdu[1].header + wcs = WCS(header) + + try: + # Step 1: extract cutout centered on target using the native WCS + cutout = Cutout2D(img, coord, size, wcs=wcs, mode="partial") + except Exception as e: + print(f"Error creating cutout for {fname}: {e}") + return None + + # Step 2: determine rotation angle to place North at the top. + # Invert the CD matrix to find the North direction in pixel space. + # This handles chip orientation automatically without needing SCA_NUM logic. + det = header['CD1_1'] * header['CD2_2'] - header['CD1_2'] * header['CD2_1'] + north_col = -header['CD1_2'] / det + north_row = header['CD1_1'] / det + angle = (-np.degrees(np.arctan2(north_col, north_row))) % 360 + + # Step 3: rotate only the small cutout (not the full image) to align North up + cutout.data = rotate(cutout.data, angle=angle, reshape=False, cval=np.nan) + + return cutout + +``` + +```{code-cell} ipython3 +--- +jupyter: + source_hidden: true +--- +def cutout_gallery(image_filenames, mjd_list, ra, dec, aperture_radius_pix_list, size=100, ncols=4, + galaxy_id=None): + """ + Display a gallery of cutouts centered on (RA, Dec) for a list of Roman TDS images. + + Parameters + ---------- + image_filenames : list of str + List of S3 image filenames. + mjd_list : list-like + Observation MJD values aligned with `image_filenames`. + ra, dec : float + Target coordinates in degrees. + aperture_radius_pix_list : list of float + Aperture radius values in pixels, aligned with `image_filenames`. + size : int or float, optional + Cutout size in pixels. Default = 100. + ncols : int, optional + Number of columns in the gallery grid. Default = 4. + galaxy_id : int or str, optional + Galaxy ID for labeling the plot. + + Returns + ------- + fig : matplotlib.figure.Figure + The displayed figure object. + """ + # Initialize lists to store image cutouts, observation times, and aligned pixel radii. + cutouts, mjds, radii_pix = [], [], [] + + # Loop over all image filenames and generate cutouts and only keep MJDs and pixel radii for those with valid cutouts + for fname, mjd, radius_pix in zip(image_filenames, mjd_list, aperture_radius_pix_list): + cutout = make_cutout(fname, ra, dec, size=size) + if cutout is not None: + cutouts.append(cutout) + mjds.append(mjd) + radii_pix.append(radius_pix) + + # Stop if no valid cutouts were created + if not cutouts: + raise ValueError("No valid cutouts could be created.") + + # Set up grid dimensions for displaying the gallery + n_images = len(cutouts) + nrows = int(np.ceil(n_images / ncols)) + fig, axes = plt.subplots(nrows, ncols, figsize=(3 * ncols, 3 * nrows)) + axes = np.atleast_1d(axes).ravel() + + # Build figure title if information is available + if galaxy_id is not None: + fig.suptitle( + f"Cutouts of TDE host galaxy {galaxy_id}", + fontsize=14, y=0.98 + ) + elif galaxy_id is not None: + fig.suptitle( + f"Cutouts for host galaxy {galaxy_id}", + fontsize=14, y=0.98 + ) + + # Display each cutout image with contrast scaling and MJD label + for ax, cutout, mjd, radius_pix in zip(axes, cutouts, mjds, radii_pix): + img = cutout.data + vmin, vmax = np.nanpercentile(img, [5, 99]) + ax.imshow(img, origin="lower", cmap="gray", vmin=vmin, vmax=vmax) + # Draw the aperture circle at the image center. + # The galaxy is centered by Cutout2D and stays centered after the North-up rotation. + x_center = img.shape[1] / 2.0 + y_center = img.shape[0] / 2.0 + if np.isfinite(radius_pix) and radius_pix > 0: + aperture_circle = plt.Circle((x_center, y_center), radius_pix, + edgecolor="cyan", facecolor="none", linewidth=1.3) + ax.add_patch(aperture_circle) + + ax.set_title(f"MJD {mjd:.2f}", fontsize=8) + ax.axis("off") + + # Hide any extra axes + for ax in axes[len(cutouts):]: + ax.axis("off") + + plt.tight_layout() + plt.show() + return +``` + +```{code-cell} ipython3 +# make cutout gallery of the host galaxy + +# Number of cutout images to display. Increase to show more epochs. +n_cutout_images = 6 + +single_gal = host_galaxy.iloc[0] + +selected_filenames = single_gal[f"image_filenames_{bands[0]}"][:n_cutout_images] +selected_mjds = single_gal[f"mjd_obs_{bands[0]}"][:n_cutout_images] +selected_radius_pix = single_gal["aperture_radius_pix"][:n_cutout_images] + +cutout_gallery( + image_filenames=selected_filenames, + mjd_list=selected_mjds, + ra=single_gal["ra"], + dec=single_gal["dec"], + aperture_radius_pix_list=selected_radius_pix, + size=100, + ncols=3, + galaxy_id=single_gal["galaxy_id"], +) + +# You may get a `FITSFixedWarning` this is completely harmless and +# just means there is an extra space in the DATE-OBS keyword that astropy is fixing. +``` + +Each cutout is extracted from the larger detector image and then rotated to place North up. The blank regions in the corners are areas outside the original square cutout that become empty after rotation. + +## Acknowledgements + +- [IPAC-IRSA](https://irsa.ipac.caltech.edu/) +- This work made use of Astropy:\footnote{http://www.astropy.org} a community-developed core Python package and an ecosystem of tools and resources for astronomy. +- This research made use of Photutils, an Astropy package for +detection and photometry of astronomical sources (Bradley et al. +<2025>). + +## About this notebook + +**Authors:** Jessica Krick, Jaladh Singhal, Brigitta Sipőcz + +**Updated:** 2026-04-02 + +**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions +or problems. + +**Runtime:** As of the date above, this notebook takes about 250s to run to completion on +a machine with 8GB RAM and 4 CPU. + + +**AI Acknowledgement:** + +This tutorial was developed with the assistance of AI tools + +**References:** + +- Bradley et al., 2025; https://zenodo.org/records/19636730 + +- [Robitaille et al., 2013](https://www.aanda.org/articles/aa/full_html/2013/10/aa22068-13/aa22068-13.html) + +- [Astropy Collaboration et al., 2018](https://arxiv.org/abs/1801.02634) + +- [Astropy Collaboration et al., 2022](https://arxiv.org/abs/2206.14220) + +- [Virtanen et al., 2020](https://www.nature.com/articles/s41592-019-0686-2); DOI: 10.1038/s41592-019-0686-2. + +- [OpenUniverse et al., 2025](https://arxiv.org/abs/2501.05632) diff --git a/tutorials/simulated-data/OpenUniverse2024/openuniverse2024_quickstart.md b/tutorials/simulated-data/OpenUniverse2024/openuniverse2024_quickstart.md new file mode 100644 index 00000000..337fefd3 --- /dev/null +++ b/tutorials/simulated-data/OpenUniverse2024/openuniverse2024_quickstart.md @@ -0,0 +1,388 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.19.1 +kernelspec: + name: python3 + display_name: Python 3 (ipykernel) + language: python +authors: + - name: Jessica Krick + - name: Jaladh Singhal + - name: Brigitta Sipőcz +--- + +# Quickstart: Accessing OpenUniverse2024 Data + +## Learning Goals + +By the end of this tutorial, you will be able to: + +- Browse the OpenUniverse2024 data directories on S3 +- Explore the structure of Roman and Rubin FITS image files +- Read the OpenUniverse2024 parquet catalogs (transient, galaxy, and galaxy flux) +- Query Roman and Rubin images covering a sky position using the IRSA SIA service + +## Introduction + +The [OpenUniverse2024](https://arxiv.org/abs/2501.05632) simulation suite delivers ~70 deg² of matched optical/infrared imagery for both the LSST Wide-Fast-Deep (WFD) and the Nancy Grace Roman Space Telescope high-latitude survey, producing roughly 400 TB of publicly available synthetic imaging and catalogs. All data are stored in the cloud (AWS S3) and can be accessed anonymously without any credentials. + +This tutorial is a focused introduction to data access only. It covers the three main categories: + +1. **Directory structure for FITS images** — Roman and Rubin simulated science images stored in S3 +2. **Parquet catalogs** — transient (SNANA), galaxy, and galaxy-flux tables, indexed by HEALPix sky region +3. **Image search via SIA** — querying which images cover a given sky position using astroquery and the IRSA Simple Image Access service + +No astrophysical analysis is performed here. For science workflows that build on these access patterns, see the [TDE Light Curve](TDE_light_curve) and [SED Fitting](SED_fit) tutorials in this repository. + +### Instructions + +This notebook is designed to be run sequentially from top to bottom. All code is self-contained and relies on publicly accessible data. + +### Input + +- OpenUniverse2024 Roman and Rubin images and catalogs on AWS S3 (`s3://nasa-irsa-simulations/`) + +### Output + +- A gallery of example Roman FITS images +- Summary of parquet catalog structure and contents +- A table of image files overlapping a chosen sky position + +## Imports + +```{code-cell} ipython3 +# Uncomment the next line to install dependencies if needed. +# !pip install numpy astropy s3fs photutils matplotlib pyarrow hpgeom astroquery +``` + +```{code-cell} ipython3 +import numpy as np +import s3fs +from matplotlib import pyplot as plt +import pyarrow.fs +import pyarrow.parquet as pq +import hpgeom +import json +from astroquery.ipac.irsa import Irsa +from astropy.coordinates import SkyCoord +from astropy import units as u +from astropy.io import fits +``` + +## 1. Explore Directory Structure for FITS images + +The OpenUniverse2024 data live on the cloud in a public AWS S3 bucket and can be accessed anonymously using `s3fs`. This section shows how to establish that connection, navigate the directory tree, and inspect the contents of a FITS image file. + +In the path below, `simple_model` refers to the simulated images with noise and realistic instrument effects, as opposed to `truth` images which are noise-free. The `full` simulation covers the complete survey footprint; a smaller `preview` subset is also available. See the [OpenUniverse2024 paper](https://arxiv.org/abs/2501.05632) for details on the differences. A `pointing` is a unique Roman observation visit — each pointing corresponds to one placement of the 18-detector focal plane on the sky, producing up to 18 individual FITS files (one per detector). + +```{code-cell} ipython3 +# Create an anonymous (public read-only) connection to the NASA IRSA S3 bucket. +s3 = s3fs.S3FileSystem(anon=True) + +# Top-level path components +BUCKET_NAME = "nasa-irsa-simulations" +OU_PREFIX = "openuniverse2024" +ROMAN_TDS_PREFIX = "roman/full/RomanTDS/images/simple_model" + +# Pick one band to explore +BAND = "J129" +band_directory = f"{BUCKET_NAME}/{OU_PREFIX}/{ROMAN_TDS_PREFIX}/{BAND}" +``` + +The pointings available for a given band can be listed by calling `s3.ls` on the band directory. + +```{code-cell} ipython3 +# List all pointings available for the chosen band +all_pointings = [p.split("/")[-1] for p in s3.ls(band_directory)] +print(f"Found {len(all_pointings)} pointings in band {BAND}:") +print(all_pointings[:10], "...") +``` + +We pick one of these pointings to explore further. + +```{code-cell} ipython3 +# Select one pointing and list the files it contains +POINTING = "10190" +image_directory = f"{band_directory}/{POINTING}" + +files = [f"s3://{f}" for f in s3.ls(image_directory)] +print(f"Found {len(files)} files in pointing {POINTING}") +``` + +```{code-cell} ipython3 +# Open one FITS file and inspect its extensions +fname = files[0] +with fits.open(fname, use_fsspec=True, fsspec_kwargs={"anon": True}, memmap=False) as hdul: + print(f"File: {fname}") + print(f"Number of extensions: {len(hdul)}\n") + hdul.info() +``` + +Each Roman TDS FITS file contains four extensions: a `primary` header with no data, followed by three 4088×4088 pixel planes — `SCI` (science image), `ERR` (per-pixel uncertainty), and `DQ` (data quality mask). + ++++ + +Let's display a gallery of example images to get a sense of the data. +Note this gallery can take about a minute to build. + +```{code-cell} ipython3 +def show_gallery(files, max_images=9): + """ + Display a gallery of FITS images. + + Parameters + ---------- + files : list of str + List of S3 URIs to FITS files. + max_images : int, optional + Maximum number of images to display (default: 9). + """ + n_images = min(len(files), max_images) + ncols = n_images if n_images < 4 else 3 + nrows = (n_images + ncols - 1) // ncols + + fig, axes = plt.subplots(nrows, ncols, figsize=(4 * ncols, 4 * nrows)) + axes = np.atleast_1d(axes).ravel() + + for i, f in enumerate(files[:n_images]): + with fits.open(f, fsspec_kwargs={"anon": True}, memmap=False) as hdul: + data = hdul[1].data + vmin, vmax = np.nanpercentile(data, [5, 99]) + axes[i].imshow(data, origin="lower", cmap="gray", vmin=vmin, vmax=vmax) + axes[i].set_title(f.split("/")[-1], fontsize=8) + axes[i].axis("off") + + for j in range(i + 1, len(axes)): + axes[j].axis("off") + + plt.tight_layout() + plt.show() +``` + +```{code-cell} ipython3 +# Display up to 3 images from the selected directory. +show_gallery(files, max_images=3) +``` + +## 2. Access the Parquet Catalogs + +The OpenUniverse2024 catalogs are stored as [Apache Parquet](https://parquet.apache.org/) files, partitioned by [HEALPix](https://healpix.sourceforge.io/) sky region (nside=32, RING ordering). Each region has three file types: + +1. `snana_{region}.parquet` — one row per simulated transient event (supernovae, TDEs, etc.), with event type (`model_name`) and host galaxy ID (`host_id`) +2. `galaxy_{region}.parquet` — host galaxy positions and physical properties +3. `galaxy_flux_{region}.parquet` — multi-band Roman and Rubin photometry for each galaxy + +We first look for the correct catalog for the center of the Roman Time-Domain Survey(TDS). The `region` number in each filename is the HEALPix pixel index. Because we know that the catalogs were built with nside=32 and RING ordering, we can convert sky coordinates to a region index using [`hpgeom`](https://hpgeom.readthedocs.io/en/latest/). + +```{code-cell} ipython3 +# The Roman Time-Domain Survey is centered near the LSST ELAIS-S1 Deep Drilling Field. +ra = 9.45 +dec = -44.02 + +# Convert sky coordinates to a HEALPix region index (nside=32, RING ordering) +nside = 32 +region = hpgeom.angle_to_pixel(nside, ra, dec, lonlat=True, nest=False) +print(f"HEALPix region for RA={ra}, Dec={dec}: {region}") +``` + +```{code-cell} ipython3 +# Build the S3 paths for this region's catalog files +CATALOG_NAME = "roman_rubin_cats_v1.1.2_faint" +catalog_prefix = f"{BUCKET_NAME}/{OU_PREFIX}/roman/full/{CATALOG_NAME}" + +snana_path = f"{catalog_prefix}/snana_{region}.parquet" +galaxy_path = f"{catalog_prefix}/galaxy_{region}.parquet" +gal_flux_path = f"{catalog_prefix}/galaxy_flux_{region}.parquet" + +print("SNANA file: ", snana_path) +print("Galaxy info file: ", galaxy_path) +print("Galaxy flux file: ", gal_flux_path) +``` + +### 2.1 Inspect the SNANA Transient Catalog + +`inspect_parquet_columns()` reads only the Parquet metadata footer to print the row count and column names — no data is loaded into memory. +We use it here for the SNANA catalog and repeat it for the galaxy info and flux catalogs below. + +```{code-cell} ipython3 +def inspect_parquet_files(s3_path): + """ + Print the structure of a Parquet file on S3 without reading its data. + + Reads only the Parquet metadata footer (row count, column names and types), + which is fast regardless of file size. + + Parameters + ---------- + s3_path : str + S3 path to the Parquet file (without the s3:// prefix). + """ + fs = pyarrow.fs.S3FileSystem(anonymous=True) + meta = pq.read_metadata(s3_path, filesystem=fs) + schema = pq.read_schema(s3_path, filesystem=fs) + + print(f"Rows: {meta.num_rows} | Columns: {len(schema.names)}") + print("\nColumn names:") + for name in schema.names: + print(" ", name) +``` + +```{code-cell} ipython3 +inspect_parquet_files(snana_path) +``` + +```{code-cell} ipython3 +# Read just the model_name column to see what transient types are in this region +fs = pyarrow.fs.S3FileSystem(anonymous=True) +model_names = pq.read_table(snana_path, filesystem=fs, columns=["model_name"]).to_pandas() +model_names["model_name"].unique() +``` + +### 2.2 Inspect the Galaxy Info Catalog + +```{code-cell} ipython3 +inspect_parquet_files(galaxy_path) +``` + +### 2.3 Inspect the Galaxy Flux Catalog + +```{code-cell} ipython3 +inspect_parquet_files(gal_flux_path) +``` + +### 2.4 Join Transient Events to Their Host Galaxies + +A common operation is to take a transient from the SNANA file and look up its host galaxy's sky position from the galaxy info file. +The two files share a common key: `host_id` in the SNANA file corresponds to `galaxy_id` in the galaxy info file. +We read the full SNANA catalog here, then use a filter to fetch only the matching row from the galaxy file without loading the entire galaxy catalog. + +Note: The next cell takes ~45s to run + +```{code-cell} ipython3 +fs = pyarrow.fs.S3FileSystem(anonymous=True) + +# Read the full SNANA catalog for this region +df_snana = pq.read_table(snana_path, filesystem=fs).to_pandas() + +# Pick one transient — here we grab the first row as an example +example_transient = df_snana.iloc[0] +print("Example transient:") +print(example_transient[["model_name", "host_id", "start_mjd", "end_mjd"]]) + +# Look up its host galaxy by matching host_id (SNANA) to galaxy_id (galaxy info file) +host = pq.read_table( + galaxy_path, + filesystem=fs, + filters=[("galaxy_id", "==", example_transient["host_id"])] +).to_pandas() + +print("\nHost galaxy info:") +host +``` + +## 3. Image Search + +Given a sky position (e.g., the host galaxy coordinates from Section 2), we can search for all Roman or Rubin images that cover that position using the IRSA Simple Image Access (SIA) service via `astroquery`. +First we set up the connection to the SIA service and list the available catalogs, then we query by position to get a table of matching images, and finally we extract the cloud locations (S3 URIs) so the files can be opened directly. + +```{code-cell} ipython3 +# Point the astroquery IRSA client to the correct locations. +Irsa.sia_url = "https://irsa.ipac.caltech.edu/simulated/SIA" +Irsa.tap_url = "https://irsa.ipac.caltech.edu/simulated/TAP" + +# List all available simulated image collections +Irsa.list_collections(servicetype='SIA') +``` + +```{code-cell} ipython3 +# Collection names for OpenUniverse2024 +OU_ROMAN_SIA_COLLECTION = 'simulated_roman_openuniverse2024' +OU_RUBIN_SIA_COLLECTION = 'simulated_rubin_openuniverse2024' +``` + +```{code-cell} ipython3 +def get_s3_fpath(cloud_access): + """Extract the S3 URI from the cloud_access JSON string in an SIA result.""" + cloud_info = json.loads(cloud_access) + bucket_name = cloud_info['aws']['bucket_name'] + key = cloud_info['aws']['key'] + return f's3://{bucket_name}/{key}' +``` + +```{code-cell} ipython3 +# Use the host galaxy position from Section 2 (or set any RA/Dec you want to query). +host_ra = float(host.iloc[0]["ra"]) +host_dec = float(host.iloc[0]["dec"]) +search_radius = 1 * u.arcsec # small radius: we just need images that contain this point + +#convert ra, dec to SkyCoords for ease of use +coords = SkyCoord(host_ra, host_dec, unit='deg') + +# Query Roman TDS images in the J129 band +sia_results = Irsa.query_sia(pos=(coords, search_radius.to(u.deg)), + collection=OU_ROMAN_SIA_COLLECTION) + +# We first choose to look at the J129 band and the simple_model images +bandname = "J129" +roman_images = sia_results[ + ['TDS_simple_model' in r['obs_id'] and bandname in r['energy_bandpassname'] + for r in sia_results] +] +roman_images['s3_uri'] = [get_s3_fpath(r['cloud_access']) for r in roman_images] + +print(f"Found {len(roman_images)} Roman {bandname} images at RA={host_ra:.4f}, Dec={host_dec:.4f}") +roman_images['obs_id', 't_min', 't_max', 's3_uri'] +``` + +```{code-cell} ipython3 +# The same search works for Rubin images — just swap the collection name and band filter. +# Unlike the Roman collection, the Rubin collection contains only one image type (calexp), +# so no obs_id filter is needed beyond selecting the desired band. +rubin_band = "r" +rubin_results = Irsa.query_sia(pos=(coords, search_radius.to(u.deg)), + collection=OU_RUBIN_SIA_COLLECTION) + +rubin_images = rubin_results[ + [rubin_band in r['energy_bandpassname'] for r in rubin_results] +] +rubin_images['s3_uri'] = [get_s3_fpath(r['cloud_access']) for r in rubin_images] + +print(f"Found {len(rubin_images)} Rubin {rubin_band}-band images at RA={host_ra:.4f}, Dec={host_dec:.4f}") +rubin_images['obs_id', 't_min', 't_max', 's3_uri'] +``` + +You now have S3 URIs for all Roman and Rubin images covering your target position. To open any of these images, pass the URI to `astropy.io.fits.open` with `fsspec_kwargs={"anon": True}` as shown in Section 1. + +## Acknowledgements + +- [IPAC-IRSA](https://irsa.ipac.caltech.edu/) +- This work made use of Astropy:\footnote{http://www.astropy.org} a community-developed core Python package and an ecosystem of tools and resources for astronomy. + +## About this notebook + +**Authors:** Jessica Krick, Jaladh Singhal, Brigitta Sipőcz + +**Updated:** 2026-04-22 + +**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions +or problems. + +**Runtime:** As of the date above, this notebook takes about 2 minutes to run to completion on a machine with 8GB RAM and 4 CPU. + +**AI Acknowledgement:** + +This tutorial was developed with the assistance of AI tools + +**References:** + +- [Robitaille et al., 2013](https://www.aanda.org/articles/aa/full_html/2013/10/aa22068-13/aa22068-13.html) + +- [Astropy Collaboration et al., 2018](https://arxiv.org/abs/1801.02634) + +- [Astropy Collaboration et al., 2022](https://arxiv.org/abs/2206.14220) + +- [OpenUniverse et al., 2025](https://arxiv.org/abs/2501.05632) diff --git a/tutorials/simulated-data/openuniverse2024_roman_simulated_timedomainsurvey.md b/tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_timedomainsurvey.md similarity index 100% rename from tutorials/simulated-data/openuniverse2024_roman_simulated_timedomainsurvey.md rename to tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_timedomainsurvey.md diff --git a/tutorials/simulated-data/openuniverse2024_roman_simulated_wideareasurvey.md b/tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_wideareasurvey.md similarity index 100% rename from tutorials/simulated-data/openuniverse2024_roman_simulated_wideareasurvey.md rename to tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_wideareasurvey.md diff --git a/tutorials/simulated-data/simulated.md b/tutorials/simulated-data/simulated.md index 9e8e4c3f..ce53bd70 100644 --- a/tutorials/simulated-data/simulated.md +++ b/tutorials/simulated-data/simulated.md @@ -7,8 +7,11 @@ These tutorials are designed to help users get started with accessing, visualizi ```{notebook-gallery} notebook_metadata.yml tutorials/simulated-data/roman_hlss_number_density.md -tutorials/simulated-data/openuniverse2024_roman_simulated_wideareasurvey.md -tutorials/simulated-data/OpenUniverse2024Preview_Firefly.md -tutorials/simulated-data/openuniverse2024_roman_simulated_timedomainsurvey.md tutorials/simulated-data/cosmoDC2_TAP_access.md +tutorials/simulated-data/OpenUniverse2024/openuniverse2024_quickstart.md +tutorials/simulated-data/OpenUniverse2024/OpenUniverse2024Preview_Firefly.md +tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_wideareasurvey.md +tutorials/simulated-data/OpenUniverse2024/TDE_light_curve.md +tutorials/simulated-data/OpenUniverse2024/SED_fit.md +tutorials/simulated-data/OpenUniverse2024/openuniverse2024_roman_simulated_timedomainsurvey.md ```