Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ project:
- file: tutorials/euclid_access/2_Euclid_intro_MER_catalog.md
- file: tutorials/euclid_access/4_Euclid_intro_PHZ_catalog.md
- file: tutorials/euclid_access/5_Euclid_intro_SPE_catalog.md
- file: tutorials/parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md
- title: Merged Objects HATS Catalog
children:
- pattern: tutorials/parquet-catalog-demos/euclid-q1-hats/*-euclid-q1-hats-*.md
- file: tutorials/cloud_access/euclid-cloud-access.md
- file: tutorials/euclid_access/Euclid_ERO.md
- title: WISE
Expand Down
4 changes: 3 additions & 1 deletion tutorials/euclid_access/euclid.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ Data products include MERged mosaics of calibrated and stacked frames; combined
- [PHZ Catalogs](4_Euclid_intro_PHZ_catalog.md) — Join the PHZ and MER catalogs and do a box search for galaxies with quality redshifts, load a MER mosaic cutout of the box, and plot the cutout with the catalog results overlaid.
Then plot the SIR spectrum of the brightest galaxy and look at a MER mosaic cutout of the galaxy in Firefly.
- [SPE Catalogs](5_Euclid_intro_SPE_catalog.md) — Join the SPE and MER catalogs and query for galaxies with H-alpha line detections, then plot the SIR spectrum of a galaxy with a high SNR H-alpha line measurement.
- [Merged Objects HATS Catalog](../parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md) — Understand the content and format of the Euclid Q1 Merged Objects HATS Catalog, then perform a basic query.
- **Merged Objects HATS Catalog** — This product was created by IRSA and contains the Euclid MER, PHZ, and SPE catalogs in a single [HATS](https://hats.readthedocs.io/en/latest/) catalog.
- [Introduction](../parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md) — Understand the content and format of the Euclid Q1 Merged Objects HATS Catalog, then perform a basic query.
- [Magnitudes](../parquet-catalog-demos/euclid-q1-hats/4-euclid-q1-hats-magnitudes.md) — Review the types of flux measurements available, load template-fit and aperture magnitudes, and plot distributions and comparisons for different object types.

## Special Topics

Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
---
short_title: "Merged Objects HATS Catalog"
short_title: Introduction
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.18.1
root_level_metadata_filter: -short_title
kernelspec:
display_name: Python 3 (ipykernel)
language: python
Expand All @@ -18,16 +19,17 @@ kernelspec:

This tutorial is an introduction to the content and format of the Euclid Q1 Merged Objects HATS Catalog.
Later tutorials in this series will show how to load quality samples.
See [Euclid Tutorial Notebooks: Catalogs](../../euclid_access/euclid.md#catalogs) for a list of tutorials in this series.

+++

## Learning Goals

In this tutorial, we will:

- Learn about the Euclid Merged Objects catalog that IRSA created by combining information from multiple Euclid Quick Release 1 catalogs
- Learn about the Euclid Merged Objects catalog that IRSA created by combining information from multiple Euclid Quick Release 1 (Q1) catalogs.
- Find columns of interest.
- Perform a basic spatial query in each of the Euclid Deep Fields using the Python library PyArrow.
- Perform a basic query using the Python library PyArrow.

+++

Expand All @@ -51,12 +53,12 @@ Access is free and no credentials are required.

## 2. Imports

```{code-cell} python3
```{code-cell} ipython3
# # Uncomment the next line to install dependencies if needed.
# %pip install hpgeom pandas pyarrow
```

```{code-cell} python3
```{code-cell} ipython3
import hpgeom # Find HEALPix indexes from RA and Dec
import pyarrow.compute as pc # Filter the catalog
import pyarrow.dataset # Load the catalog
Expand All @@ -70,7 +72,7 @@ First we'll load the Parquet schema (column information) of the Merged Objects c
The Parquet schema is accessible from a few locations, all of which include the column names and types.
Here, we load it from the `_common_metadata` file because it also includes the column units and descriptions.

```{code-cell} python3
```{code-cell} ipython3
# AWS S3 paths.
s3_bucket = "nasa-irsa-euclid-q1"
dataset_prefix = "contributed/q1/merged_objects/hats/euclid_q1_merged_objects-hats/dataset"
Expand All @@ -82,7 +84,7 @@ schema_path = f"{dataset_path}/_common_metadata"
s3 = pyarrow.fs.S3FileSystem(anonymous=True)
```

```{code-cell} python3
```{code-cell} ipython3
# Load the Parquet schema.
schema = pyarrow.parquet.read_schema(schema_path, filesystem=s3)

Expand Down Expand Up @@ -136,7 +138,7 @@ The tables are:

Find all columns from these tables in the Parquet schema:

```{code-cell} python3
```{code-cell} ipython3
mer_prefixes = ["mer_", "morph_", "cutouts_"]
mer_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in mer_prefixes}

Expand Down Expand Up @@ -193,7 +195,7 @@ The tables are:

Find all columns from these tables in the Parquet schema:

```{code-cell} python3
```{code-cell} ipython3
phz_prefixes = ["phz_", "class_", "physparam_", "galaxysed_", "physparamqso_",
"starclass_", "starsed_", "physparamnir_"]
phz_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in phz_prefixes}
Expand Down Expand Up @@ -240,7 +242,7 @@ The tables are:

Find all columns from these tables in the Parquet schema:

```{code-cell} python3
```{code-cell} ipython3
spe_prefixes = ["z_", "lines_", "models_"]
spe_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in spe_prefixes}

Expand Down Expand Up @@ -272,7 +274,7 @@ They are useful for spatial queries, as demonstrated in the Euclid Deep Fields s

The HEALPix, Euclid object ID, and Euclid tile ID columns appear first:

```{code-cell} python3
```{code-cell} ipython3
schema.names[:5]
```

Expand All @@ -288,7 +290,7 @@ However, PyArrow automatically makes them available as regular columns when the

The HATS columns appear at the end:

```{code-cell} python3
```{code-cell} ipython3
schema.names[-3:]
```

Expand All @@ -297,12 +299,12 @@ schema.names[-3:]
The subsections above show how to find all columns from a given Euclid table as well as the additional columns.
Here we show some additional techniques for finding columns.

```{code-cell} python3
```{code-cell} ipython3
# Access the data type using the `field` method.
schema.field("mer_flux_y_2fwhm_aper")
```

```{code-cell} python3
```{code-cell} ipython3
# The column metadata includes unit and description.
# Parquet metadata is always stored as bytestrings, which are denoted by a leading 'b'.
schema.field("mer_flux_y_2fwhm_aper").metadata
Expand All @@ -311,7 +313,7 @@ schema.field("mer_flux_y_2fwhm_aper").metadata
Euclid Q1 offers many flux measurements, both from Euclid detections and from external ground-based surveys.
They are given in microjanskys, so all flux columns can be found by searching the metadata for this unit.

```{code-cell} python3
```{code-cell} ipython3
# Find all flux columns.
flux_columns = [field.name for field in schema if field.metadata[b"unit"] == b"uJy"]

Expand All @@ -321,7 +323,7 @@ flux_columns[:4]

Columns associated with external surveys are identified by the inclusion of "ext" in the name.

```{code-cell} python3
```{code-cell} ipython3
external_flux_columns = [name for name in flux_columns if "ext" in name]
print(f"{len(external_flux_columns)} flux columns from external surveys. First four are:")
external_flux_columns[:4]
Expand All @@ -332,14 +334,14 @@ external_flux_columns[:4]
+++

Euclid Q1 includes data from three Euclid Deep Fields: EDF-N (North), EDF-S (South), EDF-F (Fornax; also in the southern hemisphere).
There is also a small amount of data from a fourth field: LDN1641 (Lynds' Dark Nebula 1641), which was observed for technical reasons during Euclid's verification phase and mostly ignored here.
There is also a small amount of data from a fourth field: LDN1641 (Lynds' Dark Nebula 1641), which was observed for technical reasons during Euclid's verification phase.
The fields are described in [Euclid Collaboration: Aussel et al., 2025](https://arxiv.org/pdf/2503.15302) and can be seen on this [skymap](https://irsa.ipac.caltech.edu/data/download/parquet/euclid/q1/merged_objects/hats/euclid_q1_merged_objects-hats/skymap.png).

The regions are well separated, so we can distinguish them using a simple cone search without having to be too picky about the radius.
We can load data more efficiently using the HEALPix order 9 pixels that cover each area rather than using RA and Dec values directly.
These will be used in later tutorials.

```{code-cell} python3
```{code-cell} ipython3
# EDF-N (Euclid Deep Field - North)
ra, dec, radius = 269.733, 66.018, 4 # 20 sq deg
edfn_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, inclusive=True)
Expand All @@ -360,9 +362,10 @@ ldn_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, i
## 6. Basic Query

To demonstrate a basic query, we'll search for objects with a galaxy photometric redshift estimate of 6.0 (largest possible).
Other tutorials in this series will show more complex queries and describe the redshifts and other data in more detail.
Other tutorials in this series will show more complex queries, and describe the redshifts and other data in more detail.
PyArrow dataset filters are described at [Filtering by Expressions](https://arrow.apache.org/docs/python/compute.html#filtering-by-expressions), and the list of available functions is at [Compute Functions](https://arrow.apache.org/docs/python/api/compute.html).

```{code-cell} python3
```{code-cell} ipython3
dataset = pyarrow.dataset.dataset(dataset_path, partitioning="hive", filesystem=s3, schema=schema)

highz_objects = dataset.to_table(
Expand All @@ -375,6 +378,6 @@ highz_objects

**Authors:** Troy Raen, Vandana Desai, Andreas Faisst, Shoubaneh Hemmati, Jaladh Singhal, Brigitta Sipőcz, Jessica Krick, the IRSA Data Science Team, and the Euclid NASA Science Center at IPAC (ENSCI).

**Updated:** 2025-12-22
**Updated:** 2025-12-23

**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html)
Loading