Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
## [Unreleased]

### Added
- Added configuration-based support for extending Elasticsearch/OpenSearch index mappings via environment variables, allowing users to customize field mappings without code change through `STAC_FASTAPI_ES_CUSTOM_MAPPINGS` environment variable. Also added `STAC_FASTAPI_ES_DYNAMIC_MAPPING` variable to control dynamic mapping behavior. [#546](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/546)

### Changed

Expand Down
153 changes: 153 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
- [Ingesting Sample Data CLI Tool](#ingesting-sample-data-cli-tool)
- [Redis for navigation](#redis-for-navigation)
- [Elasticsearch Mappings](#elasticsearch-mappings)
- [Custom Index Mappings](#custom-index-mappings)
- [Managing Elasticsearch Indices](#managing-elasticsearch-indices)
- [Snapshots](#snapshots)
- [Reindexing](#reindexing)
Expand Down Expand Up @@ -369,6 +370,8 @@ You can customize additional settings in your `.env` file:
| `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional |
| `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional |
| `EXCLUDED_FROM_ITEMS` | Specifies fields to exclude from STAC item responses. Supports comma-separated field names and dot notation for nested fields (e.g., `private_data,properties.confidential,assets.internal`). | `None` | Optional |
| `STAC_FASTAPI_ES_CUSTOM_MAPPINGS` | JSON string of custom Elasticsearch/OpenSearch property mappings to merge with defaults. See [Custom Index Mappings](#custom-index-mappings). | `None` | Optional |
| `STAC_FASTAPI_ES_DYNAMIC_MAPPING` | Controls dynamic mapping behavior for item indices. Values: `true` (default), `false`, or `strict`. See [Custom Index Mappings](#custom-index-mappings). | `true` | Optional |


> [!NOTE]
Expand Down Expand Up @@ -693,6 +696,156 @@ pip install stac-fastapi-elasticsearch[redis]
- The `sfeos_helpers` package contains shared mapping definitions used by both Elasticsearch and OpenSearch backends
- **Customization**: Custom mappings can be defined by extending the base mapping templates.

## Custom Index Mappings

SFEOS provides environment variables to customize Elasticsearch/OpenSearch index mappings without modifying source code. This is useful for:

- Adding STAC extension fields (SAR, Cube, etc.) with proper types
- Optimizing performance by controlling which fields are indexed
- Ensuring correct field types instead of relying on dynamic mapping inference

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `STAC_FASTAPI_ES_CUSTOM_MAPPINGS` | JSON string of property mappings to merge with defaults | None |
| `STAC_FASTAPI_ES_DYNAMIC_MAPPING` | Controls dynamic mapping: `true`, `false`, or `strict` | `true` |

### Custom Mappings (`STAC_FASTAPI_ES_CUSTOM_MAPPINGS`)

Accepts a JSON string with the same structure as the default ES mappings. The custom mappings are **recursively merged** with the defaults at the root level.

#### Merge Behavior

The merge follows these rules:

| Scenario | Result |
|----------|--------|
| Key only in defaults | Preserved |
| Key only in custom | Added |
| Key in both, both are dicts | Recursively merged |
| Key in both, values are not both dicts | **Custom overwrites default** |

**Example - Adding new properties (merged):**

```json
// Default has: {"geometry": {"type": "geo_shape"}}
// Custom has: {"geometry": {"ignore_malformed": true}}
// Result: {"geometry": {"type": "geo_shape", "ignore_malformed": true}}
```

**Example - Overriding a value (replaced):**

```json
// Default has: {"properties": {"datetime": {"type": "date_nanos"}}}
// Custom has: {"properties": {"datetime": {"type": "date"}}}
// Result: {"properties": {"datetime": {"type": "date"}}}
```

#### JSON Structure

The custom JSON should mirror the structure of the default mappings. For STAC item properties, the path is `properties.properties.properties`:

```
{
"numeric_detection": false,
"dynamic_templates": [...],
"properties": { # Top-level ES mapping properties
"id": {...},
"geometry": {...},
"properties": { # STAC item "properties" field
"type": "object",
"properties": { # Nested properties within STAC properties
"datetime": {...},
"sar:frequency_band": {...} # <-- Custom extension fields go here
}
}
}
}
```

**Example - Adding SAR Extension Fields:**

```bash
export STAC_FASTAPI_ES_CUSTOM_MAPPINGS='{
"properties": {
"properties": {
"properties": {
"sar:frequency_band": {"type": "keyword"},
"sar:center_frequency": {"type": "float"},
"sar:polarizations": {"type": "keyword"},
"sar:product_type": {"type": "keyword"}
}
}
}
}'
```

**Example - Adding Cube Extension Fields:**

```bash
export STAC_FASTAPI_ES_CUSTOM_MAPPINGS='{
"properties": {
"properties": {
"properties": {
"cube:dimensions": {"type": "object", "enabled": false},
"cube:variables": {"type": "object", "enabled": false}
}
}
}
}'
```

**Example - Adding geometry options:**

```bash
export STAC_FASTAPI_ES_CUSTOM_MAPPINGS='{
"properties": {
"geometry": {"ignore_malformed": true}
}
}'
```

### Dynamic Mapping Control (`STAC_FASTAPI_ES_DYNAMIC_MAPPING`)

Controls how Elasticsearch/OpenSearch handles fields not defined in the mapping:

| Value | Behavior |
|-------|----------|
| `true` (default) | New fields are automatically added to the mapping. Maintains backward compatibility. |
| `false` | New fields are ignored and not indexed. Documents can still contain these fields, but they won't be searchable. |
| `strict` | Documents with unmapped fields are rejected. |

### Combining Both Variables for Performance Optimization

For large datasets with extensive metadata that isn't queried, you can disable dynamic mapping and define only the fields you need:

```bash
# Disable dynamic mapping
export STAC_FASTAPI_ES_DYNAMIC_MAPPING=false

# Define only queryable fields
export STAC_FASTAPI_ES_CUSTOM_MAPPINGS='{
"properties": {
"properties": {
"properties": {
"platform": {"type": "keyword"},
"eo:cloud_cover": {"type": "float"},
"view:sun_elevation": {"type": "float"}
}
}
}
}'
```

This prevents Elasticsearch from creating mappings for unused metadata fields, reducing index size and improving ingestion performance.

> [!NOTE]
> These environment variables apply to both Elasticsearch and OpenSearch backends. Changes only affect newly created indices. For existing indices, you'll need to reindex using [SFEOS-tools](https://github.com/Healy-Hyperspatial/sfeos-tools).

> [!WARNING]
> Use caution when overriding core fields like `geometry`, `datetime`, or `id`. Incorrect types may cause search failures or data loss.

## Managing Elasticsearch Indices

### Snapshots
Expand Down
120 changes: 118 additions & 2 deletions stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/mappings.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,123 @@
- Parameter names should be consistent across similar functions
"""

import copy
import json
import logging
import os
from typing import Any, Dict, Literal, Protocol
from typing import Any, Dict, Literal, Optional, Protocol, Union

from stac_fastapi.core.utilities import get_bool_env

logger = logging.getLogger(__name__)


def merge_mappings(base: Dict[str, Any], custom: Dict[str, Any]) -> None:
"""Recursively merge custom mappings into base mappings.

Custom mappings will overwrite base mappings if keys collide.
Nested dictionaries are merged recursively.

Args:
base: The base mapping dictionary to merge into (modified in place).
custom: The custom mapping dictionary to merge from.
"""
for key, value in custom.items():
if key in base and isinstance(base[key], dict) and isinstance(value, dict):
merge_mappings(base[key], value)
else:
base[key] = value


def parse_dynamic_mapping_config(
config_value: Optional[str],
) -> Union[bool, str]:
"""Parse the dynamic mapping configuration value.

Args:
config_value: The configuration value from environment variable.
Can be "true", "false", "strict", or None.

Returns:
True for "true" (default), False for "false", or the string value
for other settings like "strict".
"""
if config_value is None:
return True
config_lower = config_value.lower()
if config_lower == "true":
return True
elif config_lower == "false":
return False
else:
return config_lower


def apply_custom_mappings(
mappings: Dict[str, Any], custom_mappings_json: Optional[str]
) -> None:
"""Apply custom mappings from a JSON string to the mappings dictionary.

The custom mappings JSON should have the same structure as ES_ITEMS_MAPPINGS.
It will be recursively merged at the root level, allowing users to override
any part of the mapping including properties, dynamic_templates, etc.

Args:
mappings: The mappings dictionary to modify (modified in place).
custom_mappings_json: JSON string containing custom mappings.

Raises:
Logs error if JSON parsing or merging fails.
"""
if not custom_mappings_json:
return

try:
custom_mappings = json.loads(custom_mappings_json)
merge_mappings(mappings, custom_mappings)
except json.JSONDecodeError as e:
logger.error(f"Failed to parse STAC_FASTAPI_ES_CUSTOM_MAPPINGS JSON: {e}")
except Exception as e:
logger.error(f"Failed to merge STAC_FASTAPI_ES_CUSTOM_MAPPINGS: {e}")


def get_items_mappings(
dynamic_mapping: Optional[str] = None, custom_mappings: Optional[str] = None
) -> Dict[str, Any]:
"""Get the ES_ITEMS_MAPPINGS with optional dynamic mapping and custom mappings applied.

This function creates a fresh copy of the base mappings and applies the
specified configuration. Useful for testing or programmatic configuration.

Args:
dynamic_mapping: Override for STAC_FASTAPI_ES_DYNAMIC_MAPPING.
If None, reads from environment variable.
custom_mappings: Override for STAC_FASTAPI_ES_CUSTOM_MAPPINGS.
If None, reads from environment variable.

Returns:
A new dictionary containing the configured mappings.
"""
mappings = copy.deepcopy(_BASE_ITEMS_MAPPINGS)

# Apply dynamic mapping configuration
dynamic_config = (
dynamic_mapping
if dynamic_mapping is not None
else os.getenv("STAC_FASTAPI_ES_DYNAMIC_MAPPING", "true")
)
mappings["dynamic"] = parse_dynamic_mapping_config(dynamic_config)

# Apply custom mappings
custom_config = (
custom_mappings
if custom_mappings is not None
else os.getenv("STAC_FASTAPI_ES_CUSTOM_MAPPINGS")
)
apply_custom_mappings(mappings, custom_config)

return mappings


# stac_pydantic classes extend _GeometryBase, which doesn't have a type field,
# So create our own Protocol for typing
Expand Down Expand Up @@ -129,7 +241,8 @@ class Geometry(Protocol): # noqa
},
]

ES_ITEMS_MAPPINGS = {
# Base items mappings without dynamic configuration applied
_BASE_ITEMS_MAPPINGS = {
"numeric_detection": False,
"dynamic_templates": ES_MAPPINGS_DYNAMIC_TEMPLATES,
"properties": {
Expand All @@ -155,6 +268,9 @@ class Geometry(Protocol): # noqa
},
}

# ES_ITEMS_MAPPINGS with environment-based configuration applied at module load time
ES_ITEMS_MAPPINGS = get_items_mappings()

ES_COLLECTIONS_MAPPINGS = {
"numeric_detection": False,
"dynamic_templates": ES_MAPPINGS_DYNAMIC_TEMPLATES,
Expand Down
1 change: 1 addition & 0 deletions stac_fastapi/tests/sfeos_helpers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Tests for sfeos_helpers module."""
Loading