Skip to content

Conversation

@Gomez324
Copy link
Collaborator

@Gomez324 Gomez324 commented Nov 23, 2025

Related Issue(s):

Description:

Until now, only the datetime field had aliases. This change adds aliases for start_datetime and end_datetime when USE_DATETIME=false, which enables optimized filtering when searching by these fields. It improves performance because Elasticsearch/OpenSearch can now route queries to the appropriate indices instead of scanning a larger number of them.

When USE_DATETIME=true, the system works as before with datetime-based aliases only.

Example with use_datetime=false:
Index A with aliases:
{
"start_datetime": "items_start_datetime_new-collection_2020-02-08",
"end_datetime": "items_end_datetime_new-collection_2020-02-16"
}
Index B with aliases:
{
"start_datetime": "items_start_datetime_new-collection_2020-02-12",
"end_datetime": "items_end_datetime_new-collection_2020-02-17"
}
Index C with aliases:
{
"start_datetime": "items_start_datetime_new-collection_2020-02-18",
"end_datetime": "items_end_datetime_new-collection_2020-02-20"
}

When a user searches in the range start_datetime/end_datetime = 2020-02-10 / 2020-02-16, Index A and Index B will be queried because both indices overlap with the requested range. Index C will be excluded because it does not intersect with that time window.

Previously, all indices could have been selected, but the new aliases allow the query engine to efficiently identify which indices overlap with the target range and avoid scanning unrelated ones, such as Index C.

To enable this feature, set USE_DATETIME=false in your configuration. If you want to keep the previous behavior with datetime aliases, set USE_DATETIME=true.

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

@Gomez324 Gomez324 requested a review from jonhealy1 November 23, 2025 04:53
@Gomez324
Copy link
Collaborator Author

Hi @jonhealy1, Will you have time soon to do a code review?

@jonhealy1
Copy link
Collaborator

@Gomez324 I will make time this weekend. Can you fix the conflicts? Thanks

"gte": None,
"lte": datetime_search.get("lte") if not USE_DATETIME else None,
},
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added code complicates the core database logic by tightly coupling it to a specific indexing strategy. Please move this calculation into the IndexSelector (the actual consumer) to keep the core method focused solely on query construction.

"opensearch-py[async]~=2.8.0",
"uvicorn~=0.23.0",
"starlette>=0.35.0,<0.36.0",
"redis==6.4.0",
Copy link
Collaborator

@jonhealy1 jonhealy1 Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis should not be installed in the core package as most Users probably won't use Redis. It can be installed with pip install stac-fastapi-elasticsearch[redis] or with dev


if not datetime_search:
return search, result_metadata

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment on Elasticsearch version of this code.

raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Product datetime is required for indexing",
detail="Product 'start_datetime', 'datetime' and 'end_datetime' is required for indexing",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation logic violates the STAC specification in two ways:

  1. It creates a mandatory requirement for start_datetime and end_datetime, which are optional fields in the spec.

  2. It rejects items where datetime is null (but start/end are present), which is explicitly allowed for interval data.

Please refactor this to handle standard STAC items (single datetime) and interval items (null datetime) correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1 I agree with you. However, if indexes are to be created based on start_datetime, then that field must always be required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we tie this validation to the existing USE_DATETIME setting?

If USE_DATETIME=true (Default): We allow items that only have a datetime field. In these cases, we can derive the index partition name from the datetime field instead of raising a 400 error.

If USE_DATETIME=false: Then strict enforcement of start_datetime is appropriate.

This ensures we support standard STAC items (point-in-time) without forcing users to reconfigure or reformat their data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1

Good idea. I'll need some more time to implement it, but it is doable.

If USE_DATETIME is true, then datetime is required, and the aliases will work as they do now using only datetime, so the migration tool will not be needed? And if it is false, then start_datetime and end_datetime are required, while datetime becomes optional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gomez324 Sounds good! Yes, I think migration scripts would not be needed.

datetime_alias = index_dict.get("datetime")

if not start_datetime_alias:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line effectively makes all existing production indexes invisible to the API. Current indexes do not have start_datetime aliases.

  1. Where is the migration plan to backfill aliases on historical data?

  2. Without a migration, this change breaks backwards compatibility and will return 0 results for existing datasets.

"elasticsearch[async]~=8.19.1",
"uvicorn~=0.23.0",
"starlette>=0.35.0,<0.36.0",
"redis==6.4.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - let's not install redis here. It's an optional feature.

@jonhealy1
Copy link
Collaborator

@Gomez324 In the description for this PR, you state that Index B (12th-17th) lies outside the requested range (10th-16th) and would be skipped.

This description implies incorrect behavior. STAC API searches rely on Intersection, not Containment. Since Index B overlaps with the search window, it must be queried; otherwise, valid items from the 12th to the 16th would be hidden from the user.

Looking at the code in check_criteria, it appears you are correctly implementing intersection logic (which contradicts your description). Please update the PR description to avoid confusion, as the current example implies the feature is broken.

@Gomez324
Copy link
Collaborator Author

Gomez324 commented Dec 2, 2025

Hey @jonhealy1 I've fixed the code according to the suggestions, it's ready for a CR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants