Skip to content

fix(entities): treat $like as a literal substring, not a SQL wildcard#264

Open
maxdubrinsky wants to merge 1 commit into
mainfrom
aircore-749-like-literal-escaping/md
Open

fix(entities): treat $like as a literal substring, not a SQL wildcard#264
maxdubrinsky wants to merge 1 commit into
mainfrom
aircore-749-like-literal-escaping/md

Conversation

@maxdubrinsky

@maxdubrinsky maxdubrinsky commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

SQLAlchemyFilterRepository.like() interpolated the user filter value directly into an ILIKE pattern (f"%{value}%") with no escaping, so % and _ in a $like value were silently treated as SQL wildcards on both SQLite and PostgreSQL. The shipped in-memory backend (InMemoryFilterRepository.like) — the documented, test-pinned canonical contract — treats $like as a case-insensitive literal substring where % and _ are ordinary characters. The two backends therefore returned different result sets for any value containing % or _ (e.g. name~"a_b" matched "axb"; "100%" over-matched).

This is item (1) of AIRCORE-749.

Fix

Escape the LIKE metacharacters (\, %, _) and pass an explicit ESCAPE clause, which behaves identically on SQLite and PostgreSQL. The SQL backend now matches the in-memory literal-substring contract. The in-memory backend is unchanged (it was already correct).

pattern = f"%{_escape_like(value)}%"
return column.ilike(pattern, escape="\\")

Testing

Extended the SQL↔in-memory parity suite (test_filter_matches_sql_parity.py) with rows whose name/data.tier contain _ and %, each paired with a near-identical decoy a wildcard would wrongly match — covering both the plain-column and JSON cast-to-text paths. The three new cases fail before this change (SQL=[6,7] vs in-memory=[6]) and pass after. Full entities unit suite: 289 passed, and all existing filter tests still pass (none relied on wildcard $like).

Scope note — remaining AIRCORE-749 work

This PR is the decision-free, dialect-agnostic slice. The other foot-guns in the ticket are deliberately not included because they need cross-backend behavior decisions (and surface coupled failures):

  • (2) Numeric cast raises a 500 on Postgres for $gt/$lt against non-numeric JSON text (SQLite silently coerces to 0.0). Needs a dialect-aware numeric guard + a decision on non-numeric → no-match.
  • (3) JSON boolean rendering differs across backends for like/in/nin/ordered (only eq normalizes).
  • (4) Absent-key semantics: SQLite renders absent as literal 'null'; Postgres/in-memory treat it as no-match.
  • (5) Extend the parity suite to run against Postgres via testcontainers (the SQLite-only leg is what let these hide).

These remain open on the ticket; (2)/(4) in particular change current query results and warrant their own review.

Summary by CodeRabbit

  • Bug Fixes
    • Search and filter now treat SQL wildcard characters (%, _) and backslashes as literal characters, yielding consistent literal-substring matches across JSON and non-JSON fields.
  • Tests
    • Extended unit and end-to-end tests to cover literal handling of wildcard characters and ensure parity between in-memory and SQL backends.

@maxdubrinsky maxdubrinsky requested review from a team as code owners June 10, 2026 19:05
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a66bc2f3-6a02-40f2-a00a-1ecaa3cf4280

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0d860 and 4be74d2.

📒 Files selected for processing (3)
  • e2e/test_entities.py
  • services/core/entities/src/nmp/core/entities/app/repository/sqlalchemy/filter.py
  • services/core/entities/tests/test_filter_matches_sql_parity.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/core/entities/tests/test_filter_matches_sql_parity.py

📝 Walkthrough

Walkthrough

This PR fixes SQL LIKE wildcard injection by escaping metacharacters (%, _, backslashes) in user-provided values. A new _escape_likehelper escapes these characters, andSQLAlchemyFilterRepository.likeapplies explicitESCAPE` clauses for both regular and JSON columns. Tests and an e2e test were updated to assert literal-substring matching.

Changes

SQL LIKE Metacharacter Escaping

Layer / File(s) Summary
Escape helper and LIKE operator implementation
services/core/entities/src/nmp/core/entities/app/repository/sqlalchemy/filter.py
New _escape_like function escapes backslashes and %/_. SQLAlchemyFilterRepository.like builds ilike/contains patterns from escaped values and applies escape="\" for both JSON (cast/trimmed) and non-JSON columns.
Test and e2e updates for literal LIKE semantics
services/core/entities/tests/test_filter_matches_sql_parity.py, e2e/test_entities.py
Extended SEED with rows containing literal _/% and added $like cases asserting these characters match literally; updated e2e test_entity_search_filter to pass the raw prefix to $like (literal semantics).
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately summarizes the main change: escaping SQL LIKE metacharacters to treat $like as literal substring matching instead of wildcard patterns.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aircore-749-like-literal-escaping/md

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 19124/25199 75.9% 62.4%
Integration Tests 12197/23971 50.9% 26.3%

@maxdubrinsky maxdubrinsky changed the title fix(entities): treat $like as a literal substring, not a SQL wildcard (AIRCORE-749) fix(entities): treat $like as a literal substring, not a SQL wildcard Jun 10, 2026
@github-actions github-actions Bot added the fix label Jun 10, 2026
… (AIRCORE-749)

SQLAlchemyFilterRepository.like() interpolated the user filter value into an
ILIKE pattern without escaping, so % and _ in a $like value were silently
treated as SQL wildcards on both SQLite and PostgreSQL. The shipped in-memory
backend (InMemoryFilterRepository.like) — the documented, test-pinned canonical
contract — treats $like as a case-insensitive literal substring where % and _
are ordinary characters. The two backends therefore returned different result
sets for any value containing % or _ (e.g. name~"a_b" matched "axb").

Escape the LIKE metacharacters (\, %, _) and pass an explicit ESCAPE clause,
which behaves identically on SQLite and PostgreSQL, so the SQL backend matches
the in-memory literal-substring contract. The in-memory backend is unchanged.

Extends the SQL/in-memory parity suite with rows whose name/data.tier contain
_ and %, each paired with a near-identical decoy a wildcard would wrongly match
(covering both the plain-column and JSON cast-to-text paths). The new cases
fail before this change (SQL over-matches the decoy) and pass after.

Addresses item (1) of AIRCORE-749. The cross-backend JSON-coercion foot-guns
(numeric-cast Postgres 500, boolean rendering, absent-key semantics) and the
Postgres-backed parity leg remain open on the ticket — they require cross-backend
behavior decisions.

Signed-off-by: Max Dubrinsky <mdubrinsky@nvidia.com>
@maxdubrinsky maxdubrinsky force-pushed the aircore-749-like-literal-escaping/md branch from 2c0d860 to 4be74d2 Compare June 11, 2026 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant