Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
Adds client-side support in RedisVL for the new COSINE_SIMILARITY vector distance metric, aligning schema generation, query behavior, and result post-processing with similarity-style semantics (higher is better).
Changes:
- Extend
VectorDistanceMetric/schema field support to includeCOSINE_SIMILARITYand ensure it’s passed through to Redis asDISTANCE_METRIC COSINE_SIMILARITY. - Adjust vector query validation to default-sort
COSINE_SIMILARITYsearches byvector_distancedescending when using the default sort. - Add unit tests covering schema export, default sort behavior, and ensuring similarity scores aren’t re-normalized.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_query_types.py | Adds tests for cosine-similarity score handling and default DESC sorting behavior. |
| tests/unit/test_fields.py | Verifies FlatVectorField exports COSINE_SIMILARITY in Redis field args. |
| redisvl/schema/fields.py | Introduces VectorDistanceMetric.COSINE_SIMILARITY and disables normalization for it in VECTOR_NORM_MAP. |
| redisvl/query/query.py | Tracks whether vector-distance sorting was defaulted and resets that state when sort_by() is called/cleared. |
| redisvl/index/index.py | Applies default DESC sort for COSINE_SIMILARITY vector queries; validates batched queries before execution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -36,6 +41,108 @@ def test_count_query(): | |||
| assert process_results(fake_result, count_query, "json") == 2 | |||
There was a problem hiding this comment.
process_results now takes an IndexSchema, but this test still passes the string "json". It works only because the CountQuery early-return skips schema usage; updating the test to pass a real schema (or a minimal IndexSchema fixture) will keep it aligned with the public signature and avoid future breakage if the implementation changes.
| if isinstance(query, BaseVectorQuery): | ||
| field = self.schema.fields[query._vector_field_name] | ||
| dist_metric = VectorDistanceMetric(field.attrs.distance_metric.upper()) # type: ignore |
There was a problem hiding this comment.
Accessing self.schema.fields[query._vector_field_name] will raise a raw KeyError if the query references a vector field name not present in the schema. Since this is part of user input validation, it would be better to handle the missing-field case explicitly and raise QueryValidationError with a clear message (instead of leaking KeyError).
| for query in queries: | ||
| self._validate_query(query) |
There was a problem hiding this comment.
_query() wraps QueryValidationError to add context ("Invalid query: ..."), but batch_query() now calls _validate_query() without similar handling. Consider catching QueryValidationError here as well (and indicating which query in the batch failed) so batch and non-batch APIs report validation errors consistently.
| for query in queries: | |
| self._validate_query(query) | |
| for i, query in enumerate(queries): | |
| try: | |
| self._validate_query(query) | |
| except QueryValidationError as e: | |
| raise QueryValidationError( | |
| f"Invalid query at batch index {i}: {str(e)}" | |
| ) from e |
| for query in queries: | ||
| self._validate_query(query) |
There was a problem hiding this comment.
Same as sync batch_query(): now that _validate_query() is called here, consider catching QueryValidationError and adding context about which query failed validation so async batch and single-query APIs have consistent error reporting.
| for query in queries: | |
| self._validate_query(query) | |
| for i, query in enumerate(queries): | |
| try: | |
| self._validate_query(query) | |
| except QueryValidationError as e: | |
| raise QueryValidationError( | |
| f"Invalid query at batch index {i}: {str(e)}" | |
| ) from e |
See: rbs333/RediSearch#1
If this were merged into core Redis than we could remove our conversion logic and have a cleaner implementation of direct cosine_similarity.
PR Summary: Add
COSINE_SIMILARITYsupport in RediSearchSummary
This change adds public RediSearch support for
DISTANCE_METRIC COSINE_SIMILARITYwhile preserving the existing internal cosine execution path.The implementation is intentionally non-breaking:
L2,IP, andCOSINEbehavior remains unchanged.COSINE_SIMILARITYis exposed as a new public metric name.Why
cosine_similaritywith range [-1, 1].RediSearch module changes
Metric parsing and metadata
RediSearch now accepts
COSINE_SIMILARITYin schema creation and reports it back through metric stringification.Touched areas:
src/spec.csrc/vector_index.hsrc/vector_index.cInternal cosine-path reuse
COSINE_SIMILARITYfollows the same internal path asCOSINEfor query execution and vector normalization.Touched areas:
src/vector_normalization.hsrc/iterators/hybrid_reader.cReturned score semantics
For fields defined with
COSINE_SIMILARITY, RediSearch converts exposed vector scores from cosine distance to cosine similarity at the output boundary:similarity = 1 - distanceThis keeps internal ranking unchanged while presenting similarity-style results to users.
Touched areas:
src/vector_index.csrc/iterators/hybrid_reader.cRange query semantics
For
VECTOR_RANGEonCOSINE_SIMILARITYfields, RediSearch interprets the provided threshold as a similarity threshold and translates it before calling the existing range query path:1 - similarity_thresholdThe public input is validated against the similarity range
[-1, 1].Touched area:
src/vector_index.cValidation / tests
This PR adds focused RediSearch-side coverage for:
FT.CREATEacceptingDISTANCE_METRIC COSINE_SIMILARITYDesign constraints preserved
COSINE,IP, orL2semanticsNotes
This PR is designed as a thin RediSearch-layer adaptation:
If paired with the corresponding VectorSimilarity changes, this gives users a clean public
COSINE_SIMILARITYmetric without expanding the internal algorithm surface area.