Direct COSINE_SIMILARITY metric by rbs333 · Pull Request #582 · redis/redis-vl-python

rbs333 · 2026-04-13T17:01:12Z

If this were merged into core Redis than we could remove our conversion logic and have a cleaner implementation of direct cosine_similarity.

PR Summary: Add `COSINE_SIMILARITY` support in RediSearch

Summary

This change adds public RediSearch support for DISTANCE_METRIC COSINE_SIMILARITY while preserving the existing internal cosine execution path.

The implementation is intentionally non-breaking:

Existing L2, IP, and COSINE behavior remains unchanged.
COSINE_SIMILARITY is exposed as a new public metric name.
Internally, search/index execution continues to reuse cosine-distance behavior.
No new VecSim ordering, heap, or comparator logic is introduced in RediSearch.

Why

Industry standard with vector databases is cosine_similarity with range [-1, 1].
Many customers existing downstream apps assume cosine_similarity so lack of support adds friction for replacement.
Many ecosystem integrations also assume this convention and require us to reverse engineer the number for support.
Vector distance metric doesn't intuitively express exact opposite vectors like a negative number does.

RediSearch module changes

Metric parsing and metadata

RediSearch now accepts COSINE_SIMILARITY in schema creation and reports it back through metric stringification.

Touched areas:

src/spec.c
src/vector_index.h
src/vector_index.c

Internal cosine-path reuse

COSINE_SIMILARITY follows the same internal path as COSINE for query execution and vector normalization.

Touched areas:

src/vector_normalization.h
src/iterators/hybrid_reader.c

Returned score semantics

For fields defined with COSINE_SIMILARITY, RediSearch converts exposed vector scores from cosine distance to cosine similarity at the output boundary:

similarity = 1 - distance

This keeps internal ranking unchanged while presenting similarity-style results to users.

Touched areas:

src/vector_index.c
src/iterators/hybrid_reader.c

Range query semantics

For VECTOR_RANGE on COSINE_SIMILARITY fields, RediSearch interprets the provided threshold as a similarity threshold and translates it before calling the existing range query path:

internal radius = 1 - similarity_threshold

The public input is validated against the similarity range [-1, 1].

Touched area:

src/vector_index.c

Validation / tests

This PR adds focused RediSearch-side coverage for:

FT.CREATE accepting DISTANCE_METRIC COSINE_SIMILARITY
KNN result ordering matching cosine behavior
returned scores being exposed as cosine similarity values
range query thresholds being interpreted as similarity thresholds

Design constraints preserved

No changes to existing COSINE, IP, or L2 semantics
No new search-time cosine-similarity math in the RediSearch module
No new ordering/comparator model
No changes to the core cosine ranking behavior

Notes

This PR is designed as a thin RediSearch-layer adaptation:

keep cosine-based execution internally
translate only the public metric name and exposed score/range semantics

If paired with the corresponding VectorSimilarity changes, this gives users a clean public COSINE_SIMILARITY metric without expanding the internal algorithm surface area.

jit-ci · 2026-04-13T17:05:11Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

Copilot

Pull request overview

Adds client-side support in RedisVL for the new COSINE_SIMILARITY vector distance metric, aligning schema generation, query behavior, and result post-processing with similarity-style semantics (higher is better).

Changes:

Extend VectorDistanceMetric/schema field support to include COSINE_SIMILARITY and ensure it’s passed through to Redis as DISTANCE_METRIC COSINE_SIMILARITY.
Adjust vector query validation to default-sort COSINE_SIMILARITY searches by vector_distance descending when using the default sort.
Add unit tests covering schema export, default sort behavior, and ensuring similarity scores aren’t re-normalized.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/test_query_types.py	Adds tests for cosine-similarity score handling and default DESC sorting behavior.
tests/unit/test_fields.py	Verifies `FlatVectorField` exports `COSINE_SIMILARITY` in Redis field args.
redisvl/schema/fields.py	Introduces `VectorDistanceMetric.COSINE_SIMILARITY` and disables normalization for it in `VECTOR_NORM_MAP`.
redisvl/query/query.py	Tracks whether vector-distance sorting was defaulted and resets that state when `sort_by()` is called/cleared.
redisvl/index/index.py	Applies default DESC sort for `COSINE_SIMILARITY` vector queries; validates batched queries before execution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T17:08:08Z

tests/unit/test_query_types.py

@@ -36,6 +41,108 @@ def test_count_query():
    assert process_results(fake_result, count_query, "json") == 2


process_results now takes an IndexSchema, but this test still passes the string "json". It works only because the CountQuery early-return skips schema usage; updating the test to pass a real schema (or a minimal IndexSchema fixture) will keep it aligned with the public signature and avoid future breakage if the implementation changes.

Copilot · 2026-04-13T17:08:08Z

redisvl/index/index.py

+        if isinstance(query, BaseVectorQuery):
+            field = self.schema.fields[query._vector_field_name]
+            dist_metric = VectorDistanceMetric(field.attrs.distance_metric.upper())  # type: ignore


Accessing self.schema.fields[query._vector_field_name] will raise a raw KeyError if the query references a vector field name not present in the schema. Since this is part of user input validation, it would be better to handle the missing-field case explicitly and raise QueryValidationError with a clear message (instead of leaking KeyError).

Copilot · 2026-04-13T17:08:08Z

redisvl/index/index.py

+        for query in queries:
+            self._validate_query(query)


_query() wraps QueryValidationError to add context ("Invalid query: ..."), but batch_query() now calls _validate_query() without similar handling. Consider catching QueryValidationError here as well (and indicating which query in the batch failed) so batch and non-batch APIs report validation errors consistently.

Suggested change

for query in queries:

self._validate_query(query)

for i, query in enumerate(queries):

try:

self._validate_query(query)

except QueryValidationError as e:

raise QueryValidationError(

f"Invalid query at batch index {i}: {str(e)}"

) from e

Copilot · 2026-04-13T17:08:09Z

redisvl/index/index.py

+        for query in queries:
+            self._validate_query(query)


Same as sync batch_query(): now that _validate_query() is called here, consider catching QueryValidationError and adding context about which query failed validation so async batch and single-query APIs have consistent error reporting.

Suggested change

for query in queries:

self._validate_query(query)

for i, query in enumerate(queries):

try:

self._validate_query(query)

except QueryValidationError as e:

raise QueryValidationError(

f"Invalid query at batch index {i}: {str(e)}"

) from e

first working

3676cd2

Copilot AI review requested due to automatic review settings April 13, 2026 17:01

Copilot started reviewing on behalf of rbs333 April 13, 2026 17:01 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct COSINE_SIMILARITY metric#582

Direct COSINE_SIMILARITY metric#582
rbs333 wants to merge 1 commit intomainfrom
cosine_similarity_metric

rbs333 commented Apr 13, 2026

Uh oh!

jit-ci bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -36,6 +41,108 @@ def test_count_query():
		assert process_results(fake_result, count_query, "json") == 2

-        for query in queries:
-            self._validate_query(query)
+        for i, query in enumerate(queries):
+            try:
+                self._validate_query(query)
+            except QueryValidationError as e:
+                raise QueryValidationError(
+                    f"Invalid query at batch index {i}: {str(e)}"
+                ) from e

Conversation

rbs333 commented Apr 13, 2026

PR Summary: Add COSINE_SIMILARITY support in RediSearch

Summary

Why

RediSearch module changes

Metric parsing and metadata

Internal cosine-path reuse

Returned score semantics

Range query semantics

Validation / tests

Design constraints preserved

Notes

Uh oh!

jit-ci bot commented Apr 13, 2026

🛡️ Jit Security Scan Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PR Summary: Add `COSINE_SIMILARITY` support in RediSearch