@@ -28,10 +28,12 @@ similarity of an embedding generated from some query text with embeddings stored
2828or JSON fields, Redis can retrieve documents that closely match the query in terms
2929of their meaning.
3030
31- In the example below, we use the
31+ The example below uses the
3232[ ` sentence-transformers ` ] ( https://pypi.org/project/sentence-transformers/ )
3333library to generate vector embeddings to store and index with
34- Redis Query Engine.
34+ Redis Query Engine. The code is first demonstrated for hash documents with a
35+ separate section to explain the
36+ [ differences with JSON documents] ( #differences-with-json-documents ) .
3537
3638## Initialize
3739
@@ -50,6 +52,7 @@ from sentence_transformers import SentenceTransformer
5052from redis.commands.search.query import Query
5153from redis.commands.search.field import TextField, TagField, VectorField
5254from redis.commands.search.indexDefinition import IndexDefinition, IndexType
55+ from redis.commands.json.path import Path
5356
5457import numpy as np
5558import redis
@@ -86,7 +89,7 @@ except redis.exceptions.ResponseError:
8689 pass
8790```
8891
89- Next, we create the index.
92+ Next, create the index.
9093The schema in the example below specifies hash objects for storage and includes
9194three fields: the text content to index, a
9295[ tag] ({{< relref "/develop/interact/search-and-query/advanced-concepts/tags" >}})
@@ -127,10 +130,10 @@ Use the `model.encode()` method of `SentenceTransformer`
127130as shown below to create the embedding that represents the ` content ` field.
128131The ` astype() ` option that follows the ` model.encode() ` call specifies that
129132we want a vector of ` float32 ` values. The ` tobytes() ` option encodes the
130- vector components together as a single binary string rather than the
131- default Python list of ` float ` values.
132- Use the binary string representation when you are indexing hash objects
133- (as we are here), but use the default list of ` float ` for JSON objects .
133+ vector components together as a single binary string.
134+ Use the binary string representation when you are indexing hashes
135+ or running a query (but use a list of ` float ` for
136+ [ JSON documents ] ( #differences-with-json-documents ) ) .
134137
135138``` python
136139content = " That is a very happy person"
@@ -226,6 +229,116 @@ As you would expect, the result for `doc:0` with the content text *"That is a ve
226229is the result that is most similar in meaning to the query text
227230* "That is a happy person"* .
228231
232+ ## Differences with JSON documents
233+
234+ Indexing JSON documents is similar to hash indexing, but there are some
235+ important differences. JSON allows much richer data modelling with nested fields, so
236+ you must supply a [ path] ({{< relref "/develop/data-types/json/path" >}}) in the schema
237+ to identify each field you want to index. However, you can declare a short alias for each
238+ of these paths (using the ` as_name ` keyword argument) to avoid typing it in full for
239+ every query. Also, you must specify ` IndexType.JSON ` when you create the index.
240+
241+ The code below shows these differences, but the index is otherwise very similar to
242+ the one created previously for hashes:
243+
244+ ``` py
245+ schema = (
246+ TextField(" $.content" , as_name = " content" ),
247+ TagField(" $.genre" , as_name = " genre" ),
248+ VectorField(
249+ " $.embedding" , " HNSW" , {
250+ " TYPE" : " FLOAT32" ,
251+ " DIM" : 384 ,
252+ " DISTANCE_METRIC" : " L2"
253+ },
254+ as_name = " embedding"
255+ )
256+ )
257+
258+ r.ft(" vector_json_idx" ).create_index(
259+ schema,
260+ definition = IndexDefinition(
261+ prefix = [" jdoc:" ], index_type = IndexType.JSON
262+ )
263+ )
264+ ```
265+
266+ Use [ ` json().set() ` ] ({{< relref "/commands/json.set" >}}) to add the data
267+ instead of [ ` hset() ` ] ({{< relref "/commands/hset" >}}). The dictionaries
268+ that specify the fields have the same structure as the ones used for ` hset() `
269+ but ` json().set() ` receives them in a positional argument instead of
270+ the ` mapping ` keyword argument.
271+
272+ An important difference with JSON indexing is that the vectors are
273+ specified using lists instead of binary strings. Generate the list
274+ using the ` tolist() ` method instead of ` tobytes() ` as you would with a
275+ hash.
276+
277+ ``` py
278+ content = " That is a very happy person"
279+
280+ r.json().set(" jdoc:0" , Path.root_path(), {
281+ " content" : content,
282+ " genre" : " persons" ,
283+ " embedding" : model.encode(content).astype(np.float32).tolist(),
284+ })
285+
286+ content = " That is a happy dog"
287+
288+ r.json().set(" jdoc:1" , Path.root_path(), {
289+ " content" : content,
290+ " genre" : " pets" ,
291+ " embedding" : model.encode(content).astype(np.float32).tolist(),
292+ })
293+
294+ content = " Today is a sunny day"
295+
296+ r.json().set(" jdoc:2" , Path.root_path(), {
297+ " content" : content,
298+ " genre" : " weather" ,
299+ " embedding" : model.encode(content).astype(np.float32).tolist(),
300+ })
301+ ```
302+
303+ The query is almost identical to the one for the hash documents. This
304+ demonstrates how the right choice of aliases for the JSON paths can
305+ save you having to write complex queries. An important thing to notice
306+ is that the vector parameter for the query is still specified as a
307+ binary string (using the ` tobytes() ` method), even though the data for
308+ the ` embedding ` field of the JSON was specified as a list.
309+
310+ ``` py
311+ q = Query(
312+ " *=>[KNN 3 @embedding $vec AS vector_distance]"
313+ ).return_field(" vector_distance" ).return_field(" content" ).dialect(2 )
314+
315+ query_text = " That is a happy person"
316+
317+ res = r.ft(" vector_json_idx" ).search(
318+ q, query_params = {
319+ " vec" : model.encode(query_text).astype(np.float32).tobytes()
320+ }
321+ )
322+ ```
323+
324+ Apart from the ` jdoc: ` prefixes for the keys, the result from the JSON
325+ query is the same as for hash:
326+
327+ ```
328+ Result{
329+ 3 total,
330+ docs: [
331+ Document {
332+ 'id': 'jdoc:0',
333+ 'payload': None,
334+ 'vector_distance': '0.114169985056',
335+ 'content': 'That is a very happy person'
336+ },
337+ .
338+ .
339+ .
340+ ```
341+
229342## Learn more
230343
231344See
0 commit comments