Skip to content

Commit 8a55e35

Browse files
Finalize spec for index schema and fields (#103)
Advanced search operations require the ability to define rules and configurations about what data should be indexed, how it should be indexed, and how the database should handle this data on read/write. Redis handles this in a declarative manner, allowing for the dev to explicitly control these components of the search patterns. An `IndexSchema` is what RedisVL uses to simplify this approach for users. The goal is not to replace the core search API.. but rather to create an easier onramp for less redis-savy developers, i.e. AI-native developers. This PR applies a significant overhaul to the v0 implementation of schema in redisvl; with an additional goal of incorporating feedback from across the company/product/users for a better longterm approach in the ecosystem. Below is a quick example from the class docstring: ```python from redisvl.schema import IndexSchema # From YAML schema = IndexSchema.from_yaml("schema.yaml") # From Dict schema = IndexSchema.from_dict({ "index": { "name": "user-index", "prefix": "user", "storage_type": "json", }, "fields": [ {"name": "user", "type": "tag"}, {"name": "credit_score", "type": "tag"}, { "name": "embedding", "type": "vector", "attrs": { "algorithm": "flat", "dims": 3, "distance_metrics": "cosine", "datatype": "float32" } } ] }) ``` ``` Note: The `fields` attribute in the schema must contain unique field names to ensure correct and unambiguous field references. ``` ## `IndexSchema` Components ### Index Info Each index has a set of attributes like the index `name`, the chosen key `prefix`, and the underlying `storage_type`. In RedisVL, we also introduce the setting for `key_separator` which allows you to customize what char is used to separate the prefix from the identifier in the Redis key. ```python class IndexInfo(BaseModel): """ Represents the basic configuration information for an index in Redis. This class includes the essential details required to define an index, such as its name, prefix, key separator, and storage type. """ name: str """The unique name of the index.""" prefix: str = "rvl" """The prefix used for Redis keys associated with this index.""" key_separator: str = ":" """The separator character used in Redis keys.""" storage_type: StorageType = StorageType.HASH """The storage type used in Redis (e.g., 'hash' or 'json').""" ``` Index info can be parsed from a YAML or dict-like representation: ```yaml index: name: user-index-v1 prefix: user key_separator: ':' storage_type: json ``` ### Fields Fields are a list of.... field definitions that are to be included in the redis index (info as described above). A field has a `name`, `type`, `path` (optional, only for JSON index), and `attrs` (optional settings per field): ```python class BaseField(BaseModel): """Base field""" name: str """Field name""" type: str """Field type""" path: Optional[str] = None """Field path (within JSON object)""" attrs: Optional[Union[BaseFieldAttributes, BaseVectorFieldAttributes]] = None """Specified field attributes""" ``` Fields can be listed in either a dictionary or YAML representation like the following: ```yaml fields: - name: user type: tag path: '.user' - name: credit_score type: tag path: '$.credit_score' - name: embedding type: vector path: '$.embedding' attrs: algorithm: flat dims: 3 distance_metric: cosine datatype: float32 ``` ### Version The schema will be locked in a `0.1.0`. The pydantic model for `IndexSchema` prevents the user from fatfingering or using the incorrect version of the schema for this version of the library. It is a fixed variable. ```yaml version: '0.1.0' ```
1 parent c3e036b commit 8a55e35

33 files changed

+1311
-905
lines changed

docs/_static/js/sidebar.js

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,16 @@ const toc = [
44
{ title: "Install", path: "/overview/installation.html" },
55
{ title: "CLI", path: "/user_guide/cli.html" },
66
]},
7-
{ header: "User Guide", toc: [
7+
{ header: "User Guides", toc: [
88
{ title: "Getting Started", path: "/user_guide/getting_started_01.html" },
99
{ title: "Query and Filter", path: "/user_guide/hybrid_queries_02.html" },
10+
{ title: "JSON vs Hash Storage", path: "/user_guide/hash_vs_json_05.html" },
1011
{ title: "Vectorizers", path: "/user_guide/vectorizers_04.html" },
1112
{ title: "Semantic Caching", path: "/user_guide/llmcache_03.html" },
12-
{ title: "JSON Storage", path: "/user_guide/hash_vs_json_05.html" }
1313

1414
]},
1515
{ header: "API", toc: [
16+
{ title: "Schema", path: "/api/indexschema.html"},
1617
{ title: "Index", path: "/api/searchindex.html" },
1718
{ title: "Query", path: "/api/query.html" },
1819
{ title: "Filter", path: "/api/filter.html" },

docs/api/indexschema.rst

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
***********
3-
IndexSchema
3+
Schema
44
***********
55

66
IndexSchema
@@ -12,17 +12,14 @@ IndexSchema
1212

1313
.. autosummary::
1414

15-
IndexSchema.__init__
16-
IndexSchema.name
17-
IndexSchema.prefix
18-
IndexSchema.key_separator
19-
IndexSchema.storage_type
15+
IndexSchema.index
2016
IndexSchema.fields
17+
IndexSchema.version
18+
IndexSchema.field_names
2119
IndexSchema.redis_fields
2220
IndexSchema.add_field
2321
IndexSchema.add_fields
2422
IndexSchema.remove_field
25-
IndexSchema.generate_fields
2623
IndexSchema.from_yaml
2724
IndexSchema.to_yaml
2825
IndexSchema.from_dict
@@ -32,3 +29,22 @@ IndexSchema
3229
:show-inheritance:
3330
:inherited-members:
3431
:members:
32+
33+
34+
IndexInfo
35+
=========
36+
37+
.. currentmodule:: redisvl.schema
38+
39+
.. autosummary::
40+
41+
IndexInfo.name
42+
IndexInfo.prefix
43+
IndexInfo.key_separator
44+
IndexInfo.storage_type
45+
46+
47+
.. autoclass:: IndexInfo
48+
:show-inheritance:
49+
:inherited-members:
50+
:members:

docs/api/searchindex.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ SearchIndex
1111

1212
.. autosummary::
1313

14-
SearchIndex.__init__
1514
SearchIndex.from_yaml
1615
SearchIndex.from_dict
1716
SearchIndex.client
@@ -32,6 +31,8 @@ SearchIndex
3231
SearchIndex.asearch
3332
SearchIndex.query
3433
SearchIndex.aquery
34+
SearchIndex.query_batch
35+
SearchIndex.aquery_batch
3536
SearchIndex.delete
3637
SearchIndex.adelete
3738
SearchIndex.info

docs/examples/openai_qna.ipynb

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -81,15 +81,6 @@
8181
"%pip install pandas wget tenacity tiktoken openai==0.28.1"
8282
]
8383
},
84-
{
85-
"cell_type": "code",
86-
"execution_count": 2,
87-
"metadata": {},
88-
"outputs": [],
89-
"source": [
90-
"import redisvl"
91-
]
92-
},
9384
{
9485
"cell_type": "code",
9586
"execution_count": 3,
@@ -653,7 +644,7 @@
653644
},
654645
{
655646
"cell_type": "code",
656-
"execution_count": 9,
647+
"execution_count": 2,
657648
"metadata": {},
658649
"outputs": [
659650
{
@@ -667,18 +658,22 @@
667658
"source": [
668659
"%%writefile wiki_schema.yaml\n",
669660
"\n",
661+
"version: '0.1.0'\n",
662+
"\n",
670663
"index:\n",
671-
" name: wiki\n",
672-
" prefix: oaiWiki\n",
664+
" name: wikipedia\n",
665+
" prefix: chunk\n",
673666
"\n",
674667
"fields:\n",
675-
" text:\n",
676-
" - name: content\n",
677-
" - name: title\n",
678-
" tag:\n",
679-
" - name: id\n",
680-
" vector:\n",
681-
" - name: embedding\n",
668+
" - name: content\n",
669+
" type: text\n",
670+
" - name: title\n",
671+
" type: text\n",
672+
" - name: id\n",
673+
" type: tag\n",
674+
" - name: embedding\n",
675+
" type: vector\n",
676+
" attrs:\n",
682677
" dims: 1536\n",
683678
" distance_metric: cosine\n",
684679
" algorithm: flat"
@@ -690,10 +685,14 @@
690685
"metadata": {},
691686
"outputs": [],
692687
"source": [
688+
"import redis.asyncio as redis\n",
689+
"\n",
693690
"from redisvl.index import SearchIndex\n",
694691
"\n",
692+
"client = redis.Redis.from_url(\"redis://localhost:6379\")\n",
693+
"\n",
695694
"index = SearchIndex.from_yaml(\"wiki_schema.yaml\")\n",
696-
"index.connect(\"redis://localhost:6379\", use_async=True)\n",
695+
"index.set_client(client)\n",
697696
"\n",
698697
"await index.acreate()"
699698
]

docs/user_guide/cli.ipynb

Lines changed: 42 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"name": "stdout",
2525
"output_type": "stream",
2626
"text": [
27-
"\u001b[32m11:13:52\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m RedisVL version 0.0.5\n"
27+
"\u001b[32m20:09:02\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m RedisVL version 0.0.7\n"
2828
]
2929
}
3030
],
@@ -44,19 +44,22 @@
4444
"first, we will create an index from a yaml schema that looks like the following\n",
4545
"\n",
4646
"```yaml\n",
47+
"version: '0.1.0'\n",
48+
"\n",
4749
"index:\n",
48-
" name: providers\n",
49-
" prefix: rvl\n",
50+
" name: vectorizers\n",
51+
" prefix: doc\n",
5052
" storage_type: hash\n",
5153
"\n",
5254
"fields:\n",
53-
" text:\n",
54-
" - name: sentence\n",
55-
" vector:\n",
56-
" - name: embedding\n",
57-
" dims: 768\n",
58-
" algorithm: flat\n",
59-
" distance_metric: cosine\n",
55+
" - name: sentence\n",
56+
" type: text\n",
57+
" - name: embedding\n",
58+
" type: vector\n",
59+
" attrs:\n",
60+
" dims: 768\n",
61+
" algorithm: flat\n",
62+
" distance_metric: cosine\n",
6063
"```"
6164
]
6265
},
@@ -69,7 +72,7 @@
6972
"name": "stdout",
7073
"output_type": "stream",
7174
"text": [
72-
"\u001b[32m11:13:54\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index created successfully\n"
75+
"\u001b[32m20:09:04\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index created successfully\n"
7376
]
7477
}
7578
],
@@ -87,8 +90,8 @@
8790
"name": "stdout",
8891
"output_type": "stream",
8992
"text": [
90-
"\u001b[32m11:13:56\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
91-
"\u001b[32m11:13:56\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. providers\n"
93+
"\u001b[32m20:09:05\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
94+
"\u001b[32m20:09:05\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. vectorizers\n"
9295
]
9396
}
9497
],
@@ -112,7 +115,7 @@
112115
"╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮\n",
113116
"│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │\n",
114117
"├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤\n",
115-
"providers │ HASH │ ['rvl'] │ [] │ 0 │\n",
118+
"vectorizers │ HASH │ ['doc'] │ [] │ 0 │\n",
116119
"╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯\n",
117120
"Index Fields:\n",
118121
"╭───────────┬─────────────┬────────┬────────────────┬────────────────╮\n",
@@ -126,7 +129,7 @@
126129
],
127130
"source": [
128131
"# inspect the index fields\n",
129-
"!rvl index info -i providers"
132+
"!rvl index info -i vectorizers"
130133
]
131134
},
132135
{
@@ -138,13 +141,13 @@
138141
"name": "stdout",
139142
"output_type": "stream",
140143
"text": [
141-
"\u001b[32m11:13:59\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index deleted successfully\n"
144+
"\u001b[32m20:09:09\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index deleted successfully\n"
142145
]
143146
}
144147
],
145148
"source": [
146149
"# delete an index without deleting the data within it\n",
147-
"!rvl index delete -i providers"
150+
"!rvl index delete -i vectorizers"
148151
]
149152
},
150153
{
@@ -156,7 +159,7 @@
156159
"name": "stdout",
157160
"output_type": "stream",
158161
"text": [
159-
"\u001b[32m11:14:00\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n"
162+
"\u001b[32m20:09:11\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n"
160163
]
161164
}
162165
],
@@ -183,7 +186,7 @@
183186
"name": "stdout",
184187
"output_type": "stream",
185188
"text": [
186-
"\u001b[32m11:14:02\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index created successfully\n"
189+
"\u001b[32m20:09:12\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index created successfully\n"
187190
]
188191
}
189192
],
@@ -201,8 +204,8 @@
201204
"name": "stdout",
202205
"output_type": "stream",
203206
"text": [
204-
"\u001b[32m11:14:03\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
205-
"\u001b[32m11:14:03\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. providers\n"
207+
"\u001b[32m20:09:14\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
208+
"\u001b[32m20:09:14\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. vectorizers\n"
206209
]
207210
}
208211
],
@@ -250,7 +253,24 @@
250253
],
251254
"source": [
252255
"# see all the stats for the index\n",
253-
"!rvl stats -i providers"
256+
"!rvl stats -i vectorizers"
257+
]
258+
},
259+
{
260+
"cell_type": "code",
261+
"execution_count": 11,
262+
"metadata": {},
263+
"outputs": [
264+
{
265+
"name": "stdout",
266+
"output_type": "stream",
267+
"text": [
268+
"\u001b[32m20:09:33\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Index deleted successfully\n"
269+
]
270+
}
271+
],
272+
"source": [
273+
"!rvl index destroy -i vectorizers"
254274
]
255275
}
256276
],

0 commit comments

Comments
 (0)