|
1 | | -# RedisVL |
| 1 | +# RedisVL: Python Client Library for Redis as a Vector Database |
2 | 2 |
|
3 | | -> DISCLAIMER: This project is still under signifigant development and should not be used in any production settings. We would love input/contributions as we finalize what the CLI and library interfaces should look like. |
4 | 3 |
|
5 | | -A CLI and Library to help with loading data into Redis specifically for |
6 | | -usage with RediSearch and Redis Vector Search capabilities |
| 4 | +[](https://codecov.io/gh/RedisVentures/RedisVL) |
| 5 | +[](https://opensource.org/licenses/mit/) |
7 | 6 |
|
8 | | -### Usage |
9 | 7 |
|
10 | | -``` |
11 | | -usage: redisvl <command> [<args>] |
| 8 | +RedisVL provides a powerful Python client library for using Redis as a Vector Database. Leverage the speed and reliability of Redis along with vector-based semantic search capabilities to supercharge your application! |
12 | 9 |
|
13 | | -Commands: |
14 | | - load Load vector data into redis |
15 | | - index Index manipulation (create, delete, etc.) |
16 | | - query Query an existing index |
| 10 | +**Note:** This project is rapidly evolving, and the API may change frequently. Always refer to the most recent [documentation](https://redisvl.com/docs). |
| 11 | +## 🚀 What is RedisVL? |
17 | 12 |
|
18 | | -Redis Vector load CLI |
| 13 | +Vector databases have become increasingly popular in recent years due to their ability to store and retrieve vectors efficiently. However, most vector databases are complex to use and require a lot of time and effort to set up. RedisVL aims to solve this problem by providing a simple and intuitive interface for using Redis as a vector database. |
19 | 14 |
|
20 | | -positional arguments: |
21 | | - command Subcommand to run |
| 15 | +RedisVL provides a client library that enables you to harness the power of Redis as a vector database. This library simplifies the process of storing, retrieving, and performing semantic searches on vectors in Redis. It also provides a robust index management system that allows you to create, update, and delete indices with ease. |
22 | 16 |
|
23 | | -optional arguments: |
24 | | - -h, --help show this help message and exit |
25 | 17 |
|
26 | | -``` |
| 18 | +### Capabilities |
| 19 | + |
| 20 | +RedisVL has a host of powerful features designed to streamline your vector database operations. |
| 21 | + |
| 22 | +1. **Index Management**: RedisVL allows for indices to be created, updated, and deleted with ease. A schema for each index can be defined in yaml or directly in python code and used throughout the lifetime of the index. |
| 23 | + |
| 24 | +2. **Vector Creation**: RedisVL integrates with OpenAI and other embedding providers to make the process of creating vectors straightforward. |
| 25 | + |
| 26 | +3. **Vector Search**: RedisVL provides robust search capabilities that enable you to query vectors synchronously and asynchronously. Hybrid queries that utilize tag, geographic, numeric, and other filters like full-text search are also supported. |
| 27 | + |
| 28 | +4. **Semantic Caching**: ``LLMCache`` is a semantic caching interface built directly into RedisVL. It allows for the caching of generated output from LLM models like GPT-3 and others. As semantic search is used to check the cache, a threshold can be set to determine if the cached result is relevant enough to be returned. If not, the model is called and the result is cached for future use. This can increase the QPS and reduce the cost of using LLM models. |
| 29 | + |
| 30 | + |
| 31 | +## 😊 Quick Start |
27 | 32 |
|
28 | | -For any of the above commands, you will need to have an index schema written |
29 | | -into a yaml file for the cli to read. The format of the schema is as follows |
| 33 | +Please note that this library is still under heavy development, and while you can quickly try RedisVL and deploy it in a production environment, the API may be subject to change at any time. |
| 34 | + |
| 35 | +`pip install redisvl` |
| 36 | + |
| 37 | +## Example Usage |
| 38 | + |
| 39 | +### Index Management |
| 40 | + |
| 41 | +Indices can be defined through yaml specification that corresponds directly to the RediSearch field names and arguments in redis-py |
30 | 42 |
|
31 | 43 | ```yaml |
32 | 44 | index: |
33 | | - name: sample # index name used for querying |
| 45 | + name: users |
34 | 46 | storage_type: hash |
35 | | - key_field: "id" # column name to use for key in redis |
36 | | - prefix: vector # prefix used for all loaded docs |
| 47 | + prefix: "user:" |
| 48 | + key_field: "id" |
37 | 49 |
|
38 | | -# all fields to create index with |
39 | | -# sub-items correspond to redis-py Field arguments |
40 | 50 | fields: |
| 51 | + # define tag fields |
41 | 52 | tag: |
42 | | - categories: # name of a tag field used for queries |
43 | | - separator: "|" |
44 | | - year: # name of a tag field used for queries |
45 | | - separator: "|" |
| 53 | + - name: users |
| 54 | + - name: job |
| 55 | + - name: credit_store |
| 56 | + # define numeric fields |
| 57 | + numeric: |
| 58 | + - name: age |
| 59 | + # define vector fields |
46 | 60 | vector: |
47 | | - vector: # name of the vector field used for queries |
48 | | - datatype: "float32" |
49 | | - algorithm: "flat" # flat or HSNW |
50 | | - dims: 768 |
51 | | - distance_metric: "cosine" # ip, L2, cosine |
| 61 | + - name: user_embedding |
| 62 | + algorithm: hnsw |
| 63 | + distance_metric: cosine |
52 | 64 | ``` |
53 | 65 |
|
54 | | -#### Example Usage |
55 | | -These examples reference [provided sample data](sample-data/). |
| 66 | +This would correspond to a dataset that looked something like |
56 | 67 |
|
57 | | -```bash |
58 | | -# load in a pickled dataframe with |
59 | | -redisvl load -s sample-data/sample.yml -d sample-data/pandas-sample.pkl |
60 | | -``` |
| 68 | +| users | age | job | credit_score | user_embedding | |
| 69 | +|-------|-----|------------|--------------|-----------------------------------| |
| 70 | +| john | 1 | engineer | high | \x3f\x8c\xcc\x3f\x8c\xcc?@ | |
| 71 | +| mary | 2 | doctor | low | \x3f\x8c\xcc\x3f\x8c\xcc?@ | |
| 72 | +| joe | 3 | dentist | medium | \x3f\xab\xcc?\xab\xcc?@ | |
61 | 73 |
|
62 | | -```bash |
63 | | -# load in a pickled dataframe to a specific address and port |
64 | | -redisvl load -s sample-data/sample.yml -d sample-data/pandas-sample.pkl -h 127.0.0.1 -p 6379 |
65 | | -``` |
66 | 74 |
|
67 | | -```bash |
68 | | -# load in a pickled dataframe to a specific |
69 | | -# address and port and with password |
70 | | -redisvl load -s sample-data/sample.yml -d sample-data/pandas-sample.pkl -h 127.0.0.1 -p 6379 -p supersecret |
71 | | -``` |
| 75 | +With the schema, the RedisVL library can be used to create, load vectors and perform vector searches |
| 76 | +```python |
| 77 | +from redisvl.index import SearchIndex |
| 78 | +from redisvl.query import create_vector_query |
72 | 79 |
|
73 | | -### Support |
| 80 | +# define and create the index |
| 81 | +index = SearchIndex.from_yaml("./users_schema.yml")) |
| 82 | +index.connect("redis://localhost:6379") |
| 83 | +index.create() |
74 | 84 |
|
75 | | -#### Supported Index Fields |
| 85 | +index.load(pd.read_csv("./users.csv").to_records()) |
76 | 86 |
|
77 | | - - ``geo`` |
78 | | - - ``tag`` |
79 | | - - ``numeric`` |
80 | | - - ``vector`` |
81 | | - - ``text`` |
82 | | -#### Supported Data Types |
83 | | - - Pandas DataFrame (pickled) |
84 | | -#### Supported Redis Data Types |
85 | | - - Hash |
86 | | - - JSON (soon) |
| 87 | +query = create_vector_query( |
| 88 | + ["users", "age", "job", "credit_score"], |
| 89 | + number_of_results=2, |
| 90 | + vector_field_name="user_embedding", |
| 91 | +) |
87 | 92 |
|
88 | | -### Install |
89 | | -Install the Python requirements listed in `requirements.txt`. |
| 93 | +query_vector = np.array([0.1, 0.1, 0.5]).tobytes() |
| 94 | +results = index.search(query, query_params={"vector": query_vector}) |
| 95 | + |
| 96 | +``` |
90 | 97 |
|
91 | | -```bash |
92 | | -git clone https://github.com/RedisVentures/data-loader.git |
93 | | -cd redisvl |
94 | | -pip install . |
| 98 | +### Semantic cache |
| 99 | +
|
| 100 | +The ``LLMCache`` Interface in RedisVL can be used as follows. |
| 101 | +
|
| 102 | +```python |
| 103 | +# init open ai client |
| 104 | +import openai |
| 105 | +openai.api_key = "sk-xxx" |
| 106 | + |
| 107 | +from redisvl.llmcache.semantic import SemanticCache |
| 108 | +cache = SemanticCache(redis_host="localhost", redis_port=6379, redis_password=None) |
| 109 | + |
| 110 | +def ask_gpt3(question): |
| 111 | + response = openai.Completion.create( |
| 112 | + engine="text-davinci-003", |
| 113 | + prompt=question, |
| 114 | + max_tokens=100 |
| 115 | + ) |
| 116 | + return response.choices[0].text.strip() |
| 117 | + |
| 118 | +def answer_question(question: str): |
| 119 | + results = cache.check(question) |
| 120 | + if results: |
| 121 | + return results[0] |
| 122 | + else: |
| 123 | + answer = ask_gpt3(question) |
| 124 | + cache.store(question, answer) |
| 125 | + return answer |
95 | 126 | ``` |
96 | 127 |
|
97 | | -### Creating Input Data |
98 | | -#### Pandas DataFrame |
99 | 128 |
|
100 | | - more to come, see tests and sample-data for usage |
|
0 commit comments