|
9 | 9 | "\n", |
10 | 10 | "Hybrid search is all about combining lexical search with semantic vector search to improve result relevancy. This notebook will cover 3 different hybrid search strategies with Redis:\n", |
11 | 11 | "\n", |
12 | | - "1. Linear combination of scores from lexical search (BM25) and vector search (Cosine Distance) with the HybridQuery class\n", |
| 12 | + "1. Linear combination of scores from lexical search (BM25) and vector search (Cosine Distance) with the AggregateHybridQuery class\n", |
13 | 13 | "2. Client-Side Reciprocal Rank Fusion (RRF)\n", |
14 | 14 | "3. Client-Side Reranking with a cross encoder model\n", |
15 | 15 | "\n", |
|
32 | 32 | "metadata": {}, |
33 | 33 | "outputs": [], |
34 | 34 | "source": [ |
35 | | - "%pip install sentence-transformers pandas nltk \"redisvl>=0.6.0\"" |
| 35 | + "%pip install sentence-transformers pandas nltk \"redisvl>=0.11.0\"" |
36 | 36 | ] |
37 | 37 | }, |
38 | 38 | { |
|
653 | 653 | "\n", |
654 | 654 | "Now that our search index is populated and ready, we will build out a few different hybrid search techniques in Redis.\n", |
655 | 655 | "\n", |
656 | | - "To start, we will use our `HybridQuery` class that accepts a text string and vector to automatically combine text similarity and vector similarity scores." |
| 656 | + "To start, we will use our `AggregateHybridQuery` class that accepts a text string and vector to automatically combine text similarity and vector similarity scores." |
657 | 657 | ] |
658 | 658 | }, |
659 | 659 | { |
660 | 660 | "cell_type": "markdown", |
661 | 661 | "metadata": {}, |
662 | 662 | "source": [ |
663 | | - "## 1. Linear Combination using HybridQuery\n", |
| 663 | + "## 1. Linear Combination using AggregateHybridQuery\n", |
664 | 664 | "\n", |
665 | 665 | "The goal of this technique is to calculate a weighted sum of the text similarity score for our provided text search and the cosine distance between vectors calculated via a KNN vector query. Under the hood this is possible in Redis using the [aggregations API](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/aggregations/), as of `Redis 7.4.x` (search version `2.10.5`), within a single database call.\n", |
666 | 666 | "\n", |
667 | | - "As of RedisVl 0.5.0 all of this is nicely encapsulated in your `HybridQuery` class, which behaves much like our other query classes." |
| 667 | + "As of RedisVl 0.5.0 all of this is nicely encapsulated in your `AggregateHybridQuery` class, which behaves much like our other query classes." |
668 | 668 | ] |
669 | 669 | }, |
670 | 670 | { |
|
681 | 681 | "cell_type": "markdown", |
682 | 682 | "metadata": {}, |
683 | 683 | "source": [ |
684 | | - "First, we will import our `HybridQuery` and understand its parameters.\n", |
685 | | - "At a minimum, the `HybridQuery` needs 4 arguments:\n", |
| 684 | + "First, we will import our `AggregateHybridQuery` and understand its parameters.\n", |
| 685 | + "At a minimum, the `AggregateHybridQuery` needs 4 arguments:\n", |
686 | 686 | "```python\n", |
687 | | - "query = HybridQuery(\n", |
| 687 | + "query = AggregateHybridQuery(\n", |
688 | 688 | " text = \"your query string here\",\n", |
689 | 689 | " text_field_name = \"<name of the text field in the index to do text search in>\",\n", |
690 | 690 | " vector = <bytes or numeric array, ex: [0.1, 0.2, 0.3]>,\n", |
|
738 | 738 | } |
739 | 739 | ], |
740 | 740 | "source": [ |
741 | | - "from redisvl.query import HybridQuery\n", |
| 741 | + "from redisvl.query import AggregateHybridQuery\n", |
742 | 742 | "\n", |
743 | 743 | "vector = model.embed(user_query, as_buffer=True)\n", |
744 | 744 | "\n", |
745 | | - "query = HybridQuery(\n", |
| 745 | + "query = AggregateHybridQuery(\n", |
746 | 746 | " text=user_query,\n", |
747 | 747 | " text_field_name=\"description\",\n", |
748 | 748 | " vector=vector,\n", |
|
760 | 760 | "metadata": {}, |
761 | 761 | "source": [ |
762 | 762 | "That's it! That is all it takes to perform a hybrid text matching and vector query with RedisVL.\n", |
763 | | - "Of course there are many more configurations and things we can do with the `HybridQuery` class. Let's investigate.\n", |
| 763 | + "Of course there are many more configurations and things we can do with the `AggregateHybridQuery` class. Let's investigate.\n", |
764 | 764 | "\n", |
765 | 765 | "First, let's look at just the text query part that is being run:" |
766 | 766 | ] |
|
828 | 828 | "# translate our user query to French and use nltk french stopwords\n", |
829 | 829 | "french_query_text = \"Film d'action et d'aventure avec de superbes scènes de combat, des enquêtes criminelles, des super-héros et de la magie\"\n", |
830 | 830 | "\n", |
831 | | - "french_film_query = HybridQuery(\n", |
| 831 | + "french_film_query = AggregateHybridQuery(\n", |
832 | 832 | " text=french_query_text,\n", |
833 | 833 | " text_field_name=\"description\",\n", |
834 | 834 | " vector=model.embed(french_query_text, as_buffer=True),\n", |
|
845 | 845 | " \"then\", \"there\", \"these\", \"they\", \"this\", \"to\", \"was\", \"will\", \"with\"\n", |
846 | 846 | "])\n", |
847 | 847 | "\n", |
848 | | - "stopwords_query = HybridQuery(\n", |
| 848 | + "stopwords_query = AggregateHybridQuery(\n", |
849 | 849 | " text=user_query,\n", |
850 | 850 | " text_field_name=\"description\",\n", |
851 | 851 | " vector=vector,\n", |
|
856 | 856 | "print(stopwords_query._build_query_string())\n", |
857 | 857 | "\n", |
858 | 858 | "# don't use any stopwords\n", |
859 | | - "no_stopwords_query = HybridQuery(\n", |
| 859 | + "no_stopwords_query = AggregateHybridQuery(\n", |
860 | 860 | " text=user_query,\n", |
861 | 861 | " text_field_name=\"description\",\n", |
862 | 862 | " vector=vector,\n", |
|
919 | 919 | } |
920 | 920 | ], |
921 | 921 | "source": [ |
922 | | - "tfidf_query = HybridQuery(\n", |
| 922 | + "tfidf_query = AggregateHybridQuery(\n", |
923 | 923 | " text=user_query,\n", |
924 | 924 | " text_field_name=\"description\",\n", |
925 | 925 | " vector=vector,\n", |
|
1333 | 1333 | "source": [ |
1334 | 1334 | "def hybrid_query(text, alpha, num_results) -> List[Dict[str, Any]]:\n", |
1335 | 1335 | "\n", |
1336 | | - " query = HybridQuery(\n", |
| 1336 | + " query = AggregateHybridQuery(\n", |
1337 | 1337 | " text,\n", |
1338 | 1338 | " text_field_name=\"description\",\n", |
1339 | 1339 | " vector=model.embed(text, as_buffer=True),\n", |
|
0 commit comments