-
Notifications
You must be signed in to change notification settings - Fork 59
Adds LangCacheSemanticCache notebook example #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive Jupyter notebook tutorial demonstrating semantic caching using LangCache, a managed Redis Cloud service accessed through the RedisVL library. The notebook walks through setting up LangCache, generating FAQs from PDF documents using the Doc2Cache technique, and integrating semantic caching into a RAG pipeline.
Key Changes:
- End-to-end tutorial for LangCache semantic caching with Redis Cloud
- Demonstrates FAQ generation from PDF documents using OpenAI's LLM
- Shows cache population, retrieval testing, and RAG pipeline integration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| " print(f\"Query: {query}\")\n", | ||
| " print(f\" Similarity: {similarity:.4f}\")\n", | ||
| " print(f\" Distance: {distance:.4f}\")\n", | ||
| " print(f\" Would cache hit (threshold={cache.distance_threshold:.4f})? {distance < cache.distance_threshold}\\n\")\n" |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code references cache.distance_threshold which may not be an attribute of LangCacheSemanticCache. This is likely to cause an AttributeError when executed. The threshold configuration for LangCache might be managed differently (e.g., server-side configuration).
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Show information about the langcache-embed model\n", | ||
| "print(\"LangCache-Embed Model Information:\")\n", | ||
| "print(\"=\"*60)\n", | ||
| "print(f\"Model: redis/langcache-embed-v1\")\n", | ||
| "print(f\"Purpose: Optimized for semantic caching tasks\")\n", | ||
| "print(f\"Dimension: 768\")\n", | ||
| "print(f\"Distance Metric: cosine\")\n", | ||
| "print(\"\\nKey Features:\")\n", | ||
| "print(\" • Trained specifically on query-response pairs\")\n", | ||
| "print(\" • Balanced precision/recall for optimal cache hit rates\")\n", | ||
| "print(\" • Fast inference time suitable for production\")\n", | ||
| "print(\" • Optimized for short-form text (queries/prompts)\")\n", | ||
| "print(\"\\nAdvantages for Caching:\")\n", | ||
| "print(\" • Better semantic understanding of query intent\")\n", | ||
| "print(\" • More robust to paraphrasing and rewording\")\n", | ||
| "print(\" • Lower false positive rate compared to general embeddings\")\n", | ||
| "print(\" • Optimized threshold ranges for cache decisions\")\n" | ||
| ] |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The section title mentions "LangCache-Embed Model Deep Dive" and discusses the redis/langcache-embed-v1 model, but the code cells that attempt to demonstrate the model (lines 1188-1204) have bugs and will not execute successfully. This entire section (starting from line 1147) appears to be aspirational content that doesn't match the actual capabilities of the LangCache API shown in the notebook. Consider either removing this section or clearly marking it as informational only (not executable).
| } | ||
| ], | ||
| "source": [ | ||
| "# Check your cache is workign\n", |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "workign" should be "working"
| "# Check your cache is workign\n", | |
| "# Check your cache is working\n", |
| "print(\"\\nOptimizing threshold...\\n\")\n", | ||
| "\n", | ||
| "optimizer = CacheThresholdOptimizer(\n", | ||
| " cache,\n", | ||
| " optimization_test_data,\n", | ||
| " eval_metric=\"f1\" # Can also use \"precision\" or \"recall\"\n", | ||
| ")\n", | ||
| "\n", | ||
| "results = optimizer.optimize()\n", | ||
| "\n", | ||
| "print(f\"\\nOptimization complete!\")\n", | ||
| "if results and 'f1' in results:\n", | ||
| " print(f\" F1 Score: {results['f1']:.4f}\")\n" |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notebook contains an error cell (lines 1086-1097) showing that LangCacheSemanticCache object has no attribute _vectorizer. This indicates that the CacheThresholdOptimizer is incompatible with LangCacheSemanticCache. The optimizer appears to expect a different cache interface that has a _vectorizer attribute. This section should either be fixed to work with LangCacheSemanticCache or removed/commented out to avoid confusing users.
| "print(\"\\nOptimizing threshold...\\n\")\n", | |
| "\n", | |
| "optimizer = CacheThresholdOptimizer(\n", | |
| " cache,\n", | |
| " optimization_test_data,\n", | |
| " eval_metric=\"f1\" # Can also use \"precision\" or \"recall\"\n", | |
| ")\n", | |
| "\n", | |
| "results = optimizer.optimize()\n", | |
| "\n", | |
| "print(f\"\\nOptimization complete!\")\n", | |
| "if results and 'f1' in results:\n", | |
| " print(f\" F1 Score: {results['f1']:.4f}\")\n" | |
| "# The following code is commented out because CacheThresholdOptimizer is incompatible with LangCacheSemanticCache.\n", | |
| "# print(\"\\nOptimizing threshold...\\n\")\n", | |
| "# optimizer = CacheThresholdOptimizer(\n", | |
| "# cache,\n", | |
| "# optimization_test_data,\n", | |
| "# eval_metric=\"f1\" # Can also use \"precision\" or \"recall\"\n", | |
| "# )\n", | |
| "# \n", | |
| "# results = optimizer.optimize()\n", | |
| "# \n", | |
| "# print(f\"\\nOptimization complete!\")\n", | |
| "# if results and 'f1' in results:\n", | |
| "# print(f\" F1 Score: {results['f1']:.4f}\")\n" |
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Compare semantic similarities between related and unrelated questions\n", | ||
| "test_questions = [\n", | ||
| " \"What is NVIDIA's revenue?\",\n", | ||
| " \"Tell me about NVIDIA's earnings\", # Semantically similar\n", | ||
| " \"How much money does NVIDIA make?\", # Semantically similar\n", | ||
| " \"What is the weather today?\", # Unrelated\n", | ||
| "]\n", | ||
| "\n", | ||
| "print(\"\\nComparing semantic similarities:\\n\")\n", | ||
| "print(f\"Base question: {test_questions[0]}\\n\")\n", | ||
| "\n", | ||
| "base_embedding = vectorizer.embed(test_questions[0])\n", | ||
| "\n", | ||
| "import numpy as np\n", | ||
| "\n", | ||
| "for query in test_questions[1:]:\n", | ||
| " query_embedding = vectorizer.embed(query)\n", | ||
| " \n", | ||
| " # Calculate cosine similarity\n", | ||
| " similarity = np.dot(base_embedding, query_embedding) / (\n", | ||
| " np.linalg.norm(base_embedding) * np.linalg.norm(query_embedding)\n", | ||
| " )\n", | ||
| " distance = 1 - similarity\n", | ||
| " \n", | ||
| " print(f\"Query: {query}\")\n", | ||
| " print(f\" Similarity: {similarity:.4f}\")\n", | ||
| " print(f\" Distance: {distance:.4f}\")\n", | ||
| " print(f\" Would cache hit (threshold={cache.distance_threshold:.4f})? {distance < cache.distance_threshold}\\n\")\n" |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code references an undefined vectorizer variable. This variable is never imported or initialized in the notebook, which will cause a NameError when this cell is executed. The langcache-embed model is managed by the LangCache service, so local embedding operations may not be possible. This section should either define the vectorizer or be removed if it's not applicable to LangCache.
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Compare semantic similarities between related and unrelated questions\n", | |
| "test_questions = [\n", | |
| " \"What is NVIDIA's revenue?\",\n", | |
| " \"Tell me about NVIDIA's earnings\", # Semantically similar\n", | |
| " \"How much money does NVIDIA make?\", # Semantically similar\n", | |
| " \"What is the weather today?\", # Unrelated\n", | |
| "]\n", | |
| "\n", | |
| "print(\"\\nComparing semantic similarities:\\n\")\n", | |
| "print(f\"Base question: {test_questions[0]}\\n\")\n", | |
| "\n", | |
| "base_embedding = vectorizer.embed(test_questions[0])\n", | |
| "\n", | |
| "import numpy as np\n", | |
| "\n", | |
| "for query in test_questions[1:]:\n", | |
| " query_embedding = vectorizer.embed(query)\n", | |
| " \n", | |
| " # Calculate cosine similarity\n", | |
| " similarity = np.dot(base_embedding, query_embedding) / (\n", | |
| " np.linalg.norm(base_embedding) * np.linalg.norm(query_embedding)\n", | |
| " )\n", | |
| " distance = 1 - similarity\n", | |
| " \n", | |
| " print(f\"Query: {query}\")\n", | |
| " print(f\" Similarity: {similarity:.4f}\")\n", | |
| " print(f\" Distance: {distance:.4f}\")\n", | |
| " print(f\" Would cache hit (threshold={cache.distance_threshold:.4f})? {distance < cache.distance_threshold}\\n\")\n" | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Semantic Similarity Comparison (Omitted)\n", | |
| "\n", | |
| "The previous section attempted to compare semantic similarities using a local `vectorizer` object, which is not defined in this notebook. Since LangCache manages embeddings via its service, local embedding operations are not applicable. To compare semantic similarities, use the embeddings and API provided by LangCache instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| " {\"query\": \"What's the location of the manufacturing plant for NVIDIA?\", \"expected_match\": False, \"category\": \"general\"},\n", | ||
| " {\"query\": \"What date does NVIDIA use as it's year end for acounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n", | ||
| " {\"query\": \"Where are the locations of each office of NVIDIA Corporation?\", \"expected_match\": False, \"category\": \"off-topic\"},\n", | ||
| " {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "symbold" should be "symbol".
| " {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n", | |
| " {\"query\": \"What is the trading symbol of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n", |
| " MISS MATCHED: Where is the principal executive office of NVIDIA Corporation located?...\n", | ||
| "10:19:34 httpx INFO HTTP Request: POST https://aws-us-east-1.langcache.redis.io/v1/caches/50eb6a09acf5415d8b68619b1ccffd9a/entries/search \"HTTP/1.1 200 OK\"\n", | ||
| "14. Cache HIT (distance: 0.1295)\n", | ||
| " Original query: What is the trading symbold of NVIDIA Corporation on the Japan exchange?\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "symbold" should be "symbol".
| " Original query: What is the trading symbold of NVIDIA Corporation on the Japan exchange?\n", | |
| " Original query: What is the trading symbol of NVIDIA Corporation on the Japan exchange?\n", |
| "outputs": [], | ||
| "source": [ | ||
| "if \"OPENAI_API_KEY\" not in os.environ:\n", | ||
| " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: The function uses getpass but imports it as a function call without parentheses. The correct usage should be getpass.getpass("Enter your OpenAI API key: ") with the module imported, or use from getpass import getpass to import the function directly.
| " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n", | |
| " os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")\n", |
| "cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n", | ||
| "cache_hit_rate = cache_hit_count / len(similar_questions)\n", | ||
| "print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n" |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance issue: The cache hit rate calculation is incorrect and inefficient. The calculation sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1) uses an arbitrary threshold (0.1 seconds) to determine cache hits instead of tracking actual cache hit status. This doesn't accurately reflect cache performance since cache lookup times can vary. Consider tracking the actual cache_hit boolean values returned by rag_with_cache() function instead.
| " \n", | ||
| " chunks: list of document chunks\n", | ||
| " max_chunks: maximum number of chunks to process\n", | ||
| " \n", | ||
| " Returns: A list of question-answer pairs\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation issue: The docstring is missing proper formatting and parameter descriptions. The docstring should follow standard Python conventions with proper parameter and return type documentation. For example:
"""Extract FAQs from document chunks using LLM.
Args:
chunks: List of document chunks to process
max_chunks: Maximum number of chunks to process (default: 25)
Returns:
List[Dict]: A list of question-answer pairs with category metadata
"""| " \n", | |
| " chunks: list of document chunks\n", | |
| " max_chunks: maximum number of chunks to process\n", | |
| " \n", | |
| " Returns: A list of question-answer pairs\n", | |
| " \n", | |
| " Args:\n", | |
| " chunks (List[Any]): List of document chunks to process.\n", | |
| " max_chunks (int, optional): Maximum number of chunks to process. Defaults to 25.\n", | |
| " \n", | |
| " Returns:\n", | |
| " List[Dict]: A list of question-answer pairs with category metadata.\n", |
| " \"\"\"\n", | ||
| " Process a question through RAG pipeline with optional semantic caching.\n", | ||
| "\n", | ||
| " Returns: A tuple of (answer, cache_hit, response_time)\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation issue: The docstring formatting is inconsistent with Python conventions. It should use proper Args/Returns sections:
"""
Process a question through RAG pipeline with optional semantic caching.
Args:
question: The user's question to process
use_cache: Whether to use semantic caching (default: True)
Returns:
tuple: A tuple of (answer, cache_hit, response_time)
"""| " Returns: A tuple of (answer, cache_hit, response_time)\n", | |
| " Args:\n", | |
| " question (str): The user's question to process.\n", | |
| " use_cache (bool, optional): Whether to use semantic caching. Defaults to True.\n", | |
| "\n", | |
| " Returns:\n", | |
| " tuple: A tuple of (answer, cache_hit, response_time)\n", |
| "print(langcache_api_key)\n", | ||
| "print(langcache_id)\n", | ||
| "print(server_url)\n", | ||
| "\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security issue: The API key and cache ID are being printed to the output, which exposes sensitive credentials. These print statements should be removed to prevent accidental credential leakage in shared notebooks or logs.
| "print(langcache_api_key)\n", | |
| "print(langcache_id)\n", | |
| "print(server_url)\n", | |
| "\n", | |
| "\n", | |
| "\n", | |
| "\n", | |
| "\n", |
| "edge_cases = [\n", | ||
| " {\"query\": \"What's the fiscal year end for Microsoft Corporation?\", \"expected_match\": False, \"category\": \"general\"},\n", | ||
| " {\"query\": \"What's the location of the manufacturing plant for NVIDIA?\", \"expected_match\": False, \"category\": \"general\"},\n", | ||
| " {\"query\": \"What date does NVIDIA use as it's year end for acounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "acounting" should be "accounting".
| " {\"query\": \"What date does NVIDIA use as it's year end for acounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n", | |
| " {\"query\": \"What date does NVIDIA use as it's year end for accounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| " {\"query\": \"What is the company total revenue for the last 5 years?\", \"expected_match\": False, \"category\": \"financial\"},\n", | ||
| " {\"query\": \"What's the location of the manufacturing plant for NVIDIA?\", \"expected_match\": False, \"category\": \"general\"},\n", | ||
| " {\"query\": \"Where are the locations of each office of NVIDIA Corporation?\", \"expected_match\": False, \"category\": \"off-topic\"},\n", | ||
| " {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "symbold" should be "symbol"
| " {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n", | |
| " {\"query\": \"What is the trading symbol of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n", |
| "print(langcache_api_key)\n", | ||
| "print(langcache_id)\n", | ||
| "print(server_url)\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security concern: Sensitive API keys and credentials are being printed to stdout. This should be avoided to prevent accidental exposure in logs or shared notebooks. Consider removing these print statements or replacing them with masked output (e.g., print("API Key: " + langcache_api_key[:4] + "..." + langcache_api_key[-4:])).
| "print(langcache_api_key)\n", | |
| "print(langcache_id)\n", | |
| "print(server_url)\n", | |
| "print(\"API Key: \" + (langcache_api_key[:4] + \"...\" + langcache_api_key[-4:] if langcache_api_key else \"API Key not set\"))\n", | |
| "print(\"Cache ID: \" + (langcache_id[:4] + \"...\" + langcache_id[-4:] if langcache_id else \"Cache ID not set\"))\n", | |
| "print(\"Server URL:\", server_url)\n", |
| "source": [ | ||
| "### Configure Environment Variables\n", | ||
| "You'll need the LangCache API Key, Cache ID, URL\n", | ||
| "You will also need access to an LLM. In this notebook we'll be using OpenAI" |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation issue: Missing period at the end of the sentence. The comment should end with proper punctuation: "You will also need access to an LLM. In this notebook we'll be using OpenAI."
| "You will also need access to an LLM. In this notebook we'll be using OpenAI" | |
| "You will also need access to an LLM. In this notebook we'll be using OpenAI." |
| "\n", | ||
| "cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n", | ||
| "cache_hit_rate = cache_hit_count / len(similar_questions)\n", | ||
| "print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n" |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Division by zero error when there are no cache hits. The code on line 1332 calculates speedup = avg_first / avg_second, but if avg_second is 0, this will raise a ZeroDivisionError. The condition on line 1331 checks if avg_second > 0:, which should protect against this, but the indentation of the cache hit rate calculation (lines 1335-1337) suggests it's outside this conditional block. Consider ensuring all calculations dependent on non-zero values are properly indented within the conditional block.
| "\n", | |
| "cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n", | |
| "cache_hit_rate = cache_hit_count / len(similar_questions)\n", | |
| "print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n" | |
| " cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n", | |
| " cache_hit_rate = cache_hit_count / len(similar_questions)\n", | |
| " print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n" |
| "precision = true_positives / (true_positives + false_positives)\n", | ||
| "recall = true_positives / (true_positives + false_negatives)\n", | ||
| "f1_score = 2 * (precision * recall) / (precision + recall)\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Potential division by zero errors in metrics calculation. Lines 1032-1034 calculate precision, recall, and F1 score with conditional checks, but line 909 calculates precision without checking if the denominator is zero. If there are no true positives or false positives, this will cause a ZeroDivisionError. The same issue exists for recall (line 910) and F1 score (line 911). Consider adding proper zero checks or using the same pattern as lines 1032-1034.
| "precision = true_positives / (true_positives + false_positives)\n", | |
| "recall = true_positives / (true_positives + false_negatives)\n", | |
| "f1_score = 2 * (precision * recall) / (precision + recall)\n", | |
| "precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) != 0 else 0\n", | |
| "recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) != 0 else 0\n", | |
| "f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0\n", |
| " elif not is_cache_hit and not should_match:\n", | ||
| " status = \"✓ TN (True Negative)\"\n", | ||
| " elif not is_cache_hit and should_match:\n", | ||
| " status = \"✗ FN (False Negative)\"\n" |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintainability issue: Incomplete code block. Lines 1078-1086 define status strings but don't use them. The code appears to be cut off and doesn't print or return the status, making this calculation unused. Either complete the logic to display the detailed breakdown or remove this incomplete section.
| " status = \"✗ FN (False Negative)\"\n" | |
| " status = \"✗ FN (False Negative)\"\n", | |
| " print(f\"Query: {query_data['query']}\")\n", | |
| " print(f\" Cache Hit: {is_cache_hit} | Expected Match: {should_match} | Distance: {query_data['distance']:.3f} | Status: {status}\")\n", | |
| " print(\"-\"*80)\n" |
| " MISS MATCHED: Where is NVIDIA headquartered?...\n", | ||
| "10:41:07 httpx INFO HTTP Request: POST https://aws-us-east-1.langcache.redis.io/v1/caches/50eb6a09acf5415d8b68619b1ccffd9a/entries/search \"HTTP/1.1 200 OK\"\n", | ||
| "12. Cache HIT (distance: 0.0973)\n", | ||
| " Original query: What date does NVIDIA use as it's year end for acounting purposes?\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "acounting" should be "accounting"
| " Original query: What date does NVIDIA use as it's year end for acounting purposes?\n", | |
| " Original query: What date does NVIDIA use as it's year end for accounting purposes?\n", |
| " MISS MATCHED: Where is the principal executive office of NVIDIA Corporation located?...\n", | ||
| "10:41:07 httpx INFO HTTP Request: POST https://aws-us-east-1.langcache.redis.io/v1/caches/50eb6a09acf5415d8b68619b1ccffd9a/entries/search \"HTTP/1.1 200 OK\"\n", | ||
| "14. Cache HIT (distance: 0.0870)\n", | ||
| " Original query: What is the trading symbold of NVIDIA Corporation on the NASDAQ exchange?\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "symbold" should be "symbol"
| " Original query: What is the trading symbold of NVIDIA Corporation on the NASDAQ exchange?\n", | |
| " Original query: What is the trading symbol of NVIDIA Corporation on the NASDAQ exchange?\n", |
| " \n", | ||
| " chunks: list of document chunks\n", | ||
| " max_chunks: maximum number of chunks to process\n", | ||
| " \n", | ||
| " Returns: A list of question-answer pairs\n", |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Documentation issue: The docstring for extract_faqs_from_chunks function has inconsistent formatting. The parameter descriptions should follow a consistent style. Currently, it mixes inline descriptions with separate lines. Consider using a standard format like:
"""Extract FAQs from document chunks using LLM.
Args:
chunks: List of document chunks
max_chunks: Maximum number of chunks to process
Returns:
A list of question-answer pairs
"""| " \n", | |
| " chunks: list of document chunks\n", | |
| " max_chunks: maximum number of chunks to process\n", | |
| " \n", | |
| " Returns: A list of question-answer pairs\n", | |
| " \n", | |
| " Args:\n", | |
| " chunks: List of document chunks\n", | |
| " max_chunks: Maximum number of chunks to process\n", | |
| " \n", | |
| " Returns:\n", | |
| " A list of question-answer pairs\n", |
No description provided.