Skip to content

Conversation

@justin-cechmanek
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings November 18, 2025 05:05
Copilot finished reviewing on behalf of justin-cechmanek November 18, 2025 05:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive Jupyter notebook tutorial demonstrating semantic caching using LangCache, a managed Redis Cloud service accessed through the RedisVL library. The notebook walks through setting up LangCache, generating FAQs from PDF documents using the Doc2Cache technique, and integrating semantic caching into a RAG pipeline.

Key Changes:

  • End-to-end tutorial for LangCache semantic caching with Redis Cloud
  • Demonstrates FAQ generation from PDF documents using OpenAI's LLM
  • Shows cache population, retrieval testing, and RAG pipeline integration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

" print(f\"Query: {query}\")\n",
" print(f\" Similarity: {similarity:.4f}\")\n",
" print(f\" Distance: {distance:.4f}\")\n",
" print(f\" Would cache hit (threshold={cache.distance_threshold:.4f})? {distance < cache.distance_threshold}\\n\")\n"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code references cache.distance_threshold which may not be an attribute of LangCacheSemanticCache. This is likely to cause an AttributeError when executed. The threshold configuration for LangCache might be managed differently (e.g., server-side configuration).

Copilot uses AI. Check for mistakes.
Comment on lines 1147 to 1169
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show information about the langcache-embed model\n",
"print(\"LangCache-Embed Model Information:\")\n",
"print(\"=\"*60)\n",
"print(f\"Model: redis/langcache-embed-v1\")\n",
"print(f\"Purpose: Optimized for semantic caching tasks\")\n",
"print(f\"Dimension: 768\")\n",
"print(f\"Distance Metric: cosine\")\n",
"print(\"\\nKey Features:\")\n",
"print(\" • Trained specifically on query-response pairs\")\n",
"print(\" • Balanced precision/recall for optimal cache hit rates\")\n",
"print(\" • Fast inference time suitable for production\")\n",
"print(\" • Optimized for short-form text (queries/prompts)\")\n",
"print(\"\\nAdvantages for Caching:\")\n",
"print(\" • Better semantic understanding of query intent\")\n",
"print(\" • More robust to paraphrasing and rewording\")\n",
"print(\" • Lower false positive rate compared to general embeddings\")\n",
"print(\" • Optimized threshold ranges for cache decisions\")\n"
]
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section title mentions "LangCache-Embed Model Deep Dive" and discusses the redis/langcache-embed-v1 model, but the code cells that attempt to demonstrate the model (lines 1188-1204) have bugs and will not execute successfully. This entire section (starting from line 1147) appears to be aspirational content that doesn't match the actual capabilities of the LangCache API shown in the notebook. Consider either removing this section or clearly marking it as informational only (not executable).

Copilot uses AI. Check for mistakes.
}
],
"source": [
"# Check your cache is workign\n",
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "workign" should be "working"

Suggested change
"# Check your cache is workign\n",
"# Check your cache is working\n",

Copilot uses AI. Check for mistakes.
Comment on lines 1101 to 1113
"print(\"\\nOptimizing threshold...\\n\")\n",
"\n",
"optimizer = CacheThresholdOptimizer(\n",
" cache,\n",
" optimization_test_data,\n",
" eval_metric=\"f1\" # Can also use \"precision\" or \"recall\"\n",
")\n",
"\n",
"results = optimizer.optimize()\n",
"\n",
"print(f\"\\nOptimization complete!\")\n",
"if results and 'f1' in results:\n",
" print(f\" F1 Score: {results['f1']:.4f}\")\n"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook contains an error cell (lines 1086-1097) showing that LangCacheSemanticCache object has no attribute _vectorizer. This indicates that the CacheThresholdOptimizer is incompatible with LangCacheSemanticCache. The optimizer appears to expect a different cache interface that has a _vectorizer attribute. This section should either be fixed to work with LangCacheSemanticCache or removed/commented out to avoid confusing users.

Suggested change
"print(\"\\nOptimizing threshold...\\n\")\n",
"\n",
"optimizer = CacheThresholdOptimizer(\n",
" cache,\n",
" optimization_test_data,\n",
" eval_metric=\"f1\" # Can also use \"precision\" or \"recall\"\n",
")\n",
"\n",
"results = optimizer.optimize()\n",
"\n",
"print(f\"\\nOptimization complete!\")\n",
"if results and 'f1' in results:\n",
" print(f\" F1 Score: {results['f1']:.4f}\")\n"
"# The following code is commented out because CacheThresholdOptimizer is incompatible with LangCacheSemanticCache.\n",
"# print(\"\\nOptimizing threshold...\\n\")\n",
"# optimizer = CacheThresholdOptimizer(\n",
"# cache,\n",
"# optimization_test_data,\n",
"# eval_metric=\"f1\" # Can also use \"precision\" or \"recall\"\n",
"# )\n",
"# \n",
"# results = optimizer.optimize()\n",
"# \n",
"# print(f\"\\nOptimization complete!\")\n",
"# if results and 'f1' in results:\n",
"# print(f\" F1 Score: {results['f1']:.4f}\")\n"

Copilot uses AI. Check for mistakes.
Comment on lines 1172 to 1204
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compare semantic similarities between related and unrelated questions\n",
"test_questions = [\n",
" \"What is NVIDIA's revenue?\",\n",
" \"Tell me about NVIDIA's earnings\", # Semantically similar\n",
" \"How much money does NVIDIA make?\", # Semantically similar\n",
" \"What is the weather today?\", # Unrelated\n",
"]\n",
"\n",
"print(\"\\nComparing semantic similarities:\\n\")\n",
"print(f\"Base question: {test_questions[0]}\\n\")\n",
"\n",
"base_embedding = vectorizer.embed(test_questions[0])\n",
"\n",
"import numpy as np\n",
"\n",
"for query in test_questions[1:]:\n",
" query_embedding = vectorizer.embed(query)\n",
" \n",
" # Calculate cosine similarity\n",
" similarity = np.dot(base_embedding, query_embedding) / (\n",
" np.linalg.norm(base_embedding) * np.linalg.norm(query_embedding)\n",
" )\n",
" distance = 1 - similarity\n",
" \n",
" print(f\"Query: {query}\")\n",
" print(f\" Similarity: {similarity:.4f}\")\n",
" print(f\" Distance: {distance:.4f}\")\n",
" print(f\" Would cache hit (threshold={cache.distance_threshold:.4f})? {distance < cache.distance_threshold}\\n\")\n"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code references an undefined vectorizer variable. This variable is never imported or initialized in the notebook, which will cause a NameError when this cell is executed. The langcache-embed model is managed by the LangCache service, so local embedding operations may not be possible. This section should either define the vectorizer or be removed if it's not applicable to LangCache.

Suggested change
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compare semantic similarities between related and unrelated questions\n",
"test_questions = [\n",
" \"What is NVIDIA's revenue?\",\n",
" \"Tell me about NVIDIA's earnings\", # Semantically similar\n",
" \"How much money does NVIDIA make?\", # Semantically similar\n",
" \"What is the weather today?\", # Unrelated\n",
"]\n",
"\n",
"print(\"\\nComparing semantic similarities:\\n\")\n",
"print(f\"Base question: {test_questions[0]}\\n\")\n",
"\n",
"base_embedding = vectorizer.embed(test_questions[0])\n",
"\n",
"import numpy as np\n",
"\n",
"for query in test_questions[1:]:\n",
" query_embedding = vectorizer.embed(query)\n",
" \n",
" # Calculate cosine similarity\n",
" similarity = np.dot(base_embedding, query_embedding) / (\n",
" np.linalg.norm(base_embedding) * np.linalg.norm(query_embedding)\n",
" )\n",
" distance = 1 - similarity\n",
" \n",
" print(f\"Query: {query}\")\n",
" print(f\" Similarity: {similarity:.4f}\")\n",
" print(f\" Distance: {distance:.4f}\")\n",
" print(f\" Would cache hit (threshold={cache.distance_threshold:.4f})? {distance < cache.distance_threshold}\\n\")\n"
"cell_type": "markdown",
"metadata": {},
"source": [
"### Semantic Similarity Comparison (Omitted)\n",
"\n",
"The previous section attempted to compare semantic similarities using a local `vectorizer` object, which is not defined in this notebook. Since LangCache manages embeddings via its service, local embedding operations are not applicable. To compare semantic similarities, use the embeddings and API provided by LangCache instead."

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings November 20, 2025 18:21
Copilot finished reviewing on behalf of justin-cechmanek November 20, 2025 18:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

" {\"query\": \"What's the location of the manufacturing plant for NVIDIA?\", \"expected_match\": False, \"category\": \"general\"},\n",
" {\"query\": \"What date does NVIDIA use as it's year end for acounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n",
" {\"query\": \"Where are the locations of each office of NVIDIA Corporation?\", \"expected_match\": False, \"category\": \"off-topic\"},\n",
" {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "symbold" should be "symbol".

Suggested change
" {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n",
" {\"query\": \"What is the trading symbol of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n",

Copilot uses AI. Check for mistakes.
" MISS MATCHED: Where is the principal executive office of NVIDIA Corporation located?...\n",
"10:19:34 httpx INFO HTTP Request: POST https://aws-us-east-1.langcache.redis.io/v1/caches/50eb6a09acf5415d8b68619b1ccffd9a/entries/search \"HTTP/1.1 200 OK\"\n",
"14. Cache HIT (distance: 0.1295)\n",
" Original query: What is the trading symbold of NVIDIA Corporation on the Japan exchange?\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "symbold" should be "symbol".

Suggested change
" Original query: What is the trading symbold of NVIDIA Corporation on the Japan exchange?\n",
" Original query: What is the trading symbol of NVIDIA Corporation on the Japan exchange?\n",

Copilot uses AI. Check for mistakes.
"outputs": [],
"source": [
"if \"OPENAI_API_KEY\" not in os.environ:\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The function uses getpass but imports it as a function call without parentheses. The correct usage should be getpass.getpass("Enter your OpenAI API key: ") with the module imported, or use from getpass import getpass to import the function directly.

Suggested change
" os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")\n",

Copilot uses AI. Check for mistakes.
Comment on lines +1334 to +1336
"cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n",
"cache_hit_rate = cache_hit_count / len(similar_questions)\n",
"print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n"
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance issue: The cache hit rate calculation is incorrect and inefficient. The calculation sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1) uses an arbitrary threshold (0.1 seconds) to determine cache hits instead of tracking actual cache hit status. This doesn't accurately reflect cache performance since cache lookup times can vary. Consider tracking the actual cache_hit boolean values returned by rag_with_cache() function instead.

Copilot uses AI. Check for mistakes.
Comment on lines +489 to +493
" \n",
" chunks: list of document chunks\n",
" max_chunks: maximum number of chunks to process\n",
" \n",
" Returns: A list of question-answer pairs\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation issue: The docstring is missing proper formatting and parameter descriptions. The docstring should follow standard Python conventions with proper parameter and return type documentation. For example:

"""Extract FAQs from document chunks using LLM.

Args:
    chunks: List of document chunks to process
    max_chunks: Maximum number of chunks to process (default: 25)

Returns:
    List[Dict]: A list of question-answer pairs with category metadata
"""
Suggested change
" \n",
" chunks: list of document chunks\n",
" max_chunks: maximum number of chunks to process\n",
" \n",
" Returns: A list of question-answer pairs\n",
" \n",
" Args:\n",
" chunks (List[Any]): List of document chunks to process.\n",
" max_chunks (int, optional): Maximum number of chunks to process. Defaults to 25.\n",
" \n",
" Returns:\n",
" List[Dict]: A list of question-answer pairs with category metadata.\n",

Copilot uses AI. Check for mistakes.
" \"\"\"\n",
" Process a question through RAG pipeline with optional semantic caching.\n",
"\n",
" Returns: A tuple of (answer, cache_hit, response_time)\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation issue: The docstring formatting is inconsistent with Python conventions. It should use proper Args/Returns sections:

"""
Process a question through RAG pipeline with optional semantic caching.

Args:
    question: The user's question to process
    use_cache: Whether to use semantic caching (default: True)

Returns:
    tuple: A tuple of (answer, cache_hit, response_time)
"""
Suggested change
" Returns: A tuple of (answer, cache_hit, response_time)\n",
" Args:\n",
" question (str): The user's question to process.\n",
" use_cache (bool, optional): Whether to use semantic caching. Defaults to True.\n",
"\n",
" Returns:\n",
" tuple: A tuple of (answer, cache_hit, response_time)\n",

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +162
"print(langcache_api_key)\n",
"print(langcache_id)\n",
"print(server_url)\n",
"\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security issue: The API key and cache ID are being printed to the output, which exposes sensitive credentials. These print statements should be removed to prevent accidental credential leakage in shared notebooks or logs.

Suggested change
"print(langcache_api_key)\n",
"print(langcache_id)\n",
"print(server_url)\n",
"\n",
"\n",
"\n",
"\n",
"\n",

Copilot uses AI. Check for mistakes.
"edge_cases = [\n",
" {\"query\": \"What's the fiscal year end for Microsoft Corporation?\", \"expected_match\": False, \"category\": \"general\"},\n",
" {\"query\": \"What's the location of the manufacturing plant for NVIDIA?\", \"expected_match\": False, \"category\": \"general\"},\n",
" {\"query\": \"What date does NVIDIA use as it's year end for acounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "acounting" should be "accounting".

Suggested change
" {\"query\": \"What date does NVIDIA use as it's year end for acounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n",
" {\"query\": \"What date does NVIDIA use as it's year end for accounting purposes?\", \"expected_match\": True, \"category\":\"general\"},\n",

Copilot uses AI. Check for mistakes.
@justin-cechmanek justin-cechmanek self-assigned this Nov 20, 2025
@justin-cechmanek justin-cechmanek marked this pull request as ready for review November 20, 2025 19:05
Copilot AI review requested due to automatic review settings November 20, 2025 19:05
@justin-cechmanek justin-cechmanek merged commit 386619c into main Nov 20, 2025
5 of 6 checks passed
Copilot finished reviewing on behalf of justin-cechmanek November 20, 2025 19:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

" {\"query\": \"What is the company total revenue for the last 5 years?\", \"expected_match\": False, \"category\": \"financial\"},\n",
" {\"query\": \"What's the location of the manufacturing plant for NVIDIA?\", \"expected_match\": False, \"category\": \"general\"},\n",
" {\"query\": \"Where are the locations of each office of NVIDIA Corporation?\", \"expected_match\": False, \"category\": \"off-topic\"},\n",
" {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "symbold" should be "symbol"

Suggested change
" {\"query\": \"What is the trading symbold of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n",
" {\"query\": \"What is the trading symbol of NVIDIA Corporation on the Japan exchange?\", \"expected_match\": False, \"category\": \"general\"},\n",

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +165
"print(langcache_api_key)\n",
"print(langcache_id)\n",
"print(server_url)\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: Sensitive API keys and credentials are being printed to stdout. This should be avoided to prevent accidental exposure in logs or shared notebooks. Consider removing these print statements or replacing them with masked output (e.g., print("API Key: " + langcache_api_key[:4] + "..." + langcache_api_key[-4:])).

Suggested change
"print(langcache_api_key)\n",
"print(langcache_id)\n",
"print(server_url)\n",
"print(\"API Key: \" + (langcache_api_key[:4] + \"...\" + langcache_api_key[-4:] if langcache_api_key else \"API Key not set\"))\n",
"print(\"Cache ID: \" + (langcache_id[:4] + \"...\" + langcache_id[-4:] if langcache_id else \"Cache ID not set\"))\n",
"print(\"Server URL:\", server_url)\n",

Copilot uses AI. Check for mistakes.
"source": [
"### Configure Environment Variables\n",
"You'll need the LangCache API Key, Cache ID, URL\n",
"You will also need access to an LLM. In this notebook we'll be using OpenAI"
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation issue: Missing period at the end of the sentence. The comment should end with proper punctuation: "You will also need access to an LLM. In this notebook we'll be using OpenAI."

Suggested change
"You will also need access to an LLM. In this notebook we'll be using OpenAI"
"You will also need access to an LLM. In this notebook we'll be using OpenAI."

Copilot uses AI. Check for mistakes.
Comment on lines +1334 to +1337
"\n",
"cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n",
"cache_hit_rate = cache_hit_count / len(similar_questions)\n",
"print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n"
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Division by zero error when there are no cache hits. The code on line 1332 calculates speedup = avg_first / avg_second, but if avg_second is 0, this will raise a ZeroDivisionError. The condition on line 1331 checks if avg_second > 0:, which should protect against this, but the indentation of the cache hit rate calculation (lines 1335-1337) suggests it's outside this conditional block. Consider ensuring all calculations dependent on non-zero values are properly indented within the conditional block.

Suggested change
"\n",
"cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n",
"cache_hit_rate = cache_hit_count / len(similar_questions)\n",
"print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n"
" cache_hit_count = sum(1 for i, _ in enumerate(similar_questions) if second_pass_times[i] < 0.1)\n",
" cache_hit_rate = cache_hit_count / len(similar_questions)\n",
" print(f\" Cache hit rate: {cache_hit_rate*100:.0f}%\")\n"

Copilot uses AI. Check for mistakes.
Comment on lines +909 to +911
"precision = true_positives / (true_positives + false_positives)\n",
"recall = true_positives / (true_positives + false_negatives)\n",
"f1_score = 2 * (precision * recall) / (precision + recall)\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Potential division by zero errors in metrics calculation. Lines 1032-1034 calculate precision, recall, and F1 score with conditional checks, but line 909 calculates precision without checking if the denominator is zero. If there are no true positives or false positives, this will cause a ZeroDivisionError. The same issue exists for recall (line 910) and F1 score (line 911). Consider adding proper zero checks or using the same pattern as lines 1032-1034.

Suggested change
"precision = true_positives / (true_positives + false_positives)\n",
"recall = true_positives / (true_positives + false_negatives)\n",
"f1_score = 2 * (precision * recall) / (precision + recall)\n",
"precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) != 0 else 0\n",
"recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) != 0 else 0\n",
"f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0\n",

Copilot uses AI. Check for mistakes.
" elif not is_cache_hit and not should_match:\n",
" status = \"✓ TN (True Negative)\"\n",
" elif not is_cache_hit and should_match:\n",
" status = \"✗ FN (False Negative)\"\n"
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintainability issue: Incomplete code block. Lines 1078-1086 define status strings but don't use them. The code appears to be cut off and doesn't print or return the status, making this calculation unused. Either complete the logic to display the detailed breakdown or remove this incomplete section.

Suggested change
" status = \"✗ FN (False Negative)\"\n"
" status = \"✗ FN (False Negative)\"\n",
" print(f\"Query: {query_data['query']}\")\n",
" print(f\" Cache Hit: {is_cache_hit} | Expected Match: {should_match} | Distance: {query_data['distance']:.3f} | Status: {status}\")\n",
" print(\"-\"*80)\n"

Copilot uses AI. Check for mistakes.
" MISS MATCHED: Where is NVIDIA headquartered?...\n",
"10:41:07 httpx INFO HTTP Request: POST https://aws-us-east-1.langcache.redis.io/v1/caches/50eb6a09acf5415d8b68619b1ccffd9a/entries/search \"HTTP/1.1 200 OK\"\n",
"12. Cache HIT (distance: 0.0973)\n",
" Original query: What date does NVIDIA use as it's year end for acounting purposes?\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "acounting" should be "accounting"

Suggested change
" Original query: What date does NVIDIA use as it's year end for acounting purposes?\n",
" Original query: What date does NVIDIA use as it's year end for accounting purposes?\n",

Copilot uses AI. Check for mistakes.
" MISS MATCHED: Where is the principal executive office of NVIDIA Corporation located?...\n",
"10:41:07 httpx INFO HTTP Request: POST https://aws-us-east-1.langcache.redis.io/v1/caches/50eb6a09acf5415d8b68619b1ccffd9a/entries/search \"HTTP/1.1 200 OK\"\n",
"14. Cache HIT (distance: 0.0870)\n",
" Original query: What is the trading symbold of NVIDIA Corporation on the NASDAQ exchange?\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "symbold" should be "symbol"

Suggested change
" Original query: What is the trading symbold of NVIDIA Corporation on the NASDAQ exchange?\n",
" Original query: What is the trading symbol of NVIDIA Corporation on the NASDAQ exchange?\n",

Copilot uses AI. Check for mistakes.
Comment on lines +493 to +497
" \n",
" chunks: list of document chunks\n",
" max_chunks: maximum number of chunks to process\n",
" \n",
" Returns: A list of question-answer pairs\n",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Documentation issue: The docstring for extract_faqs_from_chunks function has inconsistent formatting. The parameter descriptions should follow a consistent style. Currently, it mixes inline descriptions with separate lines. Consider using a standard format like:

"""Extract FAQs from document chunks using LLM.

Args:
    chunks: List of document chunks
    max_chunks: Maximum number of chunks to process

Returns:
    A list of question-answer pairs
"""
Suggested change
" \n",
" chunks: list of document chunks\n",
" max_chunks: maximum number of chunks to process\n",
" \n",
" Returns: A list of question-answer pairs\n",
" \n",
" Args:\n",
" chunks: List of document chunks\n",
" max_chunks: Maximum number of chunks to process\n",
" \n",
" Returns:\n",
" A list of question-answer pairs\n",

Copilot uses AI. Check for mistakes.
@justin-cechmanek justin-cechmanek deleted the feat/langcache-example branch November 21, 2025 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants