Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughIntroduces a snippet feedback validation feature that orchestrates user feedback validation on flagged snippets. Adds Gemini AI integration, database functions for snippet retrieval, data models for validation outputs, processing pipeline orchestration with Cron scheduling, and comprehensive prompt/schema documentation for consistent validation workflows. Changes
Sequence DiagramsequenceDiagram
participant Scheduler as Scheduler<br/>(Cron 0 6 UTC)
participant Pipeline as Pipeline Orchestrator
participant Supabase as Supabase DB
participant Prompt as Prompt Builder
participant Gemini as Gemini API
participant ParseRes as Response Parser
participant SaveRes as Result Persistence
Scheduler->>Pipeline: trigger snippet_feedback_validation()
Pipeline->>Pipeline: initialize SupabaseClient, GeminiClient
Pipeline->>Supabase: get_snippets_with_recent_dislikes(lookback_days=1)
Supabase-->>Pipeline: snippets[] with dislike counts
loop for each snippet
Pipeline->>Prompt: build_validation_prompt(snippet, comments)
Prompt-->>Pipeline: formatted user_prompt
Pipeline->>Gemini: generate_content(model, user_prompt, system_instruction)
rect rgb(220, 240, 255)
note over Gemini: Gemini validates claims,<br/>assesses feedback,<br/>determines decision
end
Gemini-->>Pipeline: response (text + thought_summaries)
Pipeline->>ParseRes: parse_validation_response(response_text)
ParseRes-->>Pipeline: FeedbackValidationOutput (dict)
Pipeline->>SaveRes: save_validation_result(snippet_id, parsed_response, ...)
SaveRes->>Supabase: insert_feedback_validation_result()
Supabase-->>SaveRes: ✓ result saved
SaveRes-->>Pipeline: ✓ logged
end
Pipeline-->>Scheduler: summary (success_count, error_count)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.4)src/processing_pipeline/main.py************* Module .pylintrc ... [truncated 7668 characters] ... ] src/processing_pipeline/constants.py************* Module .pylintrc ... [truncated 13413 characters] ... tants", src/processing_pipeline/supabase_utils.py************* Module .pylintrc ... [truncated 34935 characters] ... ase_utils",
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature: an automated system designed to validate user feedback on snippets previously flagged as misinformation. By integrating with the Gemini API, the system processes user dislikes and comments, performs claim verifications using web search, assesses the quality and intent of user feedback, and ultimately determines the accuracy of the initial misinformation detection. This structured validation process aims to enhance the overall precision of the misinformation detection pipeline by learning from user input and identifying potential false positives or true positives, with ambiguous cases flagged for human review. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 3228929 in 2 minutes and 3 seconds. Click for details.
- Reviewed
1222lines of code in11files - Skipped
0files when reviewing. - Skipped posting
10draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. prompts/snippet_feedback_validation/output_schema.json:1
- Draft comment:
The JSON schema is comprehensive and well-documented for feedback validation. No changes needed. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. prompts/snippet_feedback_validation/system_instruction.md:1
- Draft comment:
System instruction is clear and succinct, outlining roles and expected validation outcomes. Looks good. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. prompts/snippet_feedback_validation/user_prompt.md:1
- Draft comment:
The user prompt is detailed with clear instructions and formatting examples. It ensures proper context is provided. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. src/processing_pipeline/constants.py:98
- Draft comment:
Consider specifying an encoding (e.g., 'utf-8') when opening files in get_system_instruction_for_feedback_validation and get_user_prompt_for_feedback_validation. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
5. src/processing_pipeline/main.py:93
- Draft comment:
The deployment block for snippet_feedback_validation uses concurrency_limit=1 and limit=1. Confirm if this is intentional for low-volume processing or consider revisiting if expecting higher throughput. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
6. src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py:286
- Draft comment:
The JSON extraction in parse_validation_response using find() and rfind() might be fragile if extra text precedes the JSON object. Consider a more robust extraction method (e.g., regex matching). - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting a refactor for "robustness" but doesn't identify a specific bug or failure case. The current implementation: 1) finds first{and last}, 2) checks if they exist, 3) validates with Pydantic which will catch malformed JSON. The comment says it "might be fragile" which is speculative language. It doesn't show evidence of an actual problem - just suggests that "extra text preceding the JSON object" could be an issue, but the current code already handles text before/after by using find/rfind. The suggestion to use regex is vague and not clearly better. This seems like a code quality suggestion without strong evidence of an actual problem. The current implementation might actually fail if the response contains nested JSON objects or if there are stray braces in text before/after the actual JSON. For example, if the response is "Here's the result: {valid json} but also {another object}", rfind would grab the last brace from the second object. This could be a legitimate concern. While that's theoretically possible, the comment doesn't provide evidence that this actually happens with Gemini responses in practice. The Pydantic validation would catch malformed JSON anyway. Without concrete evidence of failure, this is speculative. The rules state "Do NOT make speculative comments" and "you must see STRONG EVIDENCE that the comment is correct in order to keep it." This comment is speculative about potential fragility without demonstrating an actual bug or providing a concrete failure case. The suggestion is vague ("consider a more robust method") rather than actionable. Per the rules, speculative comments should be removed unless there's strong evidence of an issue.
7. src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py:232
- Draft comment:
The use of template.format() in build_validation_prompt relies on exact placeholder matching. Ensure that input data does not include unexpected curly braces that could break string formatting. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
8. src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py:412
- Draft comment:
Overall, the processing flow in snippet_feedback_validation is well-structured. Consider adding more granular error handling for external API calls (e.g., Gemini) to handle transient errors more gracefully. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
9. supabase/database/sql/get_snippets_with_recent_dislikes.sql:1
- Draft comment:
The SQL function for fetching snippets with recent dislikes is well-structured, with clear use of CTEs and proper ordering. Good job. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
10. src/services/gemini_client.py:1
- Draft comment:
The GeminiClient implementation is solid with appropriate safety settings. No issues found. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_CGhIaArTLv3qgmIb
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive new system for validating user feedback on flagged snippets, which is a great addition. The implementation is well-structured, leveraging Prefect for workflow orchestration, Pydantic for data modeling, and a new Gemini client for AI-powered validation. The database functions and utility code are also well-written. My review focuses on improving robustness by handling potential missing data from the database more safely and ensuring consistency across prompt and schema files to avoid future confusion. Overall, this is a solid contribution.
| "claim_verifications": [ | ||
| {{ | ||
| "claim": "The specific claim being verified", | ||
| "original_assessment": "What Stage 3 concluded about this claim", | ||
| "verification_finding": "What your web search reveals about this claim", | ||
| "is_claim_actually_false": true/false, | ||
| "confidence": 0-100 | ||
| }} | ||
| ], |
There was a problem hiding this comment.
The JSON example in this section has a couple of minor formatting issues that could be confusing for the model:
- The object inside the
claim_verificationsarray is wrapped in{{...}}(lines 182 and 188). It should be a plain JSON object{...}. - The value for
is_claim_actually_falseistrue/false(line 186), which is not valid JSON. It should be an example boolean liketrueorfalse. - The value for
confidenceis0-100(line 187), which is also not a valid JSON number. It should be an example integer like95.
Correcting these would make the prompt clearer and more robust for the model.
Implement user feedback validation system to review dislikes on misinformation detections. Validates whether user feedback correctly identifies false positives using web search verification. - Add validation prompts and JSON output schema - Create models for structured validation output - Implement Gemini client with web search capability - Add Supabase methods for fetching/storing validation results - Schedule daily validation runs via Prefect
Enhance the feedback validation system to classify Stage 3 error types (knowledge cutoff, temporal confusion, insufficient search, etc.) and provide prompt improvement suggestions for false positives. Also optimize database queries to fetch comments directly with snippets.
3228929 to
880728b
Compare
Add error_pattern field to classify Stage 3 errors and prompt_improvement_suggestion for false positive cases
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 7fe43b7 in 1 minute and 28 seconds. Click for details.
- Reviewed
46lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. prompts/snippet_feedback_validation/output_schema.json:9
- Draft comment:
Added 'error_pattern' to the required fields. Consider if this field should always be required—especially when Stage 3 made no error. If using 'correct_detection' to indicate no error, the required 'explanation' may not always be necessary. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is asking the PR author to "consider" whether the field should be required, which is essentially asking them to think about or verify their design decision. It's not definitively stating there's a problem - it's raising a question about whether the design is correct. The comment uses "Consider if..." which is a soft suggestion rather than identifying a clear issue. The schema does include "correct_detection" as an enum value, suggesting the author has already thought about the case where there's no error. This appears to be a speculative comment asking the author to double-check their design choice rather than identifying a definite problem. The comment could be raising a valid design concern - requiring an explanation field even when there's no error might be unnecessary boilerplate. Perhaps this is a legitimate code quality issue that should be addressed, not just speculation. While it could be a valid design concern, the comment doesn't provide strong evidence that this is actually a problem. The author clearly anticipated the "no error" case by including "correct_detection" in the enum. The comment is asking the author to "consider" rather than pointing out a definite issue, which violates the rule against asking authors to confirm their intention or double-check things. This comment should be deleted. It's asking the PR author to "consider" and verify their design decision rather than identifying a clear problem. The rules explicitly state not to ask the author to confirm their intention or double-check things. The author has already shown they considered the no-error case by including "correct_detection" in the enum.
2. prompts/snippet_feedback_validation/output_schema.json:127
- Draft comment:
For 'prompt_improvement_suggestion', consider enforcing it conditionally (only when validation status is 'false_positive') or clarifying its description, to ensure consistency with its intended use. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_EPVE6tlePcRSy6sK
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
prompts/snippet_feedback_validation/user_prompt.md (1)
177-207: The JSON example contains invalid syntax that may confuse the model.Several values in the example are showing options instead of valid JSON values:
- Line 186:
true/falseshould be a boolean liketrue- Line 187:
0-100should be an integer like85- Line 191:
"high/medium/low"should be a single value like"high"- Line 193, 196, 201: similar issues
Note: The
{{and}}are correct if this template is used with Python string formatting (they escape to literal{and}).🔎 Proposed fix
"claim_verifications": [ {{ "claim": "The specific claim being verified", "original_assessment": "What Stage 3 concluded about this claim", "verification_finding": "What your web search reveals about this claim", - "is_claim_actually_false": true/false, - "confidence": 0-100 + "is_claim_actually_false": true, + "confidence": 85 }} ], "user_feedback_assessment": {{ - "feedback_quality": "high/medium/low", + "feedback_quality": "high", "feedback_reasoning": "Assessment of why user disliked/labeled the snippet", - "appears_adversarial": true/false + "appears_adversarial": false }}, "validation_decision": {{ - "status": "false_positive/true_positive/needs_review", - "confidence": 0-100, + "status": "false_positive", + "confidence": 85, "primary_reason": "Main reason for this decision" }}, "error_pattern": {{ - "error_type": "knowledge_cutoff/temporal_confusion/insufficient_search/misinterpretation/correct_detection/ambiguous", + "error_type": "knowledge_cutoff", "explanation": "Brief explanation of why this error type was identified" }},
🧹 Nitpick comments (5)
supabase/database/sql/get_snippets_with_recent_dislikes.sql (1)
1-2: Minor:DROP FUNCTIONbeforeCREATE OR REPLACEis typically redundant.
CREATE OR REPLACE FUNCTIONalready handles replacing existing functions with the same signature. TheDROP FUNCTIONis only necessary if the function signature (parameter types/count) changes. Consider removing line 1 unless signature changes are expected, or add a comment explaining the intent.src/processing_pipeline/snippet_feedback_validation/__init__.py (1)
4-4: Consider sorting__all__alphabetically.Static analysis (Ruff RUF022) suggests applying isort-style sorting to
__all__. This is a minor style consistency issue.🔎 Proposed fix
-__all__ = ["snippet_feedback_validation", "FeedbackValidationOutput"] +__all__ = ["FeedbackValidationOutput", "snippet_feedback_validation"]src/processing_pipeline/supabase_utils.py (1)
454-491: Consider parameter object for maintainability.The method accepts 14 parameters, which can make it harder to maintain and call correctly. While this is acceptable for a database insertion method, consider refactoring to accept a data model or dictionary parameter if the signature grows further.
Example refactor approach
def insert_feedback_validation_result( self, validation_data: dict, ): response = ( self.client.table("snippet_feedback_validation_results") .insert(validation_data) .execute() ) return response.data[0] if response.data else NoneThis approach would require the caller to construct the dictionary, but reduces the method's parameter count and makes it easier to extend.
src/services/gemini_client.py (1)
67-72: Handle None value for parsed field.When
response_schemais None,response.parsedwill also be None, but the returned dictionary always includes a"parsed"key. Consider documenting this behavior or handling the None case explicitly.Optional improvement
return { "text": text, - "parsed": response.parsed, + "parsed": response.parsed if response_schema else None, "grounding_metadata": grounding_metadata, "thought_summaries": thought_summaries, }Or add a docstring clarifying that
parsedcan be None.src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py (1)
412-425: Consider making the rate limit configurable.The hardcoded 2-second sleep (line 425) between snippet processing could be moved to a parameter or configuration to allow tuning based on API quota and urgency.
Optional improvement
@optional_flow( name="Snippet Feedback Validation", log_prints=True, timeout_seconds=3600, ) -def snippet_feedback_validation(lookback_days: int, limit: int | None): +def snippet_feedback_validation( + lookback_days: int, + limit: int | None, + rate_limit_seconds: float = 2.0, +): ... for snippet in snippets: result = process_snippet(...) if result: success_count += 1 else: error_count += 1 - time.sleep(2) + time.sleep(rate_limit_seconds)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (11)
prompts/snippet_feedback_validation/output_schema.jsonprompts/snippet_feedback_validation/system_instruction.mdprompts/snippet_feedback_validation/user_prompt.mdsrc/processing_pipeline/constants.pysrc/processing_pipeline/main.pysrc/processing_pipeline/snippet_feedback_validation/__init__.pysrc/processing_pipeline/snippet_feedback_validation/models.pysrc/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.pysrc/processing_pipeline/supabase_utils.pysrc/services/gemini_client.pysupabase/database/sql/get_snippets_with_recent_dislikes.sql
🧰 Additional context used
🧬 Code graph analysis (3)
src/processing_pipeline/main.py (1)
src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py (1)
snippet_feedback_validation(388-427)
src/services/gemini_client.py (1)
src/processing_pipeline/constants.py (1)
GeminiModel(5-11)
src/processing_pipeline/snippet_feedback_validation/__init__.py (2)
src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py (1)
snippet_feedback_validation(388-427)src/processing_pipeline/snippet_feedback_validation/models.py (1)
FeedbackValidationOutput(53-89)
🪛 Ruff (0.14.10)
src/services/gemini_client.py
50-50: Avoid specifying long messages outside the exception class
(TRY003)
56-56: Avoid specifying long messages outside the exception class
(TRY003)
src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py
290-290: Avoid specifying long messages outside the exception class
(TRY003)
376-376: Consider moving this statement to an else block
(TRY300)
378-378: Do not catch blind exception: Exception
(BLE001)
391-391: Avoid specifying long messages outside the exception class
(TRY003)
src/processing_pipeline/snippet_feedback_validation/__init__.py
4-4: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🔇 Additional comments (15)
prompts/snippet_feedback_validation/output_schema.json (1)
1-135: LGTM!The JSON schema is well-structured and properly aligned with the
FeedbackValidationOutputPydantic model inmodels.py. Theprompt_improvement_suggestionfield is correctly defined as nullable and excluded from required fields, matching theOptionaltype in the model.supabase/database/sql/get_snippets_with_recent_dislikes.sql (1)
11-100: LGTM!The function is well-structured with clear CTE layering. The filtering logic for validated snippets, ordering by dislike count, and aggregation of labels/comments is correctly implemented. The use of
COALESCEfor empty arrays is a good defensive pattern.src/processing_pipeline/snippet_feedback_validation/models.py (1)
1-89: LGTM!The Pydantic models are well-designed with appropriate type constraints using
Literalfor enum-like fields andField(ge=0, le=100)for confidence bounds. The inline comments onErrorPatternDetected.error_typevalues provide helpful documentation. The models align with the JSON schema and will provide robust validation.prompts/snippet_feedback_validation/system_instruction.md (1)
1-12: LGTM!The system instruction is appropriately concise, establishing role context without duplicating the detailed format specification in the user prompt. The three validation outcomes align with the
ValidationDecision.statusvalues in the Pydantic model.src/processing_pipeline/constants.py (1)
88-95: LGTM!Good use of context managers (
with open(...) as f) for file handling, which ensures proper resource cleanup. This is a better pattern than some of the existing functions in this file.src/processing_pipeline/main.py (1)
93-100: LGTM!The deployment configuration is appropriate for a daily validation job:
- Low concurrency (
limit=1) prevents resource contention- Daily 6 AM UTC schedule aligns with commit message intent
lookback_days=1focuses on recent feedbackNote that
limit=Nonemeans all matching snippets will be processed. If snippet volume grows significantly, consider adding a reasonable limit to prevent very long runs.prompts/snippet_feedback_validation/user_prompt.md (1)
1-171: LGTM!The prompt is comprehensive and well-structured with clear sections for:
- Original snippet context with translation and metadata
- Stage 3 assessment details
- User feedback
- Verification guidelines with specific protocols
- Error pattern reference table
- Confidence thresholds that align with the
needs_reviewoutcomeThe placeholder structure (
{snippet_id},{transcription}, etc.) is appropriate for template substitution.src/processing_pipeline/supabase_utils.py (1)
438-452: LGTM! Clean RPC wrapper with good defaults.The method properly handles optional date conversion and provides a safe fallback with
response.data or [].src/services/gemini_client.py (2)
19-30: LGTM! Well-structured method signature with keyword-only arguments.Using keyword-only arguments (after
*) prevents positional argument errors and makes calls more readable.
74-96: Review safety settings for production readiness.All harm categories are set to
BLOCK_NONE, which disables content safety filters entirely. While this may be necessary for analyzing disinformation content, ensure this aligns with your content moderation policy and consider whether any categories should have stricter thresholds.Does the use case require completely disabling all safety filters, or should certain categories (like HARM_CATEGORY_DANGEROUS_CONTENT) maintain some level of filtering?
src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py (5)
92-147: LGTM! Clean liveblocks content parser.The nested helper functions handle various content types (bold, italic, links, mentions) and properly convert to markdown-like syntax.
150-158: LGTM! Simple and effective task wrapper.The retry decorator with logging is appropriate for network calls to Supabase.
161-265: LGTM! Comprehensive prompt construction.The function extracts and formats all relevant snippet data, handling missing values gracefully and building a detailed validation prompt.
268-280: LGTM! Clean Gemini validation wrapper.Properly configures the client with appropriate token budget and Google Search tool integration.
283-296: LGTM! Robust JSON parsing with validation.The function finds JSON boundaries and validates against the Pydantic model, with proper error propagation.
| def format_grounding_metadata(grounding_metadata) -> str: | ||
| """Format grounding metadata into a readable summary of searches performed.""" | ||
| if not grounding_metadata: | ||
| return "No search evidence available from Stage 3." | ||
|
|
||
| if isinstance(grounding_metadata, str): | ||
| try: | ||
| grounding_metadata = json.loads(grounding_metadata) | ||
| except json.JSONDecodeError: | ||
| return grounding_metadata | ||
|
|
||
| # Handle different formats of grounding metadata | ||
| lines = [] | ||
|
|
||
| # If it's a list of tool calls (CLI method format) | ||
| if isinstance(grounding_metadata, list): | ||
| for i, item in enumerate(grounding_metadata, 1): | ||
| if isinstance(item, dict): | ||
| params = item.get("parameters", item.get("input", {})) | ||
| output = item.get("output", item.get("result", "")) | ||
|
|
||
| # Extract search query if present | ||
| query = None | ||
| if isinstance(params, dict): | ||
| query = params.get("query", params.get("q", params.get("search_query"))) | ||
| elif isinstance(params, str): | ||
| query = params | ||
|
|
||
| if query: | ||
| lines.append(f"**Search {i}:** {query}") | ||
| if output and len(str(output)) < 500: | ||
| lines.append(f" Result: {output[:500]}...") | ||
| elif output: | ||
| lines.append(f" Result: [truncated - {len(str(output))} chars]") | ||
|
|
||
| # If it's a dict with search_queries or similar structure (SDK method format) | ||
| elif isinstance(grounding_metadata, dict): | ||
| # Handle Google Search grounding format | ||
| if "search_entry_point" in grounding_metadata: | ||
| rendered_content = grounding_metadata.get("search_entry_point", {}).get("rendered_content", "") | ||
| if rendered_content: | ||
| lines.append(f"**Search context:** {rendered_content[:500]}") | ||
|
|
||
| if "grounding_chunks" in grounding_metadata: | ||
| chunks = grounding_metadata["grounding_chunks"] | ||
| for i, chunk in enumerate(chunks[:10], 1): # Limit to first 10 chunks | ||
| web = chunk.get("web", {}) | ||
| uri = web.get("uri", "") | ||
| title = web.get("title", "") | ||
| if uri or title: | ||
| lines.append(f"**Source {i}:** [{title}]({uri})" if title else f"**Source {i}:** {uri}") | ||
|
|
||
| if "grounding_supports" in grounding_metadata: | ||
| supports = grounding_metadata["grounding_supports"] | ||
| for support in supports[:5]: # Limit to first 5 | ||
| segment = support.get("segment", {}) | ||
| text = segment.get("text", "") | ||
| if text: | ||
| lines.append(f"- Supported claim: \"{text[:200]}...\"" if len(text) > 200 else f"- Supported claim: \"{text}\"") | ||
|
|
||
| # Fallback: just dump key info | ||
| if not lines: | ||
| for key, value in grounding_metadata.items(): | ||
| if value and key not in ["search_entry_point"]: | ||
| lines.append(f"**{key}:** {str(value)[:300]}") | ||
|
|
||
| if not lines: | ||
| return "Stage 3 search metadata format not recognized. Raw data available but could not be parsed." | ||
|
|
||
| return "\n".join(lines) |
There was a problem hiding this comment.
Fix truncation logic in output formatting.
Line 50 checks if output and len(str(output)) < 500 but then displays output[:500]..., which would never truncate since the condition already ensures length < 500. The condition should likely be >= 500 or > 500.
🔎 Proposed fix
if query:
lines.append(f"**Search {i}:** {query}")
- if output and len(str(output)) < 500:
- lines.append(f" Result: {output[:500]}...")
- elif output:
+ if output:
+ output_str = str(output)
+ if len(output_str) <= 500:
+ lines.append(f" Result: {output_str}")
+ else:
+ lines.append(f" Result: {output_str[:500]}...")
+ else:
lines.append(f" Result: [truncated - {len(str(output))} chars]")Note: The final else branch in the original code (line 52-53) also has inverted logic—it claims truncation but the condition ensures the string is short.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def format_grounding_metadata(grounding_metadata) -> str: | |
| """Format grounding metadata into a readable summary of searches performed.""" | |
| if not grounding_metadata: | |
| return "No search evidence available from Stage 3." | |
| if isinstance(grounding_metadata, str): | |
| try: | |
| grounding_metadata = json.loads(grounding_metadata) | |
| except json.JSONDecodeError: | |
| return grounding_metadata | |
| # Handle different formats of grounding metadata | |
| lines = [] | |
| # If it's a list of tool calls (CLI method format) | |
| if isinstance(grounding_metadata, list): | |
| for i, item in enumerate(grounding_metadata, 1): | |
| if isinstance(item, dict): | |
| params = item.get("parameters", item.get("input", {})) | |
| output = item.get("output", item.get("result", "")) | |
| # Extract search query if present | |
| query = None | |
| if isinstance(params, dict): | |
| query = params.get("query", params.get("q", params.get("search_query"))) | |
| elif isinstance(params, str): | |
| query = params | |
| if query: | |
| lines.append(f"**Search {i}:** {query}") | |
| if output and len(str(output)) < 500: | |
| lines.append(f" Result: {output[:500]}...") | |
| elif output: | |
| lines.append(f" Result: [truncated - {len(str(output))} chars]") | |
| # If it's a dict with search_queries or similar structure (SDK method format) | |
| elif isinstance(grounding_metadata, dict): | |
| # Handle Google Search grounding format | |
| if "search_entry_point" in grounding_metadata: | |
| rendered_content = grounding_metadata.get("search_entry_point", {}).get("rendered_content", "") | |
| if rendered_content: | |
| lines.append(f"**Search context:** {rendered_content[:500]}") | |
| if "grounding_chunks" in grounding_metadata: | |
| chunks = grounding_metadata["grounding_chunks"] | |
| for i, chunk in enumerate(chunks[:10], 1): # Limit to first 10 chunks | |
| web = chunk.get("web", {}) | |
| uri = web.get("uri", "") | |
| title = web.get("title", "") | |
| if uri or title: | |
| lines.append(f"**Source {i}:** [{title}]({uri})" if title else f"**Source {i}:** {uri}") | |
| if "grounding_supports" in grounding_metadata: | |
| supports = grounding_metadata["grounding_supports"] | |
| for support in supports[:5]: # Limit to first 5 | |
| segment = support.get("segment", {}) | |
| text = segment.get("text", "") | |
| if text: | |
| lines.append(f"- Supported claim: \"{text[:200]}...\"" if len(text) > 200 else f"- Supported claim: \"{text}\"") | |
| # Fallback: just dump key info | |
| if not lines: | |
| for key, value in grounding_metadata.items(): | |
| if value and key not in ["search_entry_point"]: | |
| lines.append(f"**{key}:** {str(value)[:300]}") | |
| if not lines: | |
| return "Stage 3 search metadata format not recognized. Raw data available but could not be parsed." | |
| return "\n".join(lines) | |
| if query: | |
| lines.append(f"**Search {i}:** {query}") | |
| if output: | |
| output_str = str(output) | |
| if len(output_str) <= 500: | |
| lines.append(f" Result: {output_str}") | |
| else: | |
| lines.append(f" Result: {output_str[:500]}...") |
🤖 Prompt for AI Agents
In
src/processing_pipeline/snippet_feedback_validation/snippet_feedback_validation.py
around lines 20-89, the truncation logic is inverted: the code checks `if output
and len(str(output)) < 500` but then slices to 500 and adds "..." (and the else
branch similarly claims truncation for short strings). Change the branching so
that short outputs are shown unmodified (e.g., if output and len(str(output)) <=
500: append full output) and long outputs are truncated (else: append
output[:500] with a truncation note), ensuring the condition uses `>` (or `>
500`) for the truncation branch.
| @optional_task(log_prints=True) | ||
| def process_snippet( | ||
| supabase_client: SupabaseClient, | ||
| gemini_client: GeminiClient, | ||
| model_name: GeminiModel, | ||
| snippet, | ||
| ): | ||
| snippet_id = snippet["id"] | ||
| print(f"Processing snippet: {snippet_id}") | ||
|
|
||
| try: | ||
| user_prompt = build_validation_prompt(snippet, snippet["comments"]) | ||
| gemini_response = validate_with_gemini(gemini_client, model_name, user_prompt) | ||
| parsed_response = parse_validation_response(gemini_response["text"]) | ||
|
|
||
| # Prepare input data for audit | ||
| input_snippet_data = { | ||
| k: v | ||
| for k, v in snippet.items() | ||
| if k not in ["grounding_metadata", "thought_summaries", "labels", "comments"] | ||
| } | ||
|
|
||
| # Save result | ||
| save_validation_result( | ||
| supabase_client=supabase_client, | ||
| snippet_id=snippet_id, | ||
| parsed_response=parsed_response, | ||
| grounding_metadata=gemini_response["grounding_metadata"], | ||
| thought_summaries=gemini_response["thought_summaries"], | ||
| model_name=model_name, | ||
| input_snippet_data=input_snippet_data, | ||
| input_user_feedback={ | ||
| "labels": snippet["labels"], | ||
| "comments": snippet["comments"], | ||
| }, | ||
| dislike_count=snippet["dislike_count"], | ||
| ) | ||
|
|
||
| print( | ||
| f"Validation complete for snippet {snippet_id}: {parsed_response['validation_decision']['status']}\n\n" | ||
| f"Error pattern: {parsed_response['error_pattern']['error_type']}" | ||
| ) | ||
| return True | ||
|
|
||
| except Exception as e: | ||
| print(f"Error processing snippet {snippet_id}: {e}") | ||
| return False |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Improve exception handling specificity.
The broad except Exception catch on line 378 can mask unexpected errors and make debugging harder. Consider catching specific exceptions (e.g., ValidationError, ValueError, ConnectionError) and re-raising truly unexpected ones.
🔎 Proposed improvement
try:
user_prompt = build_validation_prompt(snippet, snippet["comments"])
gemini_response = validate_with_gemini(gemini_client, model_name, user_prompt)
parsed_response = parse_validation_response(gemini_response["text"])
# Prepare input data for audit
input_snippet_data = {
k: v
for k, v in snippet.items()
if k not in ["grounding_metadata", "thought_summaries", "labels", "comments"]
}
# Save result
save_validation_result(
supabase_client=supabase_client,
snippet_id=snippet_id,
parsed_response=parsed_response,
grounding_metadata=gemini_response["grounding_metadata"],
thought_summaries=gemini_response["thought_summaries"],
model_name=model_name,
input_snippet_data=input_snippet_data,
input_user_feedback={
"labels": snippet["labels"],
"comments": snippet["comments"],
},
dislike_count=snippet["dislike_count"],
)
print(
f"Validation complete for snippet {snippet_id}: {parsed_response['validation_decision']['status']}\n\n"
f"Error pattern: {parsed_response['error_pattern']['error_type']}"
)
return True
- except Exception as e:
+ except (ValidationError, ValueError, KeyError, ConnectionError) as e:
print(f"Error processing snippet {snippet_id}: {e}")
return False
+ except Exception as e:
+ print(f"Unexpected error processing snippet {snippet_id}: {e}")
+ raiseThis allows known, recoverable errors to be logged and skipped, while re-raising unexpected errors for investigation.
🧰 Tools
🪛 Ruff (0.14.10)
376-376: Consider moving this statement to an else block
(TRY300)
378-378: Do not catch blind exception: Exception
(BLE001)
| thought_summaries = "" | ||
| for part in response.candidates[0].content.parts: | ||
| if part.thought and part.text: | ||
| thought_summaries += part.text |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/services/gemini_client.py | head -100Repository: PublicDataWorks/verdad
Length of output: 3984
🏁 Script executed:
rg -nP -C5 'thought_summaries' --type=pyRepository: PublicDataWorks/verdad
Length of output: 22214
🏁 Script executed:
rg -nP -C5 'part\.thought' --type=pyRepository: PublicDataWorks/verdad
Length of output: 1740
🏁 Script executed:
git log --oneline src/services/gemini_client.py | head -10Repository: PublicDataWorks/verdad
Length of output: 127
🏁 Script executed:
rg -nP 'thinking|thought' --type=py -A2 -B2 | grep -A3 -B3 'part\.'Repository: PublicDataWorks/verdad
Length of output: 987
🏁 Script executed:
rg -nP 'Gemini.*API|google.*genai' --type=py -A2 | head -50Repository: PublicDataWorks/verdad
Length of output: 3898
Append part.thought instead of part.text in thought extraction logic.
The condition checks if part.thought and part.text: but only appends part.text to thought_summaries. Based on the variable name and the field description ("reasoning process"), this should append part.thought instead.
This same pattern appears in src/processing_pipeline/stage_3.py:596-597 and should be corrected there as well.
🤖 Prompt for AI Agents
In src/services/gemini_client.py around lines 58-61, the loop checks "if
part.thought and part.text" but appends part.text to thought_summaries; change
the append to part.thought (or simplify the condition to check part.thought
only) so the reasoning content is captured; do the same fix in
src/processing_pipeline/stage_3.py around lines 596-597 where the identical
pattern appears.
Important
Adds a new validation system for user feedback on flagged snippets, integrating it into the processing pipeline with schema, models, and database support.
snippet_feedback_validationto validate user feedback on flagged snippets.GeminiClientto generate validation results based on user feedback and snippet data.output_schema.jsonfor validation output format.system_instruction.mdanduser_prompt.mdfor validation task instructions.FeedbackValidationOutput,VerificationResult,UserFeedbackAssessment,ValidationDecision, andErrorPatternDetectedinmodels.py.get_snippets_with_recent_dislikes.sqlto fetch snippets with dislikes.SupabaseClientwith methods to fetch and insert validation results.main.pyto includesnippet_feedback_validationin the processing pipeline with a scheduled task.This description was created by
for 7fe43b7. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Release Notes
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.