Run the Q&A pipeline in-process (remove Render Workflows) by Ho1yShif · Pull Request #2 · render-examples/pydantic-agents

Ho1yShif · 2026-06-28T03:15:11Z

Problem

Clicking a question in the deployed UI returned 503 → frontend "API error". POST /ask delegated the pipeline to a separate Render Workflows service via render.workflows.start_task(...), which requires the sync:false Blueprint var WORKFLOW_SLUG. When that's unset, the endpoint raised 503 WORKFLOW_SLUG is not configured. This stack should not depend on Render Workflows and should stay deployable on free Render tiers.

Fix — run the pipeline in-process

backend/pipeline/orchestrator.py (new): run_qa_pipeline ported from the deleted workflows/app.py, calling the stage functions directly (no @app.task wrappers, no JSON serialization round-trips). The accuracy + dual-model quality checks still overlap via a single asyncio.gather.
backend/database.py: new pipeline_runs table + set_run_status/get_run_status so GET /ask/{run_id} reads terminal status without polling Workflows (uses updated_at, consistent with the sibling pipeline_progress table).
backend/main.py: POST /ask launches the orchestrator as a background task; GET /ask/{run_id} reads run status + live progress. Response shape unchanged → no frontend changes.
Dropped the render-sdk dependency, render_api_key/workflow_slug config, and the entire workflows/ directory.
Ingest: cron repointed to the in-process data/scripts/ingest_pages.py, which now loads the core corpus then the live sources on a no-arg run.
render.yaml: removed the pipeline-trigger env group; renamed services pydantic-agents-workflows-* → pydantic-agents-*. Stays on free tiers (starter web + static + basic-1gb Postgres).
Updated README.md and .env.example off the Workflows model.

All "positive changes" from the workflows lineage (HNSW index, per-client history, query-expansion fix, consolidated ingest) were already on main and are preserved.

Verification (local, 1339 docs)

POST /ask → 202 (no 503); polled to done across all 7 stages with live progress updates.
Final answer: 12 sources, 31 claims, quality 89.5 / accuracy 97; session persisted.
/history correctly scoped per client_id (owner sees it; other clients don't).
Unknown run_id → 404.
Single-source in-process ingest works (ingest_pages.py pricing).
uv sync (render-sdk removed), compileall clean, ruff shows only the pre-existing E402 (deliberate load_dotenv()-first import order).

🤖 Generated with Claude Code

- workflows/app.py: add _stage_result helper to replace 8x repeated PipelineStageResult construction; consolidate the 6 page-injection tasks behind _run_ingest_script; tighten docstrings and fix the stale execute_pipeline / evaluation.py line references - workflows/serialization.py: remove unused evaluations_*/stages_to_json helpers - backend/pipeline/quality_gate.py: delete commented-out accuracy-gate block, condense the rationale note - README.md: dedupe the local-dev/env-group blockquotes, drop the repeated Blueprints note, trim generic capability bullets - remove the stale duplicate env.example (.env.example is canonical) No behavior changes. Verified: all modules import, dev server registers all 15 tasks, _stage_result builds correct stage shapes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lities - Revised project description to include "Render Workflows" for better context. - Added a new section detailing Render capabilities, including durable workflow tasks and PostgreSQL integration. - Removed outdated example questions and streamlined the architecture explanation. - Clarified local development requirements for asking questions and running the stack. These changes improve the documentation's accuracy and usability for developers.

…setup instructions - Changed the deployment link to point to the updated repository for Pydantic Agents Workflows. - Removed the manual setup section to streamline the README and focus on essential information for developers. These updates enhance the clarity and relevance of the documentation.

…ification Address audit feedback on the retrieval/generation layer and reorganize the workflow tasks into distinct, non-duplicative verification capabilities. retrieval.py - Replace the five near-identical detect_*/inject_* pairs (~350 lines of duplicated fetchrow + metadata-parse + prepend) with a declarative INJECTION_RULES table iterated by one inject_curated_docs helper. - Pin injected docs at a named INJECTED_DOC_SCORE constant (was a bare 0.95 literal) so the editorial pinning is explicit, not disguised as a real match. - Fix the multi-query boost off-by-one: boost i == 0 (the original question, per query_expansion.py) instead of i == 1 (first expanded variation). generation.py - Rewrite ANSWER_GENERATION_INSTRUCTIONS from an 87-line all-caps prompt to a lean, neutral set of grounding rules: drop the marketing steering and per-topic carve-outs, and remove the hardcoded "20" so it can't desync from rag_top_k. - Delete the unused stream_answer (and its export / AsyncGenerator import), which also removes the copy-pasted context + feedback block. accuracy.py / evaluation.py - Sharpen the two Claude judges into distinct roles: Accuracy owns factual grounding (errors/corrections -> feedback loop); Quality owns developer experience (clarity/completeness/usefulness -> gate score). workflows/app.py + docs - Re-group the orchestrator into Generate / Grounding / Accuracy+Quality phases and update README/PIPELINE.md to the 3-capability framing and neutral answers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…low-showcase refactor: data-driven RAG injection, neutral prompt, 3-capability verification

The pipeline reported a near-constant "~21 relevant documents" for every question because retrieval was a fixed top-k quota, not a relevance filter, plus a forced +1 injection. This makes the count reflect actual relevance and puts all scores on one interpretable scale. hybrid_search (database.py) - Stop overwriting the cosine similarity with the raw RRF score. Return cosine (0-1) as similarity_score; use RRF only to order the fused set. - Apply the similarity_threshold as a real gate on the FINAL returned set (not just the semantic candidate pool), so the result count is dynamic and <= k. - Remove pre-existing unused imports (numpy, DocumentChunk). retrieve_documents (retrieval.py) - Multi-query merge: dedup by max cosine, drop the off-scale 1.15 boost (it could exceed 1.0 and was the off-by-one), prefer original-question hits only as an ordering tiebreak, sort by cosine, cap at rag_top_k. inject_curated_docs (retrieval.py) - Replace-weakest policy: if a topic's curated doc was already retrieved, leave it; otherwise insert at top (score 1.0) and, only if over the rag_top_k ceiling, drop the lowest-ranked retrieved doc. Never pads the count, never duplicates. INJECTED_DOC_SCORE 0.95 -> 1.0 (top of the cosine scale). config.py - Document rag_top_k as a ceiling and similarity_threshold as a real relevance gate (with tuning guidance); drop unused Optional import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…etrieval fix: relevance-gated retrieval + replace-weakest injection

Updated the RAG configuration to adjust the `rag_top_k` value and introduced a new `relevance_cutoff_fraction` for improved document selection. Implemented an adaptive relevance cutoff in the retrieval pipeline to dynamically filter documents based on their similarity scores relative to the best match. This change aims to enhance the quality of retrieved documents by reducing the inclusion of marginally relevant results. Additionally, modified the ProgressTracker component to conditionally display status messages for better user feedback.

The ingestion side fanned out 6 near-identical @app.task wrappers (one per page), each spinning up an instance to do a single embed + insert, backed by ~1,280 lines of ~90% copy-paste across 6 scripts. Replace that with a source-oriented, data-driven design. Ingestion (8 tasks -> 3): - backend/ingestion.py: shared embed_documents() + replace_source() helpers (the delete-by-source + insert block was copy-pasted across all 7 scripts). - data/sources.py: SOURCES registry with curated-page / pricing-table / tutorials-index build strategies; curated content inlined as constants. - workflows/app.py: ingest_core (unchanged), ingest_source(name) (one parameterized, retried task), ingest_all (fans ingest_source over the registry). Removed _run_ingest_script + the 6 add_* wrappers. - data/scripts/ingest_pages.py + Makefile: local-dev parity without the deleted scripts. Backend pipeline dedup (light helpers): - backend/pipeline/_agents.py: anthropic_agent()/openai_agent() builders used by all 6 agents (drops a deprecated OpenAIModel usage). - observability.usage_and_cost(): replaces ~6 copy-pasted token/cost blocks. - evaluation.agreement_level(): extracted and reused in app.py. Docs/Makefile updated; deleted the 6 data/scripts/add_*_page.py files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Added a new table `pipeline_progress` to the database for tracking live progress of in-flight pipeline runs, keyed by a unique token. Updated the `ask_question` endpoint to generate and return this token, allowing the UI to display real-time stage updates. Enhanced the `get_answer` endpoint to return cumulative progress updates when the token is provided. Modified the frontend to display progress messages and updated API calls to support the new progress tracking feature.

…tion-workflow-tasks Live pipeline progress tracking + ingestion consolidation

Modified the `_fetch_curated_docs` function to fetch multiple rows for source-based lookups, allowing for chunked document retrieval. Updated the `_curated_build` function to create chunked documents from curated markdown files, enhancing the semantic retrieval process. Adjusted the logic in `inject_curated_docs` to deduplicate based on content rather than title, ensuring more efficient document injection. This refactor improves the handling of curated documents and optimizes the retrieval pipeline.

…verage Claim verification embeds each claim and does a top-k similarity search over the corpus; a claim is only verified if a retrieved passage substantiates it. Curated docs were ingested as one whole-page chunk (e.g. the 2.6KB workflows_docs page → a single chunk in prod), so a narrow claim like "billed prorated by the second" had low cosine against the diluted multi-topic embedding and couldn't surface its supporting passage — verifying at 0% despite the fact being present in our content. Add chunk_markdown_by_heading: split curated markdown into one focused chunk per ##/### section, keeping the heading vocabulary ("Pricing", "Beta Limitations") with the body so claims match the section that states them. Oversized sections fall back to paragraph chunking; tiny fragments merge into the previous chunk; heading-less files fall back to chunk_document. workflows_docs now yields 8 fact-level chunks instead of 1. Requires re-ingesting the curated sources (delete-by-source replace) for the live corpus to pick up the finer chunks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Section-aware chunking for curated docs (raise verification coverage)

…race ingest_all fans out an ingest_source task per source, each on a fresh instance calling vector_store.initialize(). The full schema DDL re-ran on every instance, so concurrent CREATE OR REPLACE FUNCTION / DROP+CREATE TRIGGER against the same Postgres catalog rows intermittently raised "tuple concurrently updated" (retried and eventually succeeded, but flaky). Wrap the DDL block in a transaction guarded by a transaction-scoped advisory lock (pg_advisory_xact_lock) on a stable key, so only one instance runs schema init at a time; the rest wait and re-run the now-idempotent statements. Fixes the race for every concurrent caller (ingest fan-out, gateway startup + QA run, parallel ingest_source triggers), not just ingest_all. Pool creation stays outside the lock; all statements are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Serialize schema init with advisory lock (fix concurrent ingest race)

The ivfflat(lists=100) index had catastrophic recall on the ~1.3k-row corpus: with pgvector's default ivfflat.probes=1, each query scanned only ~1 of 100 lists (~1% of rows), so the true #1 nearest neighbor was frequently never retrieved. Claims whose supporting chunk matched at cosine >0.7 (exact rank #1) still verified at 0% confidence because the approximate index never returned the chunk. Switch to HNSW (pgvector 0.8.0), which gives effectively exact recall at this scale with no probe tuning, fixing both similarity_search (verification) and hybrid_search (answer retrieval). Drop the old index first so existing deployments migrate on their next initialize(); the build runs inside the existing schema-init advisory lock. Also raise the verifier's candidate pool from 5 to 10 as defense-in-depth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix: replace ivfflat vector index with HNSW (fixes 0% claim verification)

Update the documents schema snippet in HYBRID_SEARCH.md to use the HNSW index (with a note on why ivfflat's default probes=1 tanked recall) and note the pgvector >= 0.5.0 requirement in the README prerequisites. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs: reflect HNSW vector index in schema + prerequisites

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Refactor the iteration tracking in the run_qa_pipeline function to use a dedicated variable, iterations_run, for clarity. This change ensures that the correct iteration count is logged and reported, improving the accuracy of the QA pipeline's performance metrics.

The quality gate (`quality_gate_decision`) and the iterative refinement loop never functioned in practice: with MAX_ITERATIONS=1 the gate short-circuits on its first check (`current_iteration >= max_iterations` is `1 >= 1`), so the quality-threshold and evaluator-disagreement branches and the feedback loop were unreachable. All 23 historical sessions in the live DB ran exactly one iteration, and the stored stage data shows the gate only ever emitting "Maximum iterations (1) reached". Collapse `run_qa_pipeline` to a single linear 7-stage pass and remove the now-dead surface area: - delete `backend/pipeline/quality_gate.py` and the `while` loop - drop the unused `feedback` param from answer generation - remove the `iterations` field/metric everywhere (AnswerResponse, track_pipeline_metrics, DB column + idempotent drop migration, frontend) - remove quality_threshold / accuracy_threshold / agreement_threshold / max_iterations from config.py, the /stats endpoint, and render.yaml - update README + docs to the 7-stage single-pass pipeline Grounding (claims extraction + verification), the accuracy check, and the dual-model OpenAI+Anthropic evaluation with cross-provider agreement are all unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Eliminate dead configuration parameters related to the quality gate and refinement loop, including QUALITY_THRESHOLD, ACCURACY_THRESHOLD, AGREEMENT_THRESHOLD, and MAX_ITERATIONS, as they are no longer applicable following recent refactoring of the QA pipeline.

The hand-rolled fetch+parse in backend/prices.py crashed on the real genai-prices schema: tiered prices (e.g. gpt-5.4, where input_mtok is a {base, tiers} dict) and constraint-based prices (e.g. o3, where prices is a list). The first such model aborted the entire provider parse, and because the try block wrapped both the HTTP call and the parse, a successful 200 + failed parse was mislabeled "GitHub price fetch failed" -- silently falling back to bundled files frozen at 2026-03-27. This also ran on every task, making wasted GitHub calls in the hot path. Replace the custom fetch+parse with the official genai-prices package, which ships bundled, auto-updated price data and a lookup API -- immune to schema drift and fully offline (no network in the cost-tracking path). - backend/prices.py: thin model_cost() wrapper over calc_price, with a graceful fallback rate so cost tracking never crashes - backend/observability.py: repoint the 3 cost wrappers at model_cost - workflows/app.py: drop the per-task load_prices() call + import - deps: add genai-prices, remove now-unused pyyaml - remove stale bundled backend/prices/*.yml files Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix: use official genai-prices package for model pricing

Remove dead quality gate and refinement loop

The Sources list showed the same document several times because retrieval returns chunks and the chunk list was passed straight through as `sources` with no document-level grouping (amplified by curated injection fetching every chunk of a matched page). This collapses chunks sharing (source, title) into one source entry for display while still feeding the full chunk list to generation. Also closes three high-severity observability gaps surfaced by an audit: - Add `logfire.force_flush()` (via a `flush_on_exit` decorator) to every workflow task. Each task runs on its own short-lived instance, so without an explicit flush buffered spans were lost and the Logs tab (`/sessions/{id}/logs`) returned an empty trace. - Report `tokens_used` for the claims_verification and quality_evaluation stages, which computed token counts but dropped them — leaving 2 of 7 stages with null token attribution. Sources are collapsed at response assembly, so generation still sees every chunk and History inherits the collapsed view via the persisted sources. Verified: py_compile + import smoke on changed Python, a direct unit test of collapse_sources, and `npm run build` (type-check + static export). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…observability Collapse duplicate source chunks + close observability gaps

Follow-up to PR #11. Closes the remaining low/medium audit items in one reviewable pass — no behavior change to the happy-path pipeline. - Remove broken `make test` references (no test suite in this example repo): drop the Makefile target/help/.PHONY entries and pytest dev deps, relock. - retrieve_documents now returns a real `queries_count` (the rag_retrieval stage metadata previously always reported queries_expanded=1). - Remove dead `evaluate_quality` (orchestrator uses the granular raters) and its now-unused imports; remove the dead `iteration` param from _stage_result. - Harden backend/api/logs.py: validate trace_id against ^[0-9a-f]{32}$ (->400) before SQL interpolation; make the query window a Settings field (7->30 days) so logs stay fetchable for older sessions. - De-duplicate the instrument_stage async/sync wrappers via shared _record_stage_success / _record_stage_failure helpers. - Cross-validate embedding_model vs embedding_dimensions at startup. - Drop the post-hoc docs[0].source citation fallback in verification.py — a verified claim with no judge-cited passage keeps supporting_docs empty rather than fabricating an attribution (frontend already guards empty lists). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix: close deferred audit items (cleanup)

The "Recent questions" panel showed every question ever asked across the whole app, because qa_sessions had no owner and GET /history had no WHERE clause. Scope history per user via an anonymous browser client ID (the app has no auth): a localStorage UUID sent with each request, stamped onto each saved session, and used to filter and scope all history operations. - database.py: add client_id column + idempotent migration + composite (client_id, created_at DESC) index; thread client_id through save_session, get_recent_sessions (WHERE client_id), delete_session (scoped), and delete_all_sessions (scoped). Legacy NULL-owner rows are excluded by the equality filter, so they stay hidden from everyone. - models.py / main.py: add client_id to QuestionRequest and pass it to the workflow; require client_id on GET/DELETE /history (400 otherwise) so we never fall back to a global query. - workflows/app.py: thread client_id through run_qa_pipeline -> _persist_session. - frontend/lib/api.ts: getClientId() (localStorage UUID, SSR-guarded) wired into askQuestion and all history calls. HistoryPanel needs no changes. Verified against local Postgres: migration applies, lists are per-client, legacy rows hidden, cross-client delete rejected, clear-all scoped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ient Scope "Recent questions" history to the anonymous browser client

Raise the query expansion agent's max_tokens from 300 to 1000 so gpt-4.1-nano can finish its structured QueryExpansionOutput payload instead of aborting with UnexpectedModelBehavior before any output. Also make expand_query() failures non-fatal: on any error, log a warning and fall back to the original question alone so a retrieval enhancement hiccup no longer takes down the whole Q&A pipeline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-limit fix: prevent query expansion token-limit crash

…trypoint The Dashboard pre-deploy chained deleted per-page scripts (add_pricing_page.py && add_ai_agent_template_page.py), failing on the latter which never existed. Codify the data-driven replacement in the Blueprint so the setting is version-controlled and can't drift: preDeployCommand runs `ingest_pages.py`, which ingests all live sources in data/sources.py directly (no Render Workflows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

POST /ask returned 503 because it delegated the pipeline to a separate Render Workflows service via render.workflows.start_task(...), which needs the sync:false Blueprint var WORKFLOW_SLUG. With it unset the endpoint raised 503. Per project direction this stack must not use Render Workflows and must stay deployable on free Render tiers. Run the pipeline in-process instead: - Add backend/pipeline/orchestrator.py: run_qa_pipeline ported from the deleted workflows/app.py, calling the stage functions directly (no @app.task wrappers, no JSON serialization round-trips). The accuracy + dual-model quality checks still overlap via a single asyncio.gather. - Add a pipeline_runs table + set_run_status/get_run_status so GET /ask/{run_id} can read terminal status without polling Workflows. - Rewrite POST /ask to launch the orchestrator as a background task and GET /ask/{run_id} to read run status + live progress. The response shape is unchanged, so the frontend needs no changes. - Drop the render-sdk dependency, render_api_key/workflow_slug config, and the workflows/ directory. - Repoint the ingest cron to the in-process data/scripts/ingest_pages.py, which now loads the core corpus then the live sources on a no-arg run. - render.yaml: drop the pipeline-trigger env group and rename services from pydantic-agents-workflows-* to pydantic-agents-*. - Update README and .env.example off the Workflows model. Verified locally (1339 docs): POST /ask -> 202 (no 503); polled to done across all 7 stages with live progress; answer with 12 sources, 31 claims, quality 89.5 / accuracy 97; session persisted and scoped per client; unknown run_id -> 404; single-source in-process ingest works. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ho1yShif and others added 30 commits June 25, 2026 17:00

Merge pull request #2 from render-examples/refactor/rag-cleanup-workf…

e0cfa7c

…low-showcase refactor: data-driven RAG injection, neutral prompt, 3-capability verification

Merge pull request #3 from render-examples/refactor/relevance-gated-r…

434b7b1

…etrieval fix: relevance-gated retrieval + replace-weakest injection

Merge pull request #4 from render-examples/refactor/consolidate-inges…

37ad270

…tion-workflow-tasks Live pipeline progress tracking + ingestion consolidation

Merge pull request #5 from render-examples/fix/curated-section-chunking

b339b1f

Section-aware chunking for curated docs (raise verification coverage)

Merge pull request #6 from render-examples/fix/schema-init-advisory-lock

6f5c6ae

Serialize schema init with advisory lock (fix concurrent ingest race)

Merge pull request #7 from render-examples/fix/hnsw-vector-index

1ee24c7

fix: replace ivfflat vector index with HNSW (fixes 0% claim verification)

Merge pull request #8 from render-examples/docs/hnsw-index

bc7d48e

docs: reflect HNSW vector index in schema + prerequisites

chore: gitignore .claude/skills and untrack from remote

10d678e

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge pull request #10 from render-examples/fix/genai-prices-package

15b873c

fix: use official genai-prices package for model pricing

Merge pull request #9 from render-examples/refactor/remove-quality-gate

aa5e024

Remove dead quality gate and refinement loop

Merge pull request #11 from render-examples/fix/collapse-sources-and-…

60fafa5

…observability Collapse duplicate source chunks + close observability gaps

Ho1yShif and others added 7 commits June 27, 2026 16:29

Merge pull request #12 from render-examples/cleanup/deferred-audit-items

3913be6

fix: close deferred audit items (cleanup)

Merge pull request #13 from render-examples/feat/scope-history-per-cl…

268e772

…ient Scope "Recent questions" history to the anonymous browser client

Merge pull request #14 from render-examples/fix/query-expansion-token…

ed4471d

…-limit fix: prevent query expansion token-limit crash

Ho1yShif deployed to fix/ask-in-process-no-workflows - pydantic-agents-workflows-frontend PR #2 June 28, 2026 03:15 — with Render View deployment

Ho1yShif deployed to fix/ask-in-process-no-workflows - pydantic-agents-frontend PR #2 June 28, 2026 03:15 — with Render View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run the Q&A pipeline in-process (remove Render Workflows)#2

Run the Q&A pipeline in-process (remove Render Workflows)#2
Ho1yShif wants to merge 37 commits into
mainfrom
fix/ask-in-process-no-workflows

Ho1yShif commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Ho1yShif commented Jun 28, 2026

Problem

Fix — run the pipeline in-process

Verification (local, 1339 docs)

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant