From fe32e7bea0dda4d3a49849233bbdbab8311479f8 Mon Sep 17 00:00:00 2001 From: Gabor Szabo Date: Fri, 12 Jun 2026 18:38:54 +0200 Subject: [PATCH 01/32] docs(docs): add showcase workspace runbook and domain model entries (#401) --- docs/_base/DOMAIN_MODEL.md | 13 +++++++++++++ docs/_base/RUNBOOKS.md | 10 ++++++++++ 2 files changed, 23 insertions(+) diff --git a/docs/_base/DOMAIN_MODEL.md b/docs/_base/DOMAIN_MODEL.md index 25e2927b..06ba2d30 100644 --- a/docs/_base/DOMAIN_MODEL.md +++ b/docs/_base/DOMAIN_MODEL.md @@ -55,6 +55,16 @@ - JSONB columns are persisted via `model_dump(mode="json")` so `date`/`datetime` serialise to ISO strings. - An agent-saved plan (`source='agent'`) is persisted ONLY after the human approves it through the HITL gate — it always carries the approval audit trail. +### `showcase_workspace` (Demo) +- **Root:** `ShowcaseWorkspace(workspace_id: str, status: str)` — one row = one preserved (`preservation="keep"`) showcase run. +- **Status state machine:** `running` → `completed` | `failed` (CHECK-constrained; the finalize hook settles the row even on mid-run failure). +- **JSONB fields:** `created_objects` (sparse soft-reference keys — `winning_run_id`, `v2_run_id`, `v2_model_path`, `alias`, `agent_session_id`, `batch_id`, `scenario_plan_ids`, `scenario_artifact_key`, `train_model_types`, `stale_alias_run_id`) and `result_summary` (winner / WAPE / wall-clock display payload). +- **Invariants:** + - The config columns (`seed`, `scenario`, `reset`, `skip_seed`) are sufficient for a verbatim Replay through the normal run path — replay never mutates the original row; it creates a NEW row. + - `name` is deliberately NON-unique; `workspace_id` (UUID hex) is the unique handle. + - `created_objects` carries SOFT references only — **no ForeignKeys by design**. The workspace row is an audit record, not an ownership root: the referenced runs/plans/aliases are independently operator-deletable, and a workspace must never block (or cascade) their deletion. + - Persistence is warn-and-continue: a workspace write failure must never break the demo pipeline (the run completes with `workspace_id: null`). + ## Key Invariants — NEVER violate 1. **Time safety in features.** `app/features/featuresets/` uses only data at or before `cutoff_date`. Lags via `shift(positive)`, rolling via `shift(1).rolling(...)`, all `groupby` entity-aware. The test `app/features/featuresets/tests/test_leakage.py` is the spec — it MUST keep passing. @@ -89,6 +99,7 @@ | `model_exogenous` | The scenario `method` where a regression baseline genuinely re-forecasts through the assumptions — as opposed to the `heuristic` post-forecast multiplier | re-trained model (the baseline is not re-trained, only re-run) | | `future feature frame` | The leakage-safe `X_future` matrix `feature_frame.py` builds — long-lag, calendar, and exogenous columns the regression model consumes to re-forecast a scenario | feature matrix (that is the training-time term) | | `scenario tag` | A free-text label on a saved `scenario_plan` (its own queryable JSONB-array column) for filtering and grouping the library | seeder `scenario` preset, registry `alias` | +| `workspace` (showcase) | A saved showcase-run record (`showcase_workspace` row) — replay config + soft references to everything the run created | seeder `scenario` (a preset), `scenario plan` (a saved what-if), agent `session` | ## Event Taxonomy @@ -113,6 +124,8 @@ agent_session ──owns──► message_history (JSONB) ──may-contain─ job ──may-reference──► model_run (for train/backtest jobs) scenario_plan ──built-from──► model artifact (a baseline run_id) ──embeds──► comparison snapshot (JSONB) + +showcase_workspace ──soft-references──► model_run / scenario_plan / run_alias / agent_session / batch (JSONB ids, NO FK) ``` ## Glossary (cross-cutting) diff --git a/docs/_base/RUNBOOKS.md b/docs/_base/RUNBOOKS.md index df636648..b54bf7e1 100644 --- a/docs/_base/RUNBOOKS.md +++ b/docs/_base/RUNBOOKS.md @@ -144,6 +144,16 @@ uv run python scripts/run_demo.py --seed 42 --quiet 2>&1 | tee demo.log **Notes:** the `POST /demo/run` body and `WS /demo/stream` events are documented in `docs/_base/API_CONTRACTS.md`. The pipeline mirrors `scripts/run_demo.py`; the per-step diagnosis for `make demo` above applies to the same steps. PRP-38 added the `scenario` field on `DemoRunRequest` (defaults to `demo_minimal`) and the additive `phase_name` / `phase_index` / `phase_total` fields on every `StepEvent`. PRP-39 added four new steps (`champion_compat_compare`, `stale_alias_trigger`, `safer_promote_flow`, `batch_preset`) and a new `portfolio` phase between `decision` and `verify`. PRP-40 added the `planning` + `knowledge` phases (5 steps inserted after `portfolio`, before `verify`) and the additive `IndexProjectDocsRequest.path_prefix` field on the RAG slice. PRP-41 — design Z renames the legacy `agent` phase to `agents`, swaps the legacy `step_agent` for `agent_hitl_flow` (HITL approval round-trip), and appends a new `ops` phase carrying `ops_snapshot` immediately before `cleanup`. Total: 24 rows / 10 phases on `showcase_rich`; demo_minimal / sparse keep the 11-row layout under the unified `agents` phase id. The frontend's `DemoPhasePanel.tsx` now carries `onValueChange` (issue #311) and the Showcase page adds a KPI strip + Run-history strip + Stop button + Inspect-Artifacts panel + one-click Approve button on the HITL step card. E2 (#391) — the Scenario control is a card grid exposing all 8 `ScenarioPreset` values with per-preset demo seed profiles (`_SCENARIO_SEED_PROFILE` is exhaustive over the enum; `holiday_rush` seeds a pinned Oct–Dec 2024 window); the 5 newly exposed presets keep the legacy 11-row layout. +### Showcase workspace — preserve/restore/replay semantics (E1–E4, umbrella #389) +**Surface:** the `/showcase` "Save as workspace" controls + **Saved workspaces** panel; `GET /demo/workspaces(/{id})`; `showcase_workspace` table. Endpoint contracts live in `docs/_base/API_CONTRACTS.md` — this entry covers the operational traps only. + +1. **Replay is verbatim — replaying a `reset=true` workspace WIPES the database.** Replay re-submits the recorded config exactly (`seed`/`scenario`/`reset`/`skip_seed`) with `preservation="keep"`. A workspace saved from a Reset-database run therefore wipes + reseeds on every Replay; the panel styles such rows with a `DESTRUCTIVE` marker. This is designed E4 semantics (#393), not a bug — there is deliberately no confirm dialog (consistency with the Reset checkbox's severity styling). +2. **Names are non-unique by design.** Every Replay creates a NEW `showcase_workspace` row; same-named rows accumulate (the replay regression test itself leaves two `replay-regression` rows). Disambiguate by `workspace_id` or `created_at` (panel lists newest first). +3. **Rows accumulate — there is no DELETE endpoint yet** (a future epic; deletion was out of #393's scope). Rows are harmless audit records. `created_objects` ids are SOFT references (deliberately no FKs): an operator-issued `DELETE /registry/runs/{id}` or scenario-plan delete leaves dangling deep links on a loaded workspace's artifact cards — expected; the workspace row records what WAS created, not what still exists. +4. **`holiday_rush` workspaces replay the pinned 2024 window.** The preset seeds a fixed Oct–Dec 2024 window (incident 28 above); a Replay with `reset=false` ADDS those rows to a today-anchored dataset, so `/seeder/status` reports the union range afterwards. For a clean pinned window, save the workspace from a run with **Reset database** ticked — its (destructive) Replay then reproduces the pinned window exactly. + +**Notes:** keep-runs are recorded by warn-and-continue hooks — a DB hiccup during `create_workspace` yields a green pipeline with `workspace_id: null` and no row (check uvicorn logs for `demo.workspace_create_failed`). Ephemeral runs write no workspace rows and stay in the localStorage Run-history strip; kept runs appear ONLY in the server-backed panel. On `showcase_rich` keep-runs, the planning-phase scenario plans carry the `workspace:` tag (E3 #392) — retrieve them via `GET /scenarios?tags=workspace: