fix(earnings): venue-rule compound-word under-count — compound_type axis + fail-loud boundaries (D-30) by helloiamvu · Pull Request #93 · mostlyrightmd/mostlyright-sdk

helloiamvu · 2026-07-03T14:40:49Z

Summary

Settlement-correctness fix for the earnings-mention venue rules (d30 / quick-260702-d30). An inverted guard dropped hyphenated compounds (e.g. supply-chain for chain) from BOTH venue counts — under-counting Kalshi KXEARNINGSMENTION and Polymarket mention markets.

Counting fix + compound axis

Hyphenated compounds now count on both venues; hyphenated-token components are scanned for fused compounds.
New per-occurrence compound_type axis on schema.earnings_fact.v1 (standalone/open/hyphenated/closed/affix_derivation — additive, nullable; pre-fix rows default to standalone semantics). Closed compounds (wildfire for fire) are Kalshi-No + Polymarket human-review candidates; affix derivations (joyful for joy) count for neither venue.
Occurrence classifier wired into the live counting path; fact rows split per (term, compound_type); acronyms scanned case-sensitively and excluded from the closed-candidate pass; firefighter-class ambiguity locked as closed.
Schema export + EXPORT_MANIFEST regenerated through the byte-deterministic drift gate; compound_type projected through the catalog read boundary (stripping it laundered closed/affix occurrences into auto-counts).

Fail-loud boundaries (no fail-open path for the new axis)

Unknown compound_type raises in every venue tally, at the FactLedger durable-write boundary (round 4), and shared form validation with classify_mentions.
FactDelta.spoken_at (engine-relative float) maps to offset_seconds — in to_stt_count() (round 2) and now also on the SSE wire in _fact_payload (round 4), never as the schema's wallclock spoken_at.
The SDK stream consumer validates every PRESENT temporal field independently as tz-aware (round 4) — a single-sided naive/numeric temporal fails loud instead of coercing into a 1970-epoch wallclock.

Review history

Rounds 1–3 (pre-push, Codex adversarial loop): 14 findings fixed across the counting path, classifier, catalog boundary, and schema export — see commit trail (review round 1/2/3).
Round 4 (post-merge gate, Codex 0.142.5 high): 2 P2 + 1 P3 on the merged diff. Both P2 fixed + regression-tested (SSE offset_seconds wire mapping; ledger compound_type enum guard). P3 (suffix-chain joyfully classifies closed → conservative disputed, not a silent mis-settle) accepted per severity gate.
Round 4 verify pass: Codex confirms both P2 closed; TS consumer tolerant of the wire change; pubsub bridge unaffected.

Known pre-existing issue (flagged, out of scope)

The transcript_segment SSE path emits Segment.spoken_at/knowledge_time as raw floats (main code, untouched here; the consumer rejected those frames before this branch too). It sits in the deliberately-unwired live path (/stream 404s live calls until the Phase 28 deploy follow-up). Proper fix = wallclock anchoring at capture time, coordinated with the 27-09/27-10 live validation — tracked as a follow-up task.

Test evidence

Full fast suite green (uv run pytest -m "not live", exit 0) on the merged branch; pre-push hook test + TS typecheck green.
New regression tests: SSE fact payload carries offset_seconds and never spoken_at; consumer raises LiveStreamError on float and on naive single-sided spoken_at; ledger rejects out-of-enum compound_type before the parquet write (nothing persisted), accepts None + all five canonical values.
Clean merge of origin/main (post-PR Phase 28: Hosted GCE Data Platform — serving apps + IaC + SDK hosted surface (platform layer) #92): zero file overlap.

TS Parity

Wire-level fix. The TS stream consumer (earnings_stream.ts) reads spoken_at only as a string and tolerates its absence — no public TS API change; no TS code change required. compound_type already crossed to TS via the regenerated schema export.

Settlement firewall

No changes to research(), merge/*, live/_sources.py, or CWOP registration. Changes are confined to the earnings engine/serving/consumer + schema.earnings_fact.v1.

🤖 Generated with Claude Code

python_only: true — wire-level + Python-internal fix. The compound_type axis lives on schema.earnings_fact.v1 (already crossed to TS via the PR #89 schema export; not a hand-edited TS generated type). The SSE wire changes (offset_seconds in place of an engine-relative spoken_at float) are already forward-compatible with the TS stream consumer packages-ts/weather/src/_fetchers/earnings_stream.ts, which reads spoken_at as optional (pickString(...) ?? null) and already maps offset_seconds → offsetSeconds. No TS source change required.

…rted guard) - Remove the inverted (?<!\w-)/(?!-\w) guard in _form_to_pattern; every form now uses plain (?<!\w)/(?!\w) boundaries so pre-tariff/tariff-based/non-fat/ pro-Palestine/New York-based count for the bare term (both venues' PDFs). - Rewrite the Hyphenated-compound docstring with verbatim Kalshi + Polymarket PDF URLs so the rule cannot be re-inverted. - Correct test_hyphenated_compound_is_not_the_bare_word -> test_hyphenated_compound_counts_as_bare_word (locks the fixed behavior). - GLP-1 slash-term + plural/possessive regressions stay green.

- earnings_fact: COMPOUND_TYPE_VALUES (standalone|open|hyphenated|closed| affix_derivation) + KALSHI/POLYMARKET_AUTOCOUNT_COMPOUND_TYPES frozensets + nullable compound_type ColumnSpec (SEPARATE axis from MATCH_RULE_VALUES); all three exported in __all__. - stt: classify_mentions() SIBLING to count_mentions — one record per occurrence {surface,start,compound_type}, reusing the same form-expansion machinery. Word-boundary pass tags standalone/open/hyphenated; substring pass tags closed candidates vs affix_derivation via a curated stdlib prefix/suffix heuristic (no dictionary dep, D-30 decision 4). Ambiguous -> conservative closed candidate, never a silent drop.

…andidate review - build_fact_rows emits compound_type per row (default standalone for pre-fix / PR #89 rows); each classify_mentions occurrence is one row, never an aggregate mixing compound_types. - kalshi_boolean_settles + polymarket_count restrict to KALSHI/POLYMARKET_AUTOCOUNT_COMPOUND_TYPES (standalone|open|hyphenated); closed is Kalshi-No + Polymarket candidate-only; affix_derivation counts for neither. - New closed_candidate_count + resolve_polymarket_status: fail-loud 'disputed' when auto < threshold but auto+closed >= threshold (closed candidates could flip the outcome) — never a silent resolved_no. - Back-compat: a row without compound_type settles exactly as before.

…type - scripts/export_schemas.py output: add nullable compound_type enum column to json/schema.earnings_fact.v1.json + update EXPORT_MANIFEST sha256/size, keeping the schema-drift gate (test_schemas_codegen.py) green after the Task 2 axis add.

… (review round 1) F2: classify_mentions silently fell through to exact semantics on an unknown match_rule and returned [] for empty/degenerate terms — the silent-settle hazards count_mentions guards against. Extract the four guards into _validated_forms_for_term; both count_mentions and classify_mentions now share the identical fail-loud form-prep path.

…ndidate scan (review round 1) F3: pass 2 scanned acronym and case-sensitive forms case-blind, contradicting its own comment — OCI tagged a candidate inside 'social', Block inside 'blockchain', flooding human review and bypassing the case-sensitive over-count guard pass 1 enforces. Acronym forms are now excluded from pass 2 entirely; case-sensitive forms scan case-sensitively (documented: capitalized 'Blockchain' surfaces as a conservative closed candidate for 'Block' — review-only, never auto-counted).

…review round 1) F4: _DERIVATIONAL_PREFIXES silently dropped genuine closed compounds — oversupply/prepayment/underdog classified affix_derivation, excluded from closed_candidate_count, so a Polymarket straddle resolved resolved_no with no human review (the silent drop the locked design forbids). Remove the prefix branch entirely; only the grammatical-suffix branch (joyful, running) may return affix_derivation. All prefix-attached cases fall through to the conservative closed candidate.

…test (review round 1) F6: the set-membership assertion (in {closed, affix_derivation}) stayed green if firefighter regressed to affix_derivation — the exact silent drop the test claims to prevent. Assert == 'closed' exactly (firefighter is a named closed-compound example in the research doc / Polymarket PDF); killjoy and wildfire already assert == 'closed' in test_classify_closed_candidate.

…eview round 1) F5: an out-of-enum compound_type (e.g. 'Standalone') vanished from ALL tallies — not auto-counted, not a closed candidate — so a true count of 5 resolved resolved_no at threshold 5 silently. _compound_type now validates against COMPOUND_TYPE_VALUES and raises ValueError on any unknown non-null value (applied in kalshi_boolean_settles, polymarket_count, polymarket_threshold_met, closed_candidate_count, resolve_status, and at build_fact_rows write time). None/missing still defaults to standalone (back-compat with pre-fix rows).

…view round 1) F1 (CRITICAL): the closed-candidate fail-loud path had no production producer — _count_final still used count_mentions, FactDelta carried no compound_type, and build_fact_rows defaulted absent keys to standalone, so 'firefighter' for term 'fire' produced NO closed candidate anywhere and a Polymarket closed-straddle still resolved resolved_no silently. - _count_final now runs classify_mentions and aggregates ONE delta per (term, compound_type) — word-boundary occurrences tally identically to the old path, just split per type; closed compounds now produce candidate deltas. - FactDelta gains compound_type (additive, default 'standalone' — existing SSE consumers and persisted deltas unaffected) + to_stt_count() as the propagation seam into build_fact_rows occurrence records. - End-to-end test: streaming closed compound -> closed delta -> fact rows -> polymarket 'disputed' at threshold 1 (never silent resolved_no); Kalshi still resolves no. No core schema change -> no export regen needed (earnings_fact.v1 already carries the nullable compound_type column).

…y (review round 2) R2-F1 (CRITICAL): _PASSTHROUGH_FIELDS stripped compound_type at the SDK's canonical read boundary — both _project_row (hosted /facts) and _project_stream_row (live SSE) laundered closed/affix occurrences into standalone auto-counts on BOTH venues, inverting Kalshi's closed-compound No and bypassing the fail-loud Polymarket review for every downstream consumer. - Add compound_type to _PASSTHROUGH_FIELDS (projected only-when-present: transcript rows + pre-fix payloads unaffected). - _compound_type treats float NaN as missing (default standalone): a MIXED old/new frame round-tripped through from_rows -> to_dict fills NaN for pre-fix rows — that is missing, not an out-of-enum value. - Tests: closed row survives BOTH projection paths and still resolves Kalshi no / Polymarket disputed; pre-fix payloads project cleanly; mixed frames resolve NaN-safe.

…s (review round 2) R2-F2: round-1 F3 overcorrected — excluding acronym forms from pass 2 entirely silently dropped plausible acronym compounds: 'GenAI'/'OpenAI' for term 'AI' produced zero rows, so a Polymarket threshold-1 market resolved resolved_no instead of surfacing a closed-candidate review. Acronym forms (already case_sensitive=True) now flow through the round-1 case-sensitive substring path: the case-preserved 'AI' inside GenAI/OpenAI is a closed candidate (review-only, never auto-counts); lowercase 'social'/'said' never match.

… spoken_at (review round 2) R2-F3: to_stt_count copied the engine-relative float spoken_at (12.5 s into stream) verbatim; build_fact_rows wrote it to the timestamp_utc spoken_at column and pyarrow silently coerced float->us-after-epoch, persisting 1970-01-01 00:00:00.000012+00:00 as the temporal audit marker on every seam-built row. - to_stt_count maps the float into offset_seconds (the engine-relative int audit field build_fact_rows already accepts) and never emits spoken_at. - build_fact_rows gains _validated_spoken_at: None passes (nullable column); tz-aware datetime passes through; float/int/string/naive datetime raises — the fail-loud write seam, consistent with the SSE projection path's _assert_live_temporal_contract tz-aware enforcement (documented). - Tests: to_stt_count emits offset_seconds not spoken_at; float + naive datetime raise; tz-aware wallclock passes; FactLedger round-trip of a seam-built row has NULL spoken_at, no epoch-1970 timestamps.

…review round 3) R3-F1 (CRITICAL): pass 2 skipped any token containing '-' ('hyphenated handled in pass 1') — but pass 1 only matches the term as a distinct hyphen-separated ELEMENT (fire-related); it cannot match 'fire' FUSED inside the component 'wildfire' of 'wildfire-related'. 'wildfire-related costs' for term 'fire' at Polymarket threshold 1 emitted nothing -> auto=0, closed=0 -> resolved_no instead of disputed — the exact silent under-count class this fix targets. Pass 2 now splits a hyphenated token into components and scans EACH exactly like an unhyphenated word (same case rules: case-blind normal forms, case-preserved acronym/case-sensitive forms): - a component exactly equal to the surface (under the form's case rule) is pass 1's boundary-match territory — skipped explicitly (covered-span overlap check also guards; both exercised by tests); - fused matches classify via _closed_or_affix on the COMPONENT, not the whole token (joyful-sounding for joy -> affix_derivation); - absolute offsets account for the component's position inside the token so covered-span bookkeeping stays correct. Tests: wildfire-related/firefighter-led -> closed; OpenAI-based -> closed (case-preserved); pre-tariff + state-of-the-art -> exactly one hyphenated occurrence, no pass-2 duplicate; joyful-sounding -> affix_derivation; count conservation across fire/fire-related/wildfire/wildfire-related (2 auto + 2 closed, no overlapping spans). Pass-2 comment + docstrings updated.

…e-count-fix

… SSE wire (review round 4) _fact_payload used asdict(delta) verbatim, leaking spoken_at — an ENGINE-RELATIVE float — labeled as the schema's tz-aware wallclock column. That reopened on the live wire the 1970-epoch coercion path R2-F3 closed in to_stt_count(). The payload now carries offset_seconds (int) and omits spoken_at entirely. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… (review round 4) _assert_live_temporal_contract no-opped when either temporal field was absent, so a fact_delta carrying only a malformed spoken_at (engine-relative float or naive string) passed silently into the row. Each PRESENT field is now independently required to parse tz-aware; the ordering check still runs when both are present. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… boundary (review round 4) The ledger re-derived kalshi_counted but persisted compound_type verbatim — an out-of-enum value would be durably written, served by /facts, and vanish from every venue tally downstream. The durable write now refuses out-of-enum compound_type (None passes: pre-fix rows omit the nullable column). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T14:40:58Z

✅ Docs-required check: PASS

API-surface change includes docs updates — no reminder needed.

API-surface files changed:

packages/core/src/mostlyright/core/schemas/earnings_fact.py
packages/weather/src/mostlyright/weather/catalog/earnings.py
packages/weather/src/mostlyright/weather/earnings/fact_builder.py
packages/weather/src/mostlyright/weather/earnings/ledger.py
packages/weather/src/mostlyright/weather/earnings/streaming_transcriber.py
packages/weather/src/mostlyright/weather/earnings/stt.py

Docs files changed:

CHANGELOG.md

github-actions · 2026-07-03T14:41:03Z

Parity ticket gate: PASSED

parity-ticket-check: Python-side trigger surface touched; opt-out satisfied (parity ticket, python_only flag, or label).

See CROSS-SDK-SYNC.md §2 for the workflow.

helloiamvu and others added 19 commits July 2, 2026 09:34

Merge remote-tracking branch 'origin/main' into phase27/earnings-venu…

e33c19b

…e-count-fix

docs(27-quick): changelog entry for the venue-count fix

ff94d45

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

helloiamvu requested a review from Tarabcak July 3, 2026 14:40

helloiamvu merged commit 865f633 into main Jul 3, 2026
23 of 24 checks passed

helloiamvu deleted the phase27/earnings-venue-count-fix branch July 3, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(earnings): venue-rule compound-word under-count — compound_type axis + fail-loud boundaries (D-30)#93

fix(earnings): venue-rule compound-word under-count — compound_type axis + fail-loud boundaries (D-30)#93
helloiamvu merged 19 commits into
mainfrom
phase27/earnings-venue-count-fix

helloiamvu commented Jul 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

helloiamvu commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review history

Known pre-existing issue (flagged, out of scope)

Test evidence

TS Parity

Settlement firewall

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

helloiamvu commented Jul 3, 2026 •

edited

Loading

github-actions Bot commented Jul 3, 2026 •

edited

Loading