Skip to content

fix(earnings): venue-rule compound-word under-count — compound_type axis + fail-loud boundaries (D-30)#93

Merged
helloiamvu merged 19 commits into
mainfrom
phase27/earnings-venue-count-fix
Jul 3, 2026
Merged

fix(earnings): venue-rule compound-word under-count — compound_type axis + fail-loud boundaries (D-30)#93
helloiamvu merged 19 commits into
mainfrom
phase27/earnings-venue-count-fix

Conversation

@helloiamvu

@helloiamvu helloiamvu commented Jul 3, 2026

Copy link
Copy Markdown
Member

Summary

Settlement-correctness fix for the earnings-mention venue rules (d30 / quick-260702-d30). An inverted guard dropped hyphenated compounds (e.g. supply-chain for chain) from BOTH venue counts — under-counting Kalshi KXEARNINGSMENTION and Polymarket mention markets.

Counting fix + compound axis

  • Hyphenated compounds now count on both venues; hyphenated-token components are scanned for fused compounds.
  • New per-occurrence compound_type axis on schema.earnings_fact.v1 (standalone/open/hyphenated/closed/affix_derivation — additive, nullable; pre-fix rows default to standalone semantics). Closed compounds (wildfire for fire) are Kalshi-No + Polymarket human-review candidates; affix derivations (joyful for joy) count for neither venue.
  • Occurrence classifier wired into the live counting path; fact rows split per (term, compound_type); acronyms scanned case-sensitively and excluded from the closed-candidate pass; firefighter-class ambiguity locked as closed.
  • Schema export + EXPORT_MANIFEST regenerated through the byte-deterministic drift gate; compound_type projected through the catalog read boundary (stripping it laundered closed/affix occurrences into auto-counts).

Fail-loud boundaries (no fail-open path for the new axis)

  • Unknown compound_type raises in every venue tally, at the FactLedger durable-write boundary (round 4), and shared form validation with classify_mentions.
  • FactDelta.spoken_at (engine-relative float) maps to offset_seconds — in to_stt_count() (round 2) and now also on the SSE wire in _fact_payload (round 4), never as the schema's wallclock spoken_at.
  • The SDK stream consumer validates every PRESENT temporal field independently as tz-aware (round 4) — a single-sided naive/numeric temporal fails loud instead of coercing into a 1970-epoch wallclock.

Review history

  • Rounds 1–3 (pre-push, Codex adversarial loop): 14 findings fixed across the counting path, classifier, catalog boundary, and schema export — see commit trail (review round 1/2/3).
  • Round 4 (post-merge gate, Codex 0.142.5 high): 2 P2 + 1 P3 on the merged diff. Both P2 fixed + regression-tested (SSE offset_seconds wire mapping; ledger compound_type enum guard). P3 (suffix-chain joyfully classifies closed → conservative disputed, not a silent mis-settle) accepted per severity gate.
  • Round 4 verify pass: Codex confirms both P2 closed; TS consumer tolerant of the wire change; pubsub bridge unaffected.

Known pre-existing issue (flagged, out of scope)

The transcript_segment SSE path emits Segment.spoken_at/knowledge_time as raw floats (main code, untouched here; the consumer rejected those frames before this branch too). It sits in the deliberately-unwired live path (/stream 404s live calls until the Phase 28 deploy follow-up). Proper fix = wallclock anchoring at capture time, coordinated with the 27-09/27-10 live validation — tracked as a follow-up task.

Test evidence

  • Full fast suite green (uv run pytest -m "not live", exit 0) on the merged branch; pre-push hook test + TS typecheck green.
  • New regression tests: SSE fact payload carries offset_seconds and never spoken_at; consumer raises LiveStreamError on float and on naive single-sided spoken_at; ledger rejects out-of-enum compound_type before the parquet write (nothing persisted), accepts None + all five canonical values.
  • Clean merge of origin/main (post-PR Phase 28: Hosted GCE Data Platform — serving apps + IaC + SDK hosted surface (platform layer) #92): zero file overlap.

TS Parity

Wire-level fix. The TS stream consumer (earnings_stream.ts) reads spoken_at only as a string and tolerates its absence — no public TS API change; no TS code change required. compound_type already crossed to TS via the regenerated schema export.

Settlement firewall

No changes to research(), merge/*, live/_sources.py, or CWOP registration. Changes are confined to the earnings engine/serving/consumer + schema.earnings_fact.v1.

🤖 Generated with Claude Code


python_only: true — wire-level + Python-internal fix. The compound_type axis lives on schema.earnings_fact.v1 (already crossed to TS via the PR #89 schema export; not a hand-edited TS generated type). The SSE wire changes (offset_seconds in place of an engine-relative spoken_at float) are already forward-compatible with the TS stream consumer packages-ts/weather/src/_fetchers/earnings_stream.ts, which reads spoken_at as optional (pickString(...) ?? null) and already maps offset_seconds → offsetSeconds. No TS source change required.

helloiamvu and others added 19 commits July 2, 2026 09:34
…rted guard)

- Remove the inverted (?<!\w-)/(?!-\w) guard in _form_to_pattern; every form
  now uses plain (?<!\w)/(?!\w) boundaries so pre-tariff/tariff-based/non-fat/
  pro-Palestine/New York-based count for the bare term (both venues' PDFs).
- Rewrite the Hyphenated-compound docstring with verbatim Kalshi + Polymarket
  PDF URLs so the rule cannot be re-inverted.
- Correct test_hyphenated_compound_is_not_the_bare_word ->
  test_hyphenated_compound_counts_as_bare_word (locks the fixed behavior).
- GLP-1 slash-term + plural/possessive regressions stay green.
- earnings_fact: COMPOUND_TYPE_VALUES (standalone|open|hyphenated|closed|
  affix_derivation) + KALSHI/POLYMARKET_AUTOCOUNT_COMPOUND_TYPES frozensets +
  nullable compound_type ColumnSpec (SEPARATE axis from MATCH_RULE_VALUES);
  all three exported in __all__.
- stt: classify_mentions() SIBLING to count_mentions — one record per
  occurrence {surface,start,compound_type}, reusing the same form-expansion
  machinery. Word-boundary pass tags standalone/open/hyphenated; substring pass
  tags closed candidates vs affix_derivation via a curated stdlib prefix/suffix
  heuristic (no dictionary dep, D-30 decision 4). Ambiguous -> conservative
  closed candidate, never a silent drop.
…andidate review

- build_fact_rows emits compound_type per row (default standalone for pre-fix /
  PR #89 rows); each classify_mentions occurrence is one row, never an aggregate
  mixing compound_types.
- kalshi_boolean_settles + polymarket_count restrict to
  KALSHI/POLYMARKET_AUTOCOUNT_COMPOUND_TYPES (standalone|open|hyphenated); closed
  is Kalshi-No + Polymarket candidate-only; affix_derivation counts for neither.
- New closed_candidate_count + resolve_polymarket_status: fail-loud 'disputed'
  when auto < threshold but auto+closed >= threshold (closed candidates could
  flip the outcome) — never a silent resolved_no.
- Back-compat: a row without compound_type settles exactly as before.
…type

- scripts/export_schemas.py output: add nullable compound_type enum column to
  json/schema.earnings_fact.v1.json + update EXPORT_MANIFEST sha256/size, keeping
  the schema-drift gate (test_schemas_codegen.py) green after the Task 2 axis add.
… (review round 1)

F2: classify_mentions silently fell through to exact semantics on an unknown
match_rule and returned [] for empty/degenerate terms — the silent-settle
hazards count_mentions guards against. Extract the four guards into
_validated_forms_for_term; both count_mentions and classify_mentions now share
the identical fail-loud form-prep path.
…ndidate scan (review round 1)

F3: pass 2 scanned acronym and case-sensitive forms case-blind, contradicting
its own comment — OCI tagged a candidate inside 'social', Block inside
'blockchain', flooding human review and bypassing the case-sensitive over-count
guard pass 1 enforces. Acronym forms are now excluded from pass 2 entirely;
case-sensitive forms scan case-sensitively (documented: capitalized
'Blockchain' surfaces as a conservative closed candidate for 'Block' —
review-only, never auto-counted).
…review round 1)

F4: _DERIVATIONAL_PREFIXES silently dropped genuine closed compounds —
oversupply/prepayment/underdog classified affix_derivation, excluded from
closed_candidate_count, so a Polymarket straddle resolved resolved_no with no
human review (the silent drop the locked design forbids). Remove the prefix
branch entirely; only the grammatical-suffix branch (joyful, running) may
return affix_derivation. All prefix-attached cases fall through to the
conservative closed candidate.
…test (review round 1)

F6: the set-membership assertion (in {closed, affix_derivation}) stayed green
if firefighter regressed to affix_derivation — the exact silent drop the test
claims to prevent. Assert == 'closed' exactly (firefighter is a named
closed-compound example in the research doc / Polymarket PDF); killjoy and
wildfire already assert == 'closed' in test_classify_closed_candidate.
…eview round 1)

F5: an out-of-enum compound_type (e.g. 'Standalone') vanished from ALL tallies
— not auto-counted, not a closed candidate — so a true count of 5 resolved
resolved_no at threshold 5 silently. _compound_type now validates against
COMPOUND_TYPE_VALUES and raises ValueError on any unknown non-null value
(applied in kalshi_boolean_settles, polymarket_count, polymarket_threshold_met,
closed_candidate_count, resolve_status, and at build_fact_rows write time).
None/missing still defaults to standalone (back-compat with pre-fix rows).
…view round 1)

F1 (CRITICAL): the closed-candidate fail-loud path had no production producer —
_count_final still used count_mentions, FactDelta carried no compound_type, and
build_fact_rows defaulted absent keys to standalone, so 'firefighter' for term
'fire' produced NO closed candidate anywhere and a Polymarket closed-straddle
still resolved resolved_no silently.

- _count_final now runs classify_mentions and aggregates ONE delta per
  (term, compound_type) — word-boundary occurrences tally identically to the
  old path, just split per type; closed compounds now produce candidate deltas.
- FactDelta gains compound_type (additive, default 'standalone' — existing SSE
  consumers and persisted deltas unaffected) + to_stt_count() as the
  propagation seam into build_fact_rows occurrence records.
- End-to-end test: streaming closed compound -> closed delta -> fact rows ->
  polymarket 'disputed' at threshold 1 (never silent resolved_no); Kalshi
  still resolves no. No core schema change -> no export regen needed
  (earnings_fact.v1 already carries the nullable compound_type column).
…y (review round 2)

R2-F1 (CRITICAL): _PASSTHROUGH_FIELDS stripped compound_type at the SDK's
canonical read boundary — both _project_row (hosted /facts) and
_project_stream_row (live SSE) laundered closed/affix occurrences into
standalone auto-counts on BOTH venues, inverting Kalshi's closed-compound No
and bypassing the fail-loud Polymarket review for every downstream consumer.

- Add compound_type to _PASSTHROUGH_FIELDS (projected only-when-present:
  transcript rows + pre-fix payloads unaffected).
- _compound_type treats float NaN as missing (default standalone): a MIXED
  old/new frame round-tripped through from_rows -> to_dict fills NaN for
  pre-fix rows — that is missing, not an out-of-enum value.
- Tests: closed row survives BOTH projection paths and still resolves Kalshi
  no / Polymarket disputed; pre-fix payloads project cleanly; mixed frames
  resolve NaN-safe.
…s (review round 2)

R2-F2: round-1 F3 overcorrected — excluding acronym forms from pass 2 entirely
silently dropped plausible acronym compounds: 'GenAI'/'OpenAI' for term 'AI'
produced zero rows, so a Polymarket threshold-1 market resolved resolved_no
instead of surfacing a closed-candidate review. Acronym forms (already
case_sensitive=True) now flow through the round-1 case-sensitive substring
path: the case-preserved 'AI' inside GenAI/OpenAI is a closed candidate
(review-only, never auto-counts); lowercase 'social'/'said' never match.
… spoken_at (review round 2)

R2-F3: to_stt_count copied the engine-relative float spoken_at (12.5 s into
stream) verbatim; build_fact_rows wrote it to the timestamp_utc spoken_at
column and pyarrow silently coerced float->us-after-epoch, persisting
1970-01-01 00:00:00.000012+00:00 as the temporal audit marker on every
seam-built row.

- to_stt_count maps the float into offset_seconds (the engine-relative int
  audit field build_fact_rows already accepts) and never emits spoken_at.
- build_fact_rows gains _validated_spoken_at: None passes (nullable column);
  tz-aware datetime passes through; float/int/string/naive datetime raises —
  the fail-loud write seam, consistent with the SSE projection path's
  _assert_live_temporal_contract tz-aware enforcement (documented).
- Tests: to_stt_count emits offset_seconds not spoken_at; float + naive
  datetime raise; tz-aware wallclock passes; FactLedger round-trip of a
  seam-built row has NULL spoken_at, no epoch-1970 timestamps.
…review round 3)

R3-F1 (CRITICAL): pass 2 skipped any token containing '-' ('hyphenated handled
in pass 1') — but pass 1 only matches the term as a distinct hyphen-separated
ELEMENT (fire-related); it cannot match 'fire' FUSED inside the component
'wildfire' of 'wildfire-related'. 'wildfire-related costs' for term 'fire' at
Polymarket threshold 1 emitted nothing -> auto=0, closed=0 -> resolved_no
instead of disputed — the exact silent under-count class this fix targets.

Pass 2 now splits a hyphenated token into components and scans EACH exactly
like an unhyphenated word (same case rules: case-blind normal forms,
case-preserved acronym/case-sensitive forms):
- a component exactly equal to the surface (under the form's case rule) is
  pass 1's boundary-match territory — skipped explicitly (covered-span
  overlap check also guards; both exercised by tests);
- fused matches classify via _closed_or_affix on the COMPONENT, not the whole
  token (joyful-sounding for joy -> affix_derivation);
- absolute offsets account for the component's position inside the token so
  covered-span bookkeeping stays correct.

Tests: wildfire-related/firefighter-led -> closed; OpenAI-based -> closed
(case-preserved); pre-tariff + state-of-the-art -> exactly one hyphenated
occurrence, no pass-2 duplicate; joyful-sounding -> affix_derivation; count
conservation across fire/fire-related/wildfire/wildfire-related (2 auto +
2 closed, no overlapping spans). Pass-2 comment + docstrings updated.
… SSE wire (review round 4)

_fact_payload used asdict(delta) verbatim, leaking spoken_at — an
ENGINE-RELATIVE float — labeled as the schema's tz-aware wallclock column.
That reopened on the live wire the 1970-epoch coercion path R2-F3 closed in
to_stt_count(). The payload now carries offset_seconds (int) and omits
spoken_at entirely.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… (review round 4)

_assert_live_temporal_contract no-opped when either temporal field was
absent, so a fact_delta carrying only a malformed spoken_at (engine-relative
float or naive string) passed silently into the row. Each PRESENT field is
now independently required to parse tz-aware; the ordering check still runs
when both are present.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… boundary (review round 4)

The ledger re-derived kalshi_counted but persisted compound_type verbatim —
an out-of-enum value would be durably written, served by /facts, and vanish
from every venue tally downstream. The durable write now refuses out-of-enum
compound_type (None passes: pre-fix rows omit the nullable column).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@helloiamvu helloiamvu requested a review from Tarabcak July 3, 2026 14:40
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Docs-required check: PASS

API-surface change includes docs updates — no reminder needed.

API-surface files changed:

packages/core/src/mostlyright/core/schemas/earnings_fact.py
packages/weather/src/mostlyright/weather/catalog/earnings.py
packages/weather/src/mostlyright/weather/earnings/fact_builder.py
packages/weather/src/mostlyright/weather/earnings/ledger.py
packages/weather/src/mostlyright/weather/earnings/streaming_transcriber.py
packages/weather/src/mostlyright/weather/earnings/stt.py

Docs files changed:

CHANGELOG.md

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Parity ticket gate: PASSED

parity-ticket-check: Python-side trigger surface touched; opt-out satisfied (parity ticket, python_only flag, or label).

See CROSS-SDK-SYNC.md §2 for the workflow.

@helloiamvu helloiamvu merged commit 865f633 into main Jul 3, 2026
23 of 24 checks passed
@helloiamvu helloiamvu deleted the phase27/earnings-venue-count-fix branch July 3, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant