Add scenario validation pipeline and make systemPrompts reviewable by kcarnold · Pull Request #393 · AIToolsLab/writing-tools

kcarnold · 2026-04-14T21:05:49Z

Problem: System prompts for colleague chat scenarios were stored as single long strings with \n escapes, making them hard to review and diff. More importantly, there was no systematic way to validate that a scenario's system prompt actually produces the intended colleague behavior across different participant types (e.g., does it resist drafting requests? stay in character with a confused newbie? avoid info-dumping on a passive ack?).

Changes:

systemPrompt → systemPromptLines (scenarios.json) Store prompts as arrays of strings for readability. studyConfig.ts joins them at load time so the runtime ScenarioConfig type and all consumers (chat route, evalColleague) are unchanged.
Scenario design pipeline (scripts/scenario_design/) Four manually-sequenced scripts for validating scenario prompts:
- generate.ts: situation description + criteria → scenario JSON entry
- simulate.ts: runs 4 participant archetypes (eager, lazy, confused, boundary-pusher) through multi-turn conversations with the colleague
- judge.ts: evaluates conversations against 8 criteria using generateObject + Zod for structured verdicts
- fix.ts: analyzes failures and proposes minimal prompt edits
criteria.md defines scenario-agnostic behavioral rules (information
gating, refusal to draft, fact consistency, tone, etc.) intended to
double as paper appendix material.

All scripts use the existing ai SDK + openai provider — no new deps.
Run with: npx tsx scripts/scenario_design/<script>.ts
Cross-pollinated the two scenario systemPrompts to align on:
- UNPOLISHED tone for both colleagues
- No promises / "I'll get back to you" for both
- Patience with new users for both
- User role descriptions matching taskInstructions

Problem: System prompts for colleague chat scenarios were stored as single long strings with \n escapes, making them hard to review and diff. More importantly, there was no systematic way to validate that a scenario's system prompt actually produces the intended colleague behavior across different participant types (e.g., does it resist drafting requests? stay in character with a confused newbie? avoid info-dumping on a passive ack?). Changes: 1. systemPrompt → systemPromptLines (scenarios.json) Store prompts as arrays of strings for readability. studyConfig.ts joins them at load time so the runtime ScenarioConfig type and all consumers (chat route, evalColleague) are unchanged. 2. Scenario design pipeline (scripts/scenario_design/) Four manually-sequenced scripts for validating scenario prompts: - generate.ts: situation description + criteria → scenario JSON entry - simulate.ts: runs 4 participant archetypes (eager, lazy, confused, boundary-pusher) through multi-turn conversations with the colleague - judge.ts: evaluates conversations against 8 criteria using generateObject + Zod for structured verdicts - fix.ts: analyzes failures and proposes minimal prompt edits criteria.md defines scenario-agnostic behavioral rules (information gating, refusal to draft, fact consistency, tone, etc.) intended to double as paper appendix material. All scripts use the existing ai SDK + openai provider — no new deps. Run with: npx tsx scripts/scenario_design/<script>.ts 3. Cross-pollinated the two scenario systemPrompts to align on: - UNPOLISHED tone for both colleagues - No promises / "I'll get back to you" for both - Patience with new users for both - User role descriptions matching taskInstructions

kcarnold requested a review from nghtctrl April 14, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scenario validation pipeline and make systemPrompts reviewable#393

Add scenario validation pipeline and make systemPrompts reviewable#393
kcarnold wants to merge 1 commit into
mainfrom
claude/scenario-generation

kcarnold commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kcarnold commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant