Experiment feat impl：Intent Coding MVP + Hidden Intent proactivity tracking by harryfan1985 · Pull Request #873 · GCWing/BitFun

harryfan1985 · 2026-05-25T06:29:18Z

Closes #854

This PR combines the Intent Coding MVP workflow (#854) with Hidden Intent proactivity tracking from PR #846.

Part 1 - Intent Coding MVP: IntentCodingMode agent, context compiler, provenance chain, risk classification, policy gates, mode picker UI.

Part 2 - Hidden Intent Tracking (merged from #846): pi-Bench based infrastructure with IntentEvidenceCollector, heuristic intent extraction, proactivity/completeness scoring, session usage report extensions.

…les/ The three-directory split (rules/knowledge/changes) was a conceptual distinction with no functional difference — the context loader processes all three identically. At MVP stage, one context directory is sufficient. - Delete .agent/knowledge/ and .agent/changes/ - instruction_context.rs: AGENT_CONTEXT_DIRS reduced to .agent/rules - Update all context-loader tests to use rules/ only - intent_coding_mode.md prompt: simplify context-loading instructions - .agent/README.md: drop knowledge/changes from directory map and task lifecycle - provenance-chain.md, context-budget.md: remove knowledge/changes references - PR GCWing#873 body: add architecture reference table from former knowledge file

Based on the pi-Bench Hidden Intent framework (arXiv 2605.14678), this introduces infrastructure for tracking proactive assistance quality in long-horizon agent workflows. Paper reference: pi-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Zhang et al., arXiv 2605.14678, May 2026 What this adds: - Hidden Intent types: IntentTerminalStatus (Completed/Inferred/Provided), HiddenIntent, PersistentIntent, SessionIntentTracking, ProactivityScore, CompletenessScore in services-core - IntentEvidenceCollector and IntentTurnEvidence in the ExecutionEngine for lightweight per-turn signal collection - Proactivity behavior guidance in agentic_mode.md and claw_mode.md system prompts - Extended facet_extraction.md with proactivity/completeness analysis dimensions - SessionUsageReport extensions with ProactivityReport and CompletenessRepor Based on the pi-Bench Hidden Intent framework (arXiv 2605.14678), this introduces infrastructure for tracking p edintroduces infrastructure for tracking proactive assistance quality ig.long-horizon agent workflows. Paper reference: pi-Bench: Evaluatinho Paper reference: pi-Benchden pi-Bench: Evas Long-Horizon Workflows Zhang et al., arXiv 2605.14678, Mer Zhang et al., arXiv 2ou What this adds: - Hidden Intent types: As - Hidden Intde HiddenIntent, PersistentIntent, SessionIntentTracking, ProactivitySal ProactivityScore, CompletenessScore in services-core ds - IntentEvidenceCollector and IntentTurnEvidence in t

- round_executor: detect AskUserQuestion even when no topic headers are extractable, so the call is no longer silently dropped - execution_engine/session_manager: drop unused turn_id param; warn on poisoned intent evidence mutex instead of silent skip - hidden_intent_types: centralize proactivity level thresholds in ProactivityLevel::{from_score,as_str}; add explicit IntentAssignment is_proxy flag so proxy detection no longer relies solely on a fragile intent_id string heuristic (heuristic kept as legacy fallback) - session_usage: use is_proxy flag first; document the single-provided suppression rationale - add regression tests for AskUserQuestion detection and proxy filtering Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The `tool_call` fixture helper and its `ToolCall` import were dropped when rebasing onto main, which had rewritten the test module header. Adds them back so the detect_ask_user_question tests compile and pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Chanli520 · 2026-05-26T01:44:20Z

效果好吗？

Mirror the Rust IntentAssignment is_proxy field so the frontend can read and filter proxy assignments. Optional to stay backward compatible. (Re-applied; lost during an earlier branch rebase.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Add extract_hidden_intents_from_evidence() that infers HiddenIntent entries from proactive tool usage and AskUserQuestion topics - Add proactive_tool_intent_description() for human-readable intent labels - Wire extraction into record_intent_evidence() with deduplication - Add load_unresolved_hidden_intents() for downstream consumers - Add 4 extraction tests covering proactive tools, questions, deduplication, and passive turns

Main refactored RequestContextPolicy from static constructors to a builder pattern (::empty().with_*()). Update IntentCodingMode to use the new API instead of the removed ::full() method. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…se in README The 26 intent/evidence pairs were self-hosted development artifacts from the MVP implementation, not runtime dependencies. Only rules/, knowledge/, and changes/ are loaded into agent context. - Keep 1 representative pair as a format example - Add a table in README clarifying that intents/evidence are per-task delivery artifacts, not runtime context - agent:check still passes (1 Intent Record + 1 Evidence Package)

…les/ The three-directory split (rules/knowledge/changes) was a conceptual distinction with no functional difference — the context loader processes all three identically. At MVP stage, one context directory is sufficient. - Delete .agent/knowledge/ and .agent/changes/ - instruction_context.rs: AGENT_CONTEXT_DIRS reduced to .agent/rules - Update all context-loader tests to use rules/ only - intent_coding_mode.md prompt: simplify context-loading instructions - .agent/README.md: drop knowledge/changes from directory map and task lifecycle - provenance-chain.md, context-budget.md: remove knowledge/changes references - PR GCWing#873 body: add architecture reference table from former knowledge file

…runtime These are per-task delivery artifacts, not repository scaffolding. The agent creates .agent/intents/ and .agent/evidence/ on demand when writing Intent Records and Evidence Packages. - Delete .agent/intents/ and .agent/evidence/ - Delete stale templates: knowledge-template.md, change-template.md - agent:check: intents/evidence no longer required; validate only when the dirs have files, otherwise report 'No active' info - Update README, prompt to document runtime-on-demand behavior

Move IntentCoding workflow rules from workspace .agent/ into prompts/intent_coding_rules/, loaded via include_str!(). - 9 rule files embedded in intent_coding.rs build_prompt() - instruction_context.rs: .agent/rules removed from context dirs - intent_coding_mode.md: updated to reference built-in rules - agent:check: simplified to only validate Intent/Evidence when present - Delete .agent/ directory from repository Signed-off-by: harryfan1985 <harryfan1985@gmail.com>

- Add .agent/ to .gitignore to prevent accidental commit of runtime artifacts - Downgrade intent-without-evidence to WARN in agent:check so mid-task runs don't fail - Replace fragile rules.len()==9 assertion with per-name checks in test - Remove context-budget.md rule (implementation detail, not agent guidance) - Remove redundant || '' in modeDisplay.translatedOrEmpty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

write_once on Linux can leave data in tokio's write buffer if the file handle is dropped without an explicit flush. Add flush() after write_all() to ensure bytes reach the OS before the caller reads the file. Also switch the test's read_to_string from std::fs (sync) to tokio::fs (async) for consistency with the async write path, eliminating a subtle ordering hazard on Linux CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… Record Move clarification from step 3 to step 2 (between Load context and Intent Record), making it a MANDATORY gate that blocks Intent Record creation until resolved. This ensures Explore output informs clarification questions without the agent proceeding to implementation before user intent is aligned.

…tracking' into codex-intent-coding-mvp

The getModeDisplayDescription/getModeDisplayName helpers are now consumed inside ModePickerOption, so the ChatInput-side import became dead and broke type-check with TS6192. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Address the highest-leverage findings from the deep review: - C1: build_proactivity_report falls back to per-turn intent_evidence so the report is no longer always None when an evaluator hasn't run. - C2: extract_hidden_intents_from_evidence emits trajectory markers with terminal_status = None, matching the module-doc contract; tests updated. - H2: IntentEvidenceCollector now uses tokio::sync::Mutex instead of std::sync::Mutex so a future .await inside the critical section can't deadlock silently. - H3: per-session intent_metadata_locks serializes the read-modify-write on SessionMetadata; lock map is cleared on delete_session. - H4: cap tool_names_used (64), question_topics (16), turn_evidence (64) and hidden_intents (256) so long sessions don't grow unbounded. - H5: missing workspace/metadata is a debug no-op instead of an error, silencing the warn-log spam for ephemeral/deleted sessions. - H6: proactivity report uses .max() across multi-assignment turns and prefers the authoritative intent_evidence count when present. - M3: slugify_topic falls back to a deterministic hash so non-ASCII question headers don't collide on empty slugs. - M11: IntentCoding prompt rules section is cached in OnceLock instead of being rebuilt from ~10 include_str! blocks every dialog turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nder - H1: useFlowChat / SessionModule / BtwThreadService auto-derive enableIntentTracking from the IntentCoding mode so picking the mode actually turns the evaluator on (it was previously plumbed but never set anywhere). - H7: maybeWarnIntentCodingEvidenceMissing now requires status===completed and a non-cancelled session; the detector regex is anchored on file-path shapes (.agent/evidence/, evidence-*.md) instead of the loose "Evidence Package" phrase; user-steering items are skipped so an end-user message echoing the phrase can no longer satisfy or trigger the warning. - H10: SessionAPI level fields widened to ProactivityLevel|(string&{}) and CompletenessLevel|(string&{}) so a future backend variant doesn't break exhaustiveness narrowing in callers. - M1: ChatInput agent-capsule modifier lowercases modeState.current so the IntentCoding mode no longer produces a missing --IntentCoding class. - M2: ModePickerOption gets role="option", tabIndex=0, aria-selected, aria-label and Enter/Space keyboard activation. Tests updated to cover the new gate (status, cancellation, user-steering). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- H8: validateEvidenceIntentReference now searches the Provenance Chain section instead of the whole document so a stray mention elsewhere can't satisfy the requirement. - H9: reject ".." traversal in two paths the validator/writer touched: the session_store branch of validateEvidenceProvenanceChain (would have let a crafted evidence file read arbitrary local JSON), and the CLI- derived --session-id / --turn-id args in intent-coding-provenance-record (would have let --session-id ../../tmp/pwn write outside .bitfun/). - M5: sectionContent terminator tightened to /^##(?!#)\s+/ in both scripts so nested ### subheadings stop truncating Repair Loop / Risks content. - M6: dependency gate trigger now includes Cargo.lock so lockfile-only Rust bumps are gated alongside Cargo.toml/package.json/pnpm-lock.yaml. - M7: Context Inputs regex accepts ":" inside the reference (URLs, file.md:42 line refs, Windows paths) by splitting on the last ": ". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The H1 fix from b154781 lived in web-ui only, so any session created via server RPC (rpc_dispatcher::create_session) or the AgentSubmissionPort constructed SessionConfig with Default::default() — where enable_intent_tracking is false — and IntentCoding sessions silently shipped without the evaluator on those code paths. Move the auto-derive into ConversationCoordinator::apply_mode_derived_session_defaults and call it from create_session_with_workspace_and_creator, create_hidden_subagent_session_with_workspace, and the inner create_hidden_subagent_session so every entry point — desktop, server, relay, subagent spawn — enables tracking for agent_type == "IntentCoding" unless the caller explicitly set it true already. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

harryfan1985 · 2026-05-28T14:55:40Z

能力对比测评报告

执行的两个任务

	Task A：文档注释修改	Task B：创建 TS 工具函数
目标文件	`src/crates/core/src/agentic/agents/mod.rs`	`src/web-ui/src/shared/utils/cn.ts`
复杂度	低（文档变更）	中（新建文件 + 逻辑代码）

对比结果

Task A：文档注释修改

指标	IntentCoding	Agentic（对照）	差异
耗时	30s	23s	IC 多 30%
工具调用	3	1	IC 多 2 次（Read + Edit + cargo check）
代码正确性	✅ 编译通过	✅ 编译通过	相同
Diff	1 file, +1/-3	1 file, +1/-3	完全相同
验证命令	隐式（Read 回读）	无	IC 多一步验证
Intent Record	无（文档任务跳过）	N/A	—
Evidence	无	N/A	—

Task B：创建 TS 工具函数

指标	IntentCoding	Agentic（对照）	差异
耗时	166s	22s	IC 多 7.5x
工具调用	16	3	IC 多 13 次
代码正确性	✅ 编译 + 测试通过	⚠️ 仅文件存在	IC 严格验证
产出文件	cn.ts + cn.test.ts	cn.ts（无测试）	IC 多 1 文件
测试文件	✅ 3 个 vitest case	❌ 未生成	关键差异
验证命令	✅ `vitest run` 通过	❌ 未运行	关键差异
Intent Record	✅ 创建	N/A	IC 独有
Evidence Package	✅ 含风险/策略/验证	N/A	IC 独有
`agent:check`	✅ Passed	N/A	IC 独有
Intent Tracking	✅ 3 hidden intents	❌ disabled	IC 独有

能力提升总结

                 Task A (简单)              Task B (中等)
                 ─────────────              ─────────────
                  IC     Agentic            IC      Agentic
                  
代码正确性         ████    ████              ████    ████
验证完整性         ████    ██░░              ████    ░░░░   ← 7.5x 提升
测试覆盖           ░░░░    ░░░░              ████    ░░░░   ← 从无到有
可追溯性           ░░░░    ░░░░              ████    ░░░░   ← 从无到有
风险感知           ░░░░    ░░░░              ████    ░░░░   ← 从无到有
Proactivity 可见性  ░░░░    ░░░░              ████    ░░░░   ← 从无到有

核心发现

验证驱动的质量保证：IntentCoding 在 Task B 中自主创建了测试文件并运行验证（3 tests passed），Agentic 仅创建了源文件。这是"是否有测试"的本质差异。
完整的可追溯性：IntentCoding 产出了 Intent Record + Evidence Package + 3 个 hidden intents，使得"为什么这样改、改了什么、怎么验证的"完全可回溯。Agentic 无任何痕迹。
额外开销可控：Task B 中 IC 多用了 144s 和 13 次工具调用，但这些额外开销换来了测试文件、验证执行、证据链——对中高复杂度任务是值得的。
简单任务无冗余：Task A（纯文档变更）中 IC 正确跳过了完整工作流，没有产生多余的 Intent/Evidence 文件，说明工作流有合理的"按任务复杂度适配"逻辑。
agent:check 通过：Evidence Package 无 WARN 或 ERROR，结构完整、风险分类准确。

harryfan1985 changed the title ~~Add Intent Coding MVP workflow~~ 实验特性: Add Intent Coding MVP workflow May 25, 2026

harryfan1985 changed the title ~~实验特性: Add Intent Coding MVP workflow~~ 实验特性: Intent Coding MVP workflow (#854) May 25, 2026

harryfan1985 changed the title ~~实验特性: Intent Coding MVP workflow (#854)~~ Experiment feat impl：Intent Coding MVP workflow (#854) May 25, 2026

harryfan1985 force-pushed the codex-intent-coding-mvp branch from 53a43fa to 8b98639 Compare May 25, 2026 06:54

harryfan1985 and others added 6 commits May 26, 2026 08:54

fix(agentic): sync turn-level intent assignments to dialog turn file

2d12f81

fix(agentic): wire hidden intent tracking fixes

56be62d

fix(agentic): align hidden intent reporting with pi-bench

74b1281

harryfan1985 force-pushed the codex-intent-coding-mvp branch from e7167b0 to 56656ce Compare May 26, 2026 05:57

harryfan1985 marked this pull request as draft May 28, 2026 09:15

harryfan1985 and others added 15 commits May 28, 2026 18:54

Add Intent Coding MVP workflow

b001bfd

test(web-ui): fix TaskToolDisplay mock flowChatStore subscribe error

4c95bab

refactor(intent-coding): remove inactive agent context loader

fbe12cc

style(intent-coding): normalize eof newlines

00b9d53

docs(intent-coding): sync embedded rule list

d9e42b3

feat(intent-coding): validate accepted check statuses

a25745e

ci(intent-coding): run workflow check

b081758

test(intent-coding): cover mode picker entry

50c9175

harryfan1985 and others added 18 commits May 28, 2026 19:17

feat(intent-coding): structure high-risk review routing

486033c

feat(intent-coding): require provenance anchors

a866dd2

feat(intent-coding): add context rule manifest

772ae30

feat(intent-coding): add policy gate checks

5036f37

feat(intent-coding): infer risk from evidence text

92b0572

feat(intent-coding): record review trigger mode

bb77f69

feat(intent-coding): validate session provenance records

02e2790

feat(intent-coding): require context input evidence

5000a4c

feat(intent-coding): enforce policy gate profiles

951542d

feat(intent-coding): support configurable policy gates

e1749c2

feat(intent-coding): enrich risk suggestion signals

db452f2

feat(intent-coding): add review route handoff

a837900

feat(intent-coding): export provenance records

3fd7348

feat(intent-coding): add context input compiler

d5cebff

feat(intent-coding): load recent risk memory

d9f2ff9

fix(intent-coding): use user context policy

40ab517

harryfan1985 force-pushed the codex-intent-coding-mvp branch from 5af5e68 to 4ab2968 Compare May 28, 2026 11:23

harryfan1985 added 2 commits May 28, 2026 19:27

Merge remote-tracking branch 'fork/feature/hidden-intent-proactivity-…

f4849c6

…tracking' into codex-intent-coding-mvp

fix: add missing intent_evidence field after merge with PR GCWing#846

4796adc

harryfan1985 changed the title ~~Experiment feat impl：Intent Coding MVP workflow (#854)~~ Experiment feat impl：Intent Coding MVP workflow (#854) + Hidden Intent proactivity tracking (pi-Bench) May 28, 2026

harryfan1985 changed the title ~~Experiment feat impl：Intent Coding MVP workflow (#854) + Hidden Intent proactivity tracking (pi-Bench)~~ Experiment feat impl：Intent Coding MVP + Hidden Intent proactivity tracking May 28, 2026

harryfan1985 mentioned this pull request May 28, 2026

Experiment feat(agentic): add Hidden Intent proactivity tracking framework (pi-Bench) #846

Closed

harryfan1985 and others added 5 commits May 28, 2026 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment feat impl：Intent Coding MVP + Hidden Intent proactivity tracking#873

Experiment feat impl：Intent Coding MVP + Hidden Intent proactivity tracking#873
harryfan1985 wants to merge 53 commits into
GCWing:mainfrom
harryfan1985:codex-intent-coding-mvp

harryfan1985 commented May 25, 2026 •

edited

Loading

Uh oh!

Chanli520 commented May 26, 2026

Uh oh!

harryfan1985 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harryfan1985 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Chanli520 commented May 26, 2026

Uh oh!

harryfan1985 commented May 28, 2026

能力对比测评报告

执行的两个任务

对比结果

Task A：文档注释修改

Task B：创建 TS 工具函数

能力提升总结

核心发现

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harryfan1985 commented May 25, 2026 •

edited

Loading