Skip to content

Experiment feat impl:Intent Coding MVP + Hidden Intent proactivity tracking#873

Draft
harryfan1985 wants to merge 53 commits into
GCWing:mainfrom
harryfan1985:codex-intent-coding-mvp
Draft

Experiment feat impl:Intent Coding MVP + Hidden Intent proactivity tracking#873
harryfan1985 wants to merge 53 commits into
GCWing:mainfrom
harryfan1985:codex-intent-coding-mvp

Conversation

@harryfan1985
Copy link
Copy Markdown
Contributor

@harryfan1985 harryfan1985 commented May 25, 2026

Closes #854

This PR combines the Intent Coding MVP workflow (#854) with Hidden Intent proactivity tracking from PR #846.

Part 1 - Intent Coding MVP: IntentCodingMode agent, context compiler, provenance chain, risk classification, policy gates, mode picker UI.

Part 2 - Hidden Intent Tracking (merged from #846): pi-Bench based infrastructure with IntentEvidenceCollector, heuristic intent extraction, proactivity/completeness scoring, session usage report extensions.

@harryfan1985 harryfan1985 changed the title Add Intent Coding MVP workflow 实验特性: Add Intent Coding MVP workflow May 25, 2026
@harryfan1985 harryfan1985 changed the title 实验特性: Add Intent Coding MVP workflow 实验特性: Intent Coding MVP workflow (#854) May 25, 2026
@harryfan1985 harryfan1985 changed the title 实验特性: Intent Coding MVP workflow (#854) Experiment feat impl:Intent Coding MVP workflow (#854) May 25, 2026
@harryfan1985 harryfan1985 force-pushed the codex-intent-coding-mvp branch from 53a43fa to 8b98639 Compare May 25, 2026 06:54
harryfan1985 added a commit to harryfan1985/BitFun that referenced this pull request May 25, 2026
…les/

The three-directory split (rules/knowledge/changes) was a conceptual
distinction with no functional difference — the context loader
processes all three identically.  At MVP stage, one context directory
is sufficient.

- Delete .agent/knowledge/ and .agent/changes/
- instruction_context.rs: AGENT_CONTEXT_DIRS reduced to .agent/rules
- Update all context-loader tests to use rules/ only
- intent_coding_mode.md prompt: simplify context-loading instructions
- .agent/README.md: drop knowledge/changes from directory map and task lifecycle
- provenance-chain.md, context-budget.md: remove knowledge/changes references
- PR GCWing#873 body: add architecture reference table from former knowledge file
harryfan1985 and others added 6 commits May 26, 2026 08:54
Based on the pi-Bench Hidden Intent framework (arXiv 2605.14678), this
introduces infrastructure for tracking proactive assistance quality in
long-horizon agent workflows.

Paper reference:
  pi-Bench: Evaluating Proactive Personal Assistant Agents in
  Long-Horizon Workflows
  Zhang et al., arXiv 2605.14678, May 2026

What this adds:
  - Hidden Intent types: IntentTerminalStatus (Completed/Inferred/Provided),
    HiddenIntent, PersistentIntent, SessionIntentTracking,
    ProactivityScore, CompletenessScore in services-core
  - IntentEvidenceCollector and IntentTurnEvidence in the ExecutionEngine
    for lightweight per-turn signal collection
  - Proactivity behavior guidance in agentic_mode.md and claw_mode.md
    system prompts
  - Extended facet_extraction.md with proactivity/completeness
    analysis dimensions
  - SessionUsageReport extensions with ProactivityReport and
    CompletenessRepor
Based on the pi-Bench Hidden Intent framework (arXiv 2605.14678), this
introduces infrastructure for tracking p edintroduces infrastructure for tracking proactive assistance quality ig.long-horizon agent workflows.

Paper reference:
  pi-Bench: Evaluatinho
Paper reference:
  pi-Benchden  pi-Bench: Evas   Long-Horizon Workflows
  Zhang et al., arXiv 2605.14678, Mer  Zhang et al., arXiv 2ou
What this adds:
  - Hidden Intent types: As  - Hidden Intde    HiddenIntent, PersistentIntent, SessionIntentTracking,
    ProactivitySal    ProactivityScore, CompletenessScore in services-core
ds  - IntentEvidenceCollector and IntentTurnEvidence in t
- round_executor: detect AskUserQuestion even when no topic headers are
  extractable, so the call is no longer silently dropped
- execution_engine/session_manager: drop unused turn_id param; warn on
  poisoned intent evidence mutex instead of silent skip
- hidden_intent_types: centralize proactivity level thresholds in
  ProactivityLevel::{from_score,as_str}; add explicit IntentAssignment
  is_proxy flag so proxy detection no longer relies solely on a fragile
  intent_id string heuristic (heuristic kept as legacy fallback)
- session_usage: use is_proxy flag first; document the single-provided
  suppression rationale
- add regression tests for AskUserQuestion detection and proxy filtering

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `tool_call` fixture helper and its `ToolCall` import were dropped when
rebasing onto main, which had rewritten the test module header. Adds them
back so the detect_ask_user_question tests compile and pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Chanli520
Copy link
Copy Markdown

效果好吗?

Mirror the Rust IntentAssignment is_proxy field so the frontend can read
and filter proxy assignments. Optional to stay backward compatible.
(Re-applied; lost during an earlier branch rebase.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@harryfan1985 harryfan1985 force-pushed the codex-intent-coding-mvp branch from e7167b0 to 56656ce Compare May 26, 2026 05:57
@harryfan1985 harryfan1985 marked this pull request as draft May 28, 2026 09:15
harryfan1985 and others added 15 commits May 28, 2026 18:54
- Add extract_hidden_intents_from_evidence() that infers HiddenIntent
  entries from proactive tool usage and AskUserQuestion topics
- Add proactive_tool_intent_description() for human-readable intent labels
- Wire extraction into record_intent_evidence() with deduplication
- Add load_unresolved_hidden_intents() for downstream consumers
- Add 4 extraction tests covering proactive tools, questions,
  deduplication, and passive turns
Main refactored RequestContextPolicy from static constructors to a
builder pattern (::empty().with_*()). Update IntentCodingMode to
use the new API instead of the removed ::full() method.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…se in README

The 26 intent/evidence pairs were self-hosted development artifacts from
the MVP implementation, not runtime dependencies. Only rules/, knowledge/,
and changes/ are loaded into agent context.

- Keep 1 representative pair as a format example
- Add a table in README clarifying that intents/evidence are per-task
  delivery artifacts, not runtime context
- agent:check still passes (1 Intent Record + 1 Evidence Package)
…les/

The three-directory split (rules/knowledge/changes) was a conceptual
distinction with no functional difference — the context loader
processes all three identically.  At MVP stage, one context directory
is sufficient.

- Delete .agent/knowledge/ and .agent/changes/
- instruction_context.rs: AGENT_CONTEXT_DIRS reduced to .agent/rules
- Update all context-loader tests to use rules/ only
- intent_coding_mode.md prompt: simplify context-loading instructions
- .agent/README.md: drop knowledge/changes from directory map and task lifecycle
- provenance-chain.md, context-budget.md: remove knowledge/changes references
- PR GCWing#873 body: add architecture reference table from former knowledge file
…runtime

These are per-task delivery artifacts, not repository scaffolding.
The agent creates .agent/intents/ and .agent/evidence/ on demand when
writing Intent Records and Evidence Packages.

- Delete .agent/intents/ and .agent/evidence/
- Delete stale templates: knowledge-template.md, change-template.md
- agent:check: intents/evidence no longer required; validate
  only when the dirs have files, otherwise report 'No active' info
- Update README, prompt to document runtime-on-demand behavior
Move IntentCoding workflow rules from workspace .agent/ into
prompts/intent_coding_rules/, loaded via include_str!().

- 9 rule files embedded in intent_coding.rs build_prompt()
- instruction_context.rs: .agent/rules removed from context dirs
- intent_coding_mode.md: updated to reference built-in rules
- agent:check: simplified to only validate Intent/Evidence when present
- Delete .agent/ directory from repository

Signed-off-by: harryfan1985 <harryfan1985@gmail.com>
- Add .agent/ to .gitignore to prevent accidental commit of runtime artifacts
- Downgrade intent-without-evidence to WARN in agent:check so mid-task runs don't fail
- Replace fragile rules.len()==9 assertion with per-name checks in test
- Remove context-budget.md rule (implementation detail, not agent guidance)
- Remove redundant || '' in modeDisplay.translatedOrEmpty

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
harryfan1985 and others added 18 commits May 28, 2026 19:17
write_once on Linux can leave data in tokio's write buffer if the file
handle is dropped without an explicit flush. Add flush() after write_all()
to ensure bytes reach the OS before the caller reads the file.

Also switch the test's read_to_string from std::fs (sync) to tokio::fs
(async) for consistency with the async write path, eliminating a subtle
ordering hazard on Linux CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… Record

Move clarification from step 3 to step 2 (between Load context and
Intent Record), making it a MANDATORY gate that blocks Intent Record
creation until resolved. This ensures Explore output informs
clarification questions without the agent proceeding to implementation
before user intent is aligned.
@harryfan1985 harryfan1985 force-pushed the codex-intent-coding-mvp branch from 5af5e68 to 4ab2968 Compare May 28, 2026 11:23
@harryfan1985 harryfan1985 changed the title Experiment feat impl:Intent Coding MVP workflow (#854) Experiment feat impl:Intent Coding MVP workflow (#854) + Hidden Intent proactivity tracking (pi-Bench) May 28, 2026
@harryfan1985 harryfan1985 changed the title Experiment feat impl:Intent Coding MVP workflow (#854) + Hidden Intent proactivity tracking (pi-Bench) Experiment feat impl:Intent Coding MVP + Hidden Intent proactivity tracking May 28, 2026
harryfan1985 and others added 5 commits May 28, 2026 19:51
The getModeDisplayDescription/getModeDisplayName helpers are now
consumed inside ModePickerOption, so the ChatInput-side import
became dead and broke type-check with TS6192.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address the highest-leverage findings from the deep review:

- C1: build_proactivity_report falls back to per-turn intent_evidence so
  the report is no longer always None when an evaluator hasn't run.
- C2: extract_hidden_intents_from_evidence emits trajectory markers with
  terminal_status = None, matching the module-doc contract; tests updated.
- H2: IntentEvidenceCollector now uses tokio::sync::Mutex instead of
  std::sync::Mutex so a future .await inside the critical section can't
  deadlock silently.
- H3: per-session intent_metadata_locks serializes the read-modify-write
  on SessionMetadata; lock map is cleared on delete_session.
- H4: cap tool_names_used (64), question_topics (16), turn_evidence (64)
  and hidden_intents (256) so long sessions don't grow unbounded.
- H5: missing workspace/metadata is a debug no-op instead of an error,
  silencing the warn-log spam for ephemeral/deleted sessions.
- H6: proactivity report uses .max() across multi-assignment turns and
  prefers the authoritative intent_evidence count when present.
- M3: slugify_topic falls back to a deterministic hash so non-ASCII
  question headers don't collide on empty slugs.
- M11: IntentCoding prompt rules section is cached in OnceLock instead
  of being rebuilt from ~10 include_str! blocks every dialog turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nder

- H1: useFlowChat / SessionModule / BtwThreadService auto-derive
  enableIntentTracking from the IntentCoding mode so picking the mode
  actually turns the evaluator on (it was previously plumbed but never
  set anywhere).
- H7: maybeWarnIntentCodingEvidenceMissing now requires status===completed
  and a non-cancelled session; the detector regex is anchored on file-path
  shapes (.agent/evidence/, evidence-*.md) instead of the loose "Evidence
  Package" phrase; user-steering items are skipped so an end-user message
  echoing the phrase can no longer satisfy or trigger the warning.
- H10: SessionAPI level fields widened to ProactivityLevel|(string&{}) and
  CompletenessLevel|(string&{}) so a future backend variant doesn't break
  exhaustiveness narrowing in callers.
- M1: ChatInput agent-capsule modifier lowercases modeState.current so
  the IntentCoding mode no longer produces a missing --IntentCoding class.
- M2: ModePickerOption gets role="option", tabIndex=0, aria-selected,
  aria-label and Enter/Space keyboard activation.

Tests updated to cover the new gate (status, cancellation, user-steering).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- H8: validateEvidenceIntentReference now searches the Provenance Chain
  section instead of the whole document so a stray mention elsewhere can't
  satisfy the requirement.
- H9: reject ".." traversal in two paths the validator/writer touched:
  the session_store branch of validateEvidenceProvenanceChain (would have
  let a crafted evidence file read arbitrary local JSON), and the CLI-
  derived --session-id / --turn-id args in intent-coding-provenance-record
  (would have let --session-id ../../tmp/pwn write outside .bitfun/).
- M5: sectionContent terminator tightened to /^##(?!#)\s+/ in both scripts
  so nested ### subheadings stop truncating Repair Loop / Risks content.
- M6: dependency gate trigger now includes Cargo.lock so lockfile-only
  Rust bumps are gated alongside Cargo.toml/package.json/pnpm-lock.yaml.
- M7: Context Inputs regex accepts ":" inside the reference (URLs,
  file.md:42 line refs, Windows paths) by splitting on the last ": ".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The H1 fix from b154781 lived in web-ui only, so any session created
via server RPC (rpc_dispatcher::create_session) or the AgentSubmissionPort
constructed SessionConfig with Default::default() — where
enable_intent_tracking is false — and IntentCoding sessions silently
shipped without the evaluator on those code paths.

Move the auto-derive into ConversationCoordinator::apply_mode_derived_session_defaults
and call it from create_session_with_workspace_and_creator,
create_hidden_subagent_session_with_workspace, and the inner
create_hidden_subagent_session so every entry point — desktop, server,
relay, subagent spawn — enables tracking for agent_type == "IntentCoding"
unless the caller explicitly set it true already.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@harryfan1985
Copy link
Copy Markdown
Contributor Author

能力对比测评报告

执行的两个任务

Task A:文档注释修改 Task B:创建 TS 工具函数
目标文件 src/crates/core/src/agentic/agents/mod.rs src/web-ui/src/shared/utils/cn.ts
复杂度 低(文档变更) 中(新建文件 + 逻辑代码)

对比结果

Task A:文档注释修改

指标 IntentCoding Agentic(对照) 差异
耗时 30s 23s IC 多 30%
工具调用 3 1 IC 多 2 次(Read + Edit + cargo check)
代码正确性 ✅ 编译通过 ✅ 编译通过 相同
Diff 1 file, +1/-3 1 file, +1/-3 完全相同
验证命令 隐式(Read 回读) IC 多一步验证
Intent Record 无(文档任务跳过) N/A
Evidence N/A

Task B:创建 TS 工具函数

指标 IntentCoding Agentic(对照) 差异
耗时 166s 22s IC 多 7.5x
工具调用 16 3 IC 多 13 次
代码正确性 ✅ 编译 + 测试通过 ⚠️ 仅文件存在 IC 严格验证
产出文件 cn.ts + cn.test.ts cn.ts(无测试) IC 多 1 文件
测试文件 ✅ 3 个 vitest case ❌ 未生成 关键差异
验证命令 vitest run 通过 ❌ 未运行 关键差异
Intent Record ✅ 创建 N/A IC 独有
Evidence Package ✅ 含风险/策略/验证 N/A IC 独有
agent:check ✅ Passed N/A IC 独有
Intent Tracking ✅ 3 hidden intents ❌ disabled IC 独有

能力提升总结

                 Task A (简单)              Task B (中等)
                 ─────────────              ─────────────
                  IC     Agentic            IC      Agentic
                  
代码正确性         ████    ████              ████    ████
验证完整性         ████    ██░░              ████    ░░░░   ← 7.5x 提升
测试覆盖           ░░░░    ░░░░              ████    ░░░░   ← 从无到有
可追溯性           ░░░░    ░░░░              ████    ░░░░   ← 从无到有
风险感知           ░░░░    ░░░░              ████    ░░░░   ← 从无到有
Proactivity 可见性  ░░░░    ░░░░              ████    ░░░░   ← 从无到有

核心发现

  1. 验证驱动的质量保证:IntentCoding 在 Task B 中自主创建了测试文件并运行验证(3 tests passed),Agentic 仅创建了源文件。这是"是否有测试"的本质差异。

  2. 完整的可追溯性:IntentCoding 产出了 Intent Record + Evidence Package + 3 个 hidden intents,使得"为什么这样改、改了什么、怎么验证的"完全可回溯。Agentic 无任何痕迹。

  3. 额外开销可控:Task B 中 IC 多用了 144s 和 13 次工具调用,但这些额外开销换来了测试文件、验证执行、证据链——对中高复杂度任务是值得的。

  4. 简单任务无冗余:Task A(纯文档变更)中 IC 正确跳过了完整工作流,没有产生多余的 Intent/Evidence 文件,说明工作流有合理的"按任务复杂度适配"逻辑。

  5. agent:check 通过:Evidence Package 无 WARN 或 ERROR,结构完整、风险分类准确。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat-plan] 面向意图对齐的 Coding Agent 工作流 (Intent-Aligned Coding Agent Workflow)

2 participants