Experiment feat impl:Intent Coding MVP + Hidden Intent proactivity tracking#873
Draft
harryfan1985 wants to merge 53 commits into
Draft
Experiment feat impl:Intent Coding MVP + Hidden Intent proactivity tracking#873harryfan1985 wants to merge 53 commits into
harryfan1985 wants to merge 53 commits into
Conversation
53a43fa to
8b98639
Compare
harryfan1985
added a commit
to harryfan1985/BitFun
that referenced
this pull request
May 25, 2026
…les/ The three-directory split (rules/knowledge/changes) was a conceptual distinction with no functional difference — the context loader processes all three identically. At MVP stage, one context directory is sufficient. - Delete .agent/knowledge/ and .agent/changes/ - instruction_context.rs: AGENT_CONTEXT_DIRS reduced to .agent/rules - Update all context-loader tests to use rules/ only - intent_coding_mode.md prompt: simplify context-loading instructions - .agent/README.md: drop knowledge/changes from directory map and task lifecycle - provenance-chain.md, context-budget.md: remove knowledge/changes references - PR GCWing#873 body: add architecture reference table from former knowledge file
Based on the pi-Bench Hidden Intent framework (arXiv 2605.14678), this
introduces infrastructure for tracking proactive assistance quality in
long-horizon agent workflows.
Paper reference:
pi-Bench: Evaluating Proactive Personal Assistant Agents in
Long-Horizon Workflows
Zhang et al., arXiv 2605.14678, May 2026
What this adds:
- Hidden Intent types: IntentTerminalStatus (Completed/Inferred/Provided),
HiddenIntent, PersistentIntent, SessionIntentTracking,
ProactivityScore, CompletenessScore in services-core
- IntentEvidenceCollector and IntentTurnEvidence in the ExecutionEngine
for lightweight per-turn signal collection
- Proactivity behavior guidance in agentic_mode.md and claw_mode.md
system prompts
- Extended facet_extraction.md with proactivity/completeness
analysis dimensions
- SessionUsageReport extensions with ProactivityReport and
CompletenessRepor
Based on the pi-Bench Hidden Intent framework (arXiv 2605.14678), this
introduces infrastructure for tracking p edintroduces infrastructure for tracking proactive assistance quality ig.long-horizon agent workflows.
Paper reference:
pi-Bench: Evaluatinho
Paper reference:
pi-Benchden pi-Bench: Evas Long-Horizon Workflows
Zhang et al., arXiv 2605.14678, Mer Zhang et al., arXiv 2ou
What this adds:
- Hidden Intent types: As - Hidden Intde HiddenIntent, PersistentIntent, SessionIntentTracking,
ProactivitySal ProactivityScore, CompletenessScore in services-core
ds - IntentEvidenceCollector and IntentTurnEvidence in t
- round_executor: detect AskUserQuestion even when no topic headers are
extractable, so the call is no longer silently dropped
- execution_engine/session_manager: drop unused turn_id param; warn on
poisoned intent evidence mutex instead of silent skip
- hidden_intent_types: centralize proactivity level thresholds in
ProactivityLevel::{from_score,as_str}; add explicit IntentAssignment
is_proxy flag so proxy detection no longer relies solely on a fragile
intent_id string heuristic (heuristic kept as legacy fallback)
- session_usage: use is_proxy flag first; document the single-provided
suppression rationale
- add regression tests for AskUserQuestion detection and proxy filtering
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `tool_call` fixture helper and its `ToolCall` import were dropped when rebasing onto main, which had rewritten the test module header. Adds them back so the detect_ask_user_question tests compile and pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
效果好吗? |
Mirror the Rust IntentAssignment is_proxy field so the frontend can read and filter proxy assignments. Optional to stay backward compatible. (Re-applied; lost during an earlier branch rebase.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e7167b0 to
56656ce
Compare
- Add extract_hidden_intents_from_evidence() that infers HiddenIntent entries from proactive tool usage and AskUserQuestion topics - Add proactive_tool_intent_description() for human-readable intent labels - Wire extraction into record_intent_evidence() with deduplication - Add load_unresolved_hidden_intents() for downstream consumers - Add 4 extraction tests covering proactive tools, questions, deduplication, and passive turns
Main refactored RequestContextPolicy from static constructors to a builder pattern (::empty().with_*()). Update IntentCodingMode to use the new API instead of the removed ::full() method. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…se in README The 26 intent/evidence pairs were self-hosted development artifacts from the MVP implementation, not runtime dependencies. Only rules/, knowledge/, and changes/ are loaded into agent context. - Keep 1 representative pair as a format example - Add a table in README clarifying that intents/evidence are per-task delivery artifacts, not runtime context - agent:check still passes (1 Intent Record + 1 Evidence Package)
…les/ The three-directory split (rules/knowledge/changes) was a conceptual distinction with no functional difference — the context loader processes all three identically. At MVP stage, one context directory is sufficient. - Delete .agent/knowledge/ and .agent/changes/ - instruction_context.rs: AGENT_CONTEXT_DIRS reduced to .agent/rules - Update all context-loader tests to use rules/ only - intent_coding_mode.md prompt: simplify context-loading instructions - .agent/README.md: drop knowledge/changes from directory map and task lifecycle - provenance-chain.md, context-budget.md: remove knowledge/changes references - PR GCWing#873 body: add architecture reference table from former knowledge file
…runtime These are per-task delivery artifacts, not repository scaffolding. The agent creates .agent/intents/ and .agent/evidence/ on demand when writing Intent Records and Evidence Packages. - Delete .agent/intents/ and .agent/evidence/ - Delete stale templates: knowledge-template.md, change-template.md - agent:check: intents/evidence no longer required; validate only when the dirs have files, otherwise report 'No active' info - Update README, prompt to document runtime-on-demand behavior
Move IntentCoding workflow rules from workspace .agent/ into prompts/intent_coding_rules/, loaded via include_str!(). - 9 rule files embedded in intent_coding.rs build_prompt() - instruction_context.rs: .agent/rules removed from context dirs - intent_coding_mode.md: updated to reference built-in rules - agent:check: simplified to only validate Intent/Evidence when present - Delete .agent/ directory from repository Signed-off-by: harryfan1985 <harryfan1985@gmail.com>
- Add .agent/ to .gitignore to prevent accidental commit of runtime artifacts - Downgrade intent-without-evidence to WARN in agent:check so mid-task runs don't fail - Replace fragile rules.len()==9 assertion with per-name checks in test - Remove context-budget.md rule (implementation detail, not agent guidance) - Remove redundant || '' in modeDisplay.translatedOrEmpty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
write_once on Linux can leave data in tokio's write buffer if the file handle is dropped without an explicit flush. Add flush() after write_all() to ensure bytes reach the OS before the caller reads the file. Also switch the test's read_to_string from std::fs (sync) to tokio::fs (async) for consistency with the async write path, eliminating a subtle ordering hazard on Linux CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… Record Move clarification from step 3 to step 2 (between Load context and Intent Record), making it a MANDATORY gate that blocks Intent Record creation until resolved. This ensures Explore output informs clarification questions without the agent proceeding to implementation before user intent is aligned.
5af5e68 to
4ab2968
Compare
…tracking' into codex-intent-coding-mvp
The getModeDisplayDescription/getModeDisplayName helpers are now consumed inside ModePickerOption, so the ChatInput-side import became dead and broke type-check with TS6192. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address the highest-leverage findings from the deep review: - C1: build_proactivity_report falls back to per-turn intent_evidence so the report is no longer always None when an evaluator hasn't run. - C2: extract_hidden_intents_from_evidence emits trajectory markers with terminal_status = None, matching the module-doc contract; tests updated. - H2: IntentEvidenceCollector now uses tokio::sync::Mutex instead of std::sync::Mutex so a future .await inside the critical section can't deadlock silently. - H3: per-session intent_metadata_locks serializes the read-modify-write on SessionMetadata; lock map is cleared on delete_session. - H4: cap tool_names_used (64), question_topics (16), turn_evidence (64) and hidden_intents (256) so long sessions don't grow unbounded. - H5: missing workspace/metadata is a debug no-op instead of an error, silencing the warn-log spam for ephemeral/deleted sessions. - H6: proactivity report uses .max() across multi-assignment turns and prefers the authoritative intent_evidence count when present. - M3: slugify_topic falls back to a deterministic hash so non-ASCII question headers don't collide on empty slugs. - M11: IntentCoding prompt rules section is cached in OnceLock instead of being rebuilt from ~10 include_str! blocks every dialog turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nder
- H1: useFlowChat / SessionModule / BtwThreadService auto-derive
enableIntentTracking from the IntentCoding mode so picking the mode
actually turns the evaluator on (it was previously plumbed but never
set anywhere).
- H7: maybeWarnIntentCodingEvidenceMissing now requires status===completed
and a non-cancelled session; the detector regex is anchored on file-path
shapes (.agent/evidence/, evidence-*.md) instead of the loose "Evidence
Package" phrase; user-steering items are skipped so an end-user message
echoing the phrase can no longer satisfy or trigger the warning.
- H10: SessionAPI level fields widened to ProactivityLevel|(string&{}) and
CompletenessLevel|(string&{}) so a future backend variant doesn't break
exhaustiveness narrowing in callers.
- M1: ChatInput agent-capsule modifier lowercases modeState.current so
the IntentCoding mode no longer produces a missing --IntentCoding class.
- M2: ModePickerOption gets role="option", tabIndex=0, aria-selected,
aria-label and Enter/Space keyboard activation.
Tests updated to cover the new gate (status, cancellation, user-steering).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- H8: validateEvidenceIntentReference now searches the Provenance Chain section instead of the whole document so a stray mention elsewhere can't satisfy the requirement. - H9: reject ".." traversal in two paths the validator/writer touched: the session_store branch of validateEvidenceProvenanceChain (would have let a crafted evidence file read arbitrary local JSON), and the CLI- derived --session-id / --turn-id args in intent-coding-provenance-record (would have let --session-id ../../tmp/pwn write outside .bitfun/). - M5: sectionContent terminator tightened to /^##(?!#)\s+/ in both scripts so nested ### subheadings stop truncating Repair Loop / Risks content. - M6: dependency gate trigger now includes Cargo.lock so lockfile-only Rust bumps are gated alongside Cargo.toml/package.json/pnpm-lock.yaml. - M7: Context Inputs regex accepts ":" inside the reference (URLs, file.md:42 line refs, Windows paths) by splitting on the last ": ". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The H1 fix from b154781 lived in web-ui only, so any session created via server RPC (rpc_dispatcher::create_session) or the AgentSubmissionPort constructed SessionConfig with Default::default() — where enable_intent_tracking is false — and IntentCoding sessions silently shipped without the evaluator on those code paths. Move the auto-derive into ConversationCoordinator::apply_mode_derived_session_defaults and call it from create_session_with_workspace_and_creator, create_hidden_subagent_session_with_workspace, and the inner create_hidden_subagent_session so every entry point — desktop, server, relay, subagent spawn — enables tracking for agent_type == "IntentCoding" unless the caller explicitly set it true already. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Author
能力对比测评报告执行的两个任务
对比结果Task A:文档注释修改
Task B:创建 TS 工具函数
能力提升总结核心发现
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #854
This PR combines the Intent Coding MVP workflow (#854) with Hidden Intent proactivity tracking from PR #846.
Part 1 - Intent Coding MVP: IntentCodingMode agent, context compiler, provenance chain, risk classification, policy gates, mode picker UI.
Part 2 - Hidden Intent Tracking (merged from #846): pi-Bench based infrastructure with IntentEvidenceCollector, heuristic intent extraction, proactivity/completeness scoring, session usage report extensions.