feat(memos-local-plugin): v2 Reflect2Evolve plugin with mid-turn reasoning capture by hijzy · Pull Request #1515 · MemTensor/MemOS

hijzy · 2026-04-22T09:39:28Z

Summary

This PR lands the memos-local-plugin v2 rewrite and iterates on top of it with multi-tool-call reasoning capture + retrieval improvements.

Scope (only `apps/memos-local-plugin/`)

Three commits on top of main:

feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture — the base v2 rewrite (layered L1/L2/L3 memory, Reflect2Evolve capture/reward/skill pipeline, tier-1/2/3 retrieval, OpenClaw + Hermes adapters).
style(memos-local-plugin): apply ruff check + format to Python files — Python adapter pass (Hermes provider).
feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements — the capture+retrieval work described below.

Capture path — mid-turn reasoning is no longer lost

A multi-tool-call turn looks like user → [thinking?, text?, tool_1] → result_1 → [thinking?, text?, tool_2] → result_2 → [text final]. The plugin used to collapse every assistant text into one agentText and drop the bridge narration ("tool_1 failed, let me try tool_2") entirely. This PR fixes that end-to-end:

Adapter extractTurn flushes thinking + inline assistant text into the next tool call's new thinkingBefore field so the bridge reasoning is preserved per-tool, not smashed into the final reply.
Adapter flattenMessages no longer double-emits tool calls when pi-ai's content[toolCall] and OpenAI-legacy top-level tool_calls coexist. Prior double-push clobbered the first stub via pendingCalls.set, making thinkingBefore silently go missing and doubling tool-call rows in the DB.
Orchestrator persists thinkingBefore in the tool EpisodeTurn.meta so capture can read it back.
Step extractor (per-decision-point sub-step capture):
- Only the first tool sub-step carries userText; subsequent sub-steps leave it empty so flattenChat doesn't render the same user bubble N times.
- toolCallFromTurn + coerceToolCall now hydrate thinkingBefore from meta.
Normalizer skips the generic dedup path for sub-steps — their intentionally-identical empty userText/agentText + 1-tool shape was incorrectly collapsing two distinct tools whenever their input prefixes matched under 200 chars.
Agent contract ToolCallDTO.thinkingBefore?: string added; no SQL migration (stored inside tool_calls_json).
Web flattenChat renders per-tool thinkingBefore bubbles before each tool call; retains agentThinking single-bubble fallback for pure-reply traces.

Retrieval path

Refactored LLM-filter prompt + schema, reworked ranker scoring with new blend knobs, updated retrieve + pipeline wiring.
Config defaults / schema expose the new retrieval knobs; viewer LogsView surfaces the new fields; i18n updated.

Tests

New regression tests for:
- extractTurn interleaved thinking per tool call
- pi-ai + OpenAI-legacy double-push avoidance in flattenMessages
- flattenChat sub-step rendering (per-tool thinking bubble, no duplicate user)
Existing retrieval / llm-filter / ranker unit tests updated to the new shape.

Test plan

apps/memos-local-plugin/ unit tests: flattenChat suite passes (12/12).
Live OpenClaw integration: clean DB, issue a multi-tool-call query (e.g. "list git status + recent commits + current branch"), verify in the Tasks drawer:
- Single user bubble (no N-fold repetition)
- thinking bubble before each tool call when the model produced one (models may also emit tool calls without any mid-step text — that's model-level behavior and cannot be back-filled)
- Tool call bubbles in chronological order with correct input/output
- Final assistant reply at the end
Confirmed no duplicated tool-call rows in tool_calls_json after the double-push fix.

Non-scope

No changes outside apps/memos-local-plugin/.
No pnpm-lock.yaml / uv.lock bumps.

…rovements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape.

…ledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务三个粒度的关系、打分粒度（每步 α/V，每任务 R_human，"轮"无独立分）、检索粒度（技能/单步/子任务序列/ 环境认知，没有"按轮"召回）、生成链路（小步→经验→环境认知→技能）、以及 §6 "经验 vs 环境认知边界裁剪" 章节回答"该不该合并"问题：7 条反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示，引导新人先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新，标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design.

…ledge + L2/L3 boundary prompts (#1516) ## Summary This PR continues the v2 Reflect2Evolve plugin work merged in #1515 with three orthogonal improvements that landed together because they share the same trace fixtures and tests: 1. **UI: one user turn = one memory card** — frontend collapses sub-step rows by `(episodeId, turnId)`. Algorithm layer (V/α backprop, L2 induction, Tier-2 retrieval, Decision Repair) keeps step-level granularity per V7 §0.1. 2. **Knowledge generation in user's language** — every L1/L2/L3/Skill/reflection generation site now detects the dominant language of its evidence and emits a `languageSteeringLine` so a Chinese user no longer gets half-English memos. 3. **L2 / L3 prompts: hard boundary against drift** — `L2_INDUCTION_PROMPT` and `L3_ABSTRACTION_PROMPT` bumped v1 → v2 with explicit "what NOT to write" guards plus same-fact-two-framings examples to keep procedural ↔ declarative knowledge cleanly separated. Plus two infrastructure fixes the v2 plugin needed to actually run on better-sqlite3 ≥ v11 (defensive-mode block on `sqlite_master`) and a documentation alignment doc explaining the 小步/轮/任务 + 经验/环境认知/技能 mental model so future contributors stop conflating UI, storage, and algorithm granularities. ## What changed ### `traces.turn_id` + per-turn UI grouping - New migration `013-trace-turn-id.sql`: adds `turn_id INTEGER` + `idx_traces_episode_turn` index. - `step-extractor.ts` stamps every sub-step from the same user message with the user turn's `ts` as `meta.turnId`; `capture.ts::pickTurnId` threads it into `traces.turn_id`. - `MemoriesView.tsx` introduces `MemoryGroup` aggregation + `<StepList>` drawer so a 5-tool turn renders as one card with five collapsible step blocks (each carrying its own V / α / reflection / toolCalls), instead of five sibling cards. Bulk select / delete / share / export operate at card level. - DB rows from before this migration get NULL `turn_id` and fall back to per-row rendering. ### Language-aware knowledge generation - `core/llm/prompts/index.ts`: new `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs vs ASCII letters, returns `"zh" | "en" | "auto"`. Allocation-free, runs on every gen call. - All five gen sites inject `languageSteeringLine`: - `capture/alpha-scorer.ts` — reflection-quality reason - `capture/batch-scorer.ts` — per-step batch reflections - `memory/l2/induce.ts` — L2 policy fields - `memory/l3/abstract.ts` — L3 (ℰ, ℐ, C) bullets - `skill/crystallize.ts` — skill body + scope ### L2 / L3 boundary prompts (v1 → v2) - `L2_INDUCTION_PROMPT`: new "Boundaries — what NOT to write" section explicitly rejects environment topology / declarative behavioural rules / generic taboos. Includes same-fact-two-framings example (procedural vs declarative for the same underlying truth). - `L3_ABSTRACTION_PROMPT`: bans imperative verbs (do / should / use / install / run) under any of ℰ/ℐ/C. All three example sets rewritten as pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). - Test mock keys updated v1 → v2; historical `inducedBy` audit strings intentionally left at v1 (they record the prompt version a row was generated under, not a call-time match key). ### Retrieval injector heading hierarchy - `# User's conversation history (from memory system)` is H1; `## Memories` / `## Skills` / `## Environment Knowledge` are H2 — restores the visual outline the LLM consumes. ### Migration runner: better-sqlite3 ≥ v11 compatibility - `runMigrations` now flips `db.raw.unsafeMode(true)` at the outer boundary if any pending migration uses `PRAGMA writable_schema` (resets in `finally`). Migration 012 (status unification) needs this to swap CHECK constraints in-place; defensive mode otherwise blocked it at runtime. - Migration 012 SQL uses single-quote literals with doubled inner quotes (was double-quoted, which strict mode treats as identifiers). ### Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` (~365 lines, zh-CN) — the foundational mental-model doc that should be read before any other algorithm doc: - 小步 / 轮 / 任务三个交互粒度 + 与代码层的对应 - 打分粒度（每步 α/V，每任务 R_human，"轮"无独立分） - 检索粒度（技能/单步/子任务序列/环境认知，没有"按轮"召回 + 三层判别） - "结构性不确定" vs "操作性疑问" 判别表 - 经验 / 环境认知 / 技能五者关系 - §6 "经验 vs 环境认知边界裁剪"：7 条不该合并的理由 + 三种折中方案对比 + 7 维度判别速查 + 同事实多框架对照表 + 反例 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示。 - `docs/README.md` 索引同步更新。 ## Algorithm alignment Per V7 §0.1, the L1 trace is the minimum learning unit and stays step-level — one tool call → one trace, one final reply → one trace. The "one round = one memory" view is purely a frontend display concern using `turn_id` as a stable group key. Reflection-weighted backprop, cross-task L2 association, error-signature retrieval, and Decision Repair all continue to operate per-step. Documented end-to-end in the new GRANULARITY doc §6. ## Test plan - [x] `npx vitest run tests/unit/capture/step-extractor.test.ts` — turnId stamped on every sub-step, multi-tool turn shares one turnId (11/11 pass) - [x] `npx vitest run tests/unit/memory/l2/ tests/unit/memory/l3/ tests/unit/llm/prompts.test.ts` — prompt v2 mock keys + L2/L3 induction (74/74 pass) - [x] `npx vitest run tests/unit/storage/` — migration 013 applies cleanly (106/106 pass) - [x] `npx vitest run tests/unit/` — full unit sweep: 802/806 pass; 4 failures are pre-existing on `main` (mock LLM behavior in reward integration + an outdated `capture.lite.done` event-list assertion), unchanged by this PR - [x] Local install via `bash install.sh --version ./memtensor-memos-local-plugin-2.0.0-beta.1.tgz`: gateway + viewer come up clean, `traces.turn_id` column present, migration 013 logged as applied - [x] Manual end-to-end: ran a 3-tool query in OpenClaw, verified the memory page shows ONE card with `工具 · 4 步` chip, drawer expands into 4 collapsible step sections with per-step V/α/thinking/tool I-O ## Notes - No backward compat for the schema change is required — fresh installs run all 13 migrations on first open. Existing local DBs auto-pick up 013 the next time the gateway opens them. - Only `apps/memos-local-plugin/` is touched. No changes to other packages.

* feat: add .env.example-full and fix .env.example * feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture Complete end-to-end rewrite of the memos-local-plugin package into a layered, agent-agnostic memory runtime with support for both OpenClaw and Hermes adapters. Highlights: - New `core/` package (agent-agnostic): capture, embedding, feedback, hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall, retrieval, skill, session, storage, config modules — each with its own README + ALGORITHMS notes. - New `adapters/` layer with `openclaw/` and `hermes/` integrations isolated from core. Agent-specific concepts (turns, installers, bridge clients) live only here. - New `agent-contract/` — single shared contract (dto, errors, events, jsonrpc, log-record, memory-core) between core and adapters. - New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts). - New `server/` — HTTP/SSE server for the viewer. - New `site/` — Vite-built public product site + release notes index. - New `web/` — Vite-built viewer app with memory/skill/timeline/world model views. - New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING, Reflect2Skill design core, multi-agent viewer, etc. - New `tests/` — vitest unit/integration + python bridge tests. - Tooling: TypeScript multi-project build (tsconfig.{json,site,web}), Vite + Vitest, cross-platform install.sh / install.ps1, npm release checker, package-lock committed. - Removes legacy `src/` and `www/` structure from main branch; the new layout replaces it entirely. This change is fully scoped to apps/memos-local-plugin/ and does not touch any other package. * style(memos-local-plugin): apply ruff check + format to Python files - Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py - Move Callable into TYPE_CHECKING block in bridge_client.py - Replace try/except/pass with contextlib.suppress in bridge_client.py - Combine nested if in test_bridge_client.py - Apply ruff format to 4 Python files (hermes adapter + tests) All files now pass `ruff check` and `ruff format --check`. * feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape. * feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务三个粒度的关系、打分粒度（每步 α/V，每任务 R_human，"轮"无独立分）、检索粒度（技能/单步/子任务序列/ 环境认知，没有"按轮"召回）、生成链路（小步→经验→环境认知→技能）、以及 §6 "经验 vs 环境认知边界裁剪" 章节回答"该不该合并"问题：7 条反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示，引导新人先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新，标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design. --------- Co-authored-by: tyh <3211345556@qq.com> Co-authored-by: jiang <fdjzy@qq.com> Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>

…sing (#1533) * feat: add .env.example-full and fix .env.example * feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture Complete end-to-end rewrite of the memos-local-plugin package into a layered, agent-agnostic memory runtime with support for both OpenClaw and Hermes adapters. Highlights: - New `core/` package (agent-agnostic): capture, embedding, feedback, hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall, retrieval, skill, session, storage, config modules — each with its own README + ALGORITHMS notes. - New `adapters/` layer with `openclaw/` and `hermes/` integrations isolated from core. Agent-specific concepts (turns, installers, bridge clients) live only here. - New `agent-contract/` — single shared contract (dto, errors, events, jsonrpc, log-record, memory-core) between core and adapters. - New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts). - New `server/` — HTTP/SSE server for the viewer. - New `site/` — Vite-built public product site + release notes index. - New `web/` — Vite-built viewer app with memory/skill/timeline/world model views. - New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING, Reflect2Skill design core, multi-agent viewer, etc. - New `tests/` — vitest unit/integration + python bridge tests. - Tooling: TypeScript multi-project build (tsconfig.{json,site,web}), Vite + Vitest, cross-platform install.sh / install.ps1, npm release checker, package-lock committed. - Removes legacy `src/` and `www/` structure from main branch; the new layout replaces it entirely. This change is fully scoped to apps/memos-local-plugin/ and does not touch any other package. * style(memos-local-plugin): apply ruff check + format to Python files - Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py - Move Callable into TYPE_CHECKING block in bridge_client.py - Replace try/except/pass with contextlib.suppress in bridge_client.py - Combine nested if in test_bridge_client.py - Apply ruff format to 4 Python files (hermes adapter + tests) All files now pass `ruff check` and `ruff format --check`. * feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape. * feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务三个粒度的关系、打分粒度（每步 α/V，每任务 R_human，"轮"无独立分）、检索粒度（技能/单步/子任务序列/ 环境认知，没有"按轮"召回）、生成链路（小步→经验→环境认知→技能）、以及 §6 "经验 vs 环境认知边界裁剪" 章节回答"该不该合并"问题：7 条反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示，引导新人先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新，标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design. * feat: port plugin system from enterprise branch Bring the plugin runtime, API bootstrap integration, and related tests from aliyun-ee/dev-20260423-v2.0.14.post onto upstream/main so the upstream branch can review the plugin architecture independently of the broader enterprise history. Made-with: Cursor * fix: working binding related bug --------- Co-authored-by: tyh <3211345556@qq.com> Co-authored-by: jiang <fdjzy@qq.com> Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>

…1526) * feat: add .env.example-full and fix .env.example * feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture Complete end-to-end rewrite of the memos-local-plugin package into a layered, agent-agnostic memory runtime with support for both OpenClaw and Hermes adapters. Highlights: - New `core/` package (agent-agnostic): capture, embedding, feedback, hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall, retrieval, skill, session, storage, config modules — each with its own README + ALGORITHMS notes. - New `adapters/` layer with `openclaw/` and `hermes/` integrations isolated from core. Agent-specific concepts (turns, installers, bridge clients) live only here. - New `agent-contract/` — single shared contract (dto, errors, events, jsonrpc, log-record, memory-core) between core and adapters. - New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts). - New `server/` — HTTP/SSE server for the viewer. - New `site/` — Vite-built public product site + release notes index. - New `web/` — Vite-built viewer app with memory/skill/timeline/world model views. - New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING, Reflect2Skill design core, multi-agent viewer, etc. - New `tests/` — vitest unit/integration + python bridge tests. - Tooling: TypeScript multi-project build (tsconfig.{json,site,web}), Vite + Vitest, cross-platform install.sh / install.ps1, npm release checker, package-lock committed. - Removes legacy `src/` and `www/` structure from main branch; the new layout replaces it entirely. This change is fully scoped to apps/memos-local-plugin/ and does not touch any other package. * style(memos-local-plugin): apply ruff check + format to Python files - Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py - Move Callable into TYPE_CHECKING block in bridge_client.py - Replace try/except/pass with contextlib.suppress in bridge_client.py - Combine nested if in test_bridge_client.py - Apply ruff format to 4 Python files (hermes adapter + tests) All files now pass `ruff check` and `ruff format --check`. * feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape. * feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务三个粒度的关系、打分粒度（每步 α/V，每任务 R_human，"轮"无独立分）、检索粒度（技能/单步/子任务序列/ 环境认知，没有"按轮"召回）、生成链路（小步→经验→环境认知→技能）、以及 §6 "经验 vs 环境认知边界裁剪" 章节回答"该不该合并"问题：7 条反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示，引导新人先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新，标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design. * fix(format_utils): raise ValueError on None in clean_json_response clean_json_response treats its argument as a str and unconditionally calls .replace(). When an upstream LLM helper returns None (e.g. due to the silent-fail pattern in timed_with_status), the resulting AttributeError points to format_utils.py rather than to the failed LLM call, which is hard to diagnose. Add an explicit None check that raises a descriptive ValueError. This turns the symptom 'NoneType has no attribute replace' into a message that names the actual root cause. --------- Co-authored-by: tyh <3211345556@qq.com> Co-authored-by: CaralHsi <caralhsi@gmail.com> Co-authored-by: jiang <fdjzy@qq.com> Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com> Co-authored-by: auctor <auctor@xinfty.space>

hijzy merged commit 1427811 into MemTensor:main Apr 22, 2026
16 checks passed

hijzy mentioned this pull request Apr 22, 2026

feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts #1516

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memos-local-plugin): v2 Reflect2Evolve plugin with mid-turn reasoning capture#1515

feat(memos-local-plugin): v2 Reflect2Evolve plugin with mid-turn reasoning capture#1515
hijzy merged 1 commit intoMemTensor:mainfrom
hijzy:feat/memos-local-plugin-v2

hijzy commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hijzy commented Apr 22, 2026

Summary

Scope (only apps/memos-local-plugin/)

Capture path — mid-turn reasoning is no longer lost

Retrieval path

Tests

Test plan

Non-scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Scope (only `apps/memos-local-plugin/`)