feat(memos-local-plugin): v2 Reflect2Evolve plugin with mid-turn reasoning capture#1515
Merged
hijzy merged 1 commit intoMemTensor:mainfrom Apr 22, 2026
Merged
Conversation
…rovements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape.
6 tasks
hijzy
added a commit
that referenced
this pull request
Apr 22, 2026
…ledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V, 每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/ 环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、 以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条 反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人 先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design.
hijzy
added a commit
that referenced
this pull request
Apr 22, 2026
…ledge + L2/L3 boundary prompts (#1516) ## Summary This PR continues the v2 Reflect2Evolve plugin work merged in #1515 with three orthogonal improvements that landed together because they share the same trace fixtures and tests: 1. **UI: one user turn = one memory card** — frontend collapses sub-step rows by `(episodeId, turnId)`. Algorithm layer (V/α backprop, L2 induction, Tier-2 retrieval, Decision Repair) keeps step-level granularity per V7 §0.1. 2. **Knowledge generation in user's language** — every L1/L2/L3/Skill/reflection generation site now detects the dominant language of its evidence and emits a `languageSteeringLine` so a Chinese user no longer gets half-English memos. 3. **L2 / L3 prompts: hard boundary against drift** — `L2_INDUCTION_PROMPT` and `L3_ABSTRACTION_PROMPT` bumped v1 → v2 with explicit "what NOT to write" guards plus same-fact-two-framings examples to keep procedural ↔ declarative knowledge cleanly separated. Plus two infrastructure fixes the v2 plugin needed to actually run on better-sqlite3 ≥ v11 (defensive-mode block on `sqlite_master`) and a documentation alignment doc explaining the 小步/轮/任务 + 经验/环境认知/技能 mental model so future contributors stop conflating UI, storage, and algorithm granularities. ## What changed ### `traces.turn_id` + per-turn UI grouping - New migration `013-trace-turn-id.sql`: adds `turn_id INTEGER` + `idx_traces_episode_turn` index. - `step-extractor.ts` stamps every sub-step from the same user message with the user turn's `ts` as `meta.turnId`; `capture.ts::pickTurnId` threads it into `traces.turn_id`. - `MemoriesView.tsx` introduces `MemoryGroup` aggregation + `<StepList>` drawer so a 5-tool turn renders as one card with five collapsible step blocks (each carrying its own V / α / reflection / toolCalls), instead of five sibling cards. Bulk select / delete / share / export operate at card level. - DB rows from before this migration get NULL `turn_id` and fall back to per-row rendering. ### Language-aware knowledge generation - `core/llm/prompts/index.ts`: new `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs vs ASCII letters, returns `"zh" | "en" | "auto"`. Allocation-free, runs on every gen call. - All five gen sites inject `languageSteeringLine`: - `capture/alpha-scorer.ts` — reflection-quality reason - `capture/batch-scorer.ts` — per-step batch reflections - `memory/l2/induce.ts` — L2 policy fields - `memory/l3/abstract.ts` — L3 (ℰ, ℐ, C) bullets - `skill/crystallize.ts` — skill body + scope ### L2 / L3 boundary prompts (v1 → v2) - `L2_INDUCTION_PROMPT`: new "Boundaries — what NOT to write" section explicitly rejects environment topology / declarative behavioural rules / generic taboos. Includes same-fact-two-framings example (procedural vs declarative for the same underlying truth). - `L3_ABSTRACTION_PROMPT`: bans imperative verbs (do / should / use / install / run) under any of ℰ/ℐ/C. All three example sets rewritten as pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). - Test mock keys updated v1 → v2; historical `inducedBy` audit strings intentionally left at v1 (they record the prompt version a row was generated under, not a call-time match key). ### Retrieval injector heading hierarchy - `# User's conversation history (from memory system)` is H1; `## Memories` / `## Skills` / `## Environment Knowledge` are H2 — restores the visual outline the LLM consumes. ### Migration runner: better-sqlite3 ≥ v11 compatibility - `runMigrations` now flips `db.raw.unsafeMode(true)` at the outer boundary if any pending migration uses `PRAGMA writable_schema` (resets in `finally`). Migration 012 (status unification) needs this to swap CHECK constraints in-place; defensive mode otherwise blocked it at runtime. - Migration 012 SQL uses single-quote literals with doubled inner quotes (was double-quoted, which strict mode treats as identifiers). ### Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` (~365 lines, zh-CN) — the foundational mental-model doc that should be read before any other algorithm doc: - 小步 / 轮 / 任务 三个交互粒度 + 与代码层的对应 - 打分粒度(每步 α/V,每任务 R_human,"轮"无独立分) - 检索粒度(技能/单步/子任务序列/环境认知,没有"按轮"召回 + 三层判别) - "结构性不确定" vs "操作性疑问" 判别表 - 经验 / 环境认知 / 技能 五者关系 - §6 "经验 vs 环境认知 边界裁剪":7 条不该合并的理由 + 三种折中方案对比 + 7 维度判别速查 + 同事实多框架对照表 + 反例 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示。 - `docs/README.md` 索引同步更新。 ## Algorithm alignment Per V7 §0.1, the L1 trace is the minimum learning unit and stays step-level — one tool call → one trace, one final reply → one trace. The "one round = one memory" view is purely a frontend display concern using `turn_id` as a stable group key. Reflection-weighted backprop, cross-task L2 association, error-signature retrieval, and Decision Repair all continue to operate per-step. Documented end-to-end in the new GRANULARITY doc §6. ## Test plan - [x] `npx vitest run tests/unit/capture/step-extractor.test.ts` — turnId stamped on every sub-step, multi-tool turn shares one turnId (11/11 pass) - [x] `npx vitest run tests/unit/memory/l2/ tests/unit/memory/l3/ tests/unit/llm/prompts.test.ts` — prompt v2 mock keys + L2/L3 induction (74/74 pass) - [x] `npx vitest run tests/unit/storage/` — migration 013 applies cleanly (106/106 pass) - [x] `npx vitest run tests/unit/` — full unit sweep: 802/806 pass; 4 failures are pre-existing on `main` (mock LLM behavior in reward integration + an outdated `capture.lite.done` event-list assertion), unchanged by this PR - [x] Local install via `bash install.sh --version ./memtensor-memos-local-plugin-2.0.0-beta.1.tgz`: gateway + viewer come up clean, `traces.turn_id` column present, migration 013 logged as applied - [x] Manual end-to-end: ran a 3-tool query in OpenClaw, verified the memory page shows ONE card with `工具 · 4 步` chip, drawer expands into 4 collapsible step sections with per-step V/α/thinking/tool I-O ## Notes - No backward compat for the schema change is required — fresh installs run all 13 migrations on first open. Existing local DBs auto-pick up 013 the next time the gateway opens them. - Only `apps/memos-local-plugin/` is touched. No changes to other packages.
CaralHsi
added a commit
that referenced
this pull request
Apr 23, 2026
* feat: add .env.example-full and fix .env.example
* feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture
Complete end-to-end rewrite of the memos-local-plugin package into a
layered, agent-agnostic memory runtime with support for both OpenClaw
and Hermes adapters.
Highlights:
- New `core/` package (agent-agnostic): capture, embedding, feedback,
hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall,
retrieval, skill, session, storage, config modules — each with its
own README + ALGORITHMS notes.
- New `adapters/` layer with `openclaw/` and `hermes/` integrations
isolated from core. Agent-specific concepts (turns, installers,
bridge clients) live only here.
- New `agent-contract/` — single shared contract (dto, errors, events,
jsonrpc, log-record, memory-core) between core and adapters.
- New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts).
- New `server/` — HTTP/SSE server for the viewer.
- New `site/` — Vite-built public product site + release notes index.
- New `web/` — Vite-built viewer app with memory/skill/timeline/world
model views.
- New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING,
Reflect2Skill design core, multi-agent viewer, etc.
- New `tests/` — vitest unit/integration + python bridge tests.
- Tooling: TypeScript multi-project build (tsconfig.{json,site,web}),
Vite + Vitest, cross-platform install.sh / install.ps1, npm release
checker, package-lock committed.
- Removes legacy `src/` and `www/` structure from main branch; the new
layout replaces it entirely.
This change is fully scoped to apps/memos-local-plugin/ and does not
touch any other package.
* style(memos-local-plugin): apply ruff check + format to Python files
- Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py
- Move Callable into TYPE_CHECKING block in bridge_client.py
- Replace try/except/pass with contextlib.suppress in bridge_client.py
- Combine nested if in test_bridge_client.py
- Apply ruff format to 4 Python files (hermes adapter + tests)
All files now pass `ruff check` and `ruff format --check`.
* feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements
Capture path (thinking-between-tool-calls):
- Adapter extractTurn now flushes thinking + assistant-text that appears
between consecutive tool calls into the next ToolCallDTO's
`thinkingBefore`, preserving the model's natural-language bridge
(e.g. "nproc failed, let me try sysctl") in the trace.
- Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with
legacy top-level `tool_calls`, skip the legacy path so each call is
emitted once (prior double-push clobbered the first stub's
`thinkingBefore` via pendingCalls.set, making the field silently go
missing and doubling tool-call rows in the DB).
- Orchestrator: tool turns now persist `thinkingBefore` in
EpisodeTurn.meta so the capture step-extractor can re-attach it.
- Step extractor: only the first tool sub-step carries `userText`;
subsequent sub-steps leave it empty so the viewer's flattenChat
doesn't render the same user bubble N times.
- Step extractor: `toolCallFromTurn` + `coerceToolCall` now read
`thinkingBefore` back from meta.
- Normalizer: sub-step candidates skip the generic dedup path — their
intentionally-identical empty userText/agentText plus 1-tool shape
used to collapse two distinct tools into one whenever their input
prefixes matched under 200 chars.
- Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no
schema migration needed (stored inside `tool_calls_json`).
- Web flattenChat: renders per-tool `thinkingBefore` bubbles before
each tool call for the user↔agent timeline; retains legacy
`agentThinking` single-bubble fallback for pure-reply traces.
Retrieval path:
- LLM-filter: refactored prompt templates and schema shape.
- Ranker: reworked scoring with new blend knobs.
- Retrieve + pipeline wiring updated to match the new types.
- Config defaults + schema expose the new retrieval knobs.
- Viewer LogsView surfaces new filter fields; i18n updated.
Tests:
- New regression tests for extractTurn interleaved thinking, pi-ai +
OpenAI-legacy double-push avoidance, and flattenChat sub-step
rendering.
- Retrieval / llm-filter / ranker tests updated for the new shape.
* feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts
UI: one user turn = one memory card
- New `traces.turn_id INTEGER` column (migration 013) stamped by
`step-extractor` with the user turn's ts; every sub-step of the same
user message shares the same turnId.
- `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses
rows by (episodeId, turnId): one card per turn, role pill chosen by
group-level rule (any tool → "tool"), aggregate V/α displayed as the
member-row mean.
- Drawer rewritten as `<StepList>`: every member step renders as a
collapsible <details> block with its own ts / V / α / agentThinking /
toolCalls / reflection. First step expanded, rest collapsed so a
10-tool turn doesn't drown the user.
- Bulk actions (select / delete / share / export) operate on whole
cards: card checkbox toggles the full set of member ids; delete /
share / export bulk over `g.ids` so a card never half-disappears.
- Algorithm layer untouched — every L1 trace stays step-level so V/α
reflection-weighted backprop, L2 incremental association, Tier-2
error-signature retrieval, and Decision Repair keep their per-step
granularity (V7 §0.1).
Per-tool reasoning capture (carryover, see PR #1515)
- ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the
drawer's per-step section can show the per-tool intermediate
thinking and any LLM-assigned per-tool score without a schema change.
- StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal
threaded through capture.ts → traces.turn_id; `pickTurnId` falls
back to the trace's own ts so old fixtures still produce singleton
groups instead of crashing.
Knowledge generation in user's language
- `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples,
{minSignal})` — counts CJK ideographs + ASCII letters and returns
"zh" / "en" / "auto" (allocation-free, runs on every gen call).
- All five knowledge-generation sites now emit a `languageSteeringLine`
system message keyed off their evidence:
* core/capture/alpha-scorer.ts ← reflection-quality reason
* core/capture/batch-scorer.ts ← per-step batch reflections
* core/memory/l2/induce.ts ← L2 policy fields
* core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets
* core/skill/crystallize.ts ← skill body + scope
- Effect: a Chinese-speaking user no longer gets a half-English skill
card. An English user no longer gets a 中文-mixed reflection.
L2 / L3 prompts: hard boundary against drift
- `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard
rejects environment topology, declarative behavioural rules, and
generic taboos. New same-fact-two-framings example shows how to
re-fold an env fact into a state-level trigger or step-level caveat.
- `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/
install/run) under any of ℰ/ℐ/C; reworked all three example sets to
pure declarative ("loading a glibc-linked binary wheel inside Alpine
raises a dynamic-link error" instead of "if pip fails, install dev
libs and retry"). Same-fact contrast example included.
- Test mock keys updated v1 → v2 in induce.test.ts /
l2.integration.test.ts / openclaw-full-chain.test.ts /
v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings
intentionally left at v1 — they're metadata recording the prompt
version a row was generated under, not call-time keys.
Retrieval injector: heading hierarchy
- `# User's conversation history (from memory system)` is now H1, with
`## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the
injected block has a clean outline in the LLM's context (previously
the inner sections used H1 too, breaking the visual hierarchy).
Migration runner: SQLite defensive mode
- better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks
writes to `sqlite_master` even with `PRAGMA writable_schema=ON`.
Migration 012 (status unification) needs that pragma to swap CHECK
constraints in-place. `runMigrations` now flips `db.raw.unsafeMode`
on at the outer boundary if any pending migration uses
`writable_schema`, then off again in `finally`. Migrations are
shipped with the plugin (never user input) so this is safe.
- Migration 012 SQL itself rewritten to use single-quote string
literals with doubled inner quotes (instead of double quotes that
better-sqlite3 strict mode treats as identifiers).
Documentation
- New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment
doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V,
每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/
环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、
以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条
反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。
- `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人
先看上面那篇粒度对齐文档。
- `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。
Tests
- `tests/unit/capture/step-extractor.test.ts`: turnId stability
assertions across sub-steps; multi-tool turn shares one turnId.
- All other test fixtures' LLM mock keys synchronized with new prompt
versions; non-mock `inducedBy` audit fields kept at v1 by design.
---------
Co-authored-by: tyh <3211345556@qq.com>
Co-authored-by: jiang <fdjzy@qq.com>
Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>
CaralHsi
added a commit
that referenced
this pull request
Apr 23, 2026
…sing (#1533) * feat: add .env.example-full and fix .env.example * feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture Complete end-to-end rewrite of the memos-local-plugin package into a layered, agent-agnostic memory runtime with support for both OpenClaw and Hermes adapters. Highlights: - New `core/` package (agent-agnostic): capture, embedding, feedback, hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall, retrieval, skill, session, storage, config modules — each with its own README + ALGORITHMS notes. - New `adapters/` layer with `openclaw/` and `hermes/` integrations isolated from core. Agent-specific concepts (turns, installers, bridge clients) live only here. - New `agent-contract/` — single shared contract (dto, errors, events, jsonrpc, log-record, memory-core) between core and adapters. - New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts). - New `server/` — HTTP/SSE server for the viewer. - New `site/` — Vite-built public product site + release notes index. - New `web/` — Vite-built viewer app with memory/skill/timeline/world model views. - New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING, Reflect2Skill design core, multi-agent viewer, etc. - New `tests/` — vitest unit/integration + python bridge tests. - Tooling: TypeScript multi-project build (tsconfig.{json,site,web}), Vite + Vitest, cross-platform install.sh / install.ps1, npm release checker, package-lock committed. - Removes legacy `src/` and `www/` structure from main branch; the new layout replaces it entirely. This change is fully scoped to apps/memos-local-plugin/ and does not touch any other package. * style(memos-local-plugin): apply ruff check + format to Python files - Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py - Move Callable into TYPE_CHECKING block in bridge_client.py - Replace try/except/pass with contextlib.suppress in bridge_client.py - Combine nested if in test_bridge_client.py - Apply ruff format to 4 Python files (hermes adapter + tests) All files now pass `ruff check` and `ruff format --check`. * feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape. * feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V, 每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/ 环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、 以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条 反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人 先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design. * feat: port plugin system from enterprise branch Bring the plugin runtime, API bootstrap integration, and related tests from aliyun-ee/dev-20260423-v2.0.14.post onto upstream/main so the upstream branch can review the plugin architecture independently of the broader enterprise history. Made-with: Cursor * fix: working binding related bug --------- Co-authored-by: tyh <3211345556@qq.com> Co-authored-by: jiang <fdjzy@qq.com> Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>
CaralHsi
added a commit
that referenced
this pull request
Apr 23, 2026
…1526) * feat: add .env.example-full and fix .env.example * feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture Complete end-to-end rewrite of the memos-local-plugin package into a layered, agent-agnostic memory runtime with support for both OpenClaw and Hermes adapters. Highlights: - New `core/` package (agent-agnostic): capture, embedding, feedback, hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall, retrieval, skill, session, storage, config modules — each with its own README + ALGORITHMS notes. - New `adapters/` layer with `openclaw/` and `hermes/` integrations isolated from core. Agent-specific concepts (turns, installers, bridge clients) live only here. - New `agent-contract/` — single shared contract (dto, errors, events, jsonrpc, log-record, memory-core) between core and adapters. - New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts). - New `server/` — HTTP/SSE server for the viewer. - New `site/` — Vite-built public product site + release notes index. - New `web/` — Vite-built viewer app with memory/skill/timeline/world model views. - New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING, Reflect2Skill design core, multi-agent viewer, etc. - New `tests/` — vitest unit/integration + python bridge tests. - Tooling: TypeScript multi-project build (tsconfig.{json,site,web}), Vite + Vitest, cross-platform install.sh / install.ps1, npm release checker, package-lock committed. - Removes legacy `src/` and `www/` structure from main branch; the new layout replaces it entirely. This change is fully scoped to apps/memos-local-plugin/ and does not touch any other package. * style(memos-local-plugin): apply ruff check + format to Python files - Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py - Move Callable into TYPE_CHECKING block in bridge_client.py - Replace try/except/pass with contextlib.suppress in bridge_client.py - Combine nested if in test_bridge_client.py - Apply ruff format to 4 Python files (hermes adapter + tests) All files now pass `ruff check` and `ruff format --check`. * feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements Capture path (thinking-between-tool-calls): - Adapter extractTurn now flushes thinking + assistant-text that appears between consecutive tool calls into the next ToolCallDTO's `thinkingBefore`, preserving the model's natural-language bridge (e.g. "nproc failed, let me try sysctl") in the trace. - Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with legacy top-level `tool_calls`, skip the legacy path so each call is emitted once (prior double-push clobbered the first stub's `thinkingBefore` via pendingCalls.set, making the field silently go missing and doubling tool-call rows in the DB). - Orchestrator: tool turns now persist `thinkingBefore` in EpisodeTurn.meta so the capture step-extractor can re-attach it. - Step extractor: only the first tool sub-step carries `userText`; subsequent sub-steps leave it empty so the viewer's flattenChat doesn't render the same user bubble N times. - Step extractor: `toolCallFromTurn` + `coerceToolCall` now read `thinkingBefore` back from meta. - Normalizer: sub-step candidates skip the generic dedup path — their intentionally-identical empty userText/agentText plus 1-tool shape used to collapse two distinct tools into one whenever their input prefixes matched under 200 chars. - Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no schema migration needed (stored inside `tool_calls_json`). - Web flattenChat: renders per-tool `thinkingBefore` bubbles before each tool call for the user↔agent timeline; retains legacy `agentThinking` single-bubble fallback for pure-reply traces. Retrieval path: - LLM-filter: refactored prompt templates and schema shape. - Ranker: reworked scoring with new blend knobs. - Retrieve + pipeline wiring updated to match the new types. - Config defaults + schema expose the new retrieval knobs. - Viewer LogsView surfaces new filter fields; i18n updated. Tests: - New regression tests for extractTurn interleaved thinking, pi-ai + OpenAI-legacy double-push avoidance, and flattenChat sub-step rendering. - Retrieval / llm-filter / ranker tests updated for the new shape. * feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR #1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V, 每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/ 环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、 以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条 反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人 先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design. * fix(format_utils): raise ValueError on None in clean_json_response clean_json_response treats its argument as a str and unconditionally calls .replace(). When an upstream LLM helper returns None (e.g. due to the silent-fail pattern in timed_with_status), the resulting AttributeError points to format_utils.py rather than to the failed LLM call, which is hard to diagnose. Add an explicit None check that raises a descriptive ValueError. This turns the symptom 'NoneType has no attribute replace' into a message that names the actual root cause. --------- Co-authored-by: tyh <3211345556@qq.com> Co-authored-by: CaralHsi <caralhsi@gmail.com> Co-authored-by: jiang <fdjzy@qq.com> Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com> Co-authored-by: auctor <auctor@xinfty.space>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR lands the memos-local-plugin v2 rewrite and iterates on top of it with multi-tool-call reasoning capture + retrieval improvements.
Scope (only
apps/memos-local-plugin/)Three commits on top of
main:feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture— the base v2 rewrite (layered L1/L2/L3 memory, Reflect2Evolve capture/reward/skill pipeline, tier-1/2/3 retrieval, OpenClaw + Hermes adapters).style(memos-local-plugin): apply ruff check + format to Python files— Python adapter pass (Hermes provider).feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements— the capture+retrieval work described below.Capture path — mid-turn reasoning is no longer lost
A multi-tool-call turn looks like
user → [thinking?, text?, tool_1] → result_1 → [thinking?, text?, tool_2] → result_2 → [text final]. The plugin used to collapse everyassistanttext into oneagentTextand drop the bridge narration ("tool_1 failed, let me try tool_2") entirely. This PR fixes that end-to-end:extractTurnflushesthinking+ inline assistant text into the next tool call's newthinkingBeforefield so the bridge reasoning is preserved per-tool, not smashed into the final reply.flattenMessagesno longer double-emits tool calls when pi-ai'scontent[toolCall]and OpenAI-legacy top-leveltool_callscoexist. Prior double-push clobbered the first stub viapendingCalls.set, makingthinkingBeforesilently go missing and doubling tool-call rows in the DB.thinkingBeforein the toolEpisodeTurn.metaso capture can read it back.userText; subsequent sub-steps leave it empty soflattenChatdoesn't render the same user bubble N times.toolCallFromTurn+coerceToolCallnow hydratethinkingBeforefrom meta.ToolCallDTO.thinkingBefore?: stringadded; no SQL migration (stored insidetool_calls_json).flattenChatrenders per-toolthinkingBeforebubbles before each tool call; retainsagentThinkingsingle-bubble fallback for pure-reply traces.Retrieval path
Tests
extractTurninterleaved thinking per tool callflattenMessagesflattenChatsub-step rendering (per-tool thinking bubble, no duplicate user)Test plan
apps/memos-local-plugin/unit tests:flattenChatsuite passes (12/12).userbubble (no N-fold repetition)thinkingbubble before each tool call when the model produced one (models may also emit tool calls without any mid-step text — that's model-level behavior and cannot be back-filled)assistantreply at the endtool_calls_jsonafter the double-push fix.Non-scope
apps/memos-local-plugin/.pnpm-lock.yaml/uv.lockbumps.