Skip to content

Dev v2.0.14#1537

Merged
CarltonXiang merged 10 commits intomainfrom
dev-v2.0.14
Apr 23, 2026
Merged

Dev v2.0.14#1537
CarltonXiang merged 10 commits intomainfrom
dev-v2.0.14

Conversation

@CarltonXiang
Copy link
Copy Markdown
Collaborator

Description

Please include a summary of the change, the problem it solves, the implementation approach, and relevant context. List any dependencies required for this change.

Related Issue (Required): Fixes #issue_number

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Unit Test
  • Test Script Or Test Steps (please provide)
  • Pipeline Automated API Test (please provide)

Checklist

  • I have performed a self-review of my own code | 我已自行检查了自己的代码
  • I have commented my code in hard-to-understand areas | 我已在难以理解的地方对代码进行了注释
  • I have added tests that prove my fix is effective or that my feature works | 我已添加测试以证明我的修复有效或功能正常
  • I have created related documentation issue/PR in MemOS-Docs (if applicable) | 我已在 MemOS-Docs 中创建了相关的文档 issue/PR(如果适用)
  • I have linked the issue to this PR (if applicable) | 我已将 issue 链接到此 PR(如果适用)
  • I have mentioned the person who will review this PR | 我已提及将审查此 PR 的人

Reviewer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed
  • Tests have been provided

CaralHsi and others added 10 commits April 17, 2026 16:17
* fix: lint

* feat: add addMessage Stage log

* feat: add addMessage Stage log

* feat: optimized embedding item

---------

Co-authored-by: harvey_xiang <harvey_xiang22@163.com>
* fix: polardb metadata bug fix

* fix: fix chunking bug in memreader

* fix: also fix get_node for nested metadata
* feat: optimize dispatcher task

* feat: format redis_queue.py
Co-authored-by: yuan.wang <yuan.wang@yuanwangdebijibendiannao.local>
Revert "feat: add upload skill logic (#1507)"

This reverts commit 87cb6e5.
clean_json_response treats its argument as a str and unconditionally
calls .replace(). When an upstream LLM helper returns None (e.g. due to
the silent-fail pattern in timed_with_status), the resulting
AttributeError points to format_utils.py rather than to the failed LLM
call, which is hard to diagnose.

Add an explicit None check that raises a descriptive ValueError. This
turns the symptom 'NoneType has no attribute replace' into a message
that names the actual root cause.
* feat: add .env.example-full and fix .env.example

* feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture

Complete end-to-end rewrite of the memos-local-plugin package into a
layered, agent-agnostic memory runtime with support for both OpenClaw
and Hermes adapters.

Highlights:
- New `core/` package (agent-agnostic): capture, embedding, feedback,
  hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall,
  retrieval, skill, session, storage, config modules — each with its
  own README + ALGORITHMS notes.
- New `adapters/` layer with `openclaw/` and `hermes/` integrations
  isolated from core. Agent-specific concepts (turns, installers,
  bridge clients) live only here.
- New `agent-contract/` — single shared contract (dto, errors, events,
  jsonrpc, log-record, memory-core) between core and adapters.
- New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts).
- New `server/` — HTTP/SSE server for the viewer.
- New `site/` — Vite-built public product site + release notes index.
- New `web/` — Vite-built viewer app with memory/skill/timeline/world
  model views.
- New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING,
  Reflect2Skill design core, multi-agent viewer, etc.
- New `tests/` — vitest unit/integration + python bridge tests.
- Tooling: TypeScript multi-project build (tsconfig.{json,site,web}),
  Vite + Vitest, cross-platform install.sh / install.ps1, npm release
  checker, package-lock committed.
- Removes legacy `src/` and `www/` structure from main branch; the new
  layout replaces it entirely.

This change is fully scoped to apps/memos-local-plugin/ and does not
touch any other package.

* style(memos-local-plugin): apply ruff check + format to Python files

- Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py
- Move Callable into TYPE_CHECKING block in bridge_client.py
- Replace try/except/pass with contextlib.suppress in bridge_client.py
- Combine nested if in test_bridge_client.py
- Apply ruff format to 4 Python files (hermes adapter + tests)

All files now pass `ruff check` and `ruff format --check`.

* feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements

Capture path (thinking-between-tool-calls):
- Adapter extractTurn now flushes thinking + assistant-text that appears
  between consecutive tool calls into the next ToolCallDTO's
  `thinkingBefore`, preserving the model's natural-language bridge
  (e.g. "nproc failed, let me try sysctl") in the trace.
- Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with
  legacy top-level `tool_calls`, skip the legacy path so each call is
  emitted once (prior double-push clobbered the first stub's
  `thinkingBefore` via pendingCalls.set, making the field silently go
  missing and doubling tool-call rows in the DB).
- Orchestrator: tool turns now persist `thinkingBefore` in
  EpisodeTurn.meta so the capture step-extractor can re-attach it.
- Step extractor: only the first tool sub-step carries `userText`;
  subsequent sub-steps leave it empty so the viewer's flattenChat
  doesn't render the same user bubble N times.
- Step extractor: `toolCallFromTurn` + `coerceToolCall` now read
  `thinkingBefore` back from meta.
- Normalizer: sub-step candidates skip the generic dedup path — their
  intentionally-identical empty userText/agentText plus 1-tool shape
  used to collapse two distinct tools into one whenever their input
  prefixes matched under 200 chars.
- Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no
  schema migration needed (stored inside `tool_calls_json`).
- Web flattenChat: renders per-tool `thinkingBefore` bubbles before
  each tool call for the user↔agent timeline; retains legacy
  `agentThinking` single-bubble fallback for pure-reply traces.

Retrieval path:
- LLM-filter: refactored prompt templates and schema shape.
- Ranker: reworked scoring with new blend knobs.
- Retrieve + pipeline wiring updated to match the new types.
- Config defaults + schema expose the new retrieval knobs.
- Viewer LogsView surfaces new filter fields; i18n updated.

Tests:
- New regression tests for extractTurn interleaved thinking, pi-ai +
  OpenAI-legacy double-push avoidance, and flattenChat sub-step
  rendering.
- Retrieval / llm-filter / ranker tests updated for the new shape.

* feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts

UI: one user turn = one memory card
- New `traces.turn_id INTEGER` column (migration 013) stamped by
  `step-extractor` with the user turn's ts; every sub-step of the same
  user message shares the same turnId.
- `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses
  rows by (episodeId, turnId): one card per turn, role pill chosen by
  group-level rule (any tool → "tool"), aggregate V/α displayed as the
  member-row mean.
- Drawer rewritten as `<StepList>`: every member step renders as a
  collapsible <details> block with its own ts / V / α / agentThinking /
  toolCalls / reflection. First step expanded, rest collapsed so a
  10-tool turn doesn't drown the user.
- Bulk actions (select / delete / share / export) operate on whole
  cards: card checkbox toggles the full set of member ids; delete /
  share / export bulk over `g.ids` so a card never half-disappears.
- Algorithm layer untouched — every L1 trace stays step-level so V/α
  reflection-weighted backprop, L2 incremental association, Tier-2
  error-signature retrieval, and Decision Repair keep their per-step
  granularity (V7 §0.1).

Per-tool reasoning capture (carryover, see PR #1515)
- ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the
  drawer's per-step section can show the per-tool intermediate
  thinking and any LLM-assigned per-tool score without a schema change.
- StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal
  threaded through capture.ts → traces.turn_id; `pickTurnId` falls
  back to the trace's own ts so old fixtures still produce singleton
  groups instead of crashing.

Knowledge generation in user's language
- `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples,
  {minSignal})` — counts CJK ideographs + ASCII letters and returns
  "zh" / "en" / "auto" (allocation-free, runs on every gen call).
- All five knowledge-generation sites now emit a `languageSteeringLine`
  system message keyed off their evidence:
    * core/capture/alpha-scorer.ts          ← reflection-quality reason
    * core/capture/batch-scorer.ts          ← per-step batch reflections
    * core/memory/l2/induce.ts              ← L2 policy fields
    * core/memory/l3/abstract.ts            ← L3 (ℰ, ℐ, C) bullets
    * core/skill/crystallize.ts             ← skill body + scope
- Effect: a Chinese-speaking user no longer gets a half-English skill
  card. An English user no longer gets a 中文-mixed reflection.

L2 / L3 prompts: hard boundary against drift
- `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard
  rejects environment topology, declarative behavioural rules, and
  generic taboos. New same-fact-two-framings example shows how to
  re-fold an env fact into a state-level trigger or step-level caveat.
- `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/
  install/run) under any of ℰ/ℐ/C; reworked all three example sets to
  pure declarative ("loading a glibc-linked binary wheel inside Alpine
  raises a dynamic-link error" instead of "if pip fails, install dev
  libs and retry"). Same-fact contrast example included.
- Test mock keys updated v1 → v2 in induce.test.ts /
  l2.integration.test.ts / openclaw-full-chain.test.ts /
  v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings
  intentionally left at v1 — they're metadata recording the prompt
  version a row was generated under, not call-time keys.

Retrieval injector: heading hierarchy
- `# User's conversation history (from memory system)` is now H1, with
  `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the
  injected block has a clean outline in the LLM's context (previously
  the inner sections used H1 too, breaking the visual hierarchy).

Migration runner: SQLite defensive mode
- better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks
  writes to `sqlite_master` even with `PRAGMA writable_schema=ON`.
  Migration 012 (status unification) needs that pragma to swap CHECK
  constraints in-place. `runMigrations` now flips `db.raw.unsafeMode`
  on at the outer boundary if any pending migration uses
  `writable_schema`, then off again in `finally`. Migrations are
  shipped with the plugin (never user input) so this is safe.
- Migration 012 SQL itself rewritten to use single-quote string
  literals with doubled inner quotes (instead of double quotes that
  better-sqlite3 strict mode treats as identifiers).

Documentation
- New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment
  doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V,
  每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/
  环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、
  以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条
  反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。
- `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人
  先看上面那篇粒度对齐文档。
- `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。

Tests
- `tests/unit/capture/step-extractor.test.ts`: turnId stability
  assertions across sub-steps; multi-tool turn shares one turnId.
- All other test fixtures' LLM mock keys synchronized with new prompt
  versions; non-mock `inducedBy` audit fields kept at v1 by design.

---------

Co-authored-by: tyh <3211345556@qq.com>
Co-authored-by: jiang <fdjzy@qq.com>
Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>
…sing (#1533)

* feat: add .env.example-full and fix .env.example

* feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture

Complete end-to-end rewrite of the memos-local-plugin package into a
layered, agent-agnostic memory runtime with support for both OpenClaw
and Hermes adapters.

Highlights:
- New `core/` package (agent-agnostic): capture, embedding, feedback,
  hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall,
  retrieval, skill, session, storage, config modules — each with its
  own README + ALGORITHMS notes.
- New `adapters/` layer with `openclaw/` and `hermes/` integrations
  isolated from core. Agent-specific concepts (turns, installers,
  bridge clients) live only here.
- New `agent-contract/` — single shared contract (dto, errors, events,
  jsonrpc, log-record, memory-core) between core and adapters.
- New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts).
- New `server/` — HTTP/SSE server for the viewer.
- New `site/` — Vite-built public product site + release notes index.
- New `web/` — Vite-built viewer app with memory/skill/timeline/world
  model views.
- New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING,
  Reflect2Skill design core, multi-agent viewer, etc.
- New `tests/` — vitest unit/integration + python bridge tests.
- Tooling: TypeScript multi-project build (tsconfig.{json,site,web}),
  Vite + Vitest, cross-platform install.sh / install.ps1, npm release
  checker, package-lock committed.
- Removes legacy `src/` and `www/` structure from main branch; the new
  layout replaces it entirely.

This change is fully scoped to apps/memos-local-plugin/ and does not
touch any other package.

* style(memos-local-plugin): apply ruff check + format to Python files

- Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py
- Move Callable into TYPE_CHECKING block in bridge_client.py
- Replace try/except/pass with contextlib.suppress in bridge_client.py
- Combine nested if in test_bridge_client.py
- Apply ruff format to 4 Python files (hermes adapter + tests)

All files now pass `ruff check` and `ruff format --check`.

* feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements

Capture path (thinking-between-tool-calls):
- Adapter extractTurn now flushes thinking + assistant-text that appears
  between consecutive tool calls into the next ToolCallDTO's
  `thinkingBefore`, preserving the model's natural-language bridge
  (e.g. "nproc failed, let me try sysctl") in the trace.
- Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with
  legacy top-level `tool_calls`, skip the legacy path so each call is
  emitted once (prior double-push clobbered the first stub's
  `thinkingBefore` via pendingCalls.set, making the field silently go
  missing and doubling tool-call rows in the DB).
- Orchestrator: tool turns now persist `thinkingBefore` in
  EpisodeTurn.meta so the capture step-extractor can re-attach it.
- Step extractor: only the first tool sub-step carries `userText`;
  subsequent sub-steps leave it empty so the viewer's flattenChat
  doesn't render the same user bubble N times.
- Step extractor: `toolCallFromTurn` + `coerceToolCall` now read
  `thinkingBefore` back from meta.
- Normalizer: sub-step candidates skip the generic dedup path — their
  intentionally-identical empty userText/agentText plus 1-tool shape
  used to collapse two distinct tools into one whenever their input
  prefixes matched under 200 chars.
- Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no
  schema migration needed (stored inside `tool_calls_json`).
- Web flattenChat: renders per-tool `thinkingBefore` bubbles before
  each tool call for the user↔agent timeline; retains legacy
  `agentThinking` single-bubble fallback for pure-reply traces.

Retrieval path:
- LLM-filter: refactored prompt templates and schema shape.
- Ranker: reworked scoring with new blend knobs.
- Retrieve + pipeline wiring updated to match the new types.
- Config defaults + schema expose the new retrieval knobs.
- Viewer LogsView surfaces new filter fields; i18n updated.

Tests:
- New regression tests for extractTurn interleaved thinking, pi-ai +
  OpenAI-legacy double-push avoidance, and flattenChat sub-step
  rendering.
- Retrieval / llm-filter / ranker tests updated for the new shape.

* feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts

UI: one user turn = one memory card
- New `traces.turn_id INTEGER` column (migration 013) stamped by
  `step-extractor` with the user turn's ts; every sub-step of the same
  user message shares the same turnId.
- `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses
  rows by (episodeId, turnId): one card per turn, role pill chosen by
  group-level rule (any tool → "tool"), aggregate V/α displayed as the
  member-row mean.
- Drawer rewritten as `<StepList>`: every member step renders as a
  collapsible <details> block with its own ts / V / α / agentThinking /
  toolCalls / reflection. First step expanded, rest collapsed so a
  10-tool turn doesn't drown the user.
- Bulk actions (select / delete / share / export) operate on whole
  cards: card checkbox toggles the full set of member ids; delete /
  share / export bulk over `g.ids` so a card never half-disappears.
- Algorithm layer untouched — every L1 trace stays step-level so V/α
  reflection-weighted backprop, L2 incremental association, Tier-2
  error-signature retrieval, and Decision Repair keep their per-step
  granularity (V7 §0.1).

Per-tool reasoning capture (carryover, see PR #1515)
- ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the
  drawer's per-step section can show the per-tool intermediate
  thinking and any LLM-assigned per-tool score without a schema change.
- StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal
  threaded through capture.ts → traces.turn_id; `pickTurnId` falls
  back to the trace's own ts so old fixtures still produce singleton
  groups instead of crashing.

Knowledge generation in user's language
- `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples,
  {minSignal})` — counts CJK ideographs + ASCII letters and returns
  "zh" / "en" / "auto" (allocation-free, runs on every gen call).
- All five knowledge-generation sites now emit a `languageSteeringLine`
  system message keyed off their evidence:
    * core/capture/alpha-scorer.ts          ← reflection-quality reason
    * core/capture/batch-scorer.ts          ← per-step batch reflections
    * core/memory/l2/induce.ts              ← L2 policy fields
    * core/memory/l3/abstract.ts            ← L3 (ℰ, ℐ, C) bullets
    * core/skill/crystallize.ts             ← skill body + scope
- Effect: a Chinese-speaking user no longer gets a half-English skill
  card. An English user no longer gets a 中文-mixed reflection.

L2 / L3 prompts: hard boundary against drift
- `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard
  rejects environment topology, declarative behavioural rules, and
  generic taboos. New same-fact-two-framings example shows how to
  re-fold an env fact into a state-level trigger or step-level caveat.
- `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/
  install/run) under any of ℰ/ℐ/C; reworked all three example sets to
  pure declarative ("loading a glibc-linked binary wheel inside Alpine
  raises a dynamic-link error" instead of "if pip fails, install dev
  libs and retry"). Same-fact contrast example included.
- Test mock keys updated v1 → v2 in induce.test.ts /
  l2.integration.test.ts / openclaw-full-chain.test.ts /
  v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings
  intentionally left at v1 — they're metadata recording the prompt
  version a row was generated under, not call-time keys.

Retrieval injector: heading hierarchy
- `# User's conversation history (from memory system)` is now H1, with
  `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the
  injected block has a clean outline in the LLM's context (previously
  the inner sections used H1 too, breaking the visual hierarchy).

Migration runner: SQLite defensive mode
- better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks
  writes to `sqlite_master` even with `PRAGMA writable_schema=ON`.
  Migration 012 (status unification) needs that pragma to swap CHECK
  constraints in-place. `runMigrations` now flips `db.raw.unsafeMode`
  on at the outer boundary if any pending migration uses
  `writable_schema`, then off again in `finally`. Migrations are
  shipped with the plugin (never user input) so this is safe.
- Migration 012 SQL itself rewritten to use single-quote string
  literals with doubled inner quotes (instead of double quotes that
  better-sqlite3 strict mode treats as identifiers).

Documentation
- New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment
  doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V,
  每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/
  环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、
  以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条
  反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。
- `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人
  先看上面那篇粒度对齐文档。
- `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。

Tests
- `tests/unit/capture/step-extractor.test.ts`: turnId stability
  assertions across sub-steps; multi-tool turn shares one turnId.
- All other test fixtures' LLM mock keys synchronized with new prompt
  versions; non-mock `inducedBy` audit fields kept at v1 by design.

* feat: port plugin system from enterprise branch

Bring the plugin runtime, API bootstrap integration, and related tests from aliyun-ee/dev-20260423-v2.0.14.post onto upstream/main so the upstream branch can review the plugin architecture independently of the broader enterprise history.

Made-with: Cursor

* fix: working binding related bug

---------

Co-authored-by: tyh <3211345556@qq.com>
Co-authored-by: jiang <fdjzy@qq.com>
Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>
…1526)

* feat: add .env.example-full and fix .env.example

* feat(memos-local-plugin): v2.0 full rewrite with Reflect2Evolve architecture

Complete end-to-end rewrite of the memos-local-plugin package into a
layered, agent-agnostic memory runtime with support for both OpenClaw
and Hermes adapters.

Highlights:
- New `core/` package (agent-agnostic): capture, embedding, feedback,
  hub, LLM client, logger, memory (L1/L2/L3), pipeline, reward, recall,
  retrieval, skill, session, storage, config modules — each with its
  own README + ALGORITHMS notes.
- New `adapters/` layer with `openclaw/` and `hermes/` integrations
  isolated from core. Agent-specific concepts (turns, installers,
  bridge clients) live only here.
- New `agent-contract/` — single shared contract (dto, errors, events,
  jsonrpc, log-record, memory-core) between core and adapters.
- New `bridge/` — JSON-RPC stdio bridge (methods.ts, stdio.ts).
- New `server/` — HTTP/SSE server for the viewer.
- New `site/` — Vite-built public product site + release notes index.
- New `web/` — Vite-built viewer app with memory/skill/timeline/world
  model views.
- New `docs/` — ALGORITHM, DATA-MODEL, LOGGING, MANUAL_E2E_TESTING,
  Reflect2Skill design core, multi-agent viewer, etc.
- New `tests/` — vitest unit/integration + python bridge tests.
- Tooling: TypeScript multi-project build (tsconfig.{json,site,web}),
  Vite + Vitest, cross-platform install.sh / install.ps1, npm release
  checker, package-lock committed.
- Removes legacy `src/` and `www/` structure from main branch; the new
  layout replaces it entirely.

This change is fully scoped to apps/memos-local-plugin/ and does not
touch any other package.

* style(memos-local-plugin): apply ruff check + format to Python files

- Remove unused imports (Iterable, Dict, List) in memos_provider/__init__.py
- Move Callable into TYPE_CHECKING block in bridge_client.py
- Replace try/except/pass with contextlib.suppress in bridge_client.py
- Combine nested if in test_bridge_client.py
- Apply ruff format to 4 Python files (hermes adapter + tests)

All files now pass `ruff check` and `ruff format --check`.

* feat(memos-local-plugin): preserve mid-turn reasoning + retrieval improvements

Capture path (thinking-between-tool-calls):
- Adapter extractTurn now flushes thinking + assistant-text that appears
  between consecutive tool calls into the next ToolCallDTO's
  `thinkingBefore`, preserving the model's natural-language bridge
  (e.g. "nproc failed, let me try sysctl") in the trace.
- Adapter flattenMessages: when pi-ai `content[toolCall]` coexists with
  legacy top-level `tool_calls`, skip the legacy path so each call is
  emitted once (prior double-push clobbered the first stub's
  `thinkingBefore` via pendingCalls.set, making the field silently go
  missing and doubling tool-call rows in the DB).
- Orchestrator: tool turns now persist `thinkingBefore` in
  EpisodeTurn.meta so the capture step-extractor can re-attach it.
- Step extractor: only the first tool sub-step carries `userText`;
  subsequent sub-steps leave it empty so the viewer's flattenChat
  doesn't render the same user bubble N times.
- Step extractor: `toolCallFromTurn` + `coerceToolCall` now read
  `thinkingBefore` back from meta.
- Normalizer: sub-step candidates skip the generic dedup path — their
  intentionally-identical empty userText/agentText plus 1-tool shape
  used to collapse two distinct tools into one whenever their input
  prefixes matched under 200 chars.
- Agent contract DTO: `ToolCallDTO.thinkingBefore?: string` added; no
  schema migration needed (stored inside `tool_calls_json`).
- Web flattenChat: renders per-tool `thinkingBefore` bubbles before
  each tool call for the user↔agent timeline; retains legacy
  `agentThinking` single-bubble fallback for pure-reply traces.

Retrieval path:
- LLM-filter: refactored prompt templates and schema shape.
- Ranker: reworked scoring with new blend knobs.
- Retrieve + pipeline wiring updated to match the new types.
- Config defaults + schema expose the new retrieval knobs.
- Viewer LogsView surfaces new filter fields; i18n updated.

Tests:
- New regression tests for extractTurn interleaved thinking, pi-ai +
  OpenAI-legacy double-push avoidance, and flattenChat sub-step
  rendering.
- Retrieval / llm-filter / ranker tests updated for the new shape.

* feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts

UI: one user turn = one memory card
- New `traces.turn_id INTEGER` column (migration 013) stamped by
  `step-extractor` with the user turn's ts; every sub-step of the same
  user message shares the same turnId.
- `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses
  rows by (episodeId, turnId): one card per turn, role pill chosen by
  group-level rule (any tool → "tool"), aggregate V/α displayed as the
  member-row mean.
- Drawer rewritten as `<StepList>`: every member step renders as a
  collapsible <details> block with its own ts / V / α / agentThinking /
  toolCalls / reflection. First step expanded, rest collapsed so a
  10-tool turn doesn't drown the user.
- Bulk actions (select / delete / share / export) operate on whole
  cards: card checkbox toggles the full set of member ids; delete /
  share / export bulk over `g.ids` so a card never half-disappears.
- Algorithm layer untouched — every L1 trace stays step-level so V/α
  reflection-weighted backprop, L2 incremental association, Tier-2
  error-signature retrieval, and Decision Repair keep their per-step
  granularity (V7 §0.1).

Per-tool reasoning capture (carryover, see PR #1515)
- ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the
  drawer's per-step section can show the per-tool intermediate
  thinking and any LLM-assigned per-tool score without a schema change.
- StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal
  threaded through capture.ts → traces.turn_id; `pickTurnId` falls
  back to the trace's own ts so old fixtures still produce singleton
  groups instead of crashing.

Knowledge generation in user's language
- `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples,
  {minSignal})` — counts CJK ideographs + ASCII letters and returns
  "zh" / "en" / "auto" (allocation-free, runs on every gen call).
- All five knowledge-generation sites now emit a `languageSteeringLine`
  system message keyed off their evidence:
    * core/capture/alpha-scorer.ts          ← reflection-quality reason
    * core/capture/batch-scorer.ts          ← per-step batch reflections
    * core/memory/l2/induce.ts              ← L2 policy fields
    * core/memory/l3/abstract.ts            ← L3 (ℰ, ℐ, C) bullets
    * core/skill/crystallize.ts             ← skill body + scope
- Effect: a Chinese-speaking user no longer gets a half-English skill
  card. An English user no longer gets a 中文-mixed reflection.

L2 / L3 prompts: hard boundary against drift
- `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard
  rejects environment topology, declarative behavioural rules, and
  generic taboos. New same-fact-two-framings example shows how to
  re-fold an env fact into a state-level trigger or step-level caveat.
- `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/
  install/run) under any of ℰ/ℐ/C; reworked all three example sets to
  pure declarative ("loading a glibc-linked binary wheel inside Alpine
  raises a dynamic-link error" instead of "if pip fails, install dev
  libs and retry"). Same-fact contrast example included.
- Test mock keys updated v1 → v2 in induce.test.ts /
  l2.integration.test.ts / openclaw-full-chain.test.ts /
  v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings
  intentionally left at v1 — they're metadata recording the prompt
  version a row was generated under, not call-time keys.

Retrieval injector: heading hierarchy
- `# User's conversation history (from memory system)` is now H1, with
  `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the
  injected block has a clean outline in the LLM's context (previously
  the inner sections used H1 too, breaking the visual hierarchy).

Migration runner: SQLite defensive mode
- better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks
  writes to `sqlite_master` even with `PRAGMA writable_schema=ON`.
  Migration 012 (status unification) needs that pragma to swap CHECK
  constraints in-place. `runMigrations` now flips `db.raw.unsafeMode`
  on at the outer boundary if any pending migration uses
  `writable_schema`, then off again in `finally`. Migrations are
  shipped with the plugin (never user input) so this is safe.
- Migration 012 SQL itself rewritten to use single-quote string
  literals with doubled inner quotes (instead of double quotes that
  better-sqlite3 strict mode treats as identifiers).

Documentation
- New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment
  doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V,
  每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/
  环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、
  以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条
  反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。
- `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人
  先看上面那篇粒度对齐文档。
- `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。

Tests
- `tests/unit/capture/step-extractor.test.ts`: turnId stability
  assertions across sub-steps; multi-tool turn shares one turnId.
- All other test fixtures' LLM mock keys synchronized with new prompt
  versions; non-mock `inducedBy` audit fields kept at v1 by design.

* fix(format_utils): raise ValueError on None in clean_json_response

clean_json_response treats its argument as a str and unconditionally
calls .replace(). When an upstream LLM helper returns None (e.g. due to
the silent-fail pattern in timed_with_status), the resulting
AttributeError points to format_utils.py rather than to the failed LLM
call, which is hard to diagnose.

Add an explicit None check that raises a descriptive ValueError. This
turns the symptom 'NoneType has no attribute replace' into a message
that names the actual root cause.

---------

Co-authored-by: tyh <3211345556@qq.com>
Co-authored-by: CaralHsi <caralhsi@gmail.com>
Co-authored-by: jiang <fdjzy@qq.com>
Co-authored-by: Jiang <33757498+hijzy@users.noreply.github.com>
Co-authored-by: auctor <auctor@xinfty.space>
@CarltonXiang CarltonXiang merged commit 3364f60 into main Apr 23, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants