Skip to content

security: add maxLength to all string MCP inputs (cyberMaster H2)#254

Open
EtanHey wants to merge 2 commits intomainfrom
security/mcp-string-maxlength
Open

security: add maxLength to all string MCP inputs (cyberMaster H2)#254
EtanHey wants to merge 2 commits intomainfrom
security/mcp-string-maxlength

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 24, 2026

Summary

Adds maxLength to all string MCP inputs in the Python server and BrainBar router, plus schema-level regression tests and BrainBar-side request validation for oversize string payloads.

Python MCP Field Limits

Field maxLength
action 64
agent_id 128
chunk_id 128
content 200000
content_type 64
content_type_filter 64
context 4096
correction_category 128
date_from 32
date_to 32
detail 16
entity_id 128
entity_type 64
file_path 1024
function_name 256
intent 32
mode 32
name 256
new_chunk_id 128
old_chunk_id 128
outcome 32
pattern 256
phase 256
plan_name 256
project 256
query 4096
reason 1024
reversibility 32
safety_check 32
sentiment 32
sentiment_filter 32
session_id 128
severity 16
source 32
source_filter 512
status 32
supersedes 128
tag 128
title 512
type 32

Python MCP String Array Limits

Field maxItems item maxLength
chunk_ids[] 500 128
files_changed[] 200 1024
merge_chunk_ids[] 100 128
participants[] 100 256
tags[] 100 128

BrainBar Field Limits

Field maxLength
action 64
agent_id 128
chunk_id 128
content 200000
detail 16
mode 32
project 256
query 4096
session_id 128
tag 128

BrainBar String Array Limits

Field maxItems item maxLength
tags[] 100 128

Verification

Focused checks

$ pytest tests/test_mcp_input_schema_limits.py
============================== 2 passed in 0.37s ==============================

$ swift test --package-path brain-bar --filter MCPRouterTests
Executed 19 tests, with 0 failures (0 unexpected)

Repo-wide gate

$ pytest
7 failed, 1865 passed, 8 skipped, 6 xfailed, 2 xpassed in 397.25s

Those 7 failures are pre-existing and reproduce on the clean main checkout as well:

$ PYTHONPATH=/Users/etanheyman/Gits/brainlayer/src pytest tests/test_vector_store.py::TestSchema::test_created_at_coverage
FAILED tests/test_vector_store.py::TestSchema::test_created_at_coverage

$ PYTHONPATH=/Users/etanheyman/Gits/brainlayer/src pytest tests/test_eval_baselines.py::TestTemporalQueries::test_recent_week_chunks_exist
FAILED tests/test_eval_baselines.py::TestTemporalQueries::test_recent_week_chunks_exist

$ PYTHONPATH=/Users/etanheyman/Gits/brainlayer/src pytest tests/test_eval_baselines.py::TestPromptHookEntityInjection::test_no_entity_in_generic_query
FAILED tests/test_eval_baselines.py::TestPromptHookEntityInjection::test_no_entity_in_generic_query

Review

cr review --plain surfaced one test hardening issue in tests/test_mcp_input_schema_limits.py; that guard was fixed. A second rerun hit the org review rate limit after the finding was already addressed.


Note

Medium Risk
Introduces new input validation that can reject previously-accepted oversized requests, potentially impacting clients if they exceed the new caps. Changes touch request handling paths in both Swift and Python MCP servers but are localized to schema/validation logic with targeted tests.

Overview
Adds uniform input size limits to MCP tool schemas and enforces them at runtime. The Swift MCPRouter now wraps every tool inputSchema with default/field-specific maxLength (and maxItems + item maxLength for string arrays) and validates tools/call arguments against that schema before dispatch, returning a Schema validation error result on violations.

On the Python MCP server, tool inputSchemas are similarly wrapped via _bounded_input_schema (deep-copied) to apply the same style of caps across all tools. New Swift and Python tests assert that all advertised schemas include bounds for every string and string-array field, and add regression coverage that oversized brain_digest.content is rejected at the schema layer.

Reviewed by Cursor Bugbot for commit d2bbb17. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Input schema validation is now enforced for all tool calls, rejecting requests with oversized strings and array values
    • Automatic size constraints are applied to string and array input fields across all tools
    • Schema validation errors are now properly reported in tool call responses
  • Tests

    • Added comprehensive test coverage for input schema validation and constraint enforcement

Note

Add maxLength/maxItems bounds and schema validation to all MCP tool string inputs

  • Adds per-field maxLength caps for string inputs and maxItems/per-item maxLength caps for string-array inputs to all MCP tool schemas in both the Python layer (__init__.py) and Swift layer (MCPRouter.swift).
  • Introduces a validation step in the tools/call dispatch path that enforces these schema constraints (type, enum membership, maxLength, maxItems) before the tool handler executes, returning a JSON-RPC error on failure.
  • Adds a new ToolError.schemaValidation case in Swift to surface validation failures with a descriptive message.
  • Tests verify that every exposed tool schema declares bounds on string/string-array fields and that brain_digest rejects content exceeding 200,000 characters.
  • Behavioral Change: tools/call now returns a schema validation error for inputs that exceed declared limits, where previously oversized inputs would reach the handler unchecked.

Macroscope summarized d2bbb17.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

This change implements per-tool JSON input constraints across Swift and Python implementations. Tool input schemas are augmented with default maxLength and maxItems limits, and incoming tool arguments are validated against these schema constraints before dispatch. Schema validation failures return dedicated error responses.

Changes

Cohort / File(s) Summary
Swift Router & Validation
brain-bar/Sources/BrainBar/MCPRouter.swift
Adds schema validation logic to enforce JSON input constraints. Introduces limitedInputSchema, applyInputLimits, and recursive validation methods that check object structure, required fields, string length limits, and array size constraints. New ToolError.schemaValidation error type added for validation failures.
Swift Tests
brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
Adds recursive schema-walking test helpers to collect string and array field schemas. Includes tests asserting maxLength bounds on all string fields and maxItems/item maxLength on string arrays. Validates that oversized content payloads are rejected at the schema validation layer.
Python Schema Bounding
src/brainlayer/mcp/__init__.py
Wraps each tool's inputSchema with _bounded_input_schema() function that recursively applies uniform length caps: maxLength for strings and maxItems/item maxLength for string arrays. Affects schema constraints for search, store, recall, entity, digest, and lifecycle tools.
Python Tests
tests/test_mcp_input_schema_limits.py
New test suite that validates MCP tool input schema constraints. Includes schema enumeration helper and assertions verifying every string field has maxLength and arrays-of-strings have maxItems with item bounds. Integration test confirms brain_digest rejects oversized content payloads.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router
    participant SchemaValidator
    participant Tool
    participant ErrorHandler

    Client->>Router: tools/call (with arguments)
    Router->>SchemaValidator: Validate arguments against inputSchema
    alt Validation succeeds
        SchemaValidator-->>Router: Valid
        Router->>Tool: Dispatch to tool
        Tool-->>Router: Tool result
        Router-->>Client: Success response
    else Validation fails
        SchemaValidator-->>ErrorHandler: Schema validation error
        ErrorHandler-->>Client: Error response with validation details
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit's rejoicing verse:

Long inputs once ran wild and free,
But now they're bounded carefully,
With maxLength guards and careful size,
Schema validation's the new sunrise! 🌅
Input constraints make tools more wise,
Safety checks before we try. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary security-focused change: adding maxLength constraints to all string MCP inputs across both Python and BrainBar implementations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch security/mcp-string-maxlength

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 031e81eed7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +523 to +525
!(propertyValue is NSNull),
let propertySchema = properties[propertyName] as? [String: Any]
else { continue }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce schema limits on alias arguments before dispatch

The new validator skips any argument that is not declared in inputSchema (else { continue }), but handlers still consume alias keys like subscriber_id (for example in handleBrainSearch). That means a caller can send an arbitrarily long subscriber_id string and bypass the new maxLength caps entirely, then pass it through to search filtering/DB calls. This undercuts the security hardening goal of bounding all string MCP inputs; either declare these aliases in schema (with limits) or reject unknown fields so aliases cannot bypass validation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_mcp_input_schema_limits.py (1)

47-71: ⚠️ Potential issue | 🟡 Minor

mcp dependency is unpinned—consider a version constraint, but error-string assertions are overly defensive.

The schema-bounds test is solid and parallels the Swift test well. The oversized-content test correctly exercises the server's jsonschema validation by dispatching through server.request_handlers[types.CallToolRequest].

However, pyproject.toml specifies mcp>=1.0.0 with no upper bound. This loose constraint means the test's error-string assertions ("Input validation error:" and "is too long") could theoretically break if a future mcp release changes validation error formatting. Consider pinning to a known-good mcp version range (e.g., mcp>=1.0.0,<2.0.0) rather than loosening the test assertions. Loosening to check only presence of "content" and str(200_000) weakens test precision without addressing the underlying version coupling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_mcp_input_schema_limits.py` around lines 47 - 71, The pyproject
currently allows an unpinned mcp (mcp>=1.0.0) which can break the exact
error-message assertions in test_brain_digest_schema_rejects_oversized_content
in tests/test_mcp_input_schema_limits.py; update pyproject.toml to constrain mcp
to a known-compatible range (for example mcp>=1.0.0,<2.0.0) so the jsonschema
error format remains stable, and keep the test in
test_brain_digest_schema_rejects_oversized_content (which calls
_call_brain_digest and asserts result.isError and the error content includes the
validation text) as-is rather than loosening its assertions. Ensure the package
version change is reflected in dependency lock/update commands so CI uses the
pinned range.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@brain-bar/Sources/BrainBar/MCPRouter.swift`:
- Around line 451-556: The object-branch of validate currently recurses with
fieldPath: propertyName which loses parent context for nested objects; update
the recursive calls in validate(value:against:fieldPath:) to pass the full path
(e.g. "\(fieldPath).\(propertyName)" or handle empty root appropriately) so
errors include parent context, and add a brief comment in
applyInputLimits/validate noting the current flat-schema assumption if you want
to keep compatibility; refer to the validate function and the recursive calls
that currently use fieldPath: propertyName to locate the change.

In `@src/brainlayer/mcp/__init__.py`:
- Around line 131-212: The code silently falls back to
_DEFAULT_STRING_MAX_LENGTH for unknown string field names and only sets array
maxItems for string-item arrays; update _apply_input_limits to (1) detect when a
string node uses the default (i.e., when field_name not in
_INPUT_STRING_MAX_LENGTHS) and emit a warning or raise/configure an explicit
opt-in (use the module logger and reference _INPUT_STRING_MAX_LENGTHS,
_DEFAULT_STRING_MAX_LENGTH, and the function
_apply_input_limits/_bounded_input_schema so reviewers can find it), and (2)
always set a cardinality cap on arrays by calling node.setdefault("maxItems",
<cap>) regardless of items.type (use _INPUT_STRING_ARRAY_LIMITS and
_DEFAULT_STRING_ARRAY_MAX_ITEMS as the source of truth) so arrays of non-string
items are also protected; keep the existing per-item maxLength logic for string
items.

---

Outside diff comments:
In `@tests/test_mcp_input_schema_limits.py`:
- Around line 47-71: The pyproject currently allows an unpinned mcp (mcp>=1.0.0)
which can break the exact error-message assertions in
test_brain_digest_schema_rejects_oversized_content in
tests/test_mcp_input_schema_limits.py; update pyproject.toml to constrain mcp to
a known-compatible range (for example mcp>=1.0.0,<2.0.0) so the jsonschema error
format remains stable, and keep the test in
test_brain_digest_schema_rejects_oversized_content (which calls
_call_brain_digest and asserts result.isError and the error content includes the
validation text) as-is rather than loosening its assertions. Ensure the package
version change is reflected in dependency lock/update commands so CI uses the
pinned range.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75dfd5e7-7b71-4c5b-8aee-88a9ed094c0f

📥 Commits

Reviewing files that changed from the base of the PR and between 05b2eaf and 031e81e.

📒 Files selected for processing (4)
  • brain-bar/Sources/BrainBar/MCPRouter.swift
  • brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
  • src/brainlayer/mcp/__init__.py
  • tests/test_mcp_input_schema_limits.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • tests/test_mcp_input_schema_limits.py
  • src/brainlayer/mcp/__init__.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/mcp/__init__.py
🧠 Learnings (11)
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.

Applied to files:

  • brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
  • tests/test_mcp_input_schema_limits.py
  • brain-bar/Sources/BrainBar/MCPRouter.swift
  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-03-18T00:12:15.607Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.

Applied to files:

  • brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
📚 Learning: 2026-03-18T00:12:36.931Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/MCPRouter.swift:0-0
Timestamp: 2026-03-18T00:12:36.931Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar MCP daemon), the notification guard `let isNotification = (rawID == nil || rawID is NSNull)` is the single and only point where a no-response decision is made. Any message that passes this guard has a non-nil, non-NSNull id and MUST return a proper JSON-RPC response. Returning `[:]` (empty dict = no response) anywhere after the notification guard is always a bug — it creates a silent client hang. Flag any `return [:]` that appears after the guard in future reviews.

Applied to files:

  • brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)

Applied to files:

  • tests/test_mcp_input_schema_limits.py
  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-03-18T00:12:08.774Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.

Applied to files:

  • brain-bar/Sources/BrainBar/MCPRouter.swift
📚 Learning: 2026-03-29T18:45:40.988Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.

Applied to files:

  • brain-bar/Sources/BrainBar/MCPRouter.swift
📚 Learning: 2026-04-04T15:21:39.570Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:39.570Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Add `supersedes` parameter to `brain_store` for atomic store-and-replace operations

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-01T22:38:15.693Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T22:38:15.693Z
Learning: Applies to src/brainlayer/mcp/brain_supersede.py : `brain_supersede` tool must include safety gate for personal data (journals, notes, health/finance)

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Pipeline architecture: Extract → Classify → Chunk → Embed → Index, with post-processing for enrichment, brain graph, and Obsidian export

Applied to files:

  • src/brainlayer/mcp/__init__.py
🔇 Additional comments (3)
src/brainlayer/mcp/__init__.py (1)

384-1156: LGTM — consistent wrapping of every tool's inputSchema.

All Tool(...) registrations route through _bounded_input_schema(...), which keeps the bounding logic centralized. I spot-checked each tool against _INPUT_STRING_MAX_LENGTHS/_INPUT_STRING_ARRAY_LIMITS: every string and string-array field has either an explicit entry or a sensible implicit fallback, and all enum values fit within their configured caps.

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift (1)

11-236: LGTM — good coverage of the new schema-bounding contract.

collectStringFields/collectStringArrays correctly recurse through the schema, including string-array item schemas (path suffixed with []), so testEachToolSchemaBoundsStringInputs will catch any future tool that forgets to route through limitedInputSchema. The oversized-content test also anchors the validation to a specific error shape ("Schema validation error" + exact length/limit), which will catch accidental bypasses of the validate(arguments:for:) call site in handleToolsCall.

brain-bar/Sources/BrainBar/MCPRouter.swift (1)

25-41: LGTM — limits table and per-tool wrapping.

The Swift stringMaxLengths / stringArrayLimits entries match the Python values for every overlapping field (content 200_000, query 4_096, tags 100/128, etc.), so the two surfaces agree on the fields BrainBar actually exposes. Each toolDefinitions entry goes through MCPRouter.limitedInputSchema(...) uniformly.

Also applies to: 582-712

Comment on lines +451 to +556
private static func limitedInputSchema(_ schema: [String: Any]) -> [String: Any] {
return applyInputLimits(to: schema, fieldName: nil)
}

private static func applyInputLimits(to schema: [String: Any], fieldName: String?) -> [String: Any] {
var schema = schema
guard let type = schema["type"] as? String else {
return schema
}

switch type {
case "string":
if schema["maxLength"] == nil {
schema["maxLength"] = stringMaxLengths[fieldName ?? ""] ?? defaultStringMaxLength
}
case "array":
if var items = schema["items"] as? [String: Any] {
if (items["type"] as? String) == "string" {
let limits = stringArrayLimits[fieldName ?? ""] ??
(maxItems: defaultStringArrayMaxItems, itemMaxLength: defaultStringMaxLength)
if schema["maxItems"] == nil {
schema["maxItems"] = limits.maxItems
}
if items["maxLength"] == nil {
items["maxLength"] = limits.itemMaxLength
}
}
schema["items"] = applyInputLimits(to: items, fieldName: fieldName)
}
case "object":
if var properties = schema["properties"] as? [String: Any] {
for (propertyName, value) in properties {
guard let propertySchema = value as? [String: Any] else { continue }
properties[propertyName] = applyInputLimits(to: propertySchema, fieldName: propertyName)
}
schema["properties"] = properties
}
default:
break
}

return schema
}

private static func validate(arguments: [String: Any], for toolName: String) throws {
guard
let tool = toolDefinitions.first(where: { ($0["name"] as? String) == toolName }),
let schema = tool["inputSchema"] as? [String: Any]
else {
return
}

try validate(value: arguments, against: schema, fieldPath: "arguments")
}

private static func validate(value: Any, against schema: [String: Any], fieldPath: String) throws {
guard let type = schema["type"] as? String else {
return
}

switch type {
case "object":
guard let object = value as? [String: Any] else {
throw ToolError.schemaValidation("\(fieldPath) must be an object")
}
let required = schema["required"] as? [String] ?? []
for key in required where object[key] == nil || object[key] is NSNull {
throw ToolError.schemaValidation("\(key) is required")
}
if let properties = schema["properties"] as? [String: Any] {
for (propertyName, propertyValue) in object {
guard
!(propertyValue is NSNull),
let propertySchema = properties[propertyName] as? [String: Any]
else { continue }
try validate(value: propertyValue, against: propertySchema, fieldPath: propertyName)
}
}
case "string":
guard let stringValue = value as? String else {
throw ToolError.schemaValidation("\(fieldPath) must be a string")
}
if let enumValues = schema["enum"] as? [String], !enumValues.contains(stringValue) {
throw ToolError.schemaValidation("\(fieldPath) must be one of \(enumValues.joined(separator: ", "))")
}
if let maxLength = schema["maxLength"] as? Int, stringValue.count > maxLength {
throw ToolError.schemaValidation(
"\(fieldPath) length \(stringValue.count) exceeds maxLength \(maxLength)"
)
}
case "array":
guard let arrayValue = value as? [Any] else {
throw ToolError.schemaValidation("\(fieldPath) must be an array")
}
if let maxItems = schema["maxItems"] as? Int, arrayValue.count > maxItems {
throw ToolError.schemaValidation("\(fieldPath) item count \(arrayValue.count) exceeds maxItems \(maxItems)")
}
if let itemSchema = schema["items"] as? [String: Any] {
for (index, item) in arrayValue.enumerated() {
try validate(value: item, against: itemSchema, fieldPath: "\(fieldPath)[\(index)]")
}
}
default:
break
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Validation logic is sound; flat-schema assumption baked into fieldPath.

applyInputLimits and validate mirror the Python implementation closely and the guards (if schema["maxLength"] == nil, schema["maxItems"] == nil) correctly preserve any hand-set bounds in a tool definition. Routing ToolError.schemaValidation through the existing catch in handleToolsCall (so it surfaces as isError: true in the tool result rather than a JSON-RPC error) is consistent with how other tool errors are reported and is exactly what the new test pins down.

One minor observation for future maintainers: inside the object branch you recurse with fieldPath: propertyName rather than "\(fieldPath).\(propertyName)". Every input schema today is flat, so error messages like "content length 200001 exceeds maxLength 200000" read fine and match the test, but the moment someone adds a nested-object input the parent context will silently disappear from error messages. Worth a // flat-schema assumption comment or a trivial concat now.

🛡️ Optional: preserve parent context in nested-object errors
             if let properties = schema["properties"] as? [String: Any] {
                 for (propertyName, propertyValue) in object {
                     guard
                         !(propertyValue is NSNull),
                         let propertySchema = properties[propertyName] as? [String: Any]
                     else { continue }
-                    try validate(value: propertyValue, against: propertySchema, fieldPath: propertyName)
+                    let nextPath = fieldPath == "arguments" ? propertyName : "\(fieldPath).\(propertyName)"
+                    try validate(value: propertyValue, against: propertySchema, fieldPath: nextPath)
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@brain-bar/Sources/BrainBar/MCPRouter.swift` around lines 451 - 556, The
object-branch of validate currently recurses with fieldPath: propertyName which
loses parent context for nested objects; update the recursive calls in
validate(value:against:fieldPath:) to pass the full path (e.g.
"\(fieldPath).\(propertyName)" or handle empty root appropriately) so errors
include parent context, and add a brief comment in applyInputLimits/validate
noting the current flat-schema assumption if you want to keep compatibility;
refer to the validate function and the recursive calls that currently use
fieldPath: propertyName to locate the change.

Comment on lines +131 to +212
_DEFAULT_STRING_MAX_LENGTH = 256
_DEFAULT_STRING_ARRAY_MAX_ITEMS = 100
_INPUT_STRING_MAX_LENGTHS = {
"action": 64,
"agent_id": 128,
"chunk_id": 128,
"content": 200_000,
"content_type": 64,
"content_type_filter": 64,
"context": 4_096,
"correction_category": 128,
"date_from": 32,
"date_to": 32,
"detail": 16,
"entity_id": 128,
"entity_type": 64,
"file_path": 1_024,
"function_name": 256,
"intent": 32,
"mode": 32,
"name": 256,
"new_chunk_id": 128,
"old_chunk_id": 128,
"outcome": 32,
"pattern": 256,
"plan_name": 256,
"project": 256,
"query": 4_096,
"reason": 1_024,
"reversibility": 32,
"safety_check": 32,
"sentiment": 32,
"sentiment_filter": 32,
"session_id": 128,
"severity": 16,
"source": 32,
"source_filter": 512,
"status": 32,
"supersedes": 128,
"tag": 128,
"title": 512,
"type": 32,
}
_INPUT_STRING_ARRAY_LIMITS = {
"chunk_ids": {"max_items": 500, "item_max_length": 128},
"files_changed": {"max_items": 200, "item_max_length": 1_024},
"merge_chunk_ids": {"max_items": 100, "item_max_length": 128},
"participants": {"max_items": 100, "item_max_length": 256},
"tags": {"max_items": 100, "item_max_length": 128},
}


def _bounded_input_schema(schema: dict[str, Any]) -> dict[str, Any]:
"""Apply uniform input caps to string fields before exposing MCP schemas."""
bounded = deepcopy(schema)
_apply_input_limits(bounded)
return bounded


def _apply_input_limits(node: Any, field_name: str | None = None) -> None:
if not isinstance(node, dict):
return

node_type = node.get("type")
if node_type == "string":
node.setdefault("maxLength", _INPUT_STRING_MAX_LENGTHS.get(field_name, _DEFAULT_STRING_MAX_LENGTH))
return

if node_type == "array":
items = node.get("items")
if isinstance(items, dict) and items.get("type") == "string":
limits = _INPUT_STRING_ARRAY_LIMITS.get(
field_name,
{"max_items": _DEFAULT_STRING_ARRAY_MAX_ITEMS, "item_max_length": _DEFAULT_STRING_MAX_LENGTH},
)
node.setdefault("maxItems", limits["max_items"])
items.setdefault("maxLength", limits["item_max_length"])
_apply_input_limits(items, field_name)
return

for prop_name, prop_schema in node.get("properties", {}).items():
_apply_input_limits(prop_schema, prop_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Bounded-schema helpers look correct; silent 256-char fallback is the main footgun.

The recursion handles strings, string-arrays, and nested objects correctly, and the deepcopy in _bounded_input_schema avoids mutating caller-owned schemas. Two low-priority observations:

  1. Any future string input added without a matching _INPUT_STRING_MAX_LENGTHS entry will silently be capped at 256 chars. All current fields are mapped, but a long-form field added later (e.g., another content/context-style field) could be truncated in production before anyone notices. Consider either logging at import time for unmapped string field names seen in list_tools() schemas, or requiring an explicit opt-in for the default.
  2. _apply_input_limits only sets maxItems when items.type == "string". Arrays of non-string items (not currently present in any input schema) would therefore bypass the cardinality cap. Worth a comment or a defensive setdefault("maxItems", _DEFAULT_STRING_ARRAY_MAX_ITEMS) at the array level if you want uniform DoS protection regardless of item type.
🛡️ Optional: always cap array cardinality
     if node_type == "array":
         items = node.get("items")
+        node.setdefault("maxItems", _DEFAULT_STRING_ARRAY_MAX_ITEMS)
         if isinstance(items, dict) and items.get("type") == "string":
             limits = _INPUT_STRING_ARRAY_LIMITS.get(
                 field_name,
                 {"max_items": _DEFAULT_STRING_ARRAY_MAX_ITEMS, "item_max_length": _DEFAULT_STRING_MAX_LENGTH},
             )
-            node.setdefault("maxItems", limits["max_items"])
+            node["maxItems"] = limits["max_items"] if "maxItems" not in node else node["maxItems"]
             items.setdefault("maxLength", limits["item_max_length"])
         _apply_input_limits(items, field_name)
         return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/mcp/__init__.py` around lines 131 - 212, The code silently
falls back to _DEFAULT_STRING_MAX_LENGTH for unknown string field names and only
sets array maxItems for string-item arrays; update _apply_input_limits to (1)
detect when a string node uses the default (i.e., when field_name not in
_INPUT_STRING_MAX_LENGTHS) and emit a warning or raise/configure an explicit
opt-in (use the module logger and reference _INPUT_STRING_MAX_LENGTHS,
_DEFAULT_STRING_MAX_LENGTH, and the function
_apply_input_limits/_bounded_input_schema so reviewers can find it), and (2)
always set a cardinality cap on arrays by calling node.setdefault("maxItems",
<cap>) regardless of items.type (use _INPUT_STRING_ARRAY_LIMITS and
_DEFAULT_STRING_ARRAY_MAX_ITEMS as the source of truth) so arrays of non-string
items are also protected; keep the existing per-item maxLength logic for string
items.

EtanHey added a commit that referenced this pull request Apr 24, 2026
CI lint job required ruff format on the two files touched by PR #254.
No logic change — only whitespace/quote normalization.
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey added 2 commits April 24, 2026 16:24
CI lint job required ruff format on the two files touched by PR #254.
No logic change — only whitespace/quote normalization.
@EtanHey EtanHey force-pushed the security/mcp-string-maxlength branch from 82a1cb8 to d2bbb17 Compare April 24, 2026 13:24
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d2bbb17. Configure here.

XCTAssertNotNil(arraySchema["maxItems"] as? Int, "\(name).\(fieldPath) must declare maxItems")
XCTAssertNotNil(itemSchema["maxLength"] as? Int, "\(name).\(fieldPath)[] must declare maxLength")
}
func testEachToolSchemaBoundsStringInputs() throws {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Garbled edit destroys two Swift test functions

High Severity

The diff botched the edit of testEachToolHasExpectedAnnotations and the insertion of the new testEachToolSchemaBoundsStringInputs. Lines from both functions are interleaved — dictionary entries like "brain_entity" float as bare statements, func declarations appear nested inside other function bodies, and brace matching is broken. Beyond the syntax issues, the expected dictionary in the annotations test was reduced from 11 tools to only "brain_search", so XCTAssertEqual(toolsByName.count, expected.count) would compare 11 to 1. The openWorldHint assertion was also removed. The entire Swift test suite cannot build, meaning the new schema-bounds validation has zero test coverage on the BrainBar side.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d2bbb17. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2bbb177ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +158 to +160
"brain_entity": (true, false, true, false),
let router = MCPRouter()
"brain_digest": (false, false, false, false),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove stray statements that break MCPRouterTests compilation

This commit introduces loose dictionary entries/statements between test methods, so MCPRouterTests.swift is no longer valid Swift syntax and the BrainBar test target cannot compile (e.g., swiftc -typecheck fails at this block). That blocks all router regression tests from running, which is especially risky here because the same commit changes request-schema validation behavior.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant