security: add maxLength to all string MCP inputs (cyberMaster H2) by EtanHey · Pull Request #254 · EtanHey/brainlayer

EtanHey · 2026-04-24T12:46:46Z

Summary

Adds maxLength to all string MCP inputs in the Python server and BrainBar router, plus schema-level regression tests and BrainBar-side request validation for oversize string payloads.

Python MCP Field Limits

Field	maxLength
`action`	64
`agent_id`	128
`chunk_id`	128
`content`	200000
`content_type`	64
`content_type_filter`	64
`context`	4096
`correction_category`	128
`date_from`	32
`date_to`	32
`detail`	16
`entity_id`	128
`entity_type`	64
`file_path`	1024
`function_name`	256
`intent`	32
`mode`	32
`name`	256
`new_chunk_id`	128
`old_chunk_id`	128
`outcome`	32
`pattern`	256
`phase`	256
`plan_name`	256
`project`	256
`query`	4096
`reason`	1024
`reversibility`	32
`safety_check`	32
`sentiment`	32
`sentiment_filter`	32
`session_id`	128
`severity`	16
`source`	32
`source_filter`	512
`status`	32
`supersedes`	128
`tag`	128
`title`	512
`type`	32

Python MCP String Array Limits

Field	maxItems	item maxLength
`chunk_ids[]`	500	128
`files_changed[]`	200	1024
`merge_chunk_ids[]`	100	128
`participants[]`	100	256
`tags[]`	100	128

BrainBar Field Limits

Field	maxLength
`action`	64
`agent_id`	128
`chunk_id`	128
`content`	200000
`detail`	16
`mode`	32
`project`	256
`query`	4096
`session_id`	128
`tag`	128

BrainBar String Array Limits

Field	maxItems	item maxLength
`tags[]`	100	128

Verification

Focused checks

$ pytest tests/test_mcp_input_schema_limits.py
============================== 2 passed in 0.37s ==============================

$ swift test --package-path brain-bar --filter MCPRouterTests
Executed 19 tests, with 0 failures (0 unexpected)

Repo-wide gate

$ pytest
7 failed, 1865 passed, 8 skipped, 6 xfailed, 2 xpassed in 397.25s

Those 7 failures are pre-existing and reproduce on the clean main checkout as well:

$ PYTHONPATH=/Users/etanheyman/Gits/brainlayer/src pytest tests/test_vector_store.py::TestSchema::test_created_at_coverage
FAILED tests/test_vector_store.py::TestSchema::test_created_at_coverage

$ PYTHONPATH=/Users/etanheyman/Gits/brainlayer/src pytest tests/test_eval_baselines.py::TestTemporalQueries::test_recent_week_chunks_exist
FAILED tests/test_eval_baselines.py::TestTemporalQueries::test_recent_week_chunks_exist

$ PYTHONPATH=/Users/etanheyman/Gits/brainlayer/src pytest tests/test_eval_baselines.py::TestPromptHookEntityInjection::test_no_entity_in_generic_query
FAILED tests/test_eval_baselines.py::TestPromptHookEntityInjection::test_no_entity_in_generic_query

Review

cr review --plain surfaced one test hardening issue in tests/test_mcp_input_schema_limits.py; that guard was fixed. A second rerun hit the org review rate limit after the finding was already addressed.

Note

Medium Risk
Introduces new input validation that can reject previously-accepted oversized requests, potentially impacting clients if they exceed the new caps. Changes touch request handling paths in both Swift and Python MCP servers but are localized to schema/validation logic with targeted tests.

Overview
Adds uniform input size limits to MCP tool schemas and enforces them at runtime. The Swift MCPRouter now wraps every tool inputSchema with default/field-specific maxLength (and maxItems + item maxLength for string arrays) and validates tools/call arguments against that schema before dispatch, returning a Schema validation error result on violations.

On the Python MCP server, tool inputSchemas are similarly wrapped via _bounded_input_schema (deep-copied) to apply the same style of caps across all tools. New Swift and Python tests assert that all advertised schemas include bounds for every string and string-array field, and add regression coverage that oversized brain_digest.content is rejected at the schema layer.

^{Reviewed by Cursor Bugbot for commit d2bbb17. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

New Features
- Input schema validation is now enforced for all tool calls, rejecting requests with oversized strings and array values
- Automatic size constraints are applied to string and array input fields across all tools
- Schema validation errors are now properly reported in tool call responses
Tests
- Added comprehensive test coverage for input schema validation and constraint enforcement

Note

Add `maxLength`/`maxItems` bounds and schema validation to all MCP tool string inputs

Adds per-field maxLength caps for string inputs and maxItems/per-item maxLength caps for string-array inputs to all MCP tool schemas in both the Python layer (__init__.py) and Swift layer (MCPRouter.swift).
Introduces a validation step in the tools/call dispatch path that enforces these schema constraints (type, enum membership, maxLength, maxItems) before the tool handler executes, returning a JSON-RPC error on failure.
Adds a new ToolError.schemaValidation case in Swift to surface validation failures with a descriptive message.
Tests verify that every exposed tool schema declares bounds on string/string-array fields and that brain_digest rejects content exceeding 200,000 characters.
Behavioral Change: tools/call now returns a schema validation error for inputs that exceed declared limits, where previously oversized inputs would reach the handler unchecked.

^{Macroscope summarized d2bbb17.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-04-24T12:46:58Z

📝 Walkthrough

Walkthrough

This change implements per-tool JSON input constraints across Swift and Python implementations. Tool input schemas are augmented with default maxLength and maxItems limits, and incoming tool arguments are validated against these schema constraints before dispatch. Schema validation failures return dedicated error responses.

Changes

Cohort / File(s)	Summary
Swift Router & Validation `brain-bar/Sources/BrainBar/MCPRouter.swift`	Adds schema validation logic to enforce JSON input constraints. Introduces `limitedInputSchema`, `applyInputLimits`, and recursive validation methods that check object structure, required fields, string length limits, and array size constraints. New `ToolError.schemaValidation` error type added for validation failures.
Swift Tests `brain-bar/Tests/BrainBarTests/MCPRouterTests.swift`	Adds recursive schema-walking test helpers to collect string and array field schemas. Includes tests asserting `maxLength` bounds on all string fields and `maxItems`/item `maxLength` on string arrays. Validates that oversized `content` payloads are rejected at the schema validation layer.
Python Schema Bounding `src/brainlayer/mcp/__init__.py`	Wraps each tool's `inputSchema` with `_bounded_input_schema()` function that recursively applies uniform length caps: `maxLength` for strings and `maxItems`/item `maxLength` for string arrays. Affects schema constraints for search, store, recall, entity, digest, and lifecycle tools.
Python Tests `tests/test_mcp_input_schema_limits.py`	New test suite that validates MCP tool input schema constraints. Includes schema enumeration helper and assertions verifying every string field has `maxLength` and arrays-of-strings have `maxItems` with item bounds. Integration test confirms `brain_digest` rejects oversized content payloads.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router
    participant SchemaValidator
    participant Tool
    participant ErrorHandler

    Client->>Router: tools/call (with arguments)
    Router->>SchemaValidator: Validate arguments against inputSchema
    alt Validation succeeds
        SchemaValidator-->>Router: Valid
        Router->>Tool: Dispatch to tool
        Tool-->>Router: Tool result
        Router-->>Client: Success response
    else Validation fails
        SchemaValidator-->>ErrorHandler: Schema validation error
        ErrorHandler-->>Client: Error response with validation details
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit's rejoicing verse:

Long inputs once ran wild and free,
But now they're bounded carefully,
With maxLength guards and careful size,
Schema validation's the new sunrise! 🌅
Input constraints make tools more wise,
Safety checks before we try. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the primary security-focused change: adding maxLength constraints to all string MCP inputs across both Python and BrainBar implementations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch security/mcp-string-maxlength

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 031e81eed7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T12:54:01Z

+                        !(propertyValue is NSNull),
+                        let propertySchema = properties[propertyName] as? [String: Any]
+                    else { continue }


Enforce schema limits on alias arguments before dispatch

The new validator skips any argument that is not declared in inputSchema (else { continue }), but handlers still consume alias keys like subscriber_id (for example in handleBrainSearch). That means a caller can send an arbitrarily long subscriber_id string and bypass the new maxLength caps entirely, then pass it through to search filtering/DB calls. This undercuts the security hardening goal of bounding all string MCP inputs; either declare these aliases in schema (with limits) or reject unknown fields so aliases cannot bypass validation.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_mcp_input_schema_limits.py (1)
47-71: ⚠️ Potential issue | 🟡 Minor

mcp dependency is unpinned—consider a version constraint, but error-string assertions are overly defensive.

The schema-bounds test is solid and parallels the Swift test well. The oversized-content test correctly exercises the server's jsonschema validation by dispatching through server.request_handlers[types.CallToolRequest].

However, pyproject.toml specifies mcp>=1.0.0 with no upper bound. This loose constraint means the test's error-string assertions ("Input validation error:" and "is too long") could theoretically break if a future mcp release changes validation error formatting. Consider pinning to a known-good mcp version range (e.g., mcp>=1.0.0,<2.0.0) rather than loosening the test assertions. Loosening to check only presence of "content" and str(200_000) weakens test precision without addressing the underlying version coupling.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_mcp_input_schema_limits.py` around lines 47 - 71, The pyproject
currently allows an unpinned mcp (mcp>=1.0.0) which can break the exact
error-message assertions in test_brain_digest_schema_rejects_oversized_content
in tests/test_mcp_input_schema_limits.py; update pyproject.toml to constrain mcp
to a known-compatible range (for example mcp>=1.0.0,<2.0.0) so the jsonschema
error format remains stable, and keep the test in
test_brain_digest_schema_rejects_oversized_content (which calls
_call_brain_digest and asserts result.isError and the error content includes the
validation text) as-is rather than loosening its assertions. Ensure the package
version change is reflected in dependency lock/update commands so CI uses the
pinned range.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@brain-bar/Sources/BrainBar/MCPRouter.swift`:
- Around line 451-556: The object-branch of validate currently recurses with
fieldPath: propertyName which loses parent context for nested objects; update
the recursive calls in validate(value:against:fieldPath:) to pass the full path
(e.g. "\(fieldPath).\(propertyName)" or handle empty root appropriately) so
errors include parent context, and add a brief comment in
applyInputLimits/validate noting the current flat-schema assumption if you want
to keep compatibility; refer to the validate function and the recursive calls
that currently use fieldPath: propertyName to locate the change.

In `@src/brainlayer/mcp/__init__.py`:
- Around line 131-212: The code silently falls back to
_DEFAULT_STRING_MAX_LENGTH for unknown string field names and only sets array
maxItems for string-item arrays; update _apply_input_limits to (1) detect when a
string node uses the default (i.e., when field_name not in
_INPUT_STRING_MAX_LENGTHS) and emit a warning or raise/configure an explicit
opt-in (use the module logger and reference _INPUT_STRING_MAX_LENGTHS,
_DEFAULT_STRING_MAX_LENGTH, and the function
_apply_input_limits/_bounded_input_schema so reviewers can find it), and (2)
always set a cardinality cap on arrays by calling node.setdefault("maxItems",
<cap>) regardless of items.type (use _INPUT_STRING_ARRAY_LIMITS and
_DEFAULT_STRING_ARRAY_MAX_ITEMS as the source of truth) so arrays of non-string
items are also protected; keep the existing per-item maxLength logic for string
items.

---

Outside diff comments:
In `@tests/test_mcp_input_schema_limits.py`:
- Around line 47-71: The pyproject currently allows an unpinned mcp (mcp>=1.0.0)
which can break the exact error-message assertions in
test_brain_digest_schema_rejects_oversized_content in
tests/test_mcp_input_schema_limits.py; update pyproject.toml to constrain mcp to
a known-compatible range (for example mcp>=1.0.0,<2.0.0) so the jsonschema error
format remains stable, and keep the test in
test_brain_digest_schema_rejects_oversized_content (which calls
_call_brain_digest and asserts result.isError and the error content includes the
validation text) as-is rather than loosening its assertions. Ensure the package
version change is reflected in dependency lock/update commands so CI uses the
pinned range.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75dfd5e7-7b71-4c5b-8aee-88a9ed094c0f

📥 Commits

Reviewing files that changed from the base of the PR and between 05b2eaf and 031e81e.

📒 Files selected for processing (4)

brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
src/brainlayer/mcp/__init__.py
tests/test_mcp_input_schema_limits.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Cursor Bugbot
GitHub Check: Macroscope - Correctness Check
GitHub Check: test (3.11)
GitHub Check: test (3.13)
GitHub Check: test (3.12)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

tests/test_mcp_input_schema_limits.py
src/brainlayer/mcp/__init__.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

src/brainlayer/mcp/__init__.py

🧠 Learnings (11)

📚 Learning: 2026-03-17T01:04:22.497Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.

Applied to files:

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
tests/test_mcp_input_schema_limits.py
brain-bar/Sources/BrainBar/MCPRouter.swift
src/brainlayer/mcp/__init__.py

📚 Learning: 2026-03-18T00:12:15.607Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.

Applied to files:

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift

📚 Learning: 2026-03-18T00:12:36.931Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/MCPRouter.swift:0-0
Timestamp: 2026-03-18T00:12:36.931Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar MCP daemon), the notification guard `let isNotification = (rawID == nil || rawID is NSNull)` is the single and only point where a no-response decision is made. Any message that passes this guard has a non-nil, non-NSNull id and MUST return a proper JSON-RPC response. Returning `[:]` (empty dict = no response) anywhere after the notification guard is always a bug — it creates a silent client hang. Flag any `return [:]` that appears after the guard in future reviews.

Applied to files:

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift

📚 Learning: 2026-04-01T01:24:44.281Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)

Applied to files:

tests/test_mcp_input_schema_limits.py
src/brainlayer/mcp/__init__.py

📚 Learning: 2026-03-18T00:12:08.774Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.

Applied to files:

brain-bar/Sources/BrainBar/MCPRouter.swift

📚 Learning: 2026-03-29T18:45:40.988Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.

Applied to files:

brain-bar/Sources/BrainBar/MCPRouter.swift

📚 Learning: 2026-04-04T15:21:39.570Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:39.570Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.

Applied to files:

src/brainlayer/mcp/__init__.py

📚 Learning: 2026-04-06T08:40:13.531Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Add `supersedes` parameter to `brain_store` for atomic store-and-replace operations

Applied to files:

src/brainlayer/mcp/__init__.py

📚 Learning: 2026-04-01T22:38:15.693Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T22:38:15.693Z
Learning: Applies to src/brainlayer/mcp/brain_supersede.py : `brain_supersede` tool must include safety gate for personal data (journals, notes, health/finance)

Applied to files:

src/brainlayer/mcp/__init__.py

📚 Learning: 2026-04-06T08:40:13.531Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history

Applied to files:

src/brainlayer/mcp/__init__.py

📚 Learning: 2026-04-06T08:40:13.531Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Pipeline architecture: Extract → Classify → Chunk → Embed → Index, with post-processing for enrichment, brain graph, and Obsidian export

Applied to files:

src/brainlayer/mcp/__init__.py

🔇 Additional comments (3)

src/brainlayer/mcp/__init__.py (1)

384-1156: LGTM — consistent wrapping of every tool's inputSchema.

All Tool(...) registrations route through _bounded_input_schema(...), which keeps the bounding logic centralized. I spot-checked each tool against _INPUT_STRING_MAX_LENGTHS/_INPUT_STRING_ARRAY_LIMITS: every string and string-array field has either an explicit entry or a sensible implicit fallback, and all enum values fit within their configured caps.

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift (1)

11-236: LGTM — good coverage of the new schema-bounding contract.

collectStringFields/collectStringArrays correctly recurse through the schema, including string-array item schemas (path suffixed with []), so testEachToolSchemaBoundsStringInputs will catch any future tool that forgets to route through limitedInputSchema. The oversized-content test also anchors the validation to a specific error shape ("Schema validation error" + exact length/limit), which will catch accidental bypasses of the validate(arguments:for:) call site in handleToolsCall.

brain-bar/Sources/BrainBar/MCPRouter.swift (1)

25-41: LGTM — limits table and per-tool wrapping.

The Swift stringMaxLengths / stringArrayLimits entries match the Python values for every overlapping field (content 200_000, query 4_096, tags 100/128, etc.), so the two surfaces agree on the fields BrainBar actually exposes. Each toolDefinitions entry goes through MCPRouter.limitedInputSchema(...) uniformly.

Also applies to: 582-712

coderabbitai · 2026-04-24T12:55:57Z

+    private static func limitedInputSchema(_ schema: [String: Any]) -> [String: Any] {
+        return applyInputLimits(to: schema, fieldName: nil)
+    }
+
+    private static func applyInputLimits(to schema: [String: Any], fieldName: String?) -> [String: Any] {
+        var schema = schema
+        guard let type = schema["type"] as? String else {
+            return schema
+        }
+
+        switch type {
+        case "string":
+            if schema["maxLength"] == nil {
+                schema["maxLength"] = stringMaxLengths[fieldName ?? ""] ?? defaultStringMaxLength
+            }
+        case "array":
+            if var items = schema["items"] as? [String: Any] {
+                if (items["type"] as? String) == "string" {
+                    let limits = stringArrayLimits[fieldName ?? ""] ??
+                        (maxItems: defaultStringArrayMaxItems, itemMaxLength: defaultStringMaxLength)
+                    if schema["maxItems"] == nil {
+                        schema["maxItems"] = limits.maxItems
+                    }
+                    if items["maxLength"] == nil {
+                        items["maxLength"] = limits.itemMaxLength
+                    }
+                }
+                schema["items"] = applyInputLimits(to: items, fieldName: fieldName)
+            }
+        case "object":
+            if var properties = schema["properties"] as? [String: Any] {
+                for (propertyName, value) in properties {
+                    guard let propertySchema = value as? [String: Any] else { continue }
+                    properties[propertyName] = applyInputLimits(to: propertySchema, fieldName: propertyName)
+                }
+                schema["properties"] = properties
+            }
+        default:
+            break
+        }
+
+        return schema
+    }
+
+    private static func validate(arguments: [String: Any], for toolName: String) throws {
+        guard
+            let tool = toolDefinitions.first(where: { ($0["name"] as? String) == toolName }),
+            let schema = tool["inputSchema"] as? [String: Any]
+        else {
+            return
+        }
+
+        try validate(value: arguments, against: schema, fieldPath: "arguments")
+    }
+
+    private static func validate(value: Any, against schema: [String: Any], fieldPath: String) throws {
+        guard let type = schema["type"] as? String else {
+            return
+        }
+
+        switch type {
+        case "object":
+            guard let object = value as? [String: Any] else {
+                throw ToolError.schemaValidation("\(fieldPath) must be an object")
+            }
+            let required = schema["required"] as? [String] ?? []
+            for key in required where object[key] == nil || object[key] is NSNull {
+                throw ToolError.schemaValidation("\(key) is required")
+            }
+            if let properties = schema["properties"] as? [String: Any] {
+                for (propertyName, propertyValue) in object {
+                    guard
+                        !(propertyValue is NSNull),
+                        let propertySchema = properties[propertyName] as? [String: Any]
+                    else { continue }
+                    try validate(value: propertyValue, against: propertySchema, fieldPath: propertyName)
+                }
+            }
+        case "string":
+            guard let stringValue = value as? String else {
+                throw ToolError.schemaValidation("\(fieldPath) must be a string")
+            }
+            if let enumValues = schema["enum"] as? [String], !enumValues.contains(stringValue) {
+                throw ToolError.schemaValidation("\(fieldPath) must be one of \(enumValues.joined(separator: ", "))")
+            }
+            if let maxLength = schema["maxLength"] as? Int, stringValue.count > maxLength {
+                throw ToolError.schemaValidation(
+                    "\(fieldPath) length \(stringValue.count) exceeds maxLength \(maxLength)"
+                )
+            }
+        case "array":
+            guard let arrayValue = value as? [Any] else {
+                throw ToolError.schemaValidation("\(fieldPath) must be an array")
+            }
+            if let maxItems = schema["maxItems"] as? Int, arrayValue.count > maxItems {
+                throw ToolError.schemaValidation("\(fieldPath) item count \(arrayValue.count) exceeds maxItems \(maxItems)")
+            }
+            if let itemSchema = schema["items"] as? [String: Any] {
+                for (index, item) in arrayValue.enumerated() {
+                    try validate(value: item, against: itemSchema, fieldPath: "\(fieldPath)[\(index)]")
+                }
+            }
+        default:
+            break
+        }
+    }


🧹 Nitpick | 🔵 Trivial

Validation logic is sound; flat-schema assumption baked into fieldPath.

applyInputLimits and validate mirror the Python implementation closely and the guards (if schema["maxLength"] == nil, schema["maxItems"] == nil) correctly preserve any hand-set bounds in a tool definition. Routing ToolError.schemaValidation through the existing catch in handleToolsCall (so it surfaces as isError: true in the tool result rather than a JSON-RPC error) is consistent with how other tool errors are reported and is exactly what the new test pins down.

One minor observation for future maintainers: inside the object branch you recurse with fieldPath: propertyName rather than "\(fieldPath).\(propertyName)". Every input schema today is flat, so error messages like "content length 200001 exceeds maxLength 200000" read fine and match the test, but the moment someone adds a nested-object input the parent context will silently disappear from error messages. Worth a // flat-schema assumption comment or a trivial concat now.

🛡️ Optional: preserve parent context in nested-object errors

if let properties = schema["properties"] as? [String: Any] { for (propertyName, propertyValue) in object { guard !(propertyValue is NSNull), let propertySchema = properties[propertyName] as? [String: Any] else { continue } - try validate(value: propertyValue, against: propertySchema, fieldPath: propertyName) + let nextPath = fieldPath == "arguments" ? propertyName : "\(fieldPath).\(propertyName)" + try validate(value: propertyValue, against: propertySchema, fieldPath: nextPath) } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@brain-bar/Sources/BrainBar/MCPRouter.swift` around lines 451 - 556, The object-branch of validate currently recurses with fieldPath: propertyName which loses parent context for nested objects; update the recursive calls in validate(value:against:fieldPath:) to pass the full path (e.g. "\(fieldPath).\(propertyName)" or handle empty root appropriately) so errors include parent context, and add a brief comment in applyInputLimits/validate noting the current flat-schema assumption if you want to keep compatibility; refer to the validate function and the recursive calls that currently use fieldPath: propertyName to locate the change.

coderabbitai · 2026-04-24T12:55:57Z

+_DEFAULT_STRING_MAX_LENGTH = 256
+_DEFAULT_STRING_ARRAY_MAX_ITEMS = 100
+_INPUT_STRING_MAX_LENGTHS = {
+    "action": 64,
+    "agent_id": 128,
+    "chunk_id": 128,
+    "content": 200_000,
+    "content_type": 64,
+    "content_type_filter": 64,
+    "context": 4_096,
+    "correction_category": 128,
+    "date_from": 32,
+    "date_to": 32,
+    "detail": 16,
+    "entity_id": 128,
+    "entity_type": 64,
+    "file_path": 1_024,
+    "function_name": 256,
+    "intent": 32,
+    "mode": 32,
+    "name": 256,
+    "new_chunk_id": 128,
+    "old_chunk_id": 128,
+    "outcome": 32,
+    "pattern": 256,
+    "plan_name": 256,
+    "project": 256,
+    "query": 4_096,
+    "reason": 1_024,
+    "reversibility": 32,
+    "safety_check": 32,
+    "sentiment": 32,
+    "sentiment_filter": 32,
+    "session_id": 128,
+    "severity": 16,
+    "source": 32,
+    "source_filter": 512,
+    "status": 32,
+    "supersedes": 128,
+    "tag": 128,
+    "title": 512,
+    "type": 32,
+}
+_INPUT_STRING_ARRAY_LIMITS = {
+    "chunk_ids": {"max_items": 500, "item_max_length": 128},
+    "files_changed": {"max_items": 200, "item_max_length": 1_024},
+    "merge_chunk_ids": {"max_items": 100, "item_max_length": 128},
+    "participants": {"max_items": 100, "item_max_length": 256},
+    "tags": {"max_items": 100, "item_max_length": 128},
+}
+
+
+def _bounded_input_schema(schema: dict[str, Any]) -> dict[str, Any]:
+    """Apply uniform input caps to string fields before exposing MCP schemas."""
+    bounded = deepcopy(schema)
+    _apply_input_limits(bounded)
+    return bounded
+
+
+def _apply_input_limits(node: Any, field_name: str | None = None) -> None:
+    if not isinstance(node, dict):
+        return
+
+    node_type = node.get("type")
+    if node_type == "string":
+        node.setdefault("maxLength", _INPUT_STRING_MAX_LENGTHS.get(field_name, _DEFAULT_STRING_MAX_LENGTH))
+        return
+
+    if node_type == "array":
+        items = node.get("items")
+        if isinstance(items, dict) and items.get("type") == "string":
+            limits = _INPUT_STRING_ARRAY_LIMITS.get(
+                field_name,
+                {"max_items": _DEFAULT_STRING_ARRAY_MAX_ITEMS, "item_max_length": _DEFAULT_STRING_MAX_LENGTH},
+            )
+            node.setdefault("maxItems", limits["max_items"])
+            items.setdefault("maxLength", limits["item_max_length"])
+        _apply_input_limits(items, field_name)
+        return
+
+    for prop_name, prop_schema in node.get("properties", {}).items():
+        _apply_input_limits(prop_schema, prop_name)


🧹 Nitpick | 🔵 Trivial

Bounded-schema helpers look correct; silent 256-char fallback is the main footgun.

The recursion handles strings, string-arrays, and nested objects correctly, and the deepcopy in _bounded_input_schema avoids mutating caller-owned schemas. Two low-priority observations:

Any future string input added without a matching _INPUT_STRING_MAX_LENGTHS entry will silently be capped at 256 chars. All current fields are mapped, but a long-form field added later (e.g., another content/context-style field) could be truncated in production before anyone notices. Consider either logging at import time for unmapped string field names seen in list_tools() schemas, or requiring an explicit opt-in for the default.

_apply_input_limits only sets maxItems when items.type == "string". Arrays of non-string items (not currently present in any input schema) would therefore bypass the cardinality cap. Worth a comment or a defensive setdefault("maxItems", _DEFAULT_STRING_ARRAY_MAX_ITEMS) at the array level if you want uniform DoS protection regardless of item type.

🛡️ Optional: always cap array cardinality

if node_type == "array": items = node.get("items") + node.setdefault("maxItems", _DEFAULT_STRING_ARRAY_MAX_ITEMS) if isinstance(items, dict) and items.get("type") == "string": limits = _INPUT_STRING_ARRAY_LIMITS.get( field_name, {"max_items": _DEFAULT_STRING_ARRAY_MAX_ITEMS, "item_max_length": _DEFAULT_STRING_MAX_LENGTH}, ) - node.setdefault("maxItems", limits["max_items"]) + node["maxItems"] = limits["max_items"] if "maxItems" not in node else node["maxItems"] items.setdefault("maxLength", limits["item_max_length"]) _apply_input_limits(items, field_name) return

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/mcp/__init__.py` around lines 131 - 212, The code silently falls back to _DEFAULT_STRING_MAX_LENGTH for unknown string field names and only sets array maxItems for string-item arrays; update _apply_input_limits to (1) detect when a string node uses the default (i.e., when field_name not in _INPUT_STRING_MAX_LENGTHS) and emit a warning or raise/configure an explicit opt-in (use the module logger and reference _INPUT_STRING_MAX_LENGTHS, _DEFAULT_STRING_MAX_LENGTH, and the function _apply_input_limits/_bounded_input_schema so reviewers can find it), and (2) always set a cardinality cap on arrays by calling node.setdefault("maxItems", <cap>) regardless of items.type (use _INPUT_STRING_ARRAY_LIMITS and _DEFAULT_STRING_ARRAY_MAX_ITEMS as the source of truth) so arrays of non-string items are also protected; keep the existing per-item maxLength logic for string items.

CI lint job required ruff format on the two files touched by PR #254. No logic change — only whitespace/quote normalization.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

CI lint job required ruff format on the two files touched by PR #254. No logic change — only whitespace/quote normalization.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d2bbb17. Configure here.}

cursor · 2026-04-24T13:29:49Z

+                XCTAssertNotNil(arraySchema["maxItems"] as? Int, "\(name).\(fieldPath) must declare maxItems")
+                XCTAssertNotNil(itemSchema["maxLength"] as? Int, "\(name).\(fieldPath)[] must declare maxLength")
+            }
+    func testEachToolSchemaBoundsStringInputs() throws {


Garbled edit destroys two Swift test functions

High Severity

The diff botched the edit of testEachToolHasExpectedAnnotations and the insertion of the new testEachToolSchemaBoundsStringInputs. Lines from both functions are interleaved — dictionary entries like "brain_entity" float as bare statements, func declarations appear nested inside other function bodies, and brace matching is broken. Beyond the syntax issues, the expected dictionary in the annotations test was reduced from 11 tools to only "brain_search", so XCTAssertEqual(toolsByName.count, expected.count) would compare 11 to 1. The openWorldHint assertion was also removed. The entire Swift test suite cannot build, meaning the new schema-bounds validation has zero test coverage on the BrainBar side.

Additional Locations (1)

brain-bar/Tests/BrainBarTests/MCPRouterTests.swift#L182-L185

^{Reviewed by Cursor Bugbot for commit d2bbb17. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2bbb177ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T13:30:49Z

+            "brain_entity": (true, false, true, false),
        let router = MCPRouter()
+            "brain_digest": (false, false, false, false),


Remove stray statements that break MCPRouterTests compilation

This commit introduces loose dictionary entries/statements between test methods, so MCPRouterTests.swift is no longer valid Swift syntax and the BrainBar test target cannot compile (e.g., swiftc -typecheck fails at this block). That blocks all router regression tests from running, which is especially risky here because the same commit changes request-schema validation behavior.

Useful? React with 👍 / 👎.

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

EtanHey added a commit that referenced this pull request Apr 24, 2026

style: ruff format on maxLength fix (cyberMaster H2 follow-up)

82a1cb8

CI lint job required ruff format on the two files touched by PR #254. No logic change — only whitespace/quote normalization.

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

EtanHey added 2 commits April 24, 2026 16:24

security: add maxLength to all string MCP inputs

e2b33fb

style: ruff format on maxLength fix (cyberMaster H2 follow-up)

d2bbb17

CI lint job required ruff format on the two files touched by PR #254. No logic change — only whitespace/quote normalization.

EtanHey force-pushed the security/mcp-string-maxlength branch from 82a1cb8 to d2bbb17 Compare April 24, 2026 13:24

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

cursor Bot reviewed Apr 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

Conversation

EtanHey commented Apr 24, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Python MCP Field Limits

Python MCP String Array Limits

BrainBar Field Limits

BrainBar String Array Limits

Verification

Focused checks

Repo-wide gate

Review

Summary by CodeRabbit

Add maxLength/maxItems bounds and schema validation to all MCP tool string inputs

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 24, 2026

Choose a reason for hiding this comment

Garbled edit destroys two Swift test functions

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented Apr 24, 2026 •

edited by macroscopeapp Bot

Loading

Add `maxLength`/`maxItems` bounds and schema validation to all MCP tool string inputs

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading