security: add maxLength to all string MCP inputs (cyberMaster H2)#254
security: add maxLength to all string MCP inputs (cyberMaster H2)#254
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThis change implements per-tool JSON input constraints across Swift and Python implementations. Tool input schemas are augmented with default Changes
Sequence DiagramsequenceDiagram
participant Client
participant Router
participant SchemaValidator
participant Tool
participant ErrorHandler
Client->>Router: tools/call (with arguments)
Router->>SchemaValidator: Validate arguments against inputSchema
alt Validation succeeds
SchemaValidator-->>Router: Valid
Router->>Tool: Dispatch to tool
Tool-->>Router: Tool result
Router-->>Client: Success response
else Validation fails
SchemaValidator-->>ErrorHandler: Schema validation error
ErrorHandler-->>Client: Error response with validation details
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 031e81eed7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| !(propertyValue is NSNull), | ||
| let propertySchema = properties[propertyName] as? [String: Any] | ||
| else { continue } |
There was a problem hiding this comment.
Enforce schema limits on alias arguments before dispatch
The new validator skips any argument that is not declared in inputSchema (else { continue }), but handlers still consume alias keys like subscriber_id (for example in handleBrainSearch). That means a caller can send an arbitrarily long subscriber_id string and bypass the new maxLength caps entirely, then pass it through to search filtering/DB calls. This undercuts the security hardening goal of bounding all string MCP inputs; either declare these aliases in schema (with limits) or reject unknown fields so aliases cannot bypass validation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_mcp_input_schema_limits.py (1)
47-71:⚠️ Potential issue | 🟡 Minormcp dependency is unpinned—consider a version constraint, but error-string assertions are overly defensive.
The schema-bounds test is solid and parallels the Swift test well. The oversized-content test correctly exercises the server's jsonschema validation by dispatching through
server.request_handlers[types.CallToolRequest].However,
pyproject.tomlspecifiesmcp>=1.0.0with no upper bound. This loose constraint means the test's error-string assertions ("Input validation error:"and"is too long") could theoretically break if a future mcp release changes validation error formatting. Consider pinning to a known-good mcp version range (e.g.,mcp>=1.0.0,<2.0.0) rather than loosening the test assertions. Loosening to check only presence of"content"andstr(200_000)weakens test precision without addressing the underlying version coupling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_mcp_input_schema_limits.py` around lines 47 - 71, The pyproject currently allows an unpinned mcp (mcp>=1.0.0) which can break the exact error-message assertions in test_brain_digest_schema_rejects_oversized_content in tests/test_mcp_input_schema_limits.py; update pyproject.toml to constrain mcp to a known-compatible range (for example mcp>=1.0.0,<2.0.0) so the jsonschema error format remains stable, and keep the test in test_brain_digest_schema_rejects_oversized_content (which calls _call_brain_digest and asserts result.isError and the error content includes the validation text) as-is rather than loosening its assertions. Ensure the package version change is reflected in dependency lock/update commands so CI uses the pinned range.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@brain-bar/Sources/BrainBar/MCPRouter.swift`:
- Around line 451-556: The object-branch of validate currently recurses with
fieldPath: propertyName which loses parent context for nested objects; update
the recursive calls in validate(value:against:fieldPath:) to pass the full path
(e.g. "\(fieldPath).\(propertyName)" or handle empty root appropriately) so
errors include parent context, and add a brief comment in
applyInputLimits/validate noting the current flat-schema assumption if you want
to keep compatibility; refer to the validate function and the recursive calls
that currently use fieldPath: propertyName to locate the change.
In `@src/brainlayer/mcp/__init__.py`:
- Around line 131-212: The code silently falls back to
_DEFAULT_STRING_MAX_LENGTH for unknown string field names and only sets array
maxItems for string-item arrays; update _apply_input_limits to (1) detect when a
string node uses the default (i.e., when field_name not in
_INPUT_STRING_MAX_LENGTHS) and emit a warning or raise/configure an explicit
opt-in (use the module logger and reference _INPUT_STRING_MAX_LENGTHS,
_DEFAULT_STRING_MAX_LENGTH, and the function
_apply_input_limits/_bounded_input_schema so reviewers can find it), and (2)
always set a cardinality cap on arrays by calling node.setdefault("maxItems",
<cap>) regardless of items.type (use _INPUT_STRING_ARRAY_LIMITS and
_DEFAULT_STRING_ARRAY_MAX_ITEMS as the source of truth) so arrays of non-string
items are also protected; keep the existing per-item maxLength logic for string
items.
---
Outside diff comments:
In `@tests/test_mcp_input_schema_limits.py`:
- Around line 47-71: The pyproject currently allows an unpinned mcp (mcp>=1.0.0)
which can break the exact error-message assertions in
test_brain_digest_schema_rejects_oversized_content in
tests/test_mcp_input_schema_limits.py; update pyproject.toml to constrain mcp to
a known-compatible range (for example mcp>=1.0.0,<2.0.0) so the jsonschema error
format remains stable, and keep the test in
test_brain_digest_schema_rejects_oversized_content (which calls
_call_brain_digest and asserts result.isError and the error content includes the
validation text) as-is rather than loosening its assertions. Ensure the package
version change is reflected in dependency lock/update commands so CI uses the
pinned range.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 75dfd5e7-7b71-4c5b-8aee-88a9ed094c0f
📒 Files selected for processing (4)
brain-bar/Sources/BrainBar/MCPRouter.swiftbrain-bar/Tests/BrainBarTests/MCPRouterTests.swiftsrc/brainlayer/mcp/__init__.pytests/test_mcp_input_schema_limits.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Cursor Bugbot
- GitHub Check: Macroscope - Correctness Check
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
tests/test_mcp_input_schema_limits.pysrc/brainlayer/mcp/__init__.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/mcp/__init__.py
🧠 Learnings (11)
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.
Applied to files:
brain-bar/Tests/BrainBarTests/MCPRouterTests.swifttests/test_mcp_input_schema_limits.pybrain-bar/Sources/BrainBar/MCPRouter.swiftsrc/brainlayer/mcp/__init__.py
📚 Learning: 2026-03-18T00:12:15.607Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.
Applied to files:
brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
📚 Learning: 2026-03-18T00:12:36.931Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/MCPRouter.swift:0-0
Timestamp: 2026-03-18T00:12:36.931Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar MCP daemon), the notification guard `let isNotification = (rawID == nil || rawID is NSNull)` is the single and only point where a no-response decision is made. Any message that passes this guard has a non-nil, non-NSNull id and MUST return a proper JSON-RPC response. Returning `[:]` (empty dict = no response) anywhere after the notification guard is always a bug — it creates a silent client hang. Flag any `return [:]` that appears after the guard in future reviews.
Applied to files:
brain-bar/Tests/BrainBarTests/MCPRouterTests.swift
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)
Applied to files:
tests/test_mcp_input_schema_limits.pysrc/brainlayer/mcp/__init__.py
📚 Learning: 2026-03-18T00:12:08.774Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.
Applied to files:
brain-bar/Sources/BrainBar/MCPRouter.swift
📚 Learning: 2026-03-29T18:45:40.988Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.
Applied to files:
brain-bar/Sources/BrainBar/MCPRouter.swift
📚 Learning: 2026-04-04T15:21:39.570Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:39.570Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Add `supersedes` parameter to `brain_store` for atomic store-and-replace operations
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-01T22:38:15.693Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T22:38:15.693Z
Learning: Applies to src/brainlayer/mcp/brain_supersede.py : `brain_supersede` tool must include safety gate for personal data (journals, notes, health/finance)
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Pipeline architecture: Extract → Classify → Chunk → Embed → Index, with post-processing for enrichment, brain graph, and Obsidian export
Applied to files:
src/brainlayer/mcp/__init__.py
🔇 Additional comments (3)
src/brainlayer/mcp/__init__.py (1)
384-1156: LGTM — consistent wrapping of every tool'sinputSchema.All
Tool(...)registrations route through_bounded_input_schema(...), which keeps the bounding logic centralized. I spot-checked each tool against_INPUT_STRING_MAX_LENGTHS/_INPUT_STRING_ARRAY_LIMITS: every string and string-array field has either an explicit entry or a sensible implicit fallback, and all enum values fit within their configured caps.brain-bar/Tests/BrainBarTests/MCPRouterTests.swift (1)
11-236: LGTM — good coverage of the new schema-bounding contract.
collectStringFields/collectStringArrayscorrectly recurse through the schema, including string-array item schemas (path suffixed with[]), sotestEachToolSchemaBoundsStringInputswill catch any future tool that forgets to route throughlimitedInputSchema. The oversized-contenttest also anchors the validation to a specific error shape ("Schema validation error"+ exact length/limit), which will catch accidental bypasses of thevalidate(arguments:for:)call site inhandleToolsCall.brain-bar/Sources/BrainBar/MCPRouter.swift (1)
25-41: LGTM — limits table and per-tool wrapping.The Swift
stringMaxLengths/stringArrayLimitsentries match the Python values for every overlapping field (content200_000,query4_096,tags100/128, etc.), so the two surfaces agree on the fields BrainBar actually exposes. EachtoolDefinitionsentry goes throughMCPRouter.limitedInputSchema(...)uniformly.Also applies to: 582-712
| private static func limitedInputSchema(_ schema: [String: Any]) -> [String: Any] { | ||
| return applyInputLimits(to: schema, fieldName: nil) | ||
| } | ||
|
|
||
| private static func applyInputLimits(to schema: [String: Any], fieldName: String?) -> [String: Any] { | ||
| var schema = schema | ||
| guard let type = schema["type"] as? String else { | ||
| return schema | ||
| } | ||
|
|
||
| switch type { | ||
| case "string": | ||
| if schema["maxLength"] == nil { | ||
| schema["maxLength"] = stringMaxLengths[fieldName ?? ""] ?? defaultStringMaxLength | ||
| } | ||
| case "array": | ||
| if var items = schema["items"] as? [String: Any] { | ||
| if (items["type"] as? String) == "string" { | ||
| let limits = stringArrayLimits[fieldName ?? ""] ?? | ||
| (maxItems: defaultStringArrayMaxItems, itemMaxLength: defaultStringMaxLength) | ||
| if schema["maxItems"] == nil { | ||
| schema["maxItems"] = limits.maxItems | ||
| } | ||
| if items["maxLength"] == nil { | ||
| items["maxLength"] = limits.itemMaxLength | ||
| } | ||
| } | ||
| schema["items"] = applyInputLimits(to: items, fieldName: fieldName) | ||
| } | ||
| case "object": | ||
| if var properties = schema["properties"] as? [String: Any] { | ||
| for (propertyName, value) in properties { | ||
| guard let propertySchema = value as? [String: Any] else { continue } | ||
| properties[propertyName] = applyInputLimits(to: propertySchema, fieldName: propertyName) | ||
| } | ||
| schema["properties"] = properties | ||
| } | ||
| default: | ||
| break | ||
| } | ||
|
|
||
| return schema | ||
| } | ||
|
|
||
| private static func validate(arguments: [String: Any], for toolName: String) throws { | ||
| guard | ||
| let tool = toolDefinitions.first(where: { ($0["name"] as? String) == toolName }), | ||
| let schema = tool["inputSchema"] as? [String: Any] | ||
| else { | ||
| return | ||
| } | ||
|
|
||
| try validate(value: arguments, against: schema, fieldPath: "arguments") | ||
| } | ||
|
|
||
| private static func validate(value: Any, against schema: [String: Any], fieldPath: String) throws { | ||
| guard let type = schema["type"] as? String else { | ||
| return | ||
| } | ||
|
|
||
| switch type { | ||
| case "object": | ||
| guard let object = value as? [String: Any] else { | ||
| throw ToolError.schemaValidation("\(fieldPath) must be an object") | ||
| } | ||
| let required = schema["required"] as? [String] ?? [] | ||
| for key in required where object[key] == nil || object[key] is NSNull { | ||
| throw ToolError.schemaValidation("\(key) is required") | ||
| } | ||
| if let properties = schema["properties"] as? [String: Any] { | ||
| for (propertyName, propertyValue) in object { | ||
| guard | ||
| !(propertyValue is NSNull), | ||
| let propertySchema = properties[propertyName] as? [String: Any] | ||
| else { continue } | ||
| try validate(value: propertyValue, against: propertySchema, fieldPath: propertyName) | ||
| } | ||
| } | ||
| case "string": | ||
| guard let stringValue = value as? String else { | ||
| throw ToolError.schemaValidation("\(fieldPath) must be a string") | ||
| } | ||
| if let enumValues = schema["enum"] as? [String], !enumValues.contains(stringValue) { | ||
| throw ToolError.schemaValidation("\(fieldPath) must be one of \(enumValues.joined(separator: ", "))") | ||
| } | ||
| if let maxLength = schema["maxLength"] as? Int, stringValue.count > maxLength { | ||
| throw ToolError.schemaValidation( | ||
| "\(fieldPath) length \(stringValue.count) exceeds maxLength \(maxLength)" | ||
| ) | ||
| } | ||
| case "array": | ||
| guard let arrayValue = value as? [Any] else { | ||
| throw ToolError.schemaValidation("\(fieldPath) must be an array") | ||
| } | ||
| if let maxItems = schema["maxItems"] as? Int, arrayValue.count > maxItems { | ||
| throw ToolError.schemaValidation("\(fieldPath) item count \(arrayValue.count) exceeds maxItems \(maxItems)") | ||
| } | ||
| if let itemSchema = schema["items"] as? [String: Any] { | ||
| for (index, item) in arrayValue.enumerated() { | ||
| try validate(value: item, against: itemSchema, fieldPath: "\(fieldPath)[\(index)]") | ||
| } | ||
| } | ||
| default: | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Validation logic is sound; flat-schema assumption baked into fieldPath.
applyInputLimits and validate mirror the Python implementation closely and the guards (if schema["maxLength"] == nil, schema["maxItems"] == nil) correctly preserve any hand-set bounds in a tool definition. Routing ToolError.schemaValidation through the existing catch in handleToolsCall (so it surfaces as isError: true in the tool result rather than a JSON-RPC error) is consistent with how other tool errors are reported and is exactly what the new test pins down.
One minor observation for future maintainers: inside the object branch you recurse with fieldPath: propertyName rather than "\(fieldPath).\(propertyName)". Every input schema today is flat, so error messages like "content length 200001 exceeds maxLength 200000" read fine and match the test, but the moment someone adds a nested-object input the parent context will silently disappear from error messages. Worth a // flat-schema assumption comment or a trivial concat now.
🛡️ Optional: preserve parent context in nested-object errors
if let properties = schema["properties"] as? [String: Any] {
for (propertyName, propertyValue) in object {
guard
!(propertyValue is NSNull),
let propertySchema = properties[propertyName] as? [String: Any]
else { continue }
- try validate(value: propertyValue, against: propertySchema, fieldPath: propertyName)
+ let nextPath = fieldPath == "arguments" ? propertyName : "\(fieldPath).\(propertyName)"
+ try validate(value: propertyValue, against: propertySchema, fieldPath: nextPath)
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@brain-bar/Sources/BrainBar/MCPRouter.swift` around lines 451 - 556, The
object-branch of validate currently recurses with fieldPath: propertyName which
loses parent context for nested objects; update the recursive calls in
validate(value:against:fieldPath:) to pass the full path (e.g.
"\(fieldPath).\(propertyName)" or handle empty root appropriately) so errors
include parent context, and add a brief comment in applyInputLimits/validate
noting the current flat-schema assumption if you want to keep compatibility;
refer to the validate function and the recursive calls that currently use
fieldPath: propertyName to locate the change.
| _DEFAULT_STRING_MAX_LENGTH = 256 | ||
| _DEFAULT_STRING_ARRAY_MAX_ITEMS = 100 | ||
| _INPUT_STRING_MAX_LENGTHS = { | ||
| "action": 64, | ||
| "agent_id": 128, | ||
| "chunk_id": 128, | ||
| "content": 200_000, | ||
| "content_type": 64, | ||
| "content_type_filter": 64, | ||
| "context": 4_096, | ||
| "correction_category": 128, | ||
| "date_from": 32, | ||
| "date_to": 32, | ||
| "detail": 16, | ||
| "entity_id": 128, | ||
| "entity_type": 64, | ||
| "file_path": 1_024, | ||
| "function_name": 256, | ||
| "intent": 32, | ||
| "mode": 32, | ||
| "name": 256, | ||
| "new_chunk_id": 128, | ||
| "old_chunk_id": 128, | ||
| "outcome": 32, | ||
| "pattern": 256, | ||
| "plan_name": 256, | ||
| "project": 256, | ||
| "query": 4_096, | ||
| "reason": 1_024, | ||
| "reversibility": 32, | ||
| "safety_check": 32, | ||
| "sentiment": 32, | ||
| "sentiment_filter": 32, | ||
| "session_id": 128, | ||
| "severity": 16, | ||
| "source": 32, | ||
| "source_filter": 512, | ||
| "status": 32, | ||
| "supersedes": 128, | ||
| "tag": 128, | ||
| "title": 512, | ||
| "type": 32, | ||
| } | ||
| _INPUT_STRING_ARRAY_LIMITS = { | ||
| "chunk_ids": {"max_items": 500, "item_max_length": 128}, | ||
| "files_changed": {"max_items": 200, "item_max_length": 1_024}, | ||
| "merge_chunk_ids": {"max_items": 100, "item_max_length": 128}, | ||
| "participants": {"max_items": 100, "item_max_length": 256}, | ||
| "tags": {"max_items": 100, "item_max_length": 128}, | ||
| } | ||
|
|
||
|
|
||
| def _bounded_input_schema(schema: dict[str, Any]) -> dict[str, Any]: | ||
| """Apply uniform input caps to string fields before exposing MCP schemas.""" | ||
| bounded = deepcopy(schema) | ||
| _apply_input_limits(bounded) | ||
| return bounded | ||
|
|
||
|
|
||
| def _apply_input_limits(node: Any, field_name: str | None = None) -> None: | ||
| if not isinstance(node, dict): | ||
| return | ||
|
|
||
| node_type = node.get("type") | ||
| if node_type == "string": | ||
| node.setdefault("maxLength", _INPUT_STRING_MAX_LENGTHS.get(field_name, _DEFAULT_STRING_MAX_LENGTH)) | ||
| return | ||
|
|
||
| if node_type == "array": | ||
| items = node.get("items") | ||
| if isinstance(items, dict) and items.get("type") == "string": | ||
| limits = _INPUT_STRING_ARRAY_LIMITS.get( | ||
| field_name, | ||
| {"max_items": _DEFAULT_STRING_ARRAY_MAX_ITEMS, "item_max_length": _DEFAULT_STRING_MAX_LENGTH}, | ||
| ) | ||
| node.setdefault("maxItems", limits["max_items"]) | ||
| items.setdefault("maxLength", limits["item_max_length"]) | ||
| _apply_input_limits(items, field_name) | ||
| return | ||
|
|
||
| for prop_name, prop_schema in node.get("properties", {}).items(): | ||
| _apply_input_limits(prop_schema, prop_name) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Bounded-schema helpers look correct; silent 256-char fallback is the main footgun.
The recursion handles strings, string-arrays, and nested objects correctly, and the deepcopy in _bounded_input_schema avoids mutating caller-owned schemas. Two low-priority observations:
- Any future string input added without a matching
_INPUT_STRING_MAX_LENGTHSentry will silently be capped at 256 chars. All current fields are mapped, but a long-form field added later (e.g., anothercontent/context-style field) could be truncated in production before anyone notices. Consider either logging at import time for unmapped string field names seen inlist_tools()schemas, or requiring an explicit opt-in for the default. _apply_input_limitsonly setsmaxItemswhenitems.type == "string". Arrays of non-string items (not currently present in any input schema) would therefore bypass the cardinality cap. Worth a comment or a defensivesetdefault("maxItems", _DEFAULT_STRING_ARRAY_MAX_ITEMS)at the array level if you want uniform DoS protection regardless of item type.
🛡️ Optional: always cap array cardinality
if node_type == "array":
items = node.get("items")
+ node.setdefault("maxItems", _DEFAULT_STRING_ARRAY_MAX_ITEMS)
if isinstance(items, dict) and items.get("type") == "string":
limits = _INPUT_STRING_ARRAY_LIMITS.get(
field_name,
{"max_items": _DEFAULT_STRING_ARRAY_MAX_ITEMS, "item_max_length": _DEFAULT_STRING_MAX_LENGTH},
)
- node.setdefault("maxItems", limits["max_items"])
+ node["maxItems"] = limits["max_items"] if "maxItems" not in node else node["maxItems"]
items.setdefault("maxLength", limits["item_max_length"])
_apply_input_limits(items, field_name)
return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/mcp/__init__.py` around lines 131 - 212, The code silently
falls back to _DEFAULT_STRING_MAX_LENGTH for unknown string field names and only
sets array maxItems for string-item arrays; update _apply_input_limits to (1)
detect when a string node uses the default (i.e., when field_name not in
_INPUT_STRING_MAX_LENGTHS) and emit a warning or raise/configure an explicit
opt-in (use the module logger and reference _INPUT_STRING_MAX_LENGTHS,
_DEFAULT_STRING_MAX_LENGTH, and the function
_apply_input_limits/_bounded_input_schema so reviewers can find it), and (2)
always set a cardinality cap on arrays by calling node.setdefault("maxItems",
<cap>) regardless of items.type (use _INPUT_STRING_ARRAY_LIMITS and
_DEFAULT_STRING_ARRAY_MAX_ITEMS as the source of truth) so arrays of non-string
items are also protected; keep the existing per-item maxLength logic for string
items.
CI lint job required ruff format on the two files touched by PR #254. No logic change — only whitespace/quote normalization.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
CI lint job required ruff format on the two files touched by PR #254. No logic change — only whitespace/quote normalization.
82a1cb8 to
d2bbb17
Compare
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d2bbb17. Configure here.
| XCTAssertNotNil(arraySchema["maxItems"] as? Int, "\(name).\(fieldPath) must declare maxItems") | ||
| XCTAssertNotNil(itemSchema["maxLength"] as? Int, "\(name).\(fieldPath)[] must declare maxLength") | ||
| } | ||
| func testEachToolSchemaBoundsStringInputs() throws { |
There was a problem hiding this comment.
Garbled edit destroys two Swift test functions
High Severity
The diff botched the edit of testEachToolHasExpectedAnnotations and the insertion of the new testEachToolSchemaBoundsStringInputs. Lines from both functions are interleaved — dictionary entries like "brain_entity" float as bare statements, func declarations appear nested inside other function bodies, and brace matching is broken. Beyond the syntax issues, the expected dictionary in the annotations test was reduced from 11 tools to only "brain_search", so XCTAssertEqual(toolsByName.count, expected.count) would compare 11 to 1. The openWorldHint assertion was also removed. The entire Swift test suite cannot build, meaning the new schema-bounds validation has zero test coverage on the BrainBar side.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d2bbb17. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2bbb177ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "brain_entity": (true, false, true, false), | ||
| let router = MCPRouter() | ||
| "brain_digest": (false, false, false, false), |
There was a problem hiding this comment.
Remove stray statements that break MCPRouterTests compilation
This commit introduces loose dictionary entries/statements between test methods, so MCPRouterTests.swift is no longer valid Swift syntax and the BrainBar test target cannot compile (e.g., swiftc -typecheck fails at this block). That blocks all router regression tests from running, which is especially risky here because the same commit changes request-schema validation behavior.
Useful? React with 👍 / 👎.


Summary
Adds
maxLengthto all string MCP inputs in the Python server and BrainBar router, plus schema-level regression tests and BrainBar-side request validation for oversize string payloads.Python MCP Field Limits
actionagent_idchunk_idcontentcontent_typecontent_type_filtercontextcorrection_categorydate_fromdate_todetailentity_identity_typefile_pathfunction_nameintentmodenamenew_chunk_idold_chunk_idoutcomepatternphaseplan_nameprojectqueryreasonreversibilitysafety_checksentimentsentiment_filtersession_idseveritysourcesource_filterstatussupersedestagtitletypePython MCP String Array Limits
chunk_ids[]files_changed[]merge_chunk_ids[]participants[]tags[]BrainBar Field Limits
actionagent_idchunk_idcontentdetailmodeprojectquerysession_idtagBrainBar String Array Limits
tags[]Verification
Focused checks
Repo-wide gate
Those 7 failures are pre-existing and reproduce on the clean main checkout as well:
Review
cr review --plainsurfaced one test hardening issue intests/test_mcp_input_schema_limits.py; that guard was fixed. A second rerun hit the org review rate limit after the finding was already addressed.Note
Medium Risk
Introduces new input validation that can reject previously-accepted oversized requests, potentially impacting clients if they exceed the new caps. Changes touch request handling paths in both Swift and Python MCP servers but are localized to schema/validation logic with targeted tests.
Overview
Adds uniform input size limits to MCP tool schemas and enforces them at runtime. The Swift
MCPRouternow wraps every toolinputSchemawith default/field-specificmaxLength(andmaxItems+ itemmaxLengthfor string arrays) and validatestools/callargumentsagainst that schema before dispatch, returning aSchema validation errorresult on violations.On the Python MCP server, tool
inputSchemas are similarly wrapped via_bounded_input_schema(deep-copied) to apply the same style of caps across all tools. New Swift and Python tests assert that all advertised schemas include bounds for every string and string-array field, and add regression coverage that oversizedbrain_digest.contentis rejected at the schema layer.Reviewed by Cursor Bugbot for commit d2bbb17. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
New Features
Tests
Note
Add
maxLength/maxItemsbounds and schema validation to all MCP tool string inputsmaxLengthcaps for string inputs andmaxItems/per-itemmaxLengthcaps for string-array inputs to all MCP tool schemas in both the Python layer (__init__.py) and Swift layer (MCPRouter.swift).tools/calldispatch path that enforces these schema constraints (type, enum membership,maxLength,maxItems) before the tool handler executes, returning a JSON-RPC error on failure.ToolError.schemaValidationcase in Swift to surface validation failures with a descriptive message.brain_digestrejects content exceeding 200,000 characters.tools/callnow returns a schema validation error for inputs that exceed declared limits, where previously oversized inputs would reach the handler unchecked.Macroscope summarized d2bbb17.