Law firm production-readiness: ethical walls, encryption, IManage, air-gapped deployment#239
Law firm production-readiness: ethical walls, encryption, IManage, air-gapped deployment#239SunFlash12 wants to merge 4 commits intomasterfrom
Conversation
…age, air-gapped deployment Phase 1: Ethical wall service with Neo4j matter-based data isolation, MatterScopeDep for auto-filtering all capsule queries, pod ownership enforcement, LawfirmGuardMiddleware blocking dangerous routes. Phase 2: Wired AES-256-GCM encryption service into capsule storage, envelope encryption at rest with ENC: prefix detection. Phase 3: Legal data classifications (attorney-client privilege, work product, litigation hold), consent tracker for third-party processing. Phase 4: IManage DMS integration — REST client, document parser (PDF/DOCX/XLSX), sync service, search adapter, matter mapping. Phase 5: LAWFIRM deployment profile, startup prerequisite validation, docker-compose.lawfirm.yml for air-gapped deployment with Ollama, disabled external network calls (HIBP, Sentry, starter packs). Infrastructure: Fixed semgrep Windows compatibility (local system hook), ruff format compliance, new test suites for all phases.
There was a problem hiding this comment.
Pull request overview
This PR expands Forge’s enterprise/law-firm readiness posture by adding air-gap enforcement, stronger startup validation and health reporting, typed API responses, plus new integrations and resilience features (IManage sync, ZK/snarkjs scaffolding, blockchain retry handling). It also adds extensive test coverage and CI benchmark smoke tests.
Changes:
- Add lawfirm air-gap scheduler behavior, startup multi-instance validation, and richer readiness/health signals.
- Introduce/extend integrations and protocols (IManage client/sync/search adapter tests; ZK/snarkjs wrapper + service toggles; blockchain tx retry and configurable graduation thresholds).
- Improve API/observability ergonomics (typed responses, new metrics, event schema versioning) and add broad regression test coverage + non-blocking benchmark CI step.
Reviewed changes
Copilot reviewed 129 out of 131 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| forge-cascade-v2/tools/notify.py | Minor formatting change in notification stub. |
| forge-cascade-v2/tests/test_services/test_zk_snarkjs.py | Tests for ZK service simulation + snarkjs unavailable behavior. |
| forge-cascade-v2/tests/test_services/test_zk_coverage.py | Additional ZK service coverage (verify/get/vk/persistence). |
| forge-cascade-v2/tests/test_services/test_search_routing.py | Tests verifying search routing modes reference expected services. |
| forge-cascade-v2/tests/test_services/test_scheduler_lawfirm.py | Tests ensuring external scheduler tasks are skipped under lawfirm profile. |
| forge-cascade-v2/tests/test_services/test_imanage_sync.py | Tests for IManage sync service behaviors and filters. |
| forge-cascade-v2/tests/test_services/test_imanage_search_adapter.py | Tests for IManage search adapter normalization and error handling. |
| forge-cascade-v2/tests/test_services/test_imanage_client.py | Expanded tests for IManage REST client behavior and parsing. |
| forge-cascade-v2/tests/test_services/test_ethical_walls_bypass.py | Tests covering ethical wall enforcement and model defaults. |
| forge-cascade-v2/tests/test_services/test_directed_learning_quarantine.py | Tests for directed learning quarantine store DB interactions. |
| forge-cascade-v2/tests/test_services/test_directed_learning_catalog.py | Tests for directed learning source catalog behaviors. |
| forge-cascade-v2/tests/test_services/test_directed_learning_budget.py | Tests for directed learning daily budget tracker. |
| forge-cascade-v2/tests/test_services/test_consent_tracker.py | Adds consent revocation/audit field tests and UTC import adjustment. |
| forge-cascade-v2/tests/test_security/test_cypher_injection.py | Regression tests asserting inputs are enums/allowlists (anti-injection). |
| forge-cascade-v2/tests/test_monitoring/test_metrics_completeness.py | Tests verifying required metric enums/classes exist. |
| forge-cascade-v2/tests/test_models/test_key_rotation_flag.py | Tests for key_rotation_enabled settings flag default/override. |
| forge-cascade-v2/tests/test_models/test_event_schema.py | Tests for Event schema_version + EventType snapshot stability. |
| forge-cascade-v2/tests/test_kernel/test_correlation_id.py | Tests around correlation-id propagation and structlog bindings. |
| forge-cascade-v2/tests/test_blockchain/test_tx_resilience.py | Tests for EVM tx retry resilience and TransactionStatus enum. |
| forge-cascade-v2/tests/test_blockchain/test_graduation_config.py | Tests for configurable graduation thresholds and export. |
| forge-cascade-v2/tests/test_blockchain/test_contract_readiness.py | Tests for contract address readiness on mainnet/testnet. |
| forge-cascade-v2/tests/test_api/test_soulbound_auth.py | Tests for auth requirements on soulbound endpoints. |
| forge-cascade-v2/tests/test_api/test_pagination_limits.py | Tests for new pagination upper bounds/clamping behavior. |
| forge-cascade-v2/tests/test_api/test_openapi_schema.py | Tests for OpenAPI schema components and typed responses presence. |
| forge-cascade-v2/tests/test_api/test_multi_instance_validation.py | Tests validating multi-instance config rules (redis token blacklist). |
| forge-cascade-v2/tests/test_api/test_health_checks.py | Tests describing expected health/ready response formats. |
| forge-cascade-v2/tests/test_api/test_capsule_rate_limit.py | Tests for RateLimiter behavior and redis fallback. |
| forge-cascade-v2/tests/fixtures/event_types_v1.json | Snapshot fixture for EventType stability tests. |
| forge-cascade-v2/tests/benchmarks/test_smoke_benchmarks.py | Benchmark smoke tests for critical Python code paths. |
| forge-cascade-v2/test_ui_integration.py | Formatting-only changes to UI integration test script. |
| forge-cascade-v2/test_ghost_council_live.py | Formatting-only changes to ghost council live test script. |
| forge-cascade-v2/stubs/web3.pyi | Formatting improvements to web3 type stubs. |
| forge-cascade-v2/stubs/solana.pyi | Formatting cleanup to solana type stubs. |
| forge-cascade-v2/stubs/openai.pyi | Formatting cleanup to OpenAI SDK stubs. |
| forge-cascade-v2/stubs/hvac.pyi | Formatting cleanup to hvac stubs + small spacing additions. |
| forge-cascade-v2/stubs/eth_account.pyi | Formatting cleanup to eth_account stub signatures. |
| forge-cascade-v2/start_all_servers.py | Formatting-only changes (readability). |
| forge-cascade-v2/scripts/verify_system.py | Formatting-only changes in system verification script. |
| forge-cascade-v2/scripts/tools/tool_user_admin.py | Formatting-only change to string formatting. |
| forge-cascade-v2/scripts/tools/tool_stats_page.py | Formatting-only changes and small HTML quoting normalization. |
| forge-cascade-v2/scripts/tools/tool_server_manager.py | Formatting-only changes to Finding construction and strings. |
| forge-cascade-v2/scripts/tools/tool_security_scanner.py | Formatting-only changes to comprehensions and strings. |
| forge-cascade-v2/scripts/tools/tool_pr_reviewer.py | Formatting-only changes to regex tuples and gh CLI args formatting. |
| forge-cascade-v2/scripts/tools/tool_log_viewer.py | Formatting-only changes to ssh commands and Finding construction. |
| forge-cascade-v2/scripts/tools/tool_knowledge_health.py | Formatting-only changes to execute_query calls and string formatting. |
| forge-cascade-v2/scripts/tools/tool_incident_response.py | Formatting-only changes to Finding construction and strings. |
| forge-cascade-v2/scripts/tools/tool_flag_controller.py | Formatting-only changes to Finding construction and severity ordering. |
| forge-cascade-v2/scripts/tools/tool_feature_matrix.py | Formatting-only HTML string adjustments. |
| forge-cascade-v2/scripts/tools/tool_edge_yield_optimizer.py | Formatting-only changes and minor string simplification. |
| forge-cascade-v2/scripts/tools/tool_dead_code_detector.py | Formatting-only condition simplification. |
| forge-cascade-v2/scripts/tools/tool_db_manager.py | Formatting-only changes to query strings and WRITE_KEYWORDS layout. |
| forge-cascade-v2/scripts/tools/tool_data_freshness.py | Formatting-only changes to query strings and line appends. |
| forge-cascade-v2/scripts/tools/tool_coverage_tracker.py | Formatting-only changes to module metrics dict construction. |
| forge-cascade-v2/scripts/tools/tool_ci_monitor.py | Formatting-only changes to gh command args and string building. |
| forge-cascade-v2/scripts/tools/tool_api_docs_generator.py | Minor formatting changes in generated HTML and list comprehension. |
| forge-cascade-v2/scripts/tools/run_runall_test.py | Formatting-only changes; icon dicts expanded for readability. |
| forge-cascade-v2/scripts/tools/run_all.py | Minor argparse formatting adjustment. |
| forge-cascade-v2/scripts/tools/base.py | Formatting-only changes; ssh_exec arg list expanded. |
| forge-cascade-v2/scripts/test_cross_source.py | Formatting-only query string + output formatting tweak. |
| forge-cascade-v2/scripts/simple_import.py | Formatting-only changes to execute_single queries. |
| forge-cascade-v2/scripts/setup_db.py | Formatting-only changes + Neo4j version parsing string quotes. |
| forge-cascade-v2/scripts/seed_marketplace.py | Formatting-only changes to queries/prompts. |
| forge-cascade-v2/scripts/seed_data.py | Formatting-only changes (quotes, dict commas, readability). |
| forge-cascade-v2/scripts/moltbook_cleanup.py | Formatting-only changes to env default and argparse line. |
| forge-cascade-v2/scripts/load_wikidata.py | Formatting-only changes to printing and dict comprehension. |
| forge-cascade-v2/scripts/load_uberon.py | Formatting-only argparse formatting. |
| forge-cascade-v2/scripts/load_string_db.py | Formatting-only Neo4jClient instantiation layout. |
| forge-cascade-v2/scripts/load_stitch.py | Formatting-only Neo4jClient instantiation + argparse spacing. |
| forge-cascade-v2/scripts/load_semantic_scholar.py | Formatting-only argparse and asyncio.run layout. |
| forge-cascade-v2/scripts/load_rxnorm.py | Formatting-only Neo4jClient instantiation + argparse spacing. |
| forge-cascade-v2/scripts/load_reactome.py | Formatting-only argparse formatting. |
| forge-cascade-v2/scripts/load_primekg.py | Formatting-only printing + long query split. |
| forge-cascade-v2/scripts/load_orcid.py | Formatting-only Neo4jClient instantiation + argparse spacing. |
| forge-cascade-v2/scripts/load_opentargets.py | Formatting-only Neo4jClient instantiation layout. |
| forge-cascade-v2/scripts/load_openalex.py | Formatting-only conditional line wrapping. |
| forge-cascade-v2/scripts/load_openaire.py | Formatting-only asyncio.run invocation layout. |
| forge-cascade-v2/scripts/load_monarch.py | Formatting-only Neo4jClient instantiation + argparse spacing. |
| forge-cascade-v2/scripts/load_mesh.py | Formatting-only Neo4jClient instantiation layout. |
| forge-cascade-v2/scripts/load_intact.py | Formatting-only Neo4jClient instantiation + argparse spacing. |
| forge-cascade-v2/scripts/load_hpo.py | Formatting-only slice spacing. |
| forge-cascade-v2/scripts/load_hetionet.py | Formatting-only Neo4jClient instantiation layout. |
| forge-cascade-v2/scripts/load_geonames.py | Formatting-only argparse formatting. |
| forge-cascade-v2/scripts/load_ensembl.py | Formatting-only Neo4jClient instantiation layout. |
| forge-cascade-v2/scripts/load_ctd.py | Formatting-only Neo4jClient instantiation + argparse spacing. |
| forge-cascade-v2/scripts/load_crossref.py | Formatting-only asyncio.run line wrapping. |
| forge-cascade-v2/scripts/health_check.py | Formatting-only string quotes + minor comprehension wrap. |
| forge-cascade-v2/scripts/data_quality_check.py | Formatting-only list comprehensions and ternary formatting. |
| forge-cascade-v2/scripts/benchmark_dag_cleanup.py | Minor formatting/spacing around async benchmark entrypoint. |
| forge-cascade-v2/frontend/src/types/index.ts | Adds TRUST_LEVEL_VALUES numeric mapping for frontend parity. |
| forge-cascade-v2/forge/virtuals/tokenization/service.py | Makes graduation thresholds configurable via Settings with fallback. |
| forge-cascade-v2/forge/virtuals/tokenization/contracts.py | Adds TODO notes for mainnet lifecycle contracts blocked by audit. |
| forge-cascade-v2/forge/virtuals/models/base.py | Adds TransactionStatus enum + retry metadata fields on TransactionRecord. |
| forge-cascade-v2/forge/virtuals/chains/evm_client.py | Adds send_transaction_with_retry with exponential backoff. |
| forge-cascade-v2/forge/services/stitch/import_service.py | Expands documentation on STITCH ID parsing rationale. |
| forge-cascade-v2/forge/services/sider/import_service.py | Expands documentation on STITCH ID parsing rationale. |
| forge-cascade-v2/forge/services/scheduler.py | Adds EXTERNAL_TASKS and lawfirm air-gap enforcement for external tasks. |
| forge-cascade-v2/forge/services/hybrid_search.py | Adds cross-reference comments for hybrid search usage. |
| forge-cascade-v2/forge/services/hybrid_retriever.py | Adds cross-reference comments for route integration. |
| forge-cascade-v2/forge/resilience/observability/metrics.py | Adds extended MetricType entries (gov/fed/trust/zk). |
| forge-cascade-v2/forge/monitoring/metrics.py | Defines new counters/histograms and exports for reset. |
| forge-cascade-v2/forge/models/events.py | Adds schema_version to Event model and schema changelog block. |
| forge-cascade-v2/forge/kernel/pipeline.py | Binds correlation/pipeline/phase into structlog contextvars per phase. |
| forge-cascade-v2/forge/desci/zk/snarkjs_wrapper.py | New async subprocess wrapper around snarkjs CLI for proof ops. |
| forge-cascade-v2/forge/desci/zk/service.py | Adds snarkjs-enabled path with simulated flag and wrapper usage. |
| forge-cascade-v2/forge/desci/zk/models.py | Adds simulated field to ZKProof model. |
| forge-cascade-v2/forge/desci/zk/circuits/.gitkeep | Placeholder for circuit artifacts directory. |
| forge-cascade-v2/forge/config.py | Adds snarkjs_enabled and graduation threshold settings. |
| forge-cascade-v2/forge/api/routes/soulbound.py | Tightens auth: mint requires ActiveUserDep; slash/trust updates require TrustedUserDep. |
| forge-cascade-v2/forge/api/routes/search.py | Adds typed SearchResponse + routing table comments; advanced search path. |
| forge-cascade-v2/forge/api/routes/graph/exploration.py | Adds SAFETY comment about parameterized Cypher filtering. |
| forge-cascade-v2/forge/api/routes/federation.py | Adds typed response models for trust adjustments/history and uses them in handlers. |
| forge-cascade-v2/forge/api/routes/dataset_framework.py | Adds SAFETY comment for label interpolation origin. |
| forge-cascade-v2/forge/api/routes/capsules.py | Removes untyped response_model=dict[...] from endpoints (schema typing improvements). |
| forge-cascade-v2/forge/api/dependencies.py | Adds pagination max page enforcement + AdminPaginationParams + admin dep. |
| forge-cascade-v2/forge/api/app.py | Adds multi-instance config validation and richer /ready dependency_status response. |
| forge-cascade-v2/docs/DEPLOYMENT_CHECKLIST.md | New deployment checklist for single/multi-instance and lawfirm profile. |
| forge-cascade-v2/docs/API_CHANGELOG.md | New API changelog documenting stability policy and recent changes. |
| .semgrep/forge-custom.yml | Refines secret-detection rule to exclude enum-constant-like suffixes. |
| .pre-commit-config.yaml | Switches semgrep hook to local system hook for Windows compatibility. |
| .github/workflows/ci.yml | Adds non-blocking benchmark smoke test step (pytest-benchmark). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class TransactionStatus(str, Enum): | ||
| """Status of a blockchain transaction with retry tracking.""" | ||
|
|
||
| PENDING = "pending" | ||
| SUBMITTED = "submitted" | ||
| CONFIRMED = "confirmed" | ||
| FAILED = "failed" | ||
| RETRYING = "retrying" | ||
| DEAD_LETTER = "dead_letter" # All retries exhausted | ||
|
|
There was a problem hiding this comment.
TransactionStatus enum is added here, but TransactionRecord.status remains a free-form str (and callers already emit values like "dead_letter"). To avoid inconsistent status strings across the codebase, consider switching TransactionRecord.status to TransactionStatus (or adding validation) so the enum becomes the source of truth.
| def test_correlation_id_auto_generated(self): | ||
| """When no correlation_id provided, one is generated.""" | ||
| ctx = PipelineContext( | ||
| pipeline_id="test-pipe-2", | ||
| correlation_id="auto-generated-id", | ||
| triggered_by="manual", | ||
| ) |
There was a problem hiding this comment.
test_correlation_id_auto_generated() claims to test auto-generation, but it passes an explicit correlation_id value. Since PipelineContext requires correlation_id, this test currently only asserts non-empty input; consider either removing/renaming it or testing auto-generation at the Pipeline.execute() layer where correlation_id is actually created.
| class SearchResponse(BaseModel): | ||
| """Typed response for search endpoints.""" | ||
|
|
||
| query: str | ||
| mode: str | ||
| total: int | ||
| took_ms: float | ||
| results: list[dict[str, Any]] | ||
| filters_applied: dict[str, Any] = Field(default_factory=dict) | ||
| metadata: dict[str, Any] | None = None |
There was a problem hiding this comment.
SearchResponse.results is typed as list[dict[str, Any]] even though SearchResultItem is defined just above. Returning raw dicts reduces OpenAPI specificity and makes the new SearchResultItem model unused; consider changing results to list[SearchResultItem] (and returning model instances) so the schema is truly typed.
| sig = inspect.signature(fn) | ||
| param_annotations = [str(p.annotation) for p in sig.parameters.values()] | ||
| # None of these should have ActiveUserDep or TrustedUserDep | ||
| for ann in param_annotations: | ||
| assert "UserDep" not in ann or "ActiveUserDep" not in ann, ( | ||
| f"{fn.__name__} should not require user auth" | ||
| ) |
There was a problem hiding this comment.
This assertion uses or, which makes it pass in cases where an auth dependency is present (e.g., TrustedUserDep contains 'UserDep' but not 'ActiveUserDep'). To correctly enforce that GET endpoints have no auth deps, the condition should require that neither ActiveUserDep nor TrustedUserDep (nor any user dep marker) appears in annotations.
| return TransactionRecord( | ||
| tx_hash="", | ||
| chain=self.chain.value, | ||
| block_number=0, | ||
| timestamp=datetime.now(UTC), | ||
| from_address=self._operator_account.address | ||
| if self._operator_account | ||
| else "", | ||
| to_address=to_address, | ||
| value=value, | ||
| gas_used=0, | ||
| status="dead_letter", | ||
| transaction_type="transfer" if not data else "contract_call", | ||
| ) |
There was a problem hiding this comment.
send_transaction_with_retry() returns a TransactionRecord with tx_hash="" and status="dead_letter" when retries are exhausted. An empty tx_hash makes the record hard to correlate in logs/storage, and the status value is outside TransactionRecord’s documented statuses; also the new retry_count/max_retries/last_error fields aren’t populated here. Consider returning a record with retry metadata filled and a consistent status representation (e.g., TransactionStatus.DEAD_LETTER) plus a non-empty identifier.
The test expected page=999999 to be silently clamped, but the PaginationParams validator now correctly rejects it with HTTP 400.
…name Update COMPLETE_CODEBASE_REPORT.md with 18 modified entries and 30 new entries from the enterprise readiness implementation. Fix lawfirm docker-compose service name from forge-api to cascade-api for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Ollama healthcheck: use `ollama list` instead of curl/wget (not available in container) - Add missing env vars to compliance-api and virtuals-api in lawfirm compose override - Fix LLM provider init: recognize Ollama as local provider that doesn't need an API key - Fix async/sync mismatch: properly handle awaitable returns from initialize_encryption_service, init_file_watcher, close_file_watcher - Pass ENCRYPTION_MASTER_KEY from .env to cascade-api container - Comment out GPU reservation and TLS configs for local development - Add Ollama to forge-network
Summary
BELONGS_TO_MATTER,SEPARATES) withEthicalWallService,MatterScopeDepauto-filtering all capsule queries, pod ownership enforcement, andLawfirmGuardMiddlewareblocking dangerous routesENC:prefix detection and envelope encryptiondocker-compose.lawfirm.ymlfor air-gapped deployment with Ollama LLM, disabled external network calls (HIBP, Sentry, starter packs)Key files
forge/services/ethical_walls.py,forge/api/deps_matter.py,forge/api/routes/ethical_walls.pyforge/api/dependencies.py,forge/repositories/capsule_repository.pyforge/services/imanage/(5 files),forge/services/document_parser.py,forge/api/routes/imanage.pyforge/compliance/core/enums.py,forge/services/consent_tracker.pyforge/resilience/profiles/deployment.py,forge/api/app.py,forge/api/middleware.pyMatterSelector.tsx,MattersPage.tsx,EthicalWallsPage.tsx,lawFirmStore.tsTest plan
ENC:...in Neo4j, decrypted on API readLawfirmGuardMiddlewareblocks federation/marketplace/bulk-export routespytest— all existing tests passdocker-compose -f docker-compose.yml -f docker-compose.lawfirm.yml uptcpdump