Skip to content

Law firm production-readiness: ethical walls, encryption, IManage, air-gapped deployment#239

Open
SunFlash12 wants to merge 4 commits intomasterfrom
feat/lawfirm-production-readiness
Open

Law firm production-readiness: ethical walls, encryption, IManage, air-gapped deployment#239
SunFlash12 wants to merge 4 commits intomasterfrom
feat/lawfirm-production-readiness

Conversation

@SunFlash12
Copy link
Owner

Summary

  • Ethical Walls: Neo4j relationship-based data isolation (BELONGS_TO_MATTER, SEPARATES) with EthicalWallService, MatterScopeDep auto-filtering all capsule queries, pod ownership enforcement, and LawfirmGuardMiddleware blocking dangerous routes
  • Encryption at Rest: Wired existing AES-256-GCM encryption service into capsule storage with ENC: prefix detection and envelope encryption
  • IManage DMS Integration: REST client, document parser (PDF/DOCX/XLSX), sync service, search adapter, and matter mapping for on-premise IManage Work
  • Legal Compliance: Attorney-client privilege/work product/litigation hold data classifications, consent tracker for third-party processing
  • LAWFIRM Deployment Profile: Startup prerequisite validation, docker-compose.lawfirm.yml for air-gapped deployment with Ollama LLM, disabled external network calls (HIBP, Sentry, starter packs)
  • Infrastructure: Fixed semgrep Windows compatibility (local system hook), refined semgrep rules to exclude enum constants from secret detection, 29 new test files

Key files

Area Files
Ethical walls forge/services/ethical_walls.py, forge/api/deps_matter.py, forge/api/routes/ethical_walls.py
Encryption forge/api/dependencies.py, forge/repositories/capsule_repository.py
IManage forge/services/imanage/ (5 files), forge/services/document_parser.py, forge/api/routes/imanage.py
Legal compliance forge/compliance/core/enums.py, forge/services/consent_tracker.py
Deployment forge/resilience/profiles/deployment.py, forge/api/app.py, forge/api/middleware.py
Frontend MatterSelector.tsx, MattersPage.tsx, EthicalWallsPage.tsx, lawFirmStore.ts

Test plan

  • Verify ethical wall isolation: user on matter A cannot see capsules on matter B when wall exists
  • Verify capsule encryption: content stored as ENC:... in Neo4j, decrypted on API read
  • Verify LAWFIRM startup validation rejects missing encryption key
  • Verify LawfirmGuardMiddleware blocks federation/marketplace/bulk-export routes
  • Verify IManage sync creates capsules with proper matter linkage
  • Verify pod ownership enforcement returns 403 for non-owners
  • Run pytest — all existing tests pass
  • Deploy with docker-compose -f docker-compose.yml -f docker-compose.lawfirm.yml up
  • Verify no outbound network connections via tcpdump

…age, air-gapped deployment

Phase 1: Ethical wall service with Neo4j matter-based data isolation,
MatterScopeDep for auto-filtering all capsule queries, pod ownership
enforcement, LawfirmGuardMiddleware blocking dangerous routes.

Phase 2: Wired AES-256-GCM encryption service into capsule storage,
envelope encryption at rest with ENC: prefix detection.

Phase 3: Legal data classifications (attorney-client privilege, work
product, litigation hold), consent tracker for third-party processing.

Phase 4: IManage DMS integration — REST client, document parser
(PDF/DOCX/XLSX), sync service, search adapter, matter mapping.

Phase 5: LAWFIRM deployment profile, startup prerequisite validation,
docker-compose.lawfirm.yml for air-gapped deployment with Ollama,
disabled external network calls (HIBP, Sentry, starter packs).

Infrastructure: Fixed semgrep Windows compatibility (local system hook),
ruff format compliance, new test suites for all phases.
Copilot AI review requested due to automatic review settings March 5, 2026 06:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Forge’s enterprise/law-firm readiness posture by adding air-gap enforcement, stronger startup validation and health reporting, typed API responses, plus new integrations and resilience features (IManage sync, ZK/snarkjs scaffolding, blockchain retry handling). It also adds extensive test coverage and CI benchmark smoke tests.

Changes:

  • Add lawfirm air-gap scheduler behavior, startup multi-instance validation, and richer readiness/health signals.
  • Introduce/extend integrations and protocols (IManage client/sync/search adapter tests; ZK/snarkjs wrapper + service toggles; blockchain tx retry and configurable graduation thresholds).
  • Improve API/observability ergonomics (typed responses, new metrics, event schema versioning) and add broad regression test coverage + non-blocking benchmark CI step.

Reviewed changes

Copilot reviewed 129 out of 131 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
forge-cascade-v2/tools/notify.py Minor formatting change in notification stub.
forge-cascade-v2/tests/test_services/test_zk_snarkjs.py Tests for ZK service simulation + snarkjs unavailable behavior.
forge-cascade-v2/tests/test_services/test_zk_coverage.py Additional ZK service coverage (verify/get/vk/persistence).
forge-cascade-v2/tests/test_services/test_search_routing.py Tests verifying search routing modes reference expected services.
forge-cascade-v2/tests/test_services/test_scheduler_lawfirm.py Tests ensuring external scheduler tasks are skipped under lawfirm profile.
forge-cascade-v2/tests/test_services/test_imanage_sync.py Tests for IManage sync service behaviors and filters.
forge-cascade-v2/tests/test_services/test_imanage_search_adapter.py Tests for IManage search adapter normalization and error handling.
forge-cascade-v2/tests/test_services/test_imanage_client.py Expanded tests for IManage REST client behavior and parsing.
forge-cascade-v2/tests/test_services/test_ethical_walls_bypass.py Tests covering ethical wall enforcement and model defaults.
forge-cascade-v2/tests/test_services/test_directed_learning_quarantine.py Tests for directed learning quarantine store DB interactions.
forge-cascade-v2/tests/test_services/test_directed_learning_catalog.py Tests for directed learning source catalog behaviors.
forge-cascade-v2/tests/test_services/test_directed_learning_budget.py Tests for directed learning daily budget tracker.
forge-cascade-v2/tests/test_services/test_consent_tracker.py Adds consent revocation/audit field tests and UTC import adjustment.
forge-cascade-v2/tests/test_security/test_cypher_injection.py Regression tests asserting inputs are enums/allowlists (anti-injection).
forge-cascade-v2/tests/test_monitoring/test_metrics_completeness.py Tests verifying required metric enums/classes exist.
forge-cascade-v2/tests/test_models/test_key_rotation_flag.py Tests for key_rotation_enabled settings flag default/override.
forge-cascade-v2/tests/test_models/test_event_schema.py Tests for Event schema_version + EventType snapshot stability.
forge-cascade-v2/tests/test_kernel/test_correlation_id.py Tests around correlation-id propagation and structlog bindings.
forge-cascade-v2/tests/test_blockchain/test_tx_resilience.py Tests for EVM tx retry resilience and TransactionStatus enum.
forge-cascade-v2/tests/test_blockchain/test_graduation_config.py Tests for configurable graduation thresholds and export.
forge-cascade-v2/tests/test_blockchain/test_contract_readiness.py Tests for contract address readiness on mainnet/testnet.
forge-cascade-v2/tests/test_api/test_soulbound_auth.py Tests for auth requirements on soulbound endpoints.
forge-cascade-v2/tests/test_api/test_pagination_limits.py Tests for new pagination upper bounds/clamping behavior.
forge-cascade-v2/tests/test_api/test_openapi_schema.py Tests for OpenAPI schema components and typed responses presence.
forge-cascade-v2/tests/test_api/test_multi_instance_validation.py Tests validating multi-instance config rules (redis token blacklist).
forge-cascade-v2/tests/test_api/test_health_checks.py Tests describing expected health/ready response formats.
forge-cascade-v2/tests/test_api/test_capsule_rate_limit.py Tests for RateLimiter behavior and redis fallback.
forge-cascade-v2/tests/fixtures/event_types_v1.json Snapshot fixture for EventType stability tests.
forge-cascade-v2/tests/benchmarks/test_smoke_benchmarks.py Benchmark smoke tests for critical Python code paths.
forge-cascade-v2/test_ui_integration.py Formatting-only changes to UI integration test script.
forge-cascade-v2/test_ghost_council_live.py Formatting-only changes to ghost council live test script.
forge-cascade-v2/stubs/web3.pyi Formatting improvements to web3 type stubs.
forge-cascade-v2/stubs/solana.pyi Formatting cleanup to solana type stubs.
forge-cascade-v2/stubs/openai.pyi Formatting cleanup to OpenAI SDK stubs.
forge-cascade-v2/stubs/hvac.pyi Formatting cleanup to hvac stubs + small spacing additions.
forge-cascade-v2/stubs/eth_account.pyi Formatting cleanup to eth_account stub signatures.
forge-cascade-v2/start_all_servers.py Formatting-only changes (readability).
forge-cascade-v2/scripts/verify_system.py Formatting-only changes in system verification script.
forge-cascade-v2/scripts/tools/tool_user_admin.py Formatting-only change to string formatting.
forge-cascade-v2/scripts/tools/tool_stats_page.py Formatting-only changes and small HTML quoting normalization.
forge-cascade-v2/scripts/tools/tool_server_manager.py Formatting-only changes to Finding construction and strings.
forge-cascade-v2/scripts/tools/tool_security_scanner.py Formatting-only changes to comprehensions and strings.
forge-cascade-v2/scripts/tools/tool_pr_reviewer.py Formatting-only changes to regex tuples and gh CLI args formatting.
forge-cascade-v2/scripts/tools/tool_log_viewer.py Formatting-only changes to ssh commands and Finding construction.
forge-cascade-v2/scripts/tools/tool_knowledge_health.py Formatting-only changes to execute_query calls and string formatting.
forge-cascade-v2/scripts/tools/tool_incident_response.py Formatting-only changes to Finding construction and strings.
forge-cascade-v2/scripts/tools/tool_flag_controller.py Formatting-only changes to Finding construction and severity ordering.
forge-cascade-v2/scripts/tools/tool_feature_matrix.py Formatting-only HTML string adjustments.
forge-cascade-v2/scripts/tools/tool_edge_yield_optimizer.py Formatting-only changes and minor string simplification.
forge-cascade-v2/scripts/tools/tool_dead_code_detector.py Formatting-only condition simplification.
forge-cascade-v2/scripts/tools/tool_db_manager.py Formatting-only changes to query strings and WRITE_KEYWORDS layout.
forge-cascade-v2/scripts/tools/tool_data_freshness.py Formatting-only changes to query strings and line appends.
forge-cascade-v2/scripts/tools/tool_coverage_tracker.py Formatting-only changes to module metrics dict construction.
forge-cascade-v2/scripts/tools/tool_ci_monitor.py Formatting-only changes to gh command args and string building.
forge-cascade-v2/scripts/tools/tool_api_docs_generator.py Minor formatting changes in generated HTML and list comprehension.
forge-cascade-v2/scripts/tools/run_runall_test.py Formatting-only changes; icon dicts expanded for readability.
forge-cascade-v2/scripts/tools/run_all.py Minor argparse formatting adjustment.
forge-cascade-v2/scripts/tools/base.py Formatting-only changes; ssh_exec arg list expanded.
forge-cascade-v2/scripts/test_cross_source.py Formatting-only query string + output formatting tweak.
forge-cascade-v2/scripts/simple_import.py Formatting-only changes to execute_single queries.
forge-cascade-v2/scripts/setup_db.py Formatting-only changes + Neo4j version parsing string quotes.
forge-cascade-v2/scripts/seed_marketplace.py Formatting-only changes to queries/prompts.
forge-cascade-v2/scripts/seed_data.py Formatting-only changes (quotes, dict commas, readability).
forge-cascade-v2/scripts/moltbook_cleanup.py Formatting-only changes to env default and argparse line.
forge-cascade-v2/scripts/load_wikidata.py Formatting-only changes to printing and dict comprehension.
forge-cascade-v2/scripts/load_uberon.py Formatting-only argparse formatting.
forge-cascade-v2/scripts/load_string_db.py Formatting-only Neo4jClient instantiation layout.
forge-cascade-v2/scripts/load_stitch.py Formatting-only Neo4jClient instantiation + argparse spacing.
forge-cascade-v2/scripts/load_semantic_scholar.py Formatting-only argparse and asyncio.run layout.
forge-cascade-v2/scripts/load_rxnorm.py Formatting-only Neo4jClient instantiation + argparse spacing.
forge-cascade-v2/scripts/load_reactome.py Formatting-only argparse formatting.
forge-cascade-v2/scripts/load_primekg.py Formatting-only printing + long query split.
forge-cascade-v2/scripts/load_orcid.py Formatting-only Neo4jClient instantiation + argparse spacing.
forge-cascade-v2/scripts/load_opentargets.py Formatting-only Neo4jClient instantiation layout.
forge-cascade-v2/scripts/load_openalex.py Formatting-only conditional line wrapping.
forge-cascade-v2/scripts/load_openaire.py Formatting-only asyncio.run invocation layout.
forge-cascade-v2/scripts/load_monarch.py Formatting-only Neo4jClient instantiation + argparse spacing.
forge-cascade-v2/scripts/load_mesh.py Formatting-only Neo4jClient instantiation layout.
forge-cascade-v2/scripts/load_intact.py Formatting-only Neo4jClient instantiation + argparse spacing.
forge-cascade-v2/scripts/load_hpo.py Formatting-only slice spacing.
forge-cascade-v2/scripts/load_hetionet.py Formatting-only Neo4jClient instantiation layout.
forge-cascade-v2/scripts/load_geonames.py Formatting-only argparse formatting.
forge-cascade-v2/scripts/load_ensembl.py Formatting-only Neo4jClient instantiation layout.
forge-cascade-v2/scripts/load_ctd.py Formatting-only Neo4jClient instantiation + argparse spacing.
forge-cascade-v2/scripts/load_crossref.py Formatting-only asyncio.run line wrapping.
forge-cascade-v2/scripts/health_check.py Formatting-only string quotes + minor comprehension wrap.
forge-cascade-v2/scripts/data_quality_check.py Formatting-only list comprehensions and ternary formatting.
forge-cascade-v2/scripts/benchmark_dag_cleanup.py Minor formatting/spacing around async benchmark entrypoint.
forge-cascade-v2/frontend/src/types/index.ts Adds TRUST_LEVEL_VALUES numeric mapping for frontend parity.
forge-cascade-v2/forge/virtuals/tokenization/service.py Makes graduation thresholds configurable via Settings with fallback.
forge-cascade-v2/forge/virtuals/tokenization/contracts.py Adds TODO notes for mainnet lifecycle contracts blocked by audit.
forge-cascade-v2/forge/virtuals/models/base.py Adds TransactionStatus enum + retry metadata fields on TransactionRecord.
forge-cascade-v2/forge/virtuals/chains/evm_client.py Adds send_transaction_with_retry with exponential backoff.
forge-cascade-v2/forge/services/stitch/import_service.py Expands documentation on STITCH ID parsing rationale.
forge-cascade-v2/forge/services/sider/import_service.py Expands documentation on STITCH ID parsing rationale.
forge-cascade-v2/forge/services/scheduler.py Adds EXTERNAL_TASKS and lawfirm air-gap enforcement for external tasks.
forge-cascade-v2/forge/services/hybrid_search.py Adds cross-reference comments for hybrid search usage.
forge-cascade-v2/forge/services/hybrid_retriever.py Adds cross-reference comments for route integration.
forge-cascade-v2/forge/resilience/observability/metrics.py Adds extended MetricType entries (gov/fed/trust/zk).
forge-cascade-v2/forge/monitoring/metrics.py Defines new counters/histograms and exports for reset.
forge-cascade-v2/forge/models/events.py Adds schema_version to Event model and schema changelog block.
forge-cascade-v2/forge/kernel/pipeline.py Binds correlation/pipeline/phase into structlog contextvars per phase.
forge-cascade-v2/forge/desci/zk/snarkjs_wrapper.py New async subprocess wrapper around snarkjs CLI for proof ops.
forge-cascade-v2/forge/desci/zk/service.py Adds snarkjs-enabled path with simulated flag and wrapper usage.
forge-cascade-v2/forge/desci/zk/models.py Adds simulated field to ZKProof model.
forge-cascade-v2/forge/desci/zk/circuits/.gitkeep Placeholder for circuit artifacts directory.
forge-cascade-v2/forge/config.py Adds snarkjs_enabled and graduation threshold settings.
forge-cascade-v2/forge/api/routes/soulbound.py Tightens auth: mint requires ActiveUserDep; slash/trust updates require TrustedUserDep.
forge-cascade-v2/forge/api/routes/search.py Adds typed SearchResponse + routing table comments; advanced search path.
forge-cascade-v2/forge/api/routes/graph/exploration.py Adds SAFETY comment about parameterized Cypher filtering.
forge-cascade-v2/forge/api/routes/federation.py Adds typed response models for trust adjustments/history and uses them in handlers.
forge-cascade-v2/forge/api/routes/dataset_framework.py Adds SAFETY comment for label interpolation origin.
forge-cascade-v2/forge/api/routes/capsules.py Removes untyped response_model=dict[...] from endpoints (schema typing improvements).
forge-cascade-v2/forge/api/dependencies.py Adds pagination max page enforcement + AdminPaginationParams + admin dep.
forge-cascade-v2/forge/api/app.py Adds multi-instance config validation and richer /ready dependency_status response.
forge-cascade-v2/docs/DEPLOYMENT_CHECKLIST.md New deployment checklist for single/multi-instance and lawfirm profile.
forge-cascade-v2/docs/API_CHANGELOG.md New API changelog documenting stability policy and recent changes.
.semgrep/forge-custom.yml Refines secret-detection rule to exclude enum-constant-like suffixes.
.pre-commit-config.yaml Switches semgrep hook to local system hook for Windows compatibility.
.github/workflows/ci.yml Adds non-blocking benchmark smoke test step (pytest-benchmark).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +87 to +96
class TransactionStatus(str, Enum):
"""Status of a blockchain transaction with retry tracking."""

PENDING = "pending"
SUBMITTED = "submitted"
CONFIRMED = "confirmed"
FAILED = "failed"
RETRYING = "retrying"
DEAD_LETTER = "dead_letter" # All retries exhausted

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransactionStatus enum is added here, but TransactionRecord.status remains a free-form str (and callers already emit values like "dead_letter"). To avoid inconsistent status strings across the codebase, consider switching TransactionRecord.status to TransactionStatus (or adding validation) so the enum becomes the source of truth.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +26
def test_correlation_id_auto_generated(self):
"""When no correlation_id provided, one is generated."""
ctx = PipelineContext(
pipeline_id="test-pipe-2",
correlation_id="auto-generated-id",
triggered_by="manual",
)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_correlation_id_auto_generated() claims to test auto-generation, but it passes an explicit correlation_id value. Since PipelineContext requires correlation_id, this test currently only asserts non-empty input; consider either removing/renaming it or testing auto-generation at the Pipeline.execute() layer where correlation_id is actually created.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +59
class SearchResponse(BaseModel):
"""Typed response for search endpoints."""

query: str
mode: str
total: int
took_ms: float
results: list[dict[str, Any]]
filters_applied: dict[str, Any] = Field(default_factory=dict)
metadata: dict[str, Any] | None = None
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchResponse.results is typed as list[dict[str, Any]] even though SearchResultItem is defined just above. Returning raw dicts reduces OpenAPI specificity and makes the new SearchResultItem model unused; consider changing results to list[SearchResultItem] (and returning model instances) so the schema is truly typed.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +64
sig = inspect.signature(fn)
param_annotations = [str(p.annotation) for p in sig.parameters.values()]
# None of these should have ActiveUserDep or TrustedUserDep
for ann in param_annotations:
assert "UserDep" not in ann or "ActiveUserDep" not in ann, (
f"{fn.__name__} should not require user auth"
)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion uses or, which makes it pass in cases where an auth dependency is present (e.g., TrustedUserDep contains 'UserDep' but not 'ActiveUserDep'). To correctly enforce that GET endpoints have no auth deps, the condition should require that neither ActiveUserDep nor TrustedUserDep (nor any user dep marker) appears in annotations.

Copilot uses AI. Check for mistakes.
Comment on lines +423 to +436
return TransactionRecord(
tx_hash="",
chain=self.chain.value,
block_number=0,
timestamp=datetime.now(UTC),
from_address=self._operator_account.address
if self._operator_account
else "",
to_address=to_address,
value=value,
gas_used=0,
status="dead_letter",
transaction_type="transfer" if not data else "contract_call",
)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_transaction_with_retry() returns a TransactionRecord with tx_hash="" and status="dead_letter" when retries are exhausted. An empty tx_hash makes the record hard to correlate in logs/storage, and the status value is outside TransactionRecord’s documented statuses; also the new retry_count/max_retries/last_error fields aren’t populated here. Consider returning a record with retry metadata filled and a consistent status representation (e.g., TransactionStatus.DEAD_LETTER) plus a non-empty identifier.

Copilot uses AI. Check for mistakes.
Verify Agent 11 and others added 3 commits March 4, 2026 23:27
The test expected page=999999 to be silently clamped, but the
PaginationParams validator now correctly rejects it with HTTP 400.
…name

Update COMPLETE_CODEBASE_REPORT.md with 18 modified entries and 30 new
entries from the enterprise readiness implementation. Fix lawfirm
docker-compose service name from forge-api to cascade-api for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Ollama healthcheck: use `ollama list` instead of curl/wget (not
  available in container)
- Add missing env vars to compliance-api and virtuals-api in lawfirm
  compose override
- Fix LLM provider init: recognize Ollama as local provider that doesn't
  need an API key
- Fix async/sync mismatch: properly handle awaitable returns from
  initialize_encryption_service, init_file_watcher, close_file_watcher
- Pass ENCRYPTION_MASTER_KEY from .env to cascade-api container
- Comment out GPU reservation and TLS configs for local development
- Add Ollama to forge-network
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants