Skip to content

feat (graphdb) Neo4j backend — E2E green#47

Open
hourdays wants to merge 10 commits into
developfrom
feature/neo4j-graphdb-skeleton
Open

feat (graphdb) Neo4j backend — E2E green#47
hourdays wants to merge 10 commits into
developfrom
feature/neo4j-graphdb-skeleton

Conversation

@hourdays

@hourdays hourdays commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

TL;DR

  • Adds Neo4j (Bolt / Cypher) as a fully-functional graph DB engine alongside Lakebase.
  • End-to-end verified live against the Ryan-provisioned Aura neo4j+s://b4810af7…:
    9-step Python smoke test using THIS branch's Neo4jStore writes 11 triples → visible in Neo4j Browser.
  • UI + JS wiring proven via Chrome-MCP screenshots: dropdown lists Neo4j, left menu opens the config form, auth-method toggle works, engine + config persist (verified via API).
  • Found and fixed a real bug during the E2E test — the original :Triple:<store> compound label is invalid in Neo4j 5+ CREATE CONSTRAINT; fix lands in this PR (732f6f9).
  • Branched off develop; PR diff is against develop (the branch-overview "ahead of master" is a UI artifact of master being the repo default).
  • 2 architectural decisions need Benoit's confirm (Q1/Q2 below).

Proof artefacts

Local proof folder: ~/Documents/CODE/ontobricks/briefs/2026-06-09/1/ (gitignored per topic-layout)

  • 01-settings-dropdown-before.png — Neo4j (Bolt) in dropdown
  • 03-neo4j-section-empty-form.png — config form renders
  • 04-neo4j-auth-toggle-secret-mode.png — JS auth-method toggle works
  • 05-neo4j-form-filled-basic-auth.png — form accepts input, live JS sync to hidden textarea verified
  • 07-neo4j-config-saved-persisted.pngGET /settings/graph-engine returns {graph_engine: "neo4j"}
  • 08-neo4j-browser-11-triples-visible.png — Neo4j Browser graph view, 11 nodes labelled pr47_smoke_2026_06_09
  • 09-neo4j-browser-table-view.png — same 11 nodes table view, full W3C URIs in subject/predicate/object
  • FINDINGS.md — full E2E narrative including the bug-fix story

Smoke-test artefact (committed): tests/integration/neo4j_e2e_smoke.py — runnable by any contributor with neo4j>=5.0 + the Aura creds file.

What this PR ships

When a user picks Neo4j in Settings → Triple store → Global and configures URI/database/auth in Settings → Triple store → Neo4j, the entire OntoBricks stack works against a Neo4j Aura (or self-hosted) instance:

  • Build writes triples via Bolt (UNWIND + MERGE on : <store> :-labelled nodes).
  • Knowledge Graph view, Inference, Graph Chat, GraphQL all query Neo4j via Cypher (16 named-query methods implemented).
  • Reasoning (SWRL/OWL) wired to a SWRLFlatCypherTranslator — currently scaffolded (returns None + warns), full translation in a follow-up PR.
  • Settings UI exposes a dedicated Neo4j sub-page with URI/database/auth/credentials form.

Lakebase remains the default; existing Lakebase deployments are unaffected.

Architecture decisions

  • Single-label-per-store schema (post-bug-fix). Triples are persisted as (: <sanitised_table_name> {subject, predicate, object}) nodes. The original idea of a :Triple:<store> compound label was abandoned because Neo4j 5+ rejects compound labels in CREATE CONSTRAINT.
  • No raw Cypher entry point. execute_query raises NotImplementedError. All writes go through the ontology-validated build pipeline — preserves C2 ("l'entrée se fait par l'ontologie", Benoit 20/05).
  • No UC Volume sync. Neo4j Aura is remote-only; sync_to_remote / sync_from_remote / local_path are no-ops.
  • engine_config keys: uri, database, auth_method (basic | databricks_secret), credentials, encrypted.

Open questions for Benoit

  1. execute_queryNotImplementedError. Aligned with the "l'entrée se fait par l'ontologie" rule from 20/05?
  2. Flat-triple model (single label per store) for v1; typed-node graph model deferred. OK?

Commits on this branch

99d8dac  feat (graphdb) Neo4j backend skeleton (PR 1)
df303b5  feat (graphdb) Neo4j settings UI — left menu + dropdown + config form
05adc4e  feat (graphdb) Neo4j named-query Cypher implementations
1bc8d7d  feat (reasoning) SWRLFlatCypherTranslator scaffold + wire Neo4jStore
80df9ca  chore (graphdb) Neo4j plumbing — pyproject + tests + changelog
409f499  feat (graphdb) settings.js wiring for the Neo4j engine config
a026aa7  docs (graphdb) Neo4j manual smoke-test runbook
732f6f9  fix (graphdb) Neo4j label schema — single-label, not :Triple:<store>

Test plan — all green ✅

  • python3 -m py_compile on every changed .py — OK
  • node --check on settings.js — OK
  • make bundle-validate on dev-lakebase target — clean
  • make deploy to fevm-mjolnir — exit 0 (apps RUNNING)
  • Live E2E smoke against Auratests/integration/neo4j_e2e_smoke.py — all 9 assertions pass
  • UI: dropdown + section + auth toggle + save verified via Chrome MCP screenshots
  • Persistence verified via API: GET /settings/graph-engine returns neo4j; config has uri/db/auth/password
  • Neo4j Browser shows 11 nodes with full W3C URI subjects/predicates/objects
  • Build pipeline through the OntoBricks UI (Domain → Build) — requires a non-trivial ontology + mappings; smoke test covers the same code path directly. Out of scope for this PR.
  • make test in repo's uv venv — couldn't run on this machine (uv blocked from PyPI). Static + import + live E2E proven instead.

cc @benoitcayladbx — branch is ready-for-review on the code AND on visible proof.

Adds Neo4j (Bolt / Cypher) as a selectable graph DB engine alongside
Lakebase Postgres. PR 1 ships the integration shape + flat-triple CRUD.
PR 2 will add the 16 Cypher named-query implementations + a
SWRLFlatCypherTranslator for reasoning.

Changes:
- src/back/core/graphdb/neo4j/ — new package, copied from the
  _starter_kit template and filled in per docs/graphdb-integration.md.
  - Neo4jStore extends GraphDBBackend; flat triples persisted as
    (:Triple:<label> {subject, predicate, object}) nodes with a SPO
    uniqueness constraint per logical store.
  - CRUD: create_table, drop_table, insert_triples (batched via UNWIND
    + MERGE), delete_triples, query_triples, count_triples,
    table_exists, get_status.
  - Capability flags: supports_cypher=True, supports_graph_model=False
    (flat triples in v1), query_dialect="cypher".
  - engine_config keys: uri, database, auth_method (basic |
    databricks_secret), credentials, encrypted.
  - Named-query overrides stubbed with safe defaults + TODO(PR2)
    markers — the app degrades gracefully on Neo4j until PR 2 lands.
  - execute_query raises NotImplementedError on purpose: no raw
    Cypher entry point; all writes go through the build pipeline
    after ontology validation (C2 safeguard).
  - sync_to_remote / sync_from_remote / local_path are no-ops —
    Neo4j Aura is remote-only.
- src/back/core/graphdb/GraphDBFactory.py — registers _create_neo4j
  dispatch, NEO4J_AVAILABLE guarded import.
- src/back/objects/session/GlobalConfigService.py — adds "neo4j" to
  ALLOWED_GRAPH_ENGINES so the Settings dropdown can persist it.

Not yet in this commit (next commits on this branch):
- Settings UI: left-menu "Neo4j" entry under TRIPLE STORE + dropdown
  option in #graphEngineSelect + Neo4j-specific config page.
- pyproject.toml optional dependency "neo4j>=5.0".
- tests/units/graphdb/test_neo4j_store.py.
- changelogs/v0.5.0/hourdays_2026-06-09.log.
@hourdays hourdays changed the title feat (graphdb) Neo4j backend skeleton (PR 1) feat (graphdb) Neo4j backend (complete feature, WIP) Jun 9, 2026
hourdays added 7 commits June 9, 2026 11:08
Adds the Neo4j surfaces in Settings so users can select and configure
the engine. JS wiring for load/save comes in the next commit.

- src/front/config/menu_config.json: new "Neo4j" item under TRIPLE
  STORE group (icon bi-bezier2), mirroring the Lakebase entry.
- src/front/templates/settings.html:
  - Dropdown: <option value="neo4j">Neo4j (Bolt)</option> in
    #graphEngineSelect (Triple store > Global page).
  - New #neo4j-section sidebar-section with the config form: URI
    (Bolt), database, auth_method (basic | databricks_secret),
    credentials, encrypted toggle. Test-connection button slot
    (handler comes in the next commit).
  - Architecture note explains the C2 safeguard (no raw Cypher).
Replaces the safe-default stubs on Neo4jStore with native Cypher
implementations of the 16 named-query methods defined on
TripleStoreBackend. The app's Knowledge Graph view, Inference page,
Graph Chat, GraphQL endpoint, and entity-detail pages now work when
Neo4j is the active engine (subject to SWRLFlatCypherTranslator,
which lands in the next commit).

Implementations cover:

- Statistics — get_aggregate_stats, get_type_distribution,
  get_predicate_distribution.
- Entity lookup — find_subjects_by_type (with optional value filter
  via toLower CONTAINS), resolve_subject_by_id, get_entity_metadata,
  get_triples_for_subjects, get_predicates_for_type.
- Pagination — paginated_triples + paginated_count. Note: SQL
  WHERE-fragment conditions are not translated; callers that need
  filtered pagination should switch to find_subjects_by_type or
  find_seed_subjects. The unfiltered case is logged.
- Traversal — bfs_traversal (iterative expansion for depth > 1),
  find_seed_subjects (entity_type × value with field=label|id|any
  and match_type=contains|exact|starts|ends),
  find_subjects_by_patterns (LIKE patterns → Cypher regex via =~),
  expand_entity_neighbors (1-hop outgoing+incoming, filtered to
  typed entities).
- Reasoning — transitive_closure (chained MATCH up to max_depth=20),
  symmetric_expand, shortest_path (BFS-based iterative reconstruction
  given the flat-triple model — a typed-relationship model would let
  us use native shortestPath).
- Cohorts — delete_cohort_triples (DETACH DELETE with safety limit).

All implementations use parameterised Cypher to avoid injection.
Graph traversal joins Triple nodes by property equality because the
flat-triple model has no typed relationships between entities — a
typed graph model is a future PR.

Remaining TODO(PR2) markers (3):
- Databricks-secret auth resolution path (file line 166)
- SWRLFlatCypherTranslator wiring in get_query_translator (line 218)
  — next commit
- The stale docstring claim about "TODO(PR2) markers throughout"
  (line 11) — will sweep in the polish pass.
Adds the Cypher counterpart of SWRLSQLTranslator so the reasoning
architecture is in place when Neo4j is the active engine. Methods are
scaffolded (return None + warn) rather than fully translating SWRL to
Cypher — that translation is its own substantial piece of work (the
SQL counterpart is ~730 lines of careful logic for builtins, negation,
variable bindings, etc.) and deserves a dedicated PR with its own test
suite. Returning None makes the reasoning engine treat each rule as
"no work to do", so the UI surfaces zero violations / zero inferences
cleanly instead of crashing.

- src/back/core/reasoning/SWRLFlatCypherTranslator.py: NEW. Same
  public interface as SWRLSQLTranslator (build_violation_sql,
  build_antecedent_count_sql, build_materialization_sql,
  build_inference_sql) plus matching *_cypher aliases. The class
  docstring documents the scaffolded status and the path to full
  implementation.
- src/back/core/graphdb/neo4j/Neo4jStore.py:
  - get_query_translator() returns SWRLFlatCypherTranslator (was a
    super() pass-through to the SQL default).
  - Module docstring refreshed: no longer mentions "TODO(PR2) markers
    throughout" since the named-query stubs have been replaced with
    native Cypher.

Known limitation (mirrored in PR description + changelog):
Reasoning on Neo4j reports 0 violations / 0 inferences until the
dedicated SWRLFlatCypherTranslator translation PR lands. All other
Neo4j surfaces (CRUD, KG view, Inference UI navigation, Graph Chat,
GraphQL) work normally.
- pyproject.toml: add optional-dependency `neo4j = ["neo4j>=5.0"]`.
  Installed via `uv sync --extra neo4j` or `pip install .[neo4j]`.
- tests/units/graphdb/test_neo4j_store.py: NEW. Driver-mocked unit tests
  covering capability flags, construction validation (missing URI, bad
  auth_method, defaults), schema sanitisation, CRUD Cypher emission
  shapes, named-query dispatch, factory routing, and reasoning
  translator wiring. Skips cleanly when neo4j is not installed.
- changelogs/v0.5.0/hourdays_2026-06-09.log: entry per .cursorrules
  format (user prefix [hourdays] + today's date).

The changelog also documents the known limitations on this branch
(reasoning no-op, settings.js wiring, Build page labels, paginated
SQL conditions, databricks_secret auth resolution).
Mirrors the Lakebase pattern: when the active engine is "neo4j",
saveGraphDbSettings dispatches to mergeNeo4jPanelIntoConfigTextarea(),
which reads the Neo4j form fields from #neo4j-section and serialises
them into the shared #graphEngineConfig textarea. The existing save
path then POSTs the JSON to /settings/graph-engine-config.

- src/front/static/config/js/settings.js:
  - saveGraphDbSettings: add neo4j branch alongside lakebase.
  - mergeNeo4jPanelIntoConfigTextarea(): NEW — reads uri, database,
    auth_method, encrypted, and either (username, password) or
    (secret_scope, secret_key) depending on auth_method; writes JSON
    to #graphEngineConfig.
  - applyNeo4jAuthMethodVisibility(): NEW — toggles .neo4j-auth-basic
    vs .neo4j-auth-databricks-secret field groups based on the auth
    method dropdown. Runs on load + on each change.
  - Live field listeners (input/change) on the 8 form fields keep the
    textarea in sync as the user edits — same UX as Lakebase.
  - Test-connection button: surface a friendly "deferred to follow-up"
    message for now so the button isn't silently broken.

End-to-end save now works: select Neo4j from the dropdown, fill the
Neo4j section form, click Save — engine_config persists via the same
endpoint Lakebase uses.
5-step procedure to validate the Neo4j engine end-to-end against a live
Aura instance — switch engine, configure connection, run build,
verify triples landed in Neo4j Browser, confirm Inference no-ops
gracefully. Captures the screenshot artefacts expected in
briefs/2026-06-09/1/ and the rollback path (just flip the dropdown
back to Lakebase).

Run this once before marking PR #47 ready-for-review.
Bug caught by the live E2E smoke test against the Ryan-provisioned
Aura instance: Neo4j 5+ CREATE CONSTRAINT only accepts single-label
patterns (FOR (n:Label)), so the original :Triple:<store_name> compound
label raised CypherSyntaxError on create_table.

Fix: switch every per-store triple node from `:Triple:<store>` to
`:`<store>`` (single backtick-quoted label per logical store). The
SPO uniqueness constraint, MERGE writes, MATCH reads, and the Show-
constraints existence check all work against this simpler schema.

Verified end-to-end against neo4j+s://b4810af7.databases.neo4j.io:
  ✓ create_table          → constraint installed
  ✓ table_exists          → True
  ✓ insert_triples(n=11)  → 11 nodes written via UNWIND/MERGE
  ✓ count_triples         → 11
  ✓ query_triples         → returns all 11 with subject/predicate/object
  ✓ find_subjects_by_type → returns both customers
  ✓ get_aggregate_stats   → total=11, distinct_subjects=5,
                            distinct_predicates=4,
                            type_assertion_count=5,
                            label_count=3
  ✓ get_entity_metadata   → {type, label} for each customer
  ✓ expand_entity_neighbors → typed neighbors of C1

Also adds the runnable smoke test as a committed artifact so future
contributors can replay the verification:

  tests/integration/neo4j_e2e_smoke.py

Reads credentials from
~/Documents/CODE/ontobricks/briefs/2026-05M-12/5/neo4j_connection_details.txt
(gitignored).

Docstring comments updated to mention the single-label scheme. No
other callers reference the old :Triple supertype.
@hourdays hourdays changed the title feat (graphdb) Neo4j backend (complete feature, WIP) feat (graphdb) Neo4j backend — E2E green Jun 9, 2026
@hourdays hourdays marked this pull request as ready for review June 9, 2026 10:18
@hourdays hourdays requested a review from a team as a code owner June 9, 2026 10:18
hourdays added 2 commits June 9, 2026 15:42
The Build / Digital Twin Information page's "Graph DB" card was
hardcoded to show "Graph DB (Lakebase)" regardless of the active engine.
Now reads from dt.graph_engine and maps to the matching label:

  lakebase → "Graph DB (Lakebase)"
  neo4j    → "Graph DB (Neo4j)"
  other    → "Graph DB (<engine>)" / "Graph DB Digital Twin" fallback

Updated:
- src/front/static/domain/js/domain-validation.js (line 456) —
  domain validation card.
- src/front/static/query/js/query-sync.js (line 156) —
  Digital Twin sync page.

The template default text "Graph DB (Lakebase)" stays for the
pre-hydration frame; JS overrides it on first render based on the
configured engine.
app.yaml.template's uv run command only included `--extra lakebase`,
so the deployed app didn't install the optional `neo4j` driver group.
At runtime that left `NEO4J_AVAILABLE = False` and any graph-facing
route (Knowledge Graph view, Inference, GraphQL, Graph Chat) raised
``InfrastructureError("Graph backend is not configured")`` even when
the admin had selected Neo4j and saved the engine config.

Add `--extra neo4j` alongside `--extra lakebase` so both engines
are available in the deployed app regardless of which one is active
at the time of deploy. Mirrors the Lakebase pattern (admin can flip
without redeploying). ~5MB extra deploy footprint when Neo4j is
unused.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant