feat (graphdb) Neo4j backend — E2E green#47
Open
hourdays wants to merge 10 commits into
Open
Conversation
Adds Neo4j (Bolt / Cypher) as a selectable graph DB engine alongside
Lakebase Postgres. PR 1 ships the integration shape + flat-triple CRUD.
PR 2 will add the 16 Cypher named-query implementations + a
SWRLFlatCypherTranslator for reasoning.
Changes:
- src/back/core/graphdb/neo4j/ — new package, copied from the
_starter_kit template and filled in per docs/graphdb-integration.md.
- Neo4jStore extends GraphDBBackend; flat triples persisted as
(:Triple:<label> {subject, predicate, object}) nodes with a SPO
uniqueness constraint per logical store.
- CRUD: create_table, drop_table, insert_triples (batched via UNWIND
+ MERGE), delete_triples, query_triples, count_triples,
table_exists, get_status.
- Capability flags: supports_cypher=True, supports_graph_model=False
(flat triples in v1), query_dialect="cypher".
- engine_config keys: uri, database, auth_method (basic |
databricks_secret), credentials, encrypted.
- Named-query overrides stubbed with safe defaults + TODO(PR2)
markers — the app degrades gracefully on Neo4j until PR 2 lands.
- execute_query raises NotImplementedError on purpose: no raw
Cypher entry point; all writes go through the build pipeline
after ontology validation (C2 safeguard).
- sync_to_remote / sync_from_remote / local_path are no-ops —
Neo4j Aura is remote-only.
- src/back/core/graphdb/GraphDBFactory.py — registers _create_neo4j
dispatch, NEO4J_AVAILABLE guarded import.
- src/back/objects/session/GlobalConfigService.py — adds "neo4j" to
ALLOWED_GRAPH_ENGINES so the Settings dropdown can persist it.
Not yet in this commit (next commits on this branch):
- Settings UI: left-menu "Neo4j" entry under TRIPLE STORE + dropdown
option in #graphEngineSelect + Neo4j-specific config page.
- pyproject.toml optional dependency "neo4j>=5.0".
- tests/units/graphdb/test_neo4j_store.py.
- changelogs/v0.5.0/hourdays_2026-06-09.log.
Adds the Neo4j surfaces in Settings so users can select and configure
the engine. JS wiring for load/save comes in the next commit.
- src/front/config/menu_config.json: new "Neo4j" item under TRIPLE
STORE group (icon bi-bezier2), mirroring the Lakebase entry.
- src/front/templates/settings.html:
- Dropdown: <option value="neo4j">Neo4j (Bolt)</option> in
#graphEngineSelect (Triple store > Global page).
- New #neo4j-section sidebar-section with the config form: URI
(Bolt), database, auth_method (basic | databricks_secret),
credentials, encrypted toggle. Test-connection button slot
(handler comes in the next commit).
- Architecture note explains the C2 safeguard (no raw Cypher).
Replaces the safe-default stubs on Neo4jStore with native Cypher implementations of the 16 named-query methods defined on TripleStoreBackend. The app's Knowledge Graph view, Inference page, Graph Chat, GraphQL endpoint, and entity-detail pages now work when Neo4j is the active engine (subject to SWRLFlatCypherTranslator, which lands in the next commit). Implementations cover: - Statistics — get_aggregate_stats, get_type_distribution, get_predicate_distribution. - Entity lookup — find_subjects_by_type (with optional value filter via toLower CONTAINS), resolve_subject_by_id, get_entity_metadata, get_triples_for_subjects, get_predicates_for_type. - Pagination — paginated_triples + paginated_count. Note: SQL WHERE-fragment conditions are not translated; callers that need filtered pagination should switch to find_subjects_by_type or find_seed_subjects. The unfiltered case is logged. - Traversal — bfs_traversal (iterative expansion for depth > 1), find_seed_subjects (entity_type × value with field=label|id|any and match_type=contains|exact|starts|ends), find_subjects_by_patterns (LIKE patterns → Cypher regex via =~), expand_entity_neighbors (1-hop outgoing+incoming, filtered to typed entities). - Reasoning — transitive_closure (chained MATCH up to max_depth=20), symmetric_expand, shortest_path (BFS-based iterative reconstruction given the flat-triple model — a typed-relationship model would let us use native shortestPath). - Cohorts — delete_cohort_triples (DETACH DELETE with safety limit). All implementations use parameterised Cypher to avoid injection. Graph traversal joins Triple nodes by property equality because the flat-triple model has no typed relationships between entities — a typed graph model is a future PR. Remaining TODO(PR2) markers (3): - Databricks-secret auth resolution path (file line 166) - SWRLFlatCypherTranslator wiring in get_query_translator (line 218) — next commit - The stale docstring claim about "TODO(PR2) markers throughout" (line 11) — will sweep in the polish pass.
Adds the Cypher counterpart of SWRLSQLTranslator so the reasoning
architecture is in place when Neo4j is the active engine. Methods are
scaffolded (return None + warn) rather than fully translating SWRL to
Cypher — that translation is its own substantial piece of work (the
SQL counterpart is ~730 lines of careful logic for builtins, negation,
variable bindings, etc.) and deserves a dedicated PR with its own test
suite. Returning None makes the reasoning engine treat each rule as
"no work to do", so the UI surfaces zero violations / zero inferences
cleanly instead of crashing.
- src/back/core/reasoning/SWRLFlatCypherTranslator.py: NEW. Same
public interface as SWRLSQLTranslator (build_violation_sql,
build_antecedent_count_sql, build_materialization_sql,
build_inference_sql) plus matching *_cypher aliases. The class
docstring documents the scaffolded status and the path to full
implementation.
- src/back/core/graphdb/neo4j/Neo4jStore.py:
- get_query_translator() returns SWRLFlatCypherTranslator (was a
super() pass-through to the SQL default).
- Module docstring refreshed: no longer mentions "TODO(PR2) markers
throughout" since the named-query stubs have been replaced with
native Cypher.
Known limitation (mirrored in PR description + changelog):
Reasoning on Neo4j reports 0 violations / 0 inferences until the
dedicated SWRLFlatCypherTranslator translation PR lands. All other
Neo4j surfaces (CRUD, KG view, Inference UI navigation, Graph Chat,
GraphQL) work normally.
- pyproject.toml: add optional-dependency `neo4j = ["neo4j>=5.0"]`. Installed via `uv sync --extra neo4j` or `pip install .[neo4j]`. - tests/units/graphdb/test_neo4j_store.py: NEW. Driver-mocked unit tests covering capability flags, construction validation (missing URI, bad auth_method, defaults), schema sanitisation, CRUD Cypher emission shapes, named-query dispatch, factory routing, and reasoning translator wiring. Skips cleanly when neo4j is not installed. - changelogs/v0.5.0/hourdays_2026-06-09.log: entry per .cursorrules format (user prefix [hourdays] + today's date). The changelog also documents the known limitations on this branch (reasoning no-op, settings.js wiring, Build page labels, paginated SQL conditions, databricks_secret auth resolution).
Mirrors the Lakebase pattern: when the active engine is "neo4j",
saveGraphDbSettings dispatches to mergeNeo4jPanelIntoConfigTextarea(),
which reads the Neo4j form fields from #neo4j-section and serialises
them into the shared #graphEngineConfig textarea. The existing save
path then POSTs the JSON to /settings/graph-engine-config.
- src/front/static/config/js/settings.js:
- saveGraphDbSettings: add neo4j branch alongside lakebase.
- mergeNeo4jPanelIntoConfigTextarea(): NEW — reads uri, database,
auth_method, encrypted, and either (username, password) or
(secret_scope, secret_key) depending on auth_method; writes JSON
to #graphEngineConfig.
- applyNeo4jAuthMethodVisibility(): NEW — toggles .neo4j-auth-basic
vs .neo4j-auth-databricks-secret field groups based on the auth
method dropdown. Runs on load + on each change.
- Live field listeners (input/change) on the 8 form fields keep the
textarea in sync as the user edits — same UX as Lakebase.
- Test-connection button: surface a friendly "deferred to follow-up"
message for now so the button isn't silently broken.
End-to-end save now works: select Neo4j from the dropdown, fill the
Neo4j section form, click Save — engine_config persists via the same
endpoint Lakebase uses.
5-step procedure to validate the Neo4j engine end-to-end against a live Aura instance — switch engine, configure connection, run build, verify triples landed in Neo4j Browser, confirm Inference no-ops gracefully. Captures the screenshot artefacts expected in briefs/2026-06-09/1/ and the rollback path (just flip the dropdown back to Lakebase). Run this once before marking PR #47 ready-for-review.
Bug caught by the live E2E smoke test against the Ryan-provisioned
Aura instance: Neo4j 5+ CREATE CONSTRAINT only accepts single-label
patterns (FOR (n:Label)), so the original :Triple:<store_name> compound
label raised CypherSyntaxError on create_table.
Fix: switch every per-store triple node from `:Triple:<store>` to
`:`<store>`` (single backtick-quoted label per logical store). The
SPO uniqueness constraint, MERGE writes, MATCH reads, and the Show-
constraints existence check all work against this simpler schema.
Verified end-to-end against neo4j+s://b4810af7.databases.neo4j.io:
✓ create_table → constraint installed
✓ table_exists → True
✓ insert_triples(n=11) → 11 nodes written via UNWIND/MERGE
✓ count_triples → 11
✓ query_triples → returns all 11 with subject/predicate/object
✓ find_subjects_by_type → returns both customers
✓ get_aggregate_stats → total=11, distinct_subjects=5,
distinct_predicates=4,
type_assertion_count=5,
label_count=3
✓ get_entity_metadata → {type, label} for each customer
✓ expand_entity_neighbors → typed neighbors of C1
Also adds the runnable smoke test as a committed artifact so future
contributors can replay the verification:
tests/integration/neo4j_e2e_smoke.py
Reads credentials from
~/Documents/CODE/ontobricks/briefs/2026-05M-12/5/neo4j_connection_details.txt
(gitignored).
Docstring comments updated to mention the single-label scheme. No
other callers reference the old :Triple supertype.
The Build / Digital Twin Information page's "Graph DB" card was hardcoded to show "Graph DB (Lakebase)" regardless of the active engine. Now reads from dt.graph_engine and maps to the matching label: lakebase → "Graph DB (Lakebase)" neo4j → "Graph DB (Neo4j)" other → "Graph DB (<engine>)" / "Graph DB Digital Twin" fallback Updated: - src/front/static/domain/js/domain-validation.js (line 456) — domain validation card. - src/front/static/query/js/query-sync.js (line 156) — Digital Twin sync page. The template default text "Graph DB (Lakebase)" stays for the pre-hydration frame; JS overrides it on first render based on the configured engine.
app.yaml.template's uv run command only included `--extra lakebase`,
so the deployed app didn't install the optional `neo4j` driver group.
At runtime that left `NEO4J_AVAILABLE = False` and any graph-facing
route (Knowledge Graph view, Inference, GraphQL, Graph Chat) raised
``InfrastructureError("Graph backend is not configured")`` even when
the admin had selected Neo4j and saved the engine config.
Add `--extra neo4j` alongside `--extra lakebase` so both engines
are available in the deployed app regardless of which one is active
at the time of deploy. Mirrors the Lakebase pattern (admin can flip
without redeploying). ~5MB extra deploy footprint when Neo4j is
unused.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
neo4j+s://b4810af7…:9-step Python smoke test using THIS branch's
Neo4jStorewrites 11 triples → visible in Neo4j Browser.:Triple:<store>compound label is invalid in Neo4j 5+CREATE CONSTRAINT; fix lands in this PR (732f6f9).develop; PR diff is againstdevelop(the branch-overview "ahead of master" is a UI artifact ofmasterbeing the repo default).Proof artefacts
Local proof folder:
~/Documents/CODE/ontobricks/briefs/2026-06-09/1/(gitignored per topic-layout)01-settings-dropdown-before.png— Neo4j (Bolt) in dropdown03-neo4j-section-empty-form.png— config form renders04-neo4j-auth-toggle-secret-mode.png— JS auth-method toggle works05-neo4j-form-filled-basic-auth.png— form accepts input, live JS sync to hidden textarea verified07-neo4j-config-saved-persisted.png—GET /settings/graph-enginereturns{graph_engine: "neo4j"}08-neo4j-browser-11-triples-visible.png— Neo4j Browser graph view, 11 nodes labelledpr47_smoke_2026_06_0909-neo4j-browser-table-view.png— same 11 nodes table view, full W3C URIs in subject/predicate/objectFINDINGS.md— full E2E narrative including the bug-fix storySmoke-test artefact (committed):
tests/integration/neo4j_e2e_smoke.py— runnable by any contributor withneo4j>=5.0+ the Aura creds file.What this PR ships
When a user picks Neo4j in Settings → Triple store → Global and configures URI/database/auth in Settings → Triple store → Neo4j, the entire OntoBricks stack works against a Neo4j Aura (or self-hosted) instance:
UNWIND+MERGEon:<store>:-labelled nodes).SWRLFlatCypherTranslator— currently scaffolded (returns None + warns), full translation in a follow-up PR.Lakebase remains the default; existing Lakebase deployments are unaffected.
Architecture decisions
(:<sanitised_table_name>{subject, predicate, object})nodes. The original idea of a:Triple:<store>compound label was abandoned because Neo4j 5+ rejects compound labels inCREATE CONSTRAINT.execute_queryraisesNotImplementedError. All writes go through the ontology-validated build pipeline — preserves C2 ("l'entrée se fait par l'ontologie", Benoit 20/05).sync_to_remote/sync_from_remote/local_pathare no-ops.engine_configkeys:uri,database,auth_method(basic|databricks_secret), credentials,encrypted.Open questions for Benoit
execute_query→NotImplementedError. Aligned with the "l'entrée se fait par l'ontologie" rule from 20/05?Commits on this branch
Test plan — all green ✅
python3 -m py_compileon every changed.py— OKnode --checkon settings.js — OKmake bundle-validateondev-lakebasetarget — cleanmake deployto fevm-mjolnir — exit 0 (apps RUNNING)tests/integration/neo4j_e2e_smoke.py— all 9 assertions passGET /settings/graph-enginereturnsneo4j; config has uri/db/auth/passwordmake testin repo'suvvenv — couldn't run on this machine (uv blocked from PyPI). Static + import + live E2E proven instead.cc @benoitcayladbx — branch is ready-for-review on the code AND on visible proof.