docs: subqueries deep dive blog post and technical briefs#4208
docs: subqueries deep dive blog post and technical briefs#4208
Conversation
Structured outline for Rob to prose up in place. Covers DNF decomposition, move-in/move-out splice model, reverse-indexed stream routing, and oracle testing. Published: false. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add rob to blog authors. Draft title, description, excerpt. Fix TanStack DB 0.6 link year (2025 → 2026). Remove missing image ref. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b4d8200 to
83fef0b
Compare
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
|
||
| That isn't tolerable at any size. A resync means refetching data from the server and additional latency — every time somebody three tables away makes a small change. The shape was doing exactly what we'd built it to do. We just hadn't built it to do enough. |
There was a problem hiding this comment.
That isn't tolerable at any size.
For small shapes it's fine, surely? In fact if it's smaller than the move-in query results it's actually the quicker path.
The issue is large shapes.
I liked how claude put in in your session:
For a small shape that is acceptable. For a shape carrying tens of thousands of rows already sitting in IndexedDB on a user's laptop, watching the whole thing be torn down and rebuilt because somebody three timezones away was added to a workspace is, to put it mildly, not what you want.
| ``` | ||
| ```` | ||
|
|
||
| The structure of that index tree is where DNF earns its place. A tree of indexed conditions is naturally a disjunction of conjunctions: each `AND`-chain is a path through the indexes that narrows the candidate set step by step, and each `OR` is an independent path that contributes its own candidates. Disjunctive normal form — a flat `OR` of `AND`s — is the canonical shape that maps onto this architecture exactly. Compile any boolean expression to DNF first, and every disjunct becomes an independent indexed path; arbitrary boolean structure becomes routable in constant time without changing the engine underneath. |
There was a problem hiding this comment.
This is not the main reason we're using DNF, and currently we're not using DNF in this area, but we plan to
|
|
||
| ### Move-in planning with DNF | ||
|
|
||
| Most dependency changes touch a single disjunct. When a user joins an organisation, only the disjunct that depends on `org_id` needs new rows; the others are unaffected. DNF makes that targeted: each disjunct's predicates form a single, plannable SQL query, so the catch-up runs against just the rows that newly satisfy the changed disjunct. |
There was a problem hiding this comment.
I'm not sure if this is true, I think there's edge cases that prevent it being this clean.
|
|
||
| Most dependency changes touch a single disjunct. When a user joins an organisation, only the disjunct that depends on `org_id` needs new rows; the others are unaffected. DNF makes that targeted: each disjunct's predicates form a single, plannable SQL query, so the catch-up runs against just the rows that newly satisfy the changed disjunct. | ||
|
|
||
| The engine compiles each disjunct into a parametrised query at registration time. When a dependency moves, it picks the disjuncts that depend on the changed value, binds the new parameters, and runs the query at the dependency's commit LSN. The result is a row set scoped to those disjuncts, not a re-evaluation of the full WHERE clause. |
There was a problem hiding this comment.
We should probably mention move-in broadcasts as well
| <!-- STRUCTURAL: The performance/scaling section. Previous sections explain | ||
| correctness — this explains how we make it fast. Shorter section. --> | ||
|
|
||
| The matching engine has to answer two different questions about every shape with a subquery. On the hot path, when a change arrives from Postgres, the engine needs to know which shapes might be affected; the answer can be approximate, since downstream verification rejects false positives. Separately, when evaluating a shape's `WHERE` clause against a candidate row, the engine needs to know whether a specific value is currently in that shape's subquery view, and the answer has to be precise. Both questions hit the same underlying data: the set of values each shape's subquery currently matches. |
There was a problem hiding this comment.
the answer can be approximate
approximate sounds unreliable. perhaps "conservative" might work better
| Splitting these into two indexes would be the obvious move. The reason we don't is consistency. Every dependency change updates both — when a value enters a subquery's view, the routing entries that include the value and the exact-membership entries that confirm it are written together. Splitting the index would force coordination across two writes, and a window where the routing layer says "this shape cares" while the membership layer says "this shape doesn't include the value" is a window for silent inconsistency. | ||
|
|
||
| A single ETS write per value avoids that. Routing and membership see the same world. |
There was a problem hiding this comment.
"Splitting these into two indexes"
- they're not two indexes, and they are separate, the former is an index and the latter is a materialized view in the consumer.
| <!-- ASSET: Rob's annotated SQL or diagram showing the generated move-in query | ||
| for a concrete example --> | ||
|
|
||
| ### Move-out handling |
There was a problem hiding this comment.
We may also mention that move-ins become move-outs and visa versa with x NOT IN (subquery) or NOT(x=3 OR y IN subquery)
Summary
subqueries-deep-dive.md)subqueries-technical-brief.md)shape-indexing-technical-brief.md)Test plan
🤖 Generated with Claude Code