Skip to content

docs: subqueries deep dive blog post and technical briefs#4208

Open
balegas wants to merge 7 commits intomainfrom
vbalegas/subqueries-deep-dive
Open

docs: subqueries deep dive blog post and technical briefs#4208
balegas wants to merge 7 commits intomainfrom
vbalegas/subqueries-deep-dive

Conversation

@balegas
Copy link
Copy Markdown
Contributor

@balegas balegas commented Apr 28, 2026

Summary

  • Adds the subqueries deep dive blog post draft (subqueries-deep-dive.md)
  • Adds the detailed technical brief / outline as a separate reference doc (subqueries-technical-brief.md)
  • Adds the shape indexing technical brief (shape-indexing-technical-brief.md)

Test plan

  • Review blog post content for accuracy
  • Review technical brief for completeness
  • Confirm outline is properly separated from the blog post

🤖 Generated with Claude Code

thruflo and others added 6 commits March 31, 2026 11:32
Structured outline for Rob to prose up in place. Covers DNF decomposition,
move-in/move-out splice model, reverse-indexed stream routing, and oracle
testing. Published: false.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add rob to blog authors. Draft title, description, excerpt. Fix
TanStack DB 0.6 link year (2025 → 2026). Remove missing image ref.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1324 1 1323 26
View the top 1 failed test(s) by shortest run time
test/integration.test.ts > HTTP Sync > multiple clients can get the same data in parallel (liveSSE=true)
Stack Traces | 30s run time
Error: Test timed out in 30000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/integration.test.ts:492:21

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@balegas balegas force-pushed the vbalegas/subqueries-deep-dive branch from b4d8200 to 83fef0b Compare April 28, 2026 12:52
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 28, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 83fef0b
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69f0ad784913a6000819d4e4
😎 Deploy Preview https://deploy-preview-4208--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Comment on lines +51 to +52

That isn't tolerable at any size. A resync means refetching data from the server and additional latency — every time somebody three tables away makes a small change. The shape was doing exactly what we'd built it to do. We just hadn't built it to do enough.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't tolerable at any size.

For small shapes it's fine, surely? In fact if it's smaller than the move-in query results it's actually the quicker path.

The issue is large shapes.

I liked how claude put in in your session:

For a small shape that is acceptable. For a shape carrying tens of thousands of rows already sitting in IndexedDB on a user's laptop, watching the whole thing be torn down and rebuilt because somebody three timezones away was added to a workspace is, to put it mildly, not what you want.

```
````

The structure of that index tree is where DNF earns its place. A tree of indexed conditions is naturally a disjunction of conjunctions: each `AND`-chain is a path through the indexes that narrows the candidate set step by step, and each `OR` is an independent path that contributes its own candidates. Disjunctive normal form — a flat `OR` of `AND`s — is the canonical shape that maps onto this architecture exactly. Compile any boolean expression to DNF first, and every disjunct becomes an independent indexed path; arbitrary boolean structure becomes routable in constant time without changing the engine underneath.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the main reason we're using DNF, and currently we're not using DNF in this area, but we plan to


### Move-in planning with DNF

Most dependency changes touch a single disjunct. When a user joins an organisation, only the disjunct that depends on `org_id` needs new rows; the others are unaffected. DNF makes that targeted: each disjunct's predicates form a single, plannable SQL query, so the catch-up runs against just the rows that newly satisfy the changed disjunct.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is true, I think there's edge cases that prevent it being this clean.


Most dependency changes touch a single disjunct. When a user joins an organisation, only the disjunct that depends on `org_id` needs new rows; the others are unaffected. DNF makes that targeted: each disjunct's predicates form a single, plannable SQL query, so the catch-up runs against just the rows that newly satisfy the changed disjunct.

The engine compiles each disjunct into a parametrised query at registration time. When a dependency moves, it picks the disjuncts that depend on the changed value, binds the new parameters, and runs the query at the dependency's commit LSN. The result is a row set scoped to those disjuncts, not a re-evaluation of the full WHERE clause.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably mention move-in broadcasts as well

<!-- STRUCTURAL: The performance/scaling section. Previous sections explain
correctness — this explains how we make it fast. Shorter section. -->

The matching engine has to answer two different questions about every shape with a subquery. On the hot path, when a change arrives from Postgres, the engine needs to know which shapes might be affected; the answer can be approximate, since downstream verification rejects false positives. Separately, when evaluating a shape's `WHERE` clause against a candidate row, the engine needs to know whether a specific value is currently in that shape's subquery view, and the answer has to be precise. Both questions hit the same underlying data: the set of values each shape's subquery currently matches.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the answer can be approximate

approximate sounds unreliable. perhaps "conservative" might work better

Comment on lines +245 to +247
Splitting these into two indexes would be the obvious move. The reason we don't is consistency. Every dependency change updates both — when a value enters a subquery's view, the routing entries that include the value and the exact-membership entries that confirm it are written together. Splitting the index would force coordination across two writes, and a window where the routing layer says "this shape cares" while the membership layer says "this shape doesn't include the value" is a window for silent inconsistency.

A single ETS write per value avoids that. Routing and membership see the same world.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Splitting these into two indexes"

  • they're not two indexes, and they are separate, the former is an index and the latter is a materialized view in the consumer.

<!-- ASSET: Rob's annotated SQL or diagram showing the generated move-in query
for a concrete example -->

### Move-out handling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also mention that move-ins become move-outs and visa versa with x NOT IN (subquery) or NOT(x=3 OR y IN subquery)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants