diff --git a/website/blog/posts/2026-03-31-subqueries-deep-dive.md b/website/blog/posts/2026-03-31-subqueries-deep-dive.md new file mode 100644 index 0000000000..303b864916 --- /dev/null +++ b/website/blog/posts/2026-03-31-subqueries-deep-dive.md @@ -0,0 +1,386 @@ +--- +title: 'Expressive subqueries without resyncs — the engineering deep dive' +description: >- + A deep dive into how we rebuilt Electric's subquery support — DNF decomposition, splice-point move handling, reverse-indexed stream routing, and oracle testing for correctness. +excerpt: >- + We shipped a major upgrade to Electric's subquery support. More expressive cross-table filtering, no unnecessary resyncs when data moves in and out of shapes. This post dives into the engineering. +authors: [rob, icehaunter] +image: '/img/blog/subqueries/header.jpg' +tags: [electric, subqueries, sync, engineering] +outline: [2, 3] +post: true +published: true +--- + +[Subqueries](/docs/guides/shapes#subqueries) are a key feature of Electric's [Postgres Sync](/primitives/postgres-sync), enabling cross-table filtering when syncing [subsets of data](/docs/guides/shapes) into your apps. + +With [v1.X](#) we've shipped a major upgrade to subqueries that makes them more expressive and powerful. Supporting more real-world data loading patterns and solving edge cases where data moving into and out of shapes caused unnecessary re-syncing. + +This post dives into the engineering details and shows how we approached the problem with techniques including [DNF decomposition](#arbitrary-expressive-subqueries-via-dnf-decomposition) and [multi-timeline reverse indexes](#replication-stream-routing-with-reverse-indexes). And how we used our [Postgres oracle tests](#oracle-testing-for-correctness) to verify correctness and consistency. + +> [!Warning] ✨  Try it now +> [Read the docs](/docs/guides/shapes#subqueries), the [release notes](#) and the [move-in move-out visualiser](#) demo app. + + + + + + + + + +## Understanding subqueries + +- Electric syncs subsets of Postgres into local apps using shapes — table + + where clause + optional columns +- Real-world apps have relational data; you often need to filter what you sync + based on related tables +- Subqueries solve this: + `WHERE user_id IN (SELECT user_id FROM memberships WHERE org_id = $1)` +- This is how you get "sync the users in my org" or "sync tasks for active + projects" — the bread and butter of app data loading +- When the underlying data changes — a user joins an org, a project gets + archived — rows need to move in and out of shapes dynamically + + + +### Previous limitations + +- Our previous subquery support handled the common case well but was + constrained: single subquery per shape, limited boolean logic +- The really hard problem: when related data changes and rows move in/out, how + do you update the shape incrementally without resending everything? +- You need to know exactly which rows are newly included or excluded, at a + precise point in the replication stream, without race conditions or duplicates +- Real-world apps with complex filtering logic and dynamic relational data were + hitting these limitations, causing unnecessary resyncs and data reloading + + + + +## Arbitrary expressive subqueries via DNF decomposition + + + +- Previously limited to a single subquery per shape with constrained boolean + logic +- Now supports arbitrary combinations: `WHERE x IN sq1 OR y IN sq2`, `AND`, + `NOT IN`, nested expressions +- Show before/after examples of what you can now express + + + +### Why DNF? + +- The core insight: a single subquery move maps cleanly to one "what newly + entered?" query +- With `WHERE x IN sq1 OR y IN sq2`, a move in sq1 should only fetch rows + newly included by sq1 that weren't already present via sq2 +- DNF gives the right planning unit — each disjunct is one independent reason + a row can be in a shape +- A move only affects the disjuncts that reference the changed dependency +- Move-in queries can be scoped to just those disjuncts + + + +### DNF compilation + +- Where clause is normalised to positive DNF: + `(term AND term) OR (term AND term) OR ...` +- Each term is either a plain row predicate or a positive `IN (SELECT ...)` + subquery predicate +- The shape keeps compiled metadata: disjuncts, position count, + dependency-to-position mapping + + + + + +### The NOT problem + +- NOT with subqueries is genuinely hard — negation breaks the clean + disjunct-scoping model + + + + +## Efficient move-in/move-out without resync + + + +- When related data changes, rows move in and out of shapes — a user joins an + org, a project gets archived +- Previously this triggered a full resync — client gets a 409, reloads + everything +- Now Electric computes exactly which rows are newly included or excluded and + streams just those changes + +### The splice model + +- Core idea: buffer replication stream changes while running a precise move-in + query, then splice the results into the stream at exactly the right point + + + + + +### Move-in planning with DNF + +- A move in dependency D with new values V: identify impacted disjuncts, build + a candidate predicate scoped to just those disjuncts, exclude rows already + present via other disjuncts +- The query is narrow — only fetches rows that are genuinely new to the shape + + + + + +### Move-out handling + +- Move-outs are simpler — emit position-aware broadcasts, clients re-evaluate + inclusion locally +- No query needed; the client already has the row and just needs updated + active_conditions + + + +### Tags and active_conditions + +- Rows carry per-disjunct tags and per-position active_conditions booleans +- Clients evaluate inclusion: for each tag, AND the active_conditions at its + positions, OR the results across tags +- Move broadcasts update active_conditions for rows already on the client + without resending the row data + + + + + + +## Replication stream routing with reverse indexes + + + +- When a change arrives from Postgres, Electric needs to figure out which + shapes it's relevant to +- With many shapes using subqueries, naively evaluating every shape's where + clause for every change doesn't scale +- We use a reverse index backed by ETS that maps typed values to shape + handles — a single lookup finds candidate shapes instead of iterating + +### How the reverse index works + + + +- For positive predicates (`x IN sq`): look up the value, get shapes whose + membership contains it +- For negated predicates (`x NOT IN sq`): complement at read time — all + negated shapes minus those containing the value +- Candidates are verified against the full where clause to handle non-subquery + branches + + + +### Consistency during moves + +- During a move-in, the index needs to be broad enough to capture changes + relevant to both pre-splice and post-splice views +- For positive dependencies: store the union of before and after membership +- For negative dependencies: store the intersection +- This may over-route changes, but correctness is enforced downstream by + `convert_change` using the right subquery view for the change's position + relative to the splice boundary + + + + + + +## Oracle testing for correctness + + + +- Incremental view maintenance with splice points and multi-timeline routing + is hard to get right +- Edge cases are combinatorial — boolean logic × move timing × concurrent + changes × multiple dependencies +- Unit tests can't cover the state space; you need a fundamentally different + testing approach + +### Postgres as oracle + +- The core idea: Postgres already knows the right answer — run the full query + and compare +- For any sequence of operations, the oracle runs the equivalent `SELECT` + against the current database state +- Our incremental system must produce exactly the same result set at every + point + + + + + +### What we found + + + + + + +## Next steps + +- Try it now: subqueries work with any where clause — see the + [shapes guide](/docs/guides/shapes#subqueries-experimental) for syntax and + examples +- +- Subquery support ships in Electric vX.X — upgrade and start using richer + cross-table filtering in your shapes +- Works with [TanStack DB 0.6](/blog/2026/03/25/tanstack-db-0.6-app-ready-with-persistence-and-includes)'s + query-driven sync for progressive data loading with relational filtering +- Join the conversation on [Discord](https://discord.electric-sql.com) — we'd + love to hear what data loading patterns you're building + + +*** + + + + + + + + + + + + + + + + + + + + diff --git a/website/data/blog/authors.yaml b/website/data/blog/authors.yaml index 703cca5d39..9369395abc 100644 --- a/website/data/blog/authors.yaml +++ b/website/data/blog/authors.yaml @@ -46,6 +46,12 @@ kevindeporre: image: /img/team/kevin.jpg url: /about/team#kevin +rob: + name: "Rob A'Court" + title: Founding Engineer + image: /img/team/rob.jpg + url: /about/team#rob + tdrz: name: Tudor Zaharia title: Founding Engineer diff --git a/website/docs/guides/shapes.md b/website/docs/guides/shapes.md index e6b053e6b3..f5e68a5739 100644 --- a/website/docs/guides/shapes.md +++ b/website/docs/guides/shapes.md @@ -116,7 +116,7 @@ Where clauses have the following constraints: 1. can't use non-deterministic SQL functions like `count()` or `now()` -#### Subqueries (experimental) +#### Subqueries Electric supports subqueries in where clauses, allowing you to filter rows based on data in other tables. This enables cross-table filtering patterns—for example, syncing only users who belong to a specific organization: diff --git a/website/public/img/blog/subqueries/header.jpg b/website/public/img/blog/subqueries/header.jpg new file mode 100644 index 0000000000..3e7989c8ec Binary files /dev/null and b/website/public/img/blog/subqueries/header.jpg differ