Code of Conduct
AI Policy
Is your feature request related to a problem? Please describe.
When a no_attributes? true relationship filter combines multiple parent() references with or, the generated SQL joins the parent relationships with LEFT OUTER JOINs before the final LATERAL join. This creates a cartesian product — if one parent relationship has N rows and another has M rows, the intermediate result has N×M rows, and the lateral join executes N×M times.
defmodule Post do
relationships do
has_many :tags_a, Tag do
no_attributes? true
filter expr(parent(author.favorite_tags.id) == id)
end
has_many :tags_b, Tag do
no_attributes? true
filter expr(parent(category.default_tags.id) == id)
end
# Combine both into one relationship
# When this relationship has any sort (explicit or via default_sort),
# the lateral subquery includes row_number() AS __order__,
# which breaks DISTINCT deduplication.
has_many :all_tags, Tag do
no_attributes? true
sort inserted_at: :asc # <-- any sort triggers the correctness bug
filter expr(
parent(tags_a.id) == id or
parent(tags_b.id) == id
)
end
end
end
The generated SQL looks roughly like:
SELECT DISTINCT post.id, lateral_result.*, lateral_result.__order__
FROM post
LEFT JOIN favorite_tag ON ... -- tags_a path (N rows)
LEFT JOIN LATERAL (...) AS resolved_a ON TRUE
LEFT JOIN default_tag ON ... -- tags_b path (M rows)
LEFT JOIN LATERAL (...) AS resolved_b ON TRUE
INNER JOIN LATERAL (
SELECT *, row_number() OVER (ORDER BY ...) AS __order__
FROM tag
WHERE resolved_a.id = tag.id OR resolved_b.id = tag.id
) AS lateral_result ON TRUE
There are two issues:
-
Cartesian product (performance) — The LEFT JOINs for tags_a (N rows) and tags_b (M rows) multiply, creating N×M intermediate rows. The final lateral join executes N×M times instead of once. This happens regardless of whether a sort exists.
-
Duplicate results when sort exists (correctness) — Ash applies SELECT DISTINCT to deduplicate the cartesian product. Without any sort, this works correctly — the same record always has identical column values across lateral executions, so DISTINCT removes all duplicates.
However, when the relationship has any sort (explicit sort in the relationship definition, default_sort from an extension, or sort applied via read_action), the lateral subquery includes row_number() OVER (ORDER BY ...) AS __order__ to preserve ordering, and this column is included in the DISTINCT evaluation.
Since each lateral execution operates on a different subset of matching tags (depending on which cartesian product row it sees), the same tag can receive different __order__ values across executions, making DISTINCT unable to deduplicate.
Concrete example:
tags_a matches: [tag_1(inserted_at=10:00), tag_2(inserted_at=12:00)]
tags_b matches: [tag_3(inserted_at=11:00)]
Lateral execution 1 (resolved_a=tag_1):
matching set = {tag_1(10:00), tag_3(11:00)}
→ tag_3 gets __order__ = 2
Lateral execution 2 (resolved_a=tag_2):
matching set = {tag_3(11:00), tag_2(12:00)}
→ tag_3 gets __order__ = 1
DISTINCT sees (tag_3, __order__=2) and (tag_3, __order__=1) as different rows.
Result: tag_3 appears twice.
Note: if the sort column values happen to be monotonically ordered across sources (e.g., all tags_a timestamps < all tags_b timestamps), __order__ stays consistent and DISTINCT works correctly. This makes the bug hard to reproduce in tests (where records are created sequentially with monotonic timestamps) but easy to trigger in production (where records from different sources have interleaved sort values).
Describe the solution you'd like
Approach 1: Automatic optimization (no syntax change)
When Ash encounters multiple parent(rel.id) == id conditions combined with or in a no_attributes? true relationship filter, it should automatically generate EXISTS subqueries instead of lateral joins. This requires no API change — the current expression:
filter expr(
parent(tags_a.id) == id or
parent(tags_b.id) == id
)
-- Current: lateral joins → cartesian product
FROM tag
JOIN LATERAL (...tags_a...) ON true
JOIN LATERAL (...tags_b...) ON true
WHERE tags_a.id = tag.id OR tags_b.id = tag.id
-- Proposed: EXISTS subqueries → no cartesian product
FROM tag
WHERE EXISTS (SELECT 1 FROM ...tags_a... WHERE tags_a.id = tag.id)
OR EXISTS (SELECT 1 FROM ...tags_b... WHERE tags_b.id = tag.id)
Approach 2: Explicit parent_exists expression
If automatic detection is too complex, a new expression could let users opt in explicitly:
filter expr(
parent_exists(:tags_a, id == parent(id)) or
parent_exists(:tags_b, id == parent(id))
)
Where parent_exists(rel, condition) means: "on the parent resource, check if rel has a record matching condition." Inside the condition, id refers to the relationship target's field, and parent(id) refers back to the current child resource's field.
This is analogous to exists/2 but operates on the parent's relationships from within a no_attributes? true filter context.
Describe alternatives you've considered
- Implementing a manual relationship that loads and unions the constituent relationships instead of relying on a single complex filter expression.
defmodule MyApp.Relationships.UnionMany do
use Ash.Resource.ManualRelationship
@impl true
def load(records, opts, ctx) do
relationships = opts |> Keyword.fetch!(:relationships)
ash = to_opts(ctx)
records =
records
|> Ash.load!(relationships, [lazy?: true] ++ ash)
{:ok,
Map.new(records, fn record ->
items =
relationships
|> Enum.flat_map(&(record |> Map.get(&1) |> List.wrap()))
|> Enum.uniq_by(& &1.id)
{record.id, items}
end)}
end
end
# Usage
has_many :all_tags, Tag do
manual {MyApp.Relationships.UnionMany, relationships: [:tags_a, :tags_b]}
end
-
Rewriting the filter using child-side FKs — Instead of parent(rel.id) == id, use child_fk == parent(field) and column == :value to avoid the extra joins. This works when the child has direct FKs but doesn't generalize to cases where 2+ parent() joins are unavoidable.
-
Using a calculation instead of a relationship — Loading the sub-relationships individually and concatenating. This works but loses the ability to further load nested relationships on the result.
Additional context
No response
Code of Conduct
AI Policy
Is your feature request related to a problem? Please describe.
When a no_attributes? true
relationshipfilter combines multipleparent()references withor, the generated SQL joins the parent relationships with LEFT OUTER JOINs before the final LATERAL join. This creates a cartesian product — if one parent relationship has N rows and another has M rows, the intermediate result has N×M rows, and the lateral join executes N×M times.The generated SQL looks roughly like:
There are two issues:
Cartesian product (performance) — The LEFT JOINs for
tags_a(N rows) andtags_b(M rows) multiply, creating N×M intermediate rows. The final lateral join executes N×M times instead of once. This happens regardless of whether a sort exists.Duplicate results when sort exists (correctness) — Ash applies
SELECT DISTINCTto deduplicate the cartesian product. Without any sort, this works correctly — the same record always has identical column values across lateral executions, soDISTINCTremoves all duplicates.However, when the relationship has any sort (explicit sort in the relationship definition,
default_sortfrom an extension, or sort applied viaread_action), the lateral subquery includesrow_number() OVER (ORDER BY ...) AS __order__to preserve ordering, and this column is included in theDISTINCTevaluation.Since each lateral execution operates on a different subset of matching tags (depending on which cartesian product row it sees), the same tag can receive different
__order__values across executions, makingDISTINCTunable to deduplicate.Concrete example:
Note: if the sort column values happen to be monotonically ordered across sources (e.g., all
tags_atimestamps < alltags_btimestamps),__order__stays consistent and DISTINCT works correctly. This makes the bug hard to reproduce in tests (where records are created sequentially with monotonic timestamps) but easy to trigger in production (where records from different sources have interleaved sort values).Describe the solution you'd like
Approach 1: Automatic optimization (no syntax change)
When Ash encounters multiple
parent(rel.id) == idconditions combined with or in ano_attributes? truerelationship filter, it should automatically generateEXISTSsubqueries instead of lateral joins. This requires no API change — the current expression:Approach 2: Explicit parent_exists expression
If automatic detection is too complex, a new expression could let users opt in explicitly:
Where
parent_exists(rel, condition)means: "on the parent resource, check ifrelhas a record matching condition." Inside the condition,idrefers to the relationship target's field, andparent(id)refers back to the current child resource's field.This is analogous to
exists/2but operates on the parent's relationships from within ano_attributes? truefilter context.Describe alternatives you've considered
Rewriting the filter using child-side FKs — Instead of
parent(rel.id) == id, usechild_fk == parent(field)andcolumn == :valueto avoid the extra joins. This works when the child has direct FKs but doesn't generalize to cases where 2+parent()joins are unavoidable.Using a calculation instead of a relationship — Loading the sub-relationships individually and concatenating. This works but loses the ability to further load nested relationships on the result.
Additional context
No response