Skip to content

Large query handling for SQLGraph#281

Open
yfukai wants to merge 2 commits intoroyerlab:mainfrom
yfukai:large_query
Open

Large query handling for SQLGraph#281
yfukai wants to merge 2 commits intoroyerlab:mainfrom
yfukai:large_query

Conversation

@yfukai
Copy link
Copy Markdown
Contributor

@yfukai yfukai commented Apr 14, 2026

This pull request introduces a robust mechanism to handle large lists of IDs in SQL queries, preventing SQL variable overflow errors by dynamically switching between inline IN clauses and temporary scratch tables. It adds a new _SqlIdSet helper class to encapsulate this logic, updates all relevant filtering and degree calculation code paths to use it, and provides comprehensive tests to ensure correctness—especially around edge cases where the number of IDs approaches backend-imposed limits.

Key changes include:

Core SQL handling improvements:

  • Introduced the _SqlIdSet class, which automatically decides whether to use an inline IN clause or a temporary scratch table based on the number of IDs and the number of times they are used in a query, preventing SQL variable overflow (OperationalError: too many SQL variables). Scratch tables are created and cleaned up as needed, with automatic resource management using weakref.finalize.
  • Updated all filtering logic in SQLGraph.filter, overlaps, and _get_degree to use _SqlIdSet, ensuring consistent handling of large ID sets across all query paths.

Testing and validation:

  • Added new tests in test_subgraph.py that create graphs with enough nodes to trigger the scratch-table code path, verify correct filtering and degree calculations, and ensure the cutoff logic accounts for the number of ID occurrences per query. This includes edge cases near the cutoff boundary.

Internal utilities and cleanup:

  • Added helper functions for dropping scratch tables and closing _SqlIdSet instances, with error handling and logging for robustness during interpreter shutdown or unexpected errors.

These changes make the SQL backend resilient to large filter operations and improve maintainability by centralizing the handling of SQL variable limits.

@yfukai yfukai changed the title Large query Large query handling for SQLGraph Apr 14, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.78082% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.69%. Comparing base (6ea4fbf) to head (c43ac58).

Files with missing lines Patch % Lines
src/tracksdata/graph/_sql_graph.py 91.78% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
+ Coverage   87.62%   87.69%   +0.06%     
==========================================
  Files          57       57              
  Lines        4865     4924      +59     
  Branches      858      864       +6     
==========================================
+ Hits         4263     4318      +55     
- Misses        380      384       +4     
  Partials      222      222              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants