Large query handling for SQLGraph by yfukai · Pull Request #281 · royerlab/tracksdata

yfukai · 2026-04-14T03:22:25Z

This pull request introduces a robust mechanism to handle large lists of IDs in SQL queries, preventing SQL variable overflow errors by dynamically switching between inline IN clauses and temporary scratch tables. It adds a new _SqlIdSet helper class to encapsulate this logic, updates all relevant filtering and degree calculation code paths to use it, and provides comprehensive tests to ensure correctness—especially around edge cases where the number of IDs approaches backend-imposed limits.

Key changes include:

Core SQL handling improvements:

Introduced the _SqlIdSet class, which automatically decides whether to use an inline IN clause or a temporary scratch table based on the number of IDs and the number of times they are used in a query, preventing SQL variable overflow (OperationalError: too many SQL variables). Scratch tables are created and cleaned up as needed, with automatic resource management using weakref.finalize.
Updated all filtering logic in SQLGraph.filter, overlaps, and _get_degree to use _SqlIdSet, ensuring consistent handling of large ID sets across all query paths.

Testing and validation:

Added new tests in test_subgraph.py that create graphs with enough nodes to trigger the scratch-table code path, verify correct filtering and degree calculations, and ensure the cutoff logic accounts for the number of ID occurrences per query. This includes edge cases near the cutoff boundary.

Internal utilities and cleanup:

Added helper functions for dropping scratch tables and closing _SqlIdSet instances, with error handling and logging for robustness during interpreter shutdown or unexpected errors.

These changes make the SQL backend resilient to large filter operations and improve maintainability by centralizing the handling of SQL variable limits.

codecov-commenter · 2026-04-14T03:26:17Z

Codecov Report

❌ Patch coverage is 91.78082% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.69%. Comparing base (6ea4fbf) to head (c43ac58).

Files with missing lines	Patch %	Lines
src/tracksdata/graph/_sql_graph.py	91.78%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
+ Coverage   87.62%   87.69%   +0.06%     
==========================================
  Files          57       57              
  Lines        4865     4924      +59     
  Branches      858      864       +6     
==========================================
+ Hits         4263     4318      +55     
- Misses        380      384       +4     
  Partials      222      222

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yfukai added 2 commits April 14, 2026 12:09

working version

070e20d

further fix

c43ac58

yfukai changed the title ~~Large query~~ Large query handling for SQLGraph Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large query handling for SQLGraph#281

Large query handling for SQLGraph#281
yfukai wants to merge 2 commits intoroyerlab:mainfrom
yfukai:large_query

yfukai commented Apr 14, 2026

Uh oh!

codecov-commenter commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yfukai commented Apr 14, 2026

Uh oh!

codecov-commenter commented Apr 14, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants