⚡ Optimize Graph Entity Similarity Search#494
Conversation
DESCRIPTION: Optimized finding similar graph entities by using recursive CTEs to replace a Python-side N+1 query. IMPACT: Reduced execution time for finding similar entities from ~1.03s to ~0.02s for 1000 candidate entities. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughWalkthroughThe PR refactors ChangesGraph manager query flow
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ippoc/mnemosyne/graph/manager.py (1)
61-104: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winInitialize the database before the write path opens a session.
add_triplestill skipsawait self.init_db(), so a freshGraphManagercan fail withno such table: kg_entitiesunless every caller remembers to bootstrap first.🔧 Proposed fix
async def add_triple( self, source: str, relation: str, target: str, source_type="Concept", target_type="Concept", ): + await self.init_db() """ Adds (Source) -> [Relation] -> (Target) to the graph. Idempotent (get_or_create). """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ippoc/mnemosyne/graph/manager.py` around lines 61 - 104, The write path in add_triple still opens a session without ensuring the schema exists, so a fresh GraphManager can hit missing-table errors. Update GraphManager.add_triple to initialize the database before any session work by awaiting self.init_db() at the start of the method, and keep the rest of the get_or_create / Relation insert logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/ippoc/mnemosyne/graph/manager.py`:
- Around line 61-104: The write path in add_triple still opens a session without
ensuring the schema exists, so a fresh GraphManager can hit missing-table
errors. Update GraphManager.add_triple to initialize the database before any
session work by awaiting self.init_db() at the start of the method, and keep the
rest of the get_or_create / Relation insert logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e8c0c992-7dca-4c8c-b125-a1a2d7fbc64f
📒 Files selected for processing (2)
.jules/bolt.mdsrc/ippoc/mnemosyne/graph/manager.py
💡 What: Refactored
find_similar_entitiesin GraphManager to use a CTE instead of iterating through all other entities and executing individual SQL queries.🎯 Why: The previous implementation suffered from an N+1 query problem, fetching relations individually for every entity to compare, significantly slowing down relationship discovery as the graph grows.
📊 Measured Improvement: Execution time on a benchmark with 1000 entities dropped from ~1.03s to ~0.02s.
PR created automatically by Jules for task 17118169448170855412 started by @Theory903
Summary by CodeRabbit
Documentation
Refactor