Omni lite#1875
Conversation
- Delete sql_create_temp_table_parser.py - Remove add_temp_schema_to_graph from queries.py, queries_parser.py, and sql_create_table_parser.py - Remove TEMP_TABLE/TEMP_COLUMN labels from model/query.py, queries_comparison.py, queries_dal.py, and sql_select_parser.py - Fix shared.graph import paths to use nemo_retriever.tabular_data.ingestion.graph throughout - Replace flatten_list with flat_list_recursive from utils.py - Add sqloxide dependency to pyproject.toml - Move populate_queries/populate_views to services/queries.py; update write_to_graph.py imports Made-with: Cursor
…s constants; add parser tests - Fix shared.graph import paths in sql_select_parser, sql_insert_into_parser, sql_update_table_parser, sql_merge_table_parser, and sql_view_parser - Add SQLType, SQL, Parser, Views, Labels.ALIAS/OPERATOR/COMMAND/FUNCTION/CONSTANT, ArgsForSQLFunctions, SQLFunctions, JoinNodes, SQLFunctionsWithConsantArg stubs to reserved_words.py - Add test_queries_parser.py: 25 unit tests covering pre_process, dispatch_sqls routing, and parse_single (dialect fallback, error cases, schema propagation) Made-with: Cursor
…queries - Simplify expand_info Cypher: drop document, zone, and user_participants filtering from n:term, n:attribute, n:analysis, and n:metric cases - Remove certified_for_zones and zones fields from returned sql objects - Remove user_participants from query params dict - Fix imports in CandidateRetrievalAgent and inline get_question_for_processing - Remove account_id and user_participants from CandidateRetrievalAgent.run and extract_candidates call Made-with: Cursor
…lot extractor - Delete old graph/parsers/sql/ per-statement parsers (select, insert, update, merge, create-table, view) and replace with a single sqlglot_extractor.py that resolves tables/columns for any SQL dialect without a full AST - Rewrite services/queries.py: parse_query_slim + chunked populate_queries; remove populate_views, filter_and_sort_queries_df and related dead code - Move parsers/sql/ and sqlglot_extractor.py out of graph/ sub-package into the canonical ingestion/parsers/ directory; delete empty graph/ folder - Fix all .graph.* imports across dal/, model/, services/, parsers/ - Add services/schema.get_account_schemas(); fix Node→Neo4jNode throughout - Remove dead queries_comparison.py and test_queries_parser.py Made-with: Cursor
…and tests
- Replace sqllineage with pure sqlglot for column attribution; remove
sqllineage dependency from pyproject.toml
- Fix join_keys extraction: use select.args["from_"] (not "from") and
fall back to ON-equality detection when USING is converted at parse time
- Support schema-qualified tables (schema.table): qualified keys in
source_table_names and alias_map keep same-named tables from different
schemas distinct; _build_schema_dict stays flat for qualify() compat
- Remove lazy get_account_schemas() fallback; all_schemas is always
caller-provided (default {})
- Add test_sqlglot_extractor.py covering the two duckdb_connector queries
(RFM segmentation, CLV) plus edge-case tests, all without Neo4j
Made-with: Cursor
- Remove dead functions from queries_dal.py (get_sql_by_id, etc.) - Delete unused reserved_words classes (Views, SQL, RelTypes, SQLType, Parser) - Remove utterance and default_schema params from Query - Simplify parse_queries_df to return failed_queries instead of taking it as input - Optimise table_to_schema lookup in sqlglot_extractor (use table_exists, two-pass strategy) - Rename get_account_schemas → get_schemas; drop multi-DB loop (single DB only) - Delete dead sql/queries_parser.py and its parsers/sql package - Update test mock to implement table_exists Made-with: Cursor
- Remove sql_node, utterance, default_schema, query_tag params from Query - Drop reached_columns_ids field and its getter/setter (unused) - Add parse_query_single for parsing a bare SQL string - parse_queries_df returns failed_queries instead of taking it as input Made-with: Cursor
Fixes flake8 F401; tests do not use pytest directly. Made-with: Cursor
Drop the extra top-level definition; the same implementation remains later in the module. Fixes flake8 F811. Made-with: Cursor
data_for_populate_tabular includes a queries DataFrame; extend EXPECTED_DATA_KEYS and assertions. Document extract_tabular_db_data return keys. Made-with: Cursor
…a extract - Add dialect attribute (default "sqlite") to SQLDatabase ABC so every connector advertises its SQL dialect. - DuckDB connector already sets self.dialect = "duckdb". - Fix TabularSchemaExtractOp.process() to read dialect from the data argument instead of the undefined tabular_params name. - Pass dialect through extract_data so store_relational_db_in_neo4j receives the correct value. Made-with: Cursor
Pass dialect=duckdb to match store_relational_db_in_neo4j signature; include queries in dummy_data for extract payload shape. Made-with: Cursor
…DuckDB dev connector Remove class-level dialect default and dialect kwarg from SQLDatabase __init__. Expose dialect as a property; DuckDB connector returns "duckdb". Made-with: Cursor
Made-with: Cursor
The retriever instance (and its embedding model weights) was being created from scratch on every call to search_lancedb_semantic_index. Cache it at module level keyed by (uri, table_name) so weights load once. Made-with: Cursor
Made-with: Cursor # Conflicts: # nemo_retriever/pyproject.toml # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/dal/queries_dal.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/embeddings.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/model/query.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/services/queries.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/write_to_graph.py # nemo_retriever/src/nemo_retriever/tabular_data/sql_database.py # nemo_retriever/tabular-dev-tools/duckdb_connector.py # nemo_retriever/tests/test_tabular_pipeline.py # nemo_retriever/uv.lock
Made-with: Cursor
Greptile SummaryThis PR introduces the "Omni lite" text-to-SQL pipeline: a LangGraph-based multi-agent system that translates natural language questions into SQL, executes them via an injected connector, and formats the result. The implementation replaces prior AGENTS.md/SKILL.md scaffolding with concrete Python agent classes built on a new There are three P0/P1 defects that should be fixed before merging:
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/sql_execution.py | New SQL execution agent — _run_sql implicitly returns None when no connector is provided, causing an AttributeError crash on every invocation without a connector. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/llm_invoke.py | New LLM invocation helpers — off-by-one in retry guard means safe_invoke_with_structured_output never raises on exhausted retries, silently returning None. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py | Large utility module — module-level conn = get_neo4j_conn() establishes a Neo4j connection at import time, violating lazy-loading policy; also has unvalidated f-string interpolation in Cypher queries. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/main.py | Entry point — _make_llm(), create_graph(), and graph.compile() all execute at module scope, making import fail hard if credentials are missing. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/text_to_sql_graph.py | LangGraph definition — routing function route_sql_validation mutates shared state, which is unsafe in LangGraph's replay/checkpoint model; otherwise graph structure is clean. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/base.py | New BaseAgent and agent_wrapper — well-structured with good error handling and logging; missing SPDX header. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/state.py | State TypedDicts — clean definitions; contains commented-out code with a stale TODO that should be removed. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/retrieval_override.py | LanceDB label-filter extension of Retriever — correctly handles SQL escaping via _sql_in_literals; has SPDX header. |
| nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/candidates_preparation.py | Candidate preparation agent — solid implementation; missing SPDX header like other agent files. |
| nemo_retriever/src/nemo_retriever/tabular_data/ingestion/embeddings.py | New Neo4j→embedding dataframe helper — clean, has SPDX header, queries are read-only with no injection risk. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[entities_extraction] --> B[retrieve_candidates]
B --> C[prepare_candidates]
C --> D[construct_sql_from_candidates]
D -->|constructable| E[validate_sql_query]
D -->|unconstructable| K[unconstructable_sql_response]
E -->|valid_sql| F[validate_intent]
E -->|skip_intent_validation| G[execute_sql_query]
E -->|invalid_sql| H[reconstruct_sql]
E -->|fallback| I[construct_sql_not_from_snippets]
E -->|unconstructable| K
F -->|valid_sql| G
F -->|invalid_sql| H
G -->|valid_sql| J[format_and_respond]
G -->|invalid_sql| H
H --> E
I --> E
J --> END([END])
K --> END
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/sql_execution.py
Line: 24-34
Comment:
**`_run_sql` returns `None` when no connector is provided**
When `connector` is `None`, the function falls off the end and implicitly returns `None`. The caller unconditionally accesses `response_from_db.error` on the return value, causing `AttributeError: 'NoneType' object has no attribute 'error'` whenever the graph is invoked without a connector.
```python
def _run_sql(sql: str, state: AgentState) -> QueryResponse:
"""Execute SQL via the injected ``connector``."""
connector = state.get("connector")
if connector is not None:
try:
df = connector.execute(sql)
except Exception as e:
logger.exception("SQL execution failed (injected connector)")
return QueryResponse(result=None, sliced=False, error=str(e))
payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]"
return QueryResponse(result=[payload], sliced=False, error=None)
return QueryResponse(result=None, sliced=False, error="No database connector provided")
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/llm_invoke.py
Line: 29-44
Comment:
**Off-by-one makes the final-failure `raise` unreachable**
The loop runs `for attempt in range(RETRY_MAX_ATTEMPTS)` (values 0, 1, 2). The guard `if attempt < RETRY_MAX_ATTEMPTS` is always `True` (max loop value is 2, which is always `< 3`), so the `else: raise` branch is never entered. On the last attempt's `ValidationError`, the error message is appended to `current_messages`, the loop exits cleanly, and `safe_invoke_with_structured_output` returns `None` implicitly instead of raising — silently swallowing the failure.
```python
except ValidationError as e:
if attempt < RETRY_MAX_ATTEMPTS - 1:
current_messages.append(
SystemMessage(
content=(
"Your previous output did not validate. "
f"Validation errors:\n{str(e)}\n"
"Please return a **fully valid** object that satisfies the schema. "
"Do not omit required fields. Do not include extra keys."
)
)
)
else:
logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}")
raise # If still failing after max tries, raise
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py
Line: 34
Comment:
**Module-level Neo4j connection established at import time**
`conn = get_neo4j_conn()` executes at module scope, so importing this module (or anything that transitively imports it) immediately opens a Neo4j connection. This violates the lazy-loading policy, breaks unit tests that run without Neo4j, and makes any import failure (missing env vars, unreachable host) a hard crash rather than a deferred error.
Move `conn` into each function that needs it, or use a module-level cached accessor:
```python
_conn = None
def _get_conn():
global _conn
if _conn is None:
_conn = get_neo4j_conn()
return _conn
```
Then replace all bare `conn.` calls with `_get_conn().`.
**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/main.py
Line: 16-23
Comment:
**LLM client and graph compiled at import time**
`_make_llm()`, `create_graph()`, and `graph.compile()` all run at module scope. Importing `main` immediately makes LLM API calls (or at least validates credentials) and allocates graph objects. This violates the lazy-initialization policy and means any missing env vars (`LLM_INVOKE_URL`, `LLM_API_KEY`) produce a hard failure on import, not on first use.
Wrap these in a lazy accessor or a factory function that callers invoke explicitly, similar to the pattern already used in `utils.py` for `_make_llm`.
**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/text_to_sql_graph.py
Line: 42-53
Comment:
**Routing function mutates shared state**
`route_sql_validation` writes to `state["path_state"]["sql_attempts"]` in-place. LangGraph router functions are expected to be pure read-only predicates; side effects here can cause double-counting if the router is replayed (e.g., due to checkpointing or streaming). Move the counter increment into the agent's `execute` method (e.g., `SQLReconstructionAgent`) and make the router only read `sql_attempts` for its decision.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py
Line: 116-126
Comment:
**Property name interpolated directly into Cypher query**
`get_stored_usage_percentiles` embeds `percentiles_type_name` directly into the Cypher string via an f-string. Cypher does not support parameterized property names, so this cannot be fully parameterized, but the caller must ensure only trusted constants are passed. Consider asserting/validating `percentiles_type_name` against an allowlist of known constants (`QUERIES_USAGE_PERCENTILE`, `TABLES_USAGE_PERCENTILE`, `COLUMNS_USAGE_PERCENTILE`) before interpolation to make the constraint explicit.
```python
_ALLOWED_PERCENTILE_NAMES = {QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE}
def get_stored_usage_percentiles(percentiles_type_name: str):
if percentiles_type_name not in _ALLOWED_PERCENTILE_NAMES:
raise ValueError(f"Unknown percentile type: {percentiles_type_name!r}")
...
```
**Rule Used:** Always use parameterized queries when interacting ... ([source](https://app.greptile.com/review/custom-context?memory=no-sql-injection))
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/state.py
Line: 45-47
Comment:
**Commented-out logic with stale TODO**
The commented-out `normalized_question` path and TODO have been superseded by the current implementation. The comment contradicts the behaviour of the function (it now reads `path_state["initial_question"]`, not `initial_question` from `state`). Remove the dead code and clarify the intent so future readers understand why `path_state["initial_question"]` takes precedence.
How can I resolve this? If you propose a fix, please make it concise.(1 more prompt omitted due to length limits.)
Reviews (1): Last reviewed commit: "regenerate uv.lock after merge from main" | Re-trigger Greptile
| def _run_sql(sql: str, state: AgentState) -> QueryResponse: | ||
| """Execute SQL via the injected ``connector``.""" | ||
| connector = state.get("connector") | ||
| if connector is not None: | ||
| try: | ||
| df = connector.execute(sql) | ||
| except Exception as e: | ||
| logger.exception("SQL execution failed (injected connector)") | ||
| return QueryResponse(result=None, sliced=False, error=str(e)) | ||
| payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]" | ||
| return QueryResponse(result=[payload], sliced=False, error=None) |
There was a problem hiding this comment.
_run_sql returns None when no connector is provided
When connector is None, the function falls off the end and implicitly returns None. The caller unconditionally accesses response_from_db.error on the return value, causing AttributeError: 'NoneType' object has no attribute 'error' whenever the graph is invoked without a connector.
def _run_sql(sql: str, state: AgentState) -> QueryResponse:
"""Execute SQL via the injected ``connector``."""
connector = state.get("connector")
if connector is not None:
try:
df = connector.execute(sql)
except Exception as e:
logger.exception("SQL execution failed (injected connector)")
return QueryResponse(result=None, sliced=False, error=str(e))
payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]"
return QueryResponse(result=[payload], sliced=False, error=None)
return QueryResponse(result=None, sliced=False, error="No database connector provided")Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/sql_execution.py
Line: 24-34
Comment:
**`_run_sql` returns `None` when no connector is provided**
When `connector` is `None`, the function falls off the end and implicitly returns `None`. The caller unconditionally accesses `response_from_db.error` on the return value, causing `AttributeError: 'NoneType' object has no attribute 'error'` whenever the graph is invoked without a connector.
```python
def _run_sql(sql: str, state: AgentState) -> QueryResponse:
"""Execute SQL via the injected ``connector``."""
connector = state.get("connector")
if connector is not None:
try:
df = connector.execute(sql)
except Exception as e:
logger.exception("SQL execution failed (injected connector)")
return QueryResponse(result=None, sliced=False, error=str(e))
payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]"
return QueryResponse(result=[payload], sliced=False, error=None)
return QueryResponse(result=None, sliced=False, error="No database connector provided")
```
How can I resolve this? If you propose a fix, please make it concise.| except ValidationError as e: | ||
| if attempt < RETRY_MAX_ATTEMPTS: | ||
| # Explain what's missing/invalid and ask the model to fix it | ||
| current_messages.append( | ||
| SystemMessage( | ||
| content=( | ||
| "Your previous output did not validate. " | ||
| f"Validation errors:\n{str(e)}\n" | ||
| "Please return a **fully valid** object that satisfies the schema. " | ||
| "Do not omit required fields. Do not include extra keys." | ||
| ) | ||
| ) | ||
| ) | ||
| else: | ||
| logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}") | ||
| raise # If still failing after max tries, raise |
There was a problem hiding this comment.
Off-by-one makes the final-failure
raise unreachable
The loop runs for attempt in range(RETRY_MAX_ATTEMPTS) (values 0, 1, 2). The guard if attempt < RETRY_MAX_ATTEMPTS is always True (max loop value is 2, which is always < 3), so the else: raise branch is never entered. On the last attempt's ValidationError, the error message is appended to current_messages, the loop exits cleanly, and safe_invoke_with_structured_output returns None implicitly instead of raising — silently swallowing the failure.
except ValidationError as e:
if attempt < RETRY_MAX_ATTEMPTS - 1:
current_messages.append(
SystemMessage(
content=(
"Your previous output did not validate. "
f"Validation errors:\n{str(e)}\n"
"Please return a **fully valid** object that satisfies the schema. "
"Do not omit required fields. Do not include extra keys."
)
)
)
else:
logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}")
raise # If still failing after max tries, raisePrompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/llm_invoke.py
Line: 29-44
Comment:
**Off-by-one makes the final-failure `raise` unreachable**
The loop runs `for attempt in range(RETRY_MAX_ATTEMPTS)` (values 0, 1, 2). The guard `if attempt < RETRY_MAX_ATTEMPTS` is always `True` (max loop value is 2, which is always `< 3`), so the `else: raise` branch is never entered. On the last attempt's `ValidationError`, the error message is appended to `current_messages`, the loop exits cleanly, and `safe_invoke_with_structured_output` returns `None` implicitly instead of raising — silently swallowing the failure.
```python
except ValidationError as e:
if attempt < RETRY_MAX_ATTEMPTS - 1:
current_messages.append(
SystemMessage(
content=(
"Your previous output did not validate. "
f"Validation errors:\n{str(e)}\n"
"Please return a **fully valid** object that satisfies the schema. "
"Do not omit required fields. Do not include extra keys."
)
)
)
else:
logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}")
raise # If still failing after max tries, raise
```
How can I resolve this? If you propose a fix, please make it concise.| except ImportError: | ||
| pass | ||
|
|
||
| conn = get_neo4j_conn() |
There was a problem hiding this comment.
Module-level Neo4j connection established at import time
conn = get_neo4j_conn() executes at module scope, so importing this module (or anything that transitively imports it) immediately opens a Neo4j connection. This violates the lazy-loading policy, breaks unit tests that run without Neo4j, and makes any import failure (missing env vars, unreachable host) a hard crash rather than a deferred error.
Move conn into each function that needs it, or use a module-level cached accessor:
_conn = None
def _get_conn():
global _conn
if _conn is None:
_conn = get_neo4j_conn()
return _connThen replace all bare conn. calls with _get_conn()..
Rule Used: Models must be loaded lazily (on first use or expl... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py
Line: 34
Comment:
**Module-level Neo4j connection established at import time**
`conn = get_neo4j_conn()` executes at module scope, so importing this module (or anything that transitively imports it) immediately opens a Neo4j connection. This violates the lazy-loading policy, breaks unit tests that run without Neo4j, and makes any import failure (missing env vars, unreachable host) a hard crash rather than a deferred error.
Move `conn` into each function that needs it, or use a module-level cached accessor:
```python
_conn = None
def _get_conn():
global _conn
if _conn is None:
_conn = get_neo4j_conn()
return _conn
```
Then replace all bare `conn.` calls with `_get_conn().`.
**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))
How can I resolve this? If you propose a fix, please make it concise.| try: | ||
| llm_client = _make_llm() | ||
| except ValueError as e: | ||
| logger.error("Failed to initialize LLM client: %s", e) | ||
| llm_client = None | ||
|
|
||
| graph = create_graph() | ||
| app = graph.compile() |
There was a problem hiding this comment.
LLM client and graph compiled at import time
_make_llm(), create_graph(), and graph.compile() all run at module scope. Importing main immediately makes LLM API calls (or at least validates credentials) and allocates graph objects. This violates the lazy-initialization policy and means any missing env vars (LLM_INVOKE_URL, LLM_API_KEY) produce a hard failure on import, not on first use.
Wrap these in a lazy accessor or a factory function that callers invoke explicitly, similar to the pattern already used in utils.py for _make_llm.
Rule Used: Models must be loaded lazily (on first use or expl... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/main.py
Line: 16-23
Comment:
**LLM client and graph compiled at import time**
`_make_llm()`, `create_graph()`, and `graph.compile()` all run at module scope. Importing `main` immediately makes LLM API calls (or at least validates credentials) and allocates graph objects. This violates the lazy-initialization policy and means any missing env vars (`LLM_INVOKE_URL`, `LLM_API_KEY`) produce a hard failure on import, not on first use.
Wrap these in a lazy accessor or a factory function that callers invoke explicitly, similar to the pattern already used in `utils.py` for `_make_llm`.
**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))
How can I resolve this? If you propose a fix, please make it concise.| if state["decision"] == "invalid_sql": | ||
| attempts = state["path_state"].get("sql_attempts", 0) | ||
| logger.info(f"Construct sql attempt: {attempts}") | ||
| state["path_state"]["sql_attempts"] = attempts + 1 | ||
| if attempts == 4: | ||
| logger.info("Can not construct sql from snippets, try from relevant tables. Fallback.") | ||
| return "fallback" # try constructing from tables, not only snippets | ||
| elif attempts < 8: | ||
| return "invalid_sql" | ||
| elif attempts == 8: | ||
| logger.error("SQL construction failed after 8 attempts") | ||
| return "unconstructable" |
There was a problem hiding this comment.
Routing function mutates shared state
route_sql_validation writes to state["path_state"]["sql_attempts"] in-place. LangGraph router functions are expected to be pure read-only predicates; side effects here can cause double-counting if the router is replayed (e.g., due to checkpointing or streaming). Move the counter increment into the agent's execute method (e.g., SQLReconstructionAgent) and make the router only read sql_attempts for its decision.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/text_to_sql_graph.py
Line: 42-53
Comment:
**Routing function mutates shared state**
`route_sql_validation` writes to `state["path_state"]["sql_attempts"]` in-place. LangGraph router functions are expected to be pure read-only predicates; side effects here can cause double-counting if the router is replayed (e.g., due to checkpointing or streaming). Move the counter increment into the agent's `execute` method (e.g., `SQLReconstructionAgent`) and make the router only read `sql_attempts` for its decision.
How can I resolve this? If you propose a fix, please make it concise.| def get_stored_usage_percentiles(percentiles_type_name: str): | ||
| query = f""" | ||
| MATCH (n:Database) | ||
| RETURN n.{f"{percentiles_type_name}_25"} as usage_percentile_25, | ||
| n.{f"{percentiles_type_name}_75"} as usage_percentile_75 | ||
| """ | ||
| results = conn.query_read( | ||
| query=query, | ||
| parameters={}, | ||
| ) | ||
| return results |
There was a problem hiding this comment.
Property name interpolated directly into Cypher query
get_stored_usage_percentiles embeds percentiles_type_name directly into the Cypher string via an f-string. Cypher does not support parameterized property names, so this cannot be fully parameterized, but the caller must ensure only trusted constants are passed. Consider asserting/validating percentiles_type_name against an allowlist of known constants (QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE) before interpolation to make the constraint explicit.
_ALLOWED_PERCENTILE_NAMES = {QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE}
def get_stored_usage_percentiles(percentiles_type_name: str):
if percentiles_type_name not in _ALLOWED_PERCENTILE_NAMES:
raise ValueError(f"Unknown percentile type: {percentiles_type_name!r}")
...Rule Used: Always use parameterized queries when interacting ... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py
Line: 116-126
Comment:
**Property name interpolated directly into Cypher query**
`get_stored_usage_percentiles` embeds `percentiles_type_name` directly into the Cypher string via an f-string. Cypher does not support parameterized property names, so this cannot be fully parameterized, but the caller must ensure only trusted constants are passed. Consider asserting/validating `percentiles_type_name` against an allowlist of known constants (`QUERIES_USAGE_PERCENTILE`, `TABLES_USAGE_PERCENTILE`, `COLUMNS_USAGE_PERCENTILE`) before interpolation to make the constraint explicit.
```python
_ALLOWED_PERCENTILE_NAMES = {QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE}
def get_stored_usage_percentiles(percentiles_type_name: str):
if percentiles_type_name not in _ALLOWED_PERCENTILE_NAMES:
raise ValueError(f"Unknown percentile type: {percentiles_type_name!r}")
...
```
**Rule Used:** Always use parameterized queries when interacting ... ([source](https://app.greptile.com/review/custom-context?memory=no-sql-injection))
How can I resolve this? If you propose a fix, please make it concise.| # normalized_question = path_state.get("normalized_question") | ||
| # TODO remove normalized question, for question : give me all student from seattle , seattle was removed | ||
| normalized_question = path_state.get("initial_question") |
There was a problem hiding this comment.
Commented-out logic with stale TODO
The commented-out normalized_question path and TODO have been superseded by the current implementation. The comment contradicts the behaviour of the function (it now reads path_state["initial_question"], not initial_question from state). Remove the dead code and clarify the intent so future readers understand why path_state["initial_question"] takes precedence.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/state.py
Line: 45-47
Comment:
**Commented-out logic with stale TODO**
The commented-out `normalized_question` path and TODO have been superseded by the current implementation. The comment contradicts the behaviour of the function (it now reads `path_state["initial_question"]`, not `initial_question` from `state`). Remove the dead code and clarify the intent so future readers understand why `path_state["initial_question"]` takes precedence.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,205 @@ | |||
| """ | |||
There was a problem hiding this comment.
This file (and the other new Python files added in this PR — main.py, utils.py, state.py, models.py, llm_invoke.py, text_to_sql_graph.py, prompts.py, and all agents under agents/) are missing the required SPDX header. The embeddings.py change already has it correctly. Add to the top of each affected file:
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0Rule Used: Python files added in this PR must include the SPD... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/base.py
Line: 1
Comment:
**Missing SPDX license header**
This file (and the other new Python files added in this PR — `main.py`, `utils.py`, `state.py`, `models.py`, `llm_invoke.py`, `text_to_sql_graph.py`, `prompts.py`, and all agents under `agents/`) are missing the required SPDX header. The `embeddings.py` change already has it correctly. Add to the top of each affected file:
```python
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
```
**Rule Used:** Python files added in this PR must include the SPD... ([source](https://app.greptile.com/review/custom-context?memory=spdx-license-header))
How can I resolve this? If you propose a fix, please make it concise.
Description
Checklist