Omni lite by tomer-levin-nv · Pull Request #1875 · NVIDIA/NeMo-Retriever

tomer-levin-nv · 2026-04-20T13:36:57Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

- Delete sql_create_temp_table_parser.py - Remove add_temp_schema_to_graph from queries.py, queries_parser.py, and sql_create_table_parser.py - Remove TEMP_TABLE/TEMP_COLUMN labels from model/query.py, queries_comparison.py, queries_dal.py, and sql_select_parser.py - Fix shared.graph import paths to use nemo_retriever.tabular_data.ingestion.graph throughout - Replace flatten_list with flat_list_recursive from utils.py - Add sqloxide dependency to pyproject.toml - Move populate_queries/populate_views to services/queries.py; update write_to_graph.py imports Made-with: Cursor

…s constants; add parser tests - Fix shared.graph import paths in sql_select_parser, sql_insert_into_parser, sql_update_table_parser, sql_merge_table_parser, and sql_view_parser - Add SQLType, SQL, Parser, Views, Labels.ALIAS/OPERATOR/COMMAND/FUNCTION/CONSTANT, ArgsForSQLFunctions, SQLFunctions, JoinNodes, SQLFunctionsWithConsantArg stubs to reserved_words.py - Add test_queries_parser.py: 25 unit tests covering pre_process, dispatch_sqls routing, and parse_single (dialect fallback, error cases, schema propagation) Made-with: Cursor

…queries - Simplify expand_info Cypher: drop document, zone, and user_participants filtering from n:term, n:attribute, n:analysis, and n:metric cases - Remove certified_for_zones and zones fields from returned sql objects - Remove user_participants from query params dict - Fix imports in CandidateRetrievalAgent and inline get_question_for_processing - Remove account_id and user_participants from CandidateRetrievalAgent.run and extract_candidates call Made-with: Cursor

…lot extractor - Delete old graph/parsers/sql/ per-statement parsers (select, insert, update, merge, create-table, view) and replace with a single sqlglot_extractor.py that resolves tables/columns for any SQL dialect without a full AST - Rewrite services/queries.py: parse_query_slim + chunked populate_queries; remove populate_views, filter_and_sort_queries_df and related dead code - Move parsers/sql/ and sqlglot_extractor.py out of graph/ sub-package into the canonical ingestion/parsers/ directory; delete empty graph/ folder - Fix all .graph.* imports across dal/, model/, services/, parsers/ - Add services/schema.get_account_schemas(); fix Node→Neo4jNode throughout - Remove dead queries_comparison.py and test_queries_parser.py Made-with: Cursor

…and tests - Replace sqllineage with pure sqlglot for column attribution; remove sqllineage dependency from pyproject.toml - Fix join_keys extraction: use select.args["from_"] (not "from") and fall back to ON-equality detection when USING is converted at parse time - Support schema-qualified tables (schema.table): qualified keys in source_table_names and alias_map keep same-named tables from different schemas distinct; _build_schema_dict stays flat for qualify() compat - Remove lazy get_account_schemas() fallback; all_schemas is always caller-provided (default {}) - Add test_sqlglot_extractor.py covering the two duckdb_connector queries (RFM segmentation, CLV) plus edge-case tests, all without Neo4j Made-with: Cursor

- Remove dead functions from queries_dal.py (get_sql_by_id, etc.) - Delete unused reserved_words classes (Views, SQL, RelTypes, SQLType, Parser) - Remove utterance and default_schema params from Query - Simplify parse_queries_df to return failed_queries instead of taking it as input - Optimise table_to_schema lookup in sqlglot_extractor (use table_exists, two-pass strategy) - Rename get_account_schemas → get_schemas; drop multi-DB loop (single DB only) - Delete dead sql/queries_parser.py and its parsers/sql package - Update test mock to implement table_exists Made-with: Cursor

- Remove sql_node, utterance, default_schema, query_tag params from Query - Drop reached_columns_ids field and its getter/setter (unused) - Add parse_query_single for parsing a bare SQL string - parse_queries_df returns failed_queries instead of taking it as input Made-with: Cursor

Fixes flake8 F401; tests do not use pytest directly. Made-with: Cursor

Drop the extra top-level definition; the same implementation remains later in the module. Fixes flake8 F811. Made-with: Cursor

data_for_populate_tabular includes a queries DataFrame; extend EXPECTED_DATA_KEYS and assertions. Document extract_tabular_db_data return keys. Made-with: Cursor

…a extract - Add dialect attribute (default "sqlite") to SQLDatabase ABC so every connector advertises its SQL dialect. - DuckDB connector already sets self.dialect = "duckdb". - Fix TabularSchemaExtractOp.process() to read dialect from the data argument instead of the undefined tabular_params name. - Pass dialect through extract_data so store_relational_db_in_neo4j receives the correct value. Made-with: Cursor

Pass dialect=duckdb to match store_relational_db_in_neo4j signature; include queries in dummy_data for extract payload shape. Made-with: Cursor

…DuckDB dev connector Remove class-level dialect default and dialect kwarg from SQLDatabase __init__. Expose dialect as a property; DuckDB connector returns "duckdb". Made-with: Cursor

Made-with: Cursor

The retriever instance (and its embedding model weights) was being created from scratch on every call to search_lancedb_semantic_index. Cache it at module level keyed by (uri, table_name) so weights load once. Made-with: Cursor

Made-with: Cursor # Conflicts: # nemo_retriever/pyproject.toml # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/dal/queries_dal.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/embeddings.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/model/query.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/services/queries.py # nemo_retriever/src/nemo_retriever/tabular_data/ingestion/write_to_graph.py # nemo_retriever/src/nemo_retriever/tabular_data/sql_database.py # nemo_retriever/tabular-dev-tools/duckdb_connector.py # nemo_retriever/tests/test_tabular_pipeline.py # nemo_retriever/uv.lock

Made-with: Cursor

copy-pr-bot · 2026-04-20T13:37:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-20T13:40:54Z

Greptile Summary

This PR introduces the "Omni lite" text-to-SQL pipeline: a LangGraph-based multi-agent system that translates natural language questions into SQL, executes them via an injected connector, and formats the result. The implementation replaces prior AGENTS.md/SKILL.md scaffolding with concrete Python agent classes built on a new BaseAgent abstraction.

There are three P0/P1 defects that should be fixed before merging:

_run_sql returns None when no connector — causes an AttributeError crash on response_from_db.error in every run without an injected connector (sql_execution.py).
Off-by-one in retry guard — if attempt < RETRY_MAX_ATTEMPTS is always True in the loop, so safe_invoke_with_structured_output never raises on exhausted retries and silently returns None instead (llm_invoke.py).
Module-level side effects — conn = get_neo4j_conn() in utils.py and the _make_llm() / create_graph() / graph.compile() calls in main.py all execute at import time, violating the lazy-initialization policy and hard-failing when env vars are absent.

Confidence Score: 3/5

Not safe to merge — a crash on every no-connector execution, a silent retry swallow, and module-level LLM/DB init need to be addressed first.

Three P1 findings (one borderline P0) affect the primary execution path: null-deref crash, silent failure swallowing, and import-time side effects that break test isolation and deployment. These are present defects in the changed code, not speculative concerns.

sql_execution.py, llm_invoke.py, utils.py, and main.py each contain a confirmed defect on the primary execution path.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/sql_execution.py	New SQL execution agent — `_run_sql` implicitly returns `None` when no connector is provided, causing an `AttributeError` crash on every invocation without a connector.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/llm_invoke.py	New LLM invocation helpers — off-by-one in retry guard means `safe_invoke_with_structured_output` never raises on exhausted retries, silently returning `None`.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py	Large utility module — module-level `conn = get_neo4j_conn()` establishes a Neo4j connection at import time, violating lazy-loading policy; also has unvalidated f-string interpolation in Cypher queries.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/main.py	Entry point — `_make_llm()`, `create_graph()`, and `graph.compile()` all execute at module scope, making import fail hard if credentials are missing.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/text_to_sql_graph.py	LangGraph definition — routing function `route_sql_validation` mutates shared state, which is unsafe in LangGraph's replay/checkpoint model; otherwise graph structure is clean.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/base.py	New `BaseAgent` and `agent_wrapper` — well-structured with good error handling and logging; missing SPDX header.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/state.py	State TypedDicts — clean definitions; contains commented-out code with a stale TODO that should be removed.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/retrieval_override.py	LanceDB label-filter extension of `Retriever` — correctly handles SQL escaping via `_sql_in_literals`; has SPDX header.
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/candidates_preparation.py	Candidate preparation agent — solid implementation; missing SPDX header like other agent files.
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/embeddings.py	New Neo4j→embedding dataframe helper — clean, has SPDX header, queries are read-only with no injection risk.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[entities_extraction] --> B[retrieve_candidates]
    B --> C[prepare_candidates]
    C --> D[construct_sql_from_candidates]
    D -->|constructable| E[validate_sql_query]
    D -->|unconstructable| K[unconstructable_sql_response]
    E -->|valid_sql| F[validate_intent]
    E -->|skip_intent_validation| G[execute_sql_query]
    E -->|invalid_sql| H[reconstruct_sql]
    E -->|fallback| I[construct_sql_not_from_snippets]
    E -->|unconstructable| K
    F -->|valid_sql| G
    F -->|invalid_sql| H
    G -->|valid_sql| J[format_and_respond]
    G -->|invalid_sql| H
    H --> E
    I --> E
    J --> END([END])
    K --> END

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/sql_execution.py
Line: 24-34

Comment:
**`_run_sql` returns `None` when no connector is provided**

When `connector` is `None`, the function falls off the end and implicitly returns `None`. The caller unconditionally accesses `response_from_db.error` on the return value, causing `AttributeError: 'NoneType' object has no attribute 'error'` whenever the graph is invoked without a connector.

```python
def _run_sql(sql: str, state: AgentState) -> QueryResponse:
    """Execute SQL via the injected ``connector``."""
    connector = state.get("connector")
    if connector is not None:
        try:
            df = connector.execute(sql)
        except Exception as e:
            logger.exception("SQL execution failed (injected connector)")
            return QueryResponse(result=None, sliced=False, error=str(e))
        payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]"
        return QueryResponse(result=[payload], sliced=False, error=None)
    return QueryResponse(result=None, sliced=False, error="No database connector provided")
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/llm_invoke.py
Line: 29-44

Comment:
**Off-by-one makes the final-failure `raise` unreachable**

The loop runs `for attempt in range(RETRY_MAX_ATTEMPTS)` (values 0, 1, 2). The guard `if attempt < RETRY_MAX_ATTEMPTS` is always `True` (max loop value is 2, which is always `< 3`), so the `else: raise` branch is never entered. On the last attempt's `ValidationError`, the error message is appended to `current_messages`, the loop exits cleanly, and `safe_invoke_with_structured_output` returns `None` implicitly instead of raising — silently swallowing the failure.

```python
        except ValidationError as e:
            if attempt < RETRY_MAX_ATTEMPTS - 1:
                current_messages.append(
                    SystemMessage(
                        content=(
                            "Your previous output did not validate. "
                            f"Validation errors:\n{str(e)}\n"
                            "Please return a **fully valid** object that satisfies the schema. "
                            "Do not omit required fields. Do not include extra keys."
                        )
                    )
                )
            else:
                logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}")
                raise  # If still failing after max tries, raise
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py
Line: 34

Comment:
**Module-level Neo4j connection established at import time**

`conn = get_neo4j_conn()` executes at module scope, so importing this module (or anything that transitively imports it) immediately opens a Neo4j connection. This violates the lazy-loading policy, breaks unit tests that run without Neo4j, and makes any import failure (missing env vars, unreachable host) a hard crash rather than a deferred error.

Move `conn` into each function that needs it, or use a module-level cached accessor:

```python
_conn = None

def _get_conn():
    global _conn
    if _conn is None:
        _conn = get_neo4j_conn()
    return _conn
```

Then replace all bare `conn.` calls with `_get_conn().`.

**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/main.py
Line: 16-23

Comment:
**LLM client and graph compiled at import time**

`_make_llm()`, `create_graph()`, and `graph.compile()` all run at module scope. Importing `main` immediately makes LLM API calls (or at least validates credentials) and allocates graph objects. This violates the lazy-initialization policy and means any missing env vars (`LLM_INVOKE_URL`, `LLM_API_KEY`) produce a hard failure on import, not on first use.

Wrap these in a lazy accessor or a factory function that callers invoke explicitly, similar to the pattern already used in `utils.py` for `_make_llm`.

**Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/text_to_sql_graph.py
Line: 42-53

Comment:
**Routing function mutates shared state**

`route_sql_validation` writes to `state["path_state"]["sql_attempts"]` in-place. LangGraph router functions are expected to be pure read-only predicates; side effects here can cause double-counting if the router is replayed (e.g., due to checkpointing or streaming). Move the counter increment into the agent's `execute` method (e.g., `SQLReconstructionAgent`) and make the router only read `sql_attempts` for its decision.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py
Line: 116-126

Comment:
**Property name interpolated directly into Cypher query**

`get_stored_usage_percentiles` embeds `percentiles_type_name` directly into the Cypher string via an f-string. Cypher does not support parameterized property names, so this cannot be fully parameterized, but the caller must ensure only trusted constants are passed. Consider asserting/validating `percentiles_type_name` against an allowlist of known constants (`QUERIES_USAGE_PERCENTILE`, `TABLES_USAGE_PERCENTILE`, `COLUMNS_USAGE_PERCENTILE`) before interpolation to make the constraint explicit.

```python
_ALLOWED_PERCENTILE_NAMES = {QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE}

def get_stored_usage_percentiles(percentiles_type_name: str):
    if percentiles_type_name not in _ALLOWED_PERCENTILE_NAMES:
        raise ValueError(f"Unknown percentile type: {percentiles_type_name!r}")
    ...
```

**Rule Used:** Always use parameterized queries when interacting ... ([source](https://app.greptile.com/review/custom-context?memory=no-sql-injection))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/state.py
Line: 45-47

Comment:
**Commented-out logic with stale TODO**

The commented-out `normalized_question` path and TODO have been superseded by the current implementation. The comment contradicts the behaviour of the function (it now reads `path_state["initial_question"]`, not `initial_question` from `state`). Remove the dead code and clarify the intent so future readers understand why `path_state["initial_question"]` takes precedence.

How can I resolve this? If you propose a fix, please make it concise.

(1 more prompt omitted due to length limits.)

_{Reviews (1): Last reviewed commit: "regenerate uv.lock after merge from main" | Re-trigger Greptile}

greptile-apps · 2026-04-20T13:40:57Z

+def _run_sql(sql: str, state: AgentState) -> QueryResponse:
+    """Execute SQL via the injected ``connector``."""
+    connector = state.get("connector")
+    if connector is not None:
+        try:
+            df = connector.execute(sql)
+        except Exception as e:
+            logger.exception("SQL execution failed (injected connector)")
+            return QueryResponse(result=None, sliced=False, error=str(e))
+        payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]"
+        return QueryResponse(result=[payload], sliced=False, error=None)


_run_sql returns None when no connector is provided

When connector is None, the function falls off the end and implicitly returns None. The caller unconditionally accesses response_from_db.error on the return value, causing AttributeError: 'NoneType' object has no attribute 'error' whenever the graph is invoked without a connector.

def _run_sql(sql: str, state: AgentState) -> QueryResponse: """Execute SQL via the injected ``connector``.""" connector = state.get("connector") if connector is not None: try: df = connector.execute(sql) except Exception as e: logger.exception("SQL execution failed (injected connector)") return QueryResponse(result=None, sliced=False, error=str(e)) payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]" return QueryResponse(result=[payload], sliced=False, error=None) return QueryResponse(result=None, sliced=False, error="No database connector provided")

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/agents/sql_execution.py Line: 24-34 Comment: **`_run_sql` returns `None` when no connector is provided** When `connector` is `None`, the function falls off the end and implicitly returns `None`. The caller unconditionally accesses `response_from_db.error` on the return value, causing `AttributeError: 'NoneType' object has no attribute 'error'` whenever the graph is invoked without a connector. ```python def _run_sql(sql: str, state: AgentState) -> QueryResponse: """Execute SQL via the injected ``connector``.""" connector = state.get("connector") if connector is not None: try: df = connector.execute(sql) except Exception as e: logger.exception("SQL execution failed (injected connector)") return QueryResponse(result=None, sliced=False, error=str(e)) payload = df.to_json(orient="records", default_handler=str) if len(df) else "[]" return QueryResponse(result=[payload], sliced=False, error=None) return QueryResponse(result=None, sliced=False, error="No database connector provided") ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:40:58Z

+        except ValidationError as e:
+            if attempt < RETRY_MAX_ATTEMPTS:
+                # Explain what's missing/invalid and ask the model to fix it
+                current_messages.append(
+                    SystemMessage(
+                        content=(
+                            "Your previous output did not validate. "
+                            f"Validation errors:\n{str(e)}\n"
+                            "Please return a **fully valid** object that satisfies the schema. "
+                            "Do not omit required fields. Do not include extra keys."
+                        )
+                    )
+                )
+            else:
+                logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}")
+                raise  # If still failing after max tries, raise


Off-by-one makes the final-failure raise unreachable

The loop runs for attempt in range(RETRY_MAX_ATTEMPTS) (values 0, 1, 2). The guard if attempt < RETRY_MAX_ATTEMPTS is always True (max loop value is 2, which is always < 3), so the else: raise branch is never entered. On the last attempt's ValidationError, the error message is appended to current_messages, the loop exits cleanly, and safe_invoke_with_structured_output returns None implicitly instead of raising — silently swallowing the failure.

except ValidationError as e: if attempt < RETRY_MAX_ATTEMPTS - 1: current_messages.append( SystemMessage( content=( "Your previous output did not validate. " f"Validation errors:\n{str(e)}\n" "Please return a **fully valid** object that satisfies the schema. " "Do not omit required fields. Do not include extra keys." ) ) ) else: logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}") raise # If still failing after max tries, raise

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/llm_invoke.py Line: 29-44 Comment: **Off-by-one makes the final-failure `raise` unreachable** The loop runs `for attempt in range(RETRY_MAX_ATTEMPTS)` (values 0, 1, 2). The guard `if attempt < RETRY_MAX_ATTEMPTS` is always `True` (max loop value is 2, which is always `< 3`), so the `else: raise` branch is never entered. On the last attempt's `ValidationError`, the error message is appended to `current_messages`, the loop exits cleanly, and `safe_invoke_with_structured_output` returns `None` implicitly instead of raising — silently swallowing the failure. ```python except ValidationError as e: if attempt < RETRY_MAX_ATTEMPTS - 1: current_messages.append( SystemMessage( content=( "Your previous output did not validate. " f"Validation errors:\n{str(e)}\n" "Please return a **fully valid** object that satisfies the schema. " "Do not omit required fields. Do not include extra keys." ) ) ) else: logger.error(f"Validation failed after {RETRY_MAX_ATTEMPTS} attempts for {schema_name}") raise # If still failing after max tries, raise ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:40:59Z

+except ImportError:
+    pass
+
+conn = get_neo4j_conn()


Module-level Neo4j connection established at import time

conn = get_neo4j_conn() executes at module scope, so importing this module (or anything that transitively imports it) immediately opens a Neo4j connection. This violates the lazy-loading policy, breaks unit tests that run without Neo4j, and makes any import failure (missing env vars, unreachable host) a hard crash rather than a deferred error.

Move conn into each function that needs it, or use a module-level cached accessor:

_conn = None def _get_conn(): global _conn if _conn is None: _conn = get_neo4j_conn() return _conn

Then replace all bare conn. calls with _get_conn()..

Rule Used: Models must be loaded lazily (on first use or expl... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py Line: 34 Comment: **Module-level Neo4j connection established at import time** `conn = get_neo4j_conn()` executes at module scope, so importing this module (or anything that transitively imports it) immediately opens a Neo4j connection. This violates the lazy-loading policy, breaks unit tests that run without Neo4j, and makes any import failure (missing env vars, unreachable host) a hard crash rather than a deferred error. Move `conn` into each function that needs it, or use a module-level cached accessor: ```python _conn = None def _get_conn(): global _conn if _conn is None: _conn = get_neo4j_conn() return _conn ``` Then replace all bare `conn.` calls with `_get_conn().`. **Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:41:00Z

+try:
+    llm_client = _make_llm()
+except ValueError as e:
+    logger.error("Failed to initialize LLM client: %s", e)
+    llm_client = None
+
+graph = create_graph()
+app = graph.compile()


LLM client and graph compiled at import time

_make_llm(), create_graph(), and graph.compile() all run at module scope. Importing main immediately makes LLM API calls (or at least validates credentials) and allocates graph objects. This violates the lazy-initialization policy and means any missing env vars (LLM_INVOKE_URL, LLM_API_KEY) produce a hard failure on import, not on first use.

Wrap these in a lazy accessor or a factory function that callers invoke explicitly, similar to the pattern already used in utils.py for _make_llm.

Rule Used: Models must be loaded lazily (on first use or expl... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/main.py Line: 16-23 Comment: **LLM client and graph compiled at import time** `_make_llm()`, `create_graph()`, and `graph.compile()` all run at module scope. Importing `main` immediately makes LLM API calls (or at least validates credentials) and allocates graph objects. This violates the lazy-initialization policy and means any missing env vars (`LLM_INVOKE_URL`, `LLM_API_KEY`) produce a hard failure on import, not on first use. Wrap these in a lazy accessor or a factory function that callers invoke explicitly, similar to the pattern already used in `utils.py` for `_make_llm`. **Rule Used:** Models must be loaded lazily (on first use or expl... ([source](https://app.greptile.com/review/custom-context?memory=model-lazy-loading)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:41:02Z

+    if state["decision"] == "invalid_sql":
+        attempts = state["path_state"].get("sql_attempts", 0)
+        logger.info(f"Construct sql attempt: {attempts}")
+        state["path_state"]["sql_attempts"] = attempts + 1
+        if attempts == 4:
+            logger.info("Can not construct sql from snippets, try from relevant tables. Fallback.")
+            return "fallback"  # try constructing from tables, not only snippets
+        elif attempts < 8:
+            return "invalid_sql"
+        elif attempts == 8:
+            logger.error("SQL construction failed after 8 attempts")
+            return "unconstructable"


Routing function mutates shared state

route_sql_validation writes to state["path_state"]["sql_attempts"] in-place. LangGraph router functions are expected to be pure read-only predicates; side effects here can cause double-counting if the router is replayed (e.g., due to checkpointing or streaming). Move the counter increment into the agent's execute method (e.g., SQLReconstructionAgent) and make the router only read sql_attempts for its decision.

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/text_to_sql_graph.py Line: 42-53 Comment: **Routing function mutates shared state** `route_sql_validation` writes to `state["path_state"]["sql_attempts"]` in-place. LangGraph router functions are expected to be pure read-only predicates; side effects here can cause double-counting if the router is replayed (e.g., due to checkpointing or streaming). Move the counter increment into the agent's `execute` method (e.g., `SQLReconstructionAgent`) and make the router only read `sql_attempts` for its decision. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:41:03Z

+def get_stored_usage_percentiles(percentiles_type_name: str):
+    query = f"""
+                MATCH (n:Database)
+                RETURN n.{f"{percentiles_type_name}_25"} as usage_percentile_25,
+                       n.{f"{percentiles_type_name}_75"} as usage_percentile_75
+                """
+    results = conn.query_read(
+        query=query,
+        parameters={},
+    )
+    return results


Property name interpolated directly into Cypher query

get_stored_usage_percentiles embeds percentiles_type_name directly into the Cypher string via an f-string. Cypher does not support parameterized property names, so this cannot be fully parameterized, but the caller must ensure only trusted constants are passed. Consider asserting/validating percentiles_type_name against an allowlist of known constants (QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE) before interpolation to make the constraint explicit.

_ALLOWED_PERCENTILE_NAMES = {QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE} def get_stored_usage_percentiles(percentiles_type_name: str): if percentiles_type_name not in _ALLOWED_PERCENTILE_NAMES: raise ValueError(f"Unknown percentile type: {percentiles_type_name!r}") ...

Rule Used: Always use parameterized queries when interacting ... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/utils.py Line: 116-126 Comment: **Property name interpolated directly into Cypher query** `get_stored_usage_percentiles` embeds `percentiles_type_name` directly into the Cypher string via an f-string. Cypher does not support parameterized property names, so this cannot be fully parameterized, but the caller must ensure only trusted constants are passed. Consider asserting/validating `percentiles_type_name` against an allowlist of known constants (`QUERIES_USAGE_PERCENTILE`, `TABLES_USAGE_PERCENTILE`, `COLUMNS_USAGE_PERCENTILE`) before interpolation to make the constraint explicit. ```python _ALLOWED_PERCENTILE_NAMES = {QUERIES_USAGE_PERCENTILE, TABLES_USAGE_PERCENTILE, COLUMNS_USAGE_PERCENTILE} def get_stored_usage_percentiles(percentiles_type_name: str): if percentiles_type_name not in _ALLOWED_PERCENTILE_NAMES: raise ValueError(f"Unknown percentile type: {percentiles_type_name!r}") ... ``` **Rule Used:** Always use parameterized queries when interacting ... ([source](https://app.greptile.com/review/custom-context?memory=no-sql-injection)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:41:04Z

+    # normalized_question = path_state.get("normalized_question")
+    # TODO remove normalized question, for question : give me all student from seattle , seattle was removed
+    normalized_question = path_state.get("initial_question")


Commented-out logic with stale TODO

The commented-out normalized_question path and TODO have been superseded by the current implementation. The comment contradicts the behaviour of the function (it now reads path_state["initial_question"], not initial_question from state). Remove the dead code and clarify the intent so future readers understand why path_state["initial_question"] takes precedence.

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/state.py Line: 45-47 Comment: **Commented-out logic with stale TODO** The commented-out `normalized_question` path and TODO have been superseded by the current implementation. The comment contradicts the behaviour of the function (it now reads `path_state["initial_question"]`, not `initial_question` from `state`). Remove the dead code and clarify the intent so future readers understand why `path_state["initial_question"]` takes precedence. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-20T13:41:05Z

@@ -0,0 +1,205 @@
+"""


Missing SPDX license header

This file (and the other new Python files added in this PR — main.py, utils.py, state.py, models.py, llm_invoke.py, text_to_sql_graph.py, prompts.py, and all agents under agents/) are missing the required SPDX header. The embeddings.py change already has it correctly. Add to the top of each affected file:

# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. # All rights reserved. # SPDX-License-Identifier: Apache-2.0

Rule Used: Python files added in this PR must include the SPD... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/retrieval/text_to_sql/base.py Line: 1 Comment: **Missing SPDX license header** This file (and the other new Python files added in this PR — `main.py`, `utils.py`, `state.py`, `models.py`, `llm_invoke.py`, `text_to_sql_graph.py`, `prompts.py`, and all agents under `agents/`) are missing the required SPDX header. The `embeddings.py` change already has it correctly. Add to the top of each affected file: ```python # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. # All rights reserved. # SPDX-License-Identifier: Apache-2.0 ``` **Rule Used:** Python files added in this PR must include the SPD... ([source](https://app.greptile.com/review/custom-context?memory=spdx-license-header)) How can I resolve this? If you propose a fix, please make it concise.

polazilber and others added 30 commits April 9, 2026 11:16

omni graph skeleton

eb5fba8

remove language

b417cbf

add functions

b20cb49

continue with candidates

4d84336

continue skeleton

c22d4b8

all agents are in

0c3a6a4

flow fix

32be1d7

add main

b093728

overide retrieval

ed31d72

main + entities extraction

7c5d1fe

candidates retrieval elaboration

3c4ea36

candidates preparation

7745f59

construct sql from semantic elaboration

4401e19

validation tbd

dbe2711

fix imports

3bd5666

fixes, retrieval tbd, insert metadata label in ingestion

f4164e2

retrieval tbd

5495a2f

add columns to lancedb ingestion

9850937

preapare candidates, add columns to tables

2d3cba3

fix execution, full pipeline

a79900f

response from db

644a20b

fixes

bf305c0

Merge remote-tracking branch 'fork/queries_parser' into omni_lite

7e2406a

polazilber and others added 23 commits April 15, 2026 10:38

Remove unused pytest import from test_sqlglot_extractor

4025ee7

Fixes flake8 F401; tests do not use pytest directly. Made-with: Cursor

Remove duplicate chunks helper in ingestion utils

a117dee

Drop the extra top-level definition; the same implementation remains later in the module. Fixes flake8 F811. Made-with: Cursor

Align tabular pipeline tests with queries in extract payload

c7ae22d

data_for_populate_tabular includes a queries DataFrame; extend EXPECTED_DATA_KEYS and assertions. Document extract_tabular_db_data return keys. Made-with: Cursor

Fix store_relational_db_in_neo4j test for required dialect and queries

cc28e19

Pass dialect=duckdb to match store_relational_db_in_neo4j signature; include queries in dummy_data for extract payload shape. Made-with: Cursor

refactor(tabular): use dialect property on SQLDatabase; implement in …

987d2f8

…DuckDB dev connector Remove class-level dialect default and dialect kwarg from SQLDatabase __init__. Expose dialect as a property; DuckDB connector returns "duckdb". Made-with: Cursor

style(tabular): black-format sql_database dialect property

f1768ba

Made-with: Cursor

Merge remote-tracking branch 'origin/main' into omni_lite

b6c7287

merge

c24114d

buigfix

5446bc1

add retry to invoke

cc59990

pr comments

efe2548

pr comments

903e2c2

pr comments

cbfe204

pr comments

cf9b2ed

fix format

b58650a

pr comments

a7bb70f

pr comments

b128613

pr comments

943807c

renaming

ed54644

regenerate uv.lock after merge from main

c166943

Made-with: Cursor

tomer-levin-nv requested review from a team as code owners April 20, 2026 13:36

tomer-levin-nv requested a review from jdye64 April 20, 2026 13:36

greptile-apps Bot reviewed Apr 20, 2026

View reviewed changes

tomer-levin-nv closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omni lite#1875

Omni lite#1875
tomer-levin-nv wants to merge 78 commits intoNVIDIA:mainfrom
ftatiana-nv:omni_lite

tomer-levin-nv commented Apr 20, 2026

Uh oh!

copy-pr-bot Bot commented Apr 20, 2026

Uh oh!

greptile-apps Bot commented Apr 20, 2026

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tomer-levin-nv commented Apr 20, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 20, 2026

Uh oh!

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants