Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
469dcfe
refactor(graph): remove all temp schema/table/column concepts
polazilber Apr 5, 2026
168f442
refactor(graph): fix shared.graph imports + add missing reserved_word…
polazilber Apr 5, 2026
eb5fba8
omni graph skeleton
ftatiana-nv Mar 31, 2026
b417cbf
remove language
ftatiana-nv Mar 31, 2026
b20cb49
add functions
ftatiana-nv Mar 31, 2026
e23b9b9
refactor(omni_lite): remove zone/user/document filtering from Cypher …
polazilber Mar 31, 2026
4d84336
continue with candidates
ftatiana-nv Apr 1, 2026
c22d4b8
continue skeleton
ftatiana-nv Apr 5, 2026
0c3a6a4
all agents are in
ftatiana-nv Apr 5, 2026
32be1d7
flow fix
ftatiana-nv Apr 5, 2026
b093728
add main
ftatiana-nv Apr 5, 2026
ed31d72
overide retrieval
ftatiana-nv Apr 5, 2026
7c5d1fe
main + entities extraction
ftatiana-nv Apr 6, 2026
3c4ea36
candidates retrieval elaboration
ftatiana-nv Apr 6, 2026
7745f59
candidates preparation
ftatiana-nv Apr 6, 2026
4401e19
construct sql from semantic elaboration
ftatiana-nv Apr 6, 2026
dbe2711
validation tbd
ftatiana-nv Apr 6, 2026
3bd5666
fix imports
ftatiana-nv Apr 7, 2026
f4164e2
fixes, retrieval tbd, insert metadata label in ingestion
ftatiana-nv Apr 7, 2026
5495a2f
retrieval tbd
ftatiana-nv Apr 9, 2026
9850937
add columns to lancedb ingestion
ftatiana-nv Apr 9, 2026
d173bbb
refactor(ingestion): replace per-statement SQL parsers with slim sqlg…
polazilber Apr 9, 2026
2d3cba3
preapare candidates, add columns to tables
ftatiana-nv Apr 10, 2026
a79900f
fix execution, full pipeline
ftatiana-nv Apr 12, 2026
644a20b
response from db
ftatiana-nv Apr 12, 2026
bf305c0
fixes
ftatiana-nv Apr 12, 2026
119409f
feat(ingestion): rewrite sqlglot extractor with multi-schema support …
polazilber Apr 12, 2026
d29a1a7
refactor(ingestion): clean up dead code and simplify APIs
polazilber Apr 13, 2026
7e2406a
Merge remote-tracking branch 'fork/queries_parser' into omni_lite
ftatiana-nv Apr 13, 2026
310d15a
refactor(ingestion): simplify Query model and parse_queries_df
polazilber Apr 13, 2026
dbb9c5f
Merge remote-tracking branch 'fork/queries_parser' into omni_lite
ftatiana-nv Apr 13, 2026
3e032cf
chore(deps): add sqlglot>=30.0.0 to dependencies
polazilber Apr 13, 2026
6e8a9b4
Merge remote-tracking branch 'fork/queries_parser' into omni_lite
ftatiana-nv Apr 13, 2026
26afd06
refactor(ingestion): remove multi-DB loop in write_to_graph (single D…
polazilber Apr 13, 2026
ec6d502
refactor(ingestion): move get_db_ids_and_names and get_schemas to deb…
polazilber Apr 13, 2026
7acac3f
feat(ingestion): add parse_query_single for single SQL string parsing
polazilber Apr 13, 2026
e8aa3cf
parser validation
ftatiana-nv Apr 13, 2026
1bb350a
Merge remote-tracking branch 'fork/queries_parser' into omni_lite
ftatiana-nv Apr 13, 2026
30a91bc
feat(deps): add langchain-community and langgraph, fix torch macOS ma…
liavnave Apr 13, 2026
63c133d
labels fix
ftatiana-nv Apr 13, 2026
9120ac4
refactor(graph): remove all temp schema/table/column concepts
polazilber Apr 14, 2026
b8c705e
refactor(graph): fix shared.graph imports + add missing reserved_word…
polazilber Apr 14, 2026
b5059e3
refactor(ingestion): replace per-statement SQL parsers with slim sqlg…
polazilber Apr 14, 2026
008a893
feat(ingestion): rewrite sqlglot extractor with multi-schema support …
polazilber Apr 14, 2026
2d8807d
refactor(ingestion): clean up dead code and simplify APIs
polazilber Apr 14, 2026
e201c0b
refactor(ingestion): simplify Query model and parse_queries_df
polazilber Apr 14, 2026
7d644f9
chore(deps): add sqlglot>=30.0.0 to dependencies
polazilber Apr 14, 2026
6f9fbe9
refactor(ingestion): remove multi-DB loop in write_to_graph (single D…
polazilber Apr 14, 2026
97bc84f
refactor(ingestion): move get_db_ids_and_names and get_schemas to deb…
polazilber Apr 14, 2026
9e23fdd
feat(ingestion): add parse_query_single for single SQL string parsing
polazilber Apr 14, 2026
f989d67
refactor(ingestion): rename Query param q -> sql_text
polazilber Apr 14, 2026
3ceb14b
refactor(dev-tools): load sample queries from CSV; add comment to par…
polazilber Apr 14, 2026
3fe0b87
refactor(ingestion): rename q_timestamp/q_count to sql_timestamp/sql_…
polazilber Apr 14, 2026
57ea9e8
refactor(ingestion): use reserved_words constants in embeddings Cyphe…
polazilber Apr 14, 2026
40dec93
style: apply pre-commit formatting fixes (line length, blank lines, E…
polazilber Apr 14, 2026
4025ee7
Remove unused pytest import from test_sqlglot_extractor
polazilber Apr 14, 2026
a117dee
Remove duplicate chunks helper in ingestion utils
polazilber Apr 14, 2026
c7ae22d
Align tabular pipeline tests with queries in extract payload
polazilber Apr 14, 2026
3e7ecc9
feat(tabular): add dialect to SQLDatabase base class; wire into schem…
tomer-levin-nv Apr 14, 2026
cc28e19
Fix store_relational_db_in_neo4j test for required dialect and queries
polazilber Apr 14, 2026
987d2f8
refactor(tabular): use dialect property on SQLDatabase; implement in …
polazilber Apr 15, 2026
f1768ba
style(tabular): black-format sql_database dialect property
polazilber Apr 15, 2026
b6c7287
Merge remote-tracking branch 'origin/main' into omni_lite
ftatiana-nv Apr 15, 2026
c24114d
merge
ftatiana-nv Apr 15, 2026
5446bc1
buigfix
ftatiana-nv Apr 15, 2026
cc59990
add retry to invoke
ftatiana-nv Apr 15, 2026
efe2548
pr comments
ftatiana-nv Apr 15, 2026
903e2c2
pr comments
ftatiana-nv Apr 15, 2026
cbfe204
pr comments
ftatiana-nv Apr 15, 2026
cf9b2ed
pr comments
ftatiana-nv Apr 15, 2026
b58650a
fix format
ftatiana-nv Apr 16, 2026
a7bb70f
pr comments
ftatiana-nv Apr 16, 2026
b128613
pr comments
ftatiana-nv Apr 16, 2026
943807c
pr comments
ftatiana-nv Apr 16, 2026
ed54644
renaming
ftatiana-nv Apr 16, 2026
6c512bc
cache TextToSqlRetriever to avoid reloading weights per query
liavnave Apr 20, 2026
788d5e8
Merge remote-tracking branch 'origin/main' into omni_lite
liavnave Apr 20, 2026
c166943
regenerate uv.lock after merge from main
liavnave Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions nemo_retriever/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,9 @@ stores = [
"duckdb>=1.2.0",
"duckdb-engine>=0.13.0",
"neo4j>=5.0",
"langchain-nvidia-ai-endpoints>=0.3.0",
"langchain-community>=0.4.0",
"langgraph>=1.1.0",
]

# BEIR benchmarking and evaluation tools (not needed for production use).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

logger = logging.getLogger(__name__)

conn = get_neo4j_conn()


def load_schema_from_graph(
db_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,30 @@ def query_neo4j_tables_for_embedding() -> List[dict]:
return result[0].get("docs") or []


def query_neo4j_columns_for_embedding() -> List[dict]:
"""Return one doc per ``Column`` node for embedding (distinct from table-level rows)."""
neo4j_conn = get_neo4j_conn()
query = f"""
MATCH (d:{Labels.DB})-[:{Edges.CONTAINS}]->(s:{Labels.SCHEMA})
-[:{Edges.CONTAINS}]->(t:{Labels.TABLE})
-[:{Edges.CONTAINS}]->(c:{Labels.COLUMN})
RETURN collect({{
text: "db_name: " + d.name + ", schema_name: " + s.name +
", table_name: " + t.name + ", column_name: " + c.name +
", data_type: " + coalesce(toString(c.data_type), "") +
CASE WHEN c.description IS NOT NULL AND trim(toString(c.description)) <> ""
THEN ", column_description: " + toString(c.description) ELSE "" END,
name: c.name,
label: labels(c)[0],
id: c.id
}}) as docs
"""
result = neo4j_conn.query_read(query, parameters={})
if not result:
return []
return result[0].get("docs") or []


def fetch_tabular_embedding_dataframe() -> pd.DataFrame:
"""Fetch all tabular entity docs from Neo4j and return a DataFrame ready for embedding.

Expand All @@ -43,7 +67,9 @@ def fetch_tabular_embedding_dataframe() -> pd.DataFrame:
unstructured pipeline so run_pipeline_tasks_on_df works without changes.
"""
_empty = pd.DataFrame(columns=["text", "_embed_modality", "path", "page_number", "metadata"])
docs = query_neo4j_tables_for_embedding()
table_docs = query_neo4j_tables_for_embedding()
column_docs = query_neo4j_columns_for_embedding()
docs = list(table_docs) + list(column_docs)
if not docs:
return _empty

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def parse_query_slim(sql_text: str, query_obj: Query, dialect: str, schemas: dic
if not table_matches:
return False

column_ids: list[str] = []
for table_key, match in table_matches.items():
# table_key may be "schema.table" or just "table"; bare name is always the last part.
bare_name = table_key.split(".")[-1]
Expand Down Expand Up @@ -72,6 +73,7 @@ def parse_query_slim(sql_text: str, query_obj: Query, dialect: str, schemas: dic
try:
if schema.is_column_in_table(table_node, col_name):
col_node = schema.get_column_node(col_name, bare_name)
column_ids.append(str(col_node.id))
query_obj.edges.append((query_obj.sql_node, col_node, edge_props))
except Exception:
continue
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@
import pandas as pd


def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i : i + n]


def flat_list_recursive(nested_list):
output = []
for i in nested_list:
Expand Down Expand Up @@ -51,12 +57,6 @@ def remove_redundant_parentheses(text):
return text


def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i : i + n]


def normalize_tables(df: pd.DataFrame) -> pd.DataFrame:
"""Normalize and type a tables DataFrame. Expects a DataFrame only."""
types = {
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

Loading
Loading