Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ to find, contextualize, and extract text, tables, charts and infographics that y
> [!Note]
> NeMo Retriever extraction is also known as NVIDIA Ingest and nv-ingest.

NeMo Retriever Library enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and infographics), extracted, and further contextualized through optical character recognition (OCR) into a well defined JSON schema. From there, NeMo Retriever Library manages computaiton of embeddings for the extracted content as well as storing them in a vector database [Milvus](https://milvus.io/).
NeMo Retriever Library enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and infographics), extracted, and further contextualized through optical character recognition (OCR) into a well defined JSON schema. From there, NeMo Retriever Library manages computation of embeddings for the extracted content as well as storing them in a vector database [Milvus](https://milvus.io/).

The following diagram shows the NeMo Retriever Library pipeline.

Expand Down
5 changes: 5 additions & 0 deletions tests/integration/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ def _wait_for_port(host: str, port: int, timeout: float = 120.0, interval: float
Wait until a TCP port on a host is accepting connections or raise TimeoutError.
This makes the tests robust against pipeline warm-up variability.
"""
if timeout <= 0:
raise ValueError(f"timeout must be positive, got {timeout}")
if interval <= 0:
raise ValueError(f"interval must be positive, got {interval}")

deadline = time.time() + timeout
last_err: Exception | None = None
while time.time() < deadline:
Expand Down
Loading