Kheiss/up ovr#1861
Conversation
Update all hardcoded version references from 26.1.2 to 26.3.0-RC1 across helm charts, docker-compose, FastAPI, docs, and examples. Made-with: Cursor
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
…ing long VLM captioning Large PDFs with VLM captioning enabled can take 2-22+ hours depending on hardware. The previous defaults (STATE_TTL=7200s, RESULT_DATA_TTL=3600s) caused job state to expire mid-processing, resulting in 404 "Job ID not found or state has expired" errors even though the pipeline completed successfully. Raises both defaults to 172800s (48 hours), providing sufficient headroom for all observed workloads. Users can still override via RESULT_DATA_TTL_SECONDS and STATE_TTL_SECONDS environment variables. Fixes: Customer bug 5914605 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: sosahi <syousefisahi@nvidia.com>
…ine; misc README/Helm fixesUpdating files per bugs 5966185, 5966211, and 5966281 (#1742) Co-authored-by: sosahi <syousefisahi@nvidia.com>
Made-with: Cursor
Greptile SummaryThis PR prepends new introductory content to
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/overview.md | New introductory block (rename note, deprecation note, revised description) inserted before the existing intro, leaving two conflicting intro paragraphs with inconsistent wording about library capabilities. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["overview.md (before PR)"] --> B["Intro paragraph\n(high retrieval accuracy...)"]
B --> C["Parallelization paragraph\n(manages embeddings, stores to LanceDB)"]
C --> D["What NeMo Retriever Library Is ✔️"]
E["overview.md (after PR)"] --> F["NEW: Intro paragraph\n(scalable, performance-oriented...)"]
F --> G["NEW: Rename note (nv-ingest → NeMo Retriever Library)"]
G --> H["NEW: Parallelization paragraph\n(optionally manages embeddings, LanceDB or Milvus)"]
H --> I["NEW: Deprecation note (Cached/Deplot)"]
I --> J["OLD: Intro paragraph\n(high retrieval accuracy...) ⚠️ DUPLICATE"]
J --> K["OLD: Parallelization paragraph\n(manages, stores) ⚠️ CONFLICTS"]
K --> L["What NeMo Retriever Library Is ✔️"]
style J fill:#ffcccc
style K fill:#ffcccc
Comments Outside Diff (1)
-
docs/docs/extraction/overview.md, line 17-20 (link)Duplicate and conflicting introductory content
The newly added paragraphs (lines 3–11) introduce NeMo Retriever Library in a way that is nearly duplicate — but subtly inconsistent — with the original paragraphs that still remain here. Line 3 calls it "scalable, performance-oriented" while line 17 calls it "high retrieval accuracy, performant, and scalable"; lines 9–11 say the library "can optionally manage" embedding and storage, while lines 19–20 say it "manages" and "stores into". Readers will encounter two competing intro sections with contradictory details about the library's capabilities and listed file types. The old paragraph block (lines 17–20) should be removed or merged into the new intro.
Prompt To Fix With AI
This is a comment left during a code review. Path: docs/docs/extraction/overview.md Line: 17-20 Comment: **Duplicate and conflicting introductory content** The newly added paragraphs (lines 3–11) introduce NeMo Retriever Library in a way that is nearly duplicate — but subtly inconsistent — with the original paragraphs that still remain here. Line 3 calls it "scalable, performance-oriented" while line 17 calls it "high retrieval accuracy, performant, and scalable"; lines 9–11 say the library "can optionally manage" embedding and storage, while lines 19–20 say it "manages" and "stores into". Readers will encounter two competing intro sections with contradictory details about the library's capabilities and listed file types. The old paragraph block (lines 17–20) should be removed or merged into the new intro. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/docs/extraction/overview.md
Line: 17-20
Comment:
**Duplicate and conflicting introductory content**
The newly added paragraphs (lines 3–11) introduce NeMo Retriever Library in a way that is nearly duplicate — but subtly inconsistent — with the original paragraphs that still remain here. Line 3 calls it "scalable, performance-oriented" while line 17 calls it "high retrieval accuracy, performant, and scalable"; lines 9–11 say the library "can optionally manage" embedding and storage, while lines 19–20 say it "manages" and "stores into". Readers will encounter two competing intro sections with contradictory details about the library's capabilities and listed file types. The old paragraph block (lines 17–20) should be removed or merged into the new intro.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/docs/extraction/overview.md
Line: 9
Comment:
**Missing hyphen in compound modifier**
"well defined" modifying "JSON schema" is a compound adjective and should be hyphenated.
```suggestion
NeMo Retriever Library enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and infographics), extracted, and further contextualized through optical character recognition (OCR) into a well-defined JSON schema.
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (4): Last reviewed commit: "Merge branch 'main' into kheiss/up-ovr" | Re-trigger Greptile
|
@kheiss-uwzoo approved but merge conflicts that need resolved before I can merge |
NVIDIA NeMo Retriever Library is a scalable, performance-oriented framework for document content and metadata extraction. It supports both NVIDIA NIM microservices and a wide range of models to find, contextualize, and extract text, tables, charts, and infographics for use in downstream generative and retrieval-augmented applications.