Kheiss/up ovr by kheiss-uwzoo · Pull Request #1861 · NVIDIA/NeMo-Retriever

kheiss-uwzoo · 2026-04-15T20:54:26Z

NVIDIA NeMo Retriever Library is a scalable, performance-oriented framework for document content and metadata extraction. It supports both NVIDIA NIM microservices and a wide range of models to find, contextualize, and extract text, tables, charts, and infographics for use in downstream generative and retrieval-augmented applications.

Update all hardcoded version references from 26.1.2 to 26.3.0-RC1 across helm charts, docker-compose, FastAPI, docs, and examples. Made-with: Cursor

)

Co-authored-by: Kurt Heiss <kheiss@nvidia.com>

#1578)

Co-authored-by: Jeremy Dyer <jdye64@gmail.com>

…ing long VLM captioning Large PDFs with VLM captioning enabled can take 2-22+ hours depending on hardware. The previous defaults (STATE_TTL=7200s, RESULT_DATA_TTL=3600s) caused job state to expire mid-processing, resulting in 404 "Job ID not found or state has expired" errors even though the pipeline completed successfully. Raises both defaults to 172800s (48 hours), providing sufficient headroom for all observed workloads. Users can still override via RESULT_DATA_TTL_SECONDS and STATE_TTL_SECONDS environment variables. Fixes: Customer bug 5914605 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: sosahi <syousefisahi@nvidia.com>

…ine; misc README/Helm fixesUpdating files per bugs 5966185, 5966211, and 5966281 (#1742) Co-authored-by: sosahi <syousefisahi@nvidia.com>

Made-with: Cursor

greptile-apps · 2026-04-15T20:55:38Z

Greptile Summary

This PR prepends new introductory content to docs/docs/extraction/overview.md: a rename notice (nv-ingest → NeMo Retriever Library), a deprecation note for Cached/Deplot, and a revised high-level description paragraph. However, the original introductory paragraphs were not removed, leaving the document with two conflicting intro blocks that differ in wording, scope of listed capabilities, and whether embedding/storage steps are described as optional.

Confidence Score: 3/5

The duplicate and contradictory intro blocks will confuse readers; the old paragraphs should be removed or merged before merging.

A P1 documentation correctness issue remains: two competing introductory sections with inconsistent descriptions of the library's capabilities. This directly harms the user-facing docs and should be resolved before merging.

docs/docs/extraction/overview.md — lines 17–20 (original intro) conflict with the newly added lines 3–11.

Important Files Changed

Filename	Overview
docs/docs/extraction/overview.md	New introductory block (rename note, deprecation note, revised description) inserted before the existing intro, leaving two conflicting intro paragraphs with inconsistent wording about library capabilities.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["overview.md (before PR)"] --> B["Intro paragraph\n(high retrieval accuracy...)"]
    B --> C["Parallelization paragraph\n(manages embeddings, stores to LanceDB)"]
    C --> D["What NeMo Retriever Library Is ✔️"]
    
    E["overview.md (after PR)"] --> F["NEW: Intro paragraph\n(scalable, performance-oriented...)"]
    F --> G["NEW: Rename note (nv-ingest → NeMo Retriever Library)"]
    G --> H["NEW: Parallelization paragraph\n(optionally manages embeddings, LanceDB or Milvus)"]
    H --> I["NEW: Deprecation note (Cached/Deplot)"]
    I --> J["OLD: Intro paragraph\n(high retrieval accuracy...) ⚠️ DUPLICATE"]
    J --> K["OLD: Parallelization paragraph\n(manages, stores) ⚠️ CONFLICTS"]
    K --> L["What NeMo Retriever Library Is ✔️"]
    
    style J fill:#ffcccc
    style K fill:#ffcccc

Comments Outside Diff (1)

docs/docs/extraction/overview.md, line 17-20 (link)

Duplicate and conflicting introductory content

The newly added paragraphs (lines 3–11) introduce NeMo Retriever Library in a way that is nearly duplicate — but subtly inconsistent — with the original paragraphs that still remain here. Line 3 calls it "scalable, performance-oriented" while line 17 calls it "high retrieval accuracy, performant, and scalable"; lines 9–11 say the library "can optionally manage" embedding and storage, while lines 19–20 say it "manages" and "stores into". Readers will encounter two competing intro sections with contradictory details about the library's capabilities and listed file types. The old paragraph block (lines 17–20) should be removed or merged into the new intro.

Prompt To Fix With AI

This is a comment left during a code review.
Path: docs/docs/extraction/overview.md
Line: 17-20

Comment:
**Duplicate and conflicting introductory content**

The newly added paragraphs (lines 3–11) introduce NeMo Retriever Library in a way that is nearly duplicate — but subtly inconsistent — with the original paragraphs that still remain here. Line 3 calls it "scalable, performance-oriented" while line 17 calls it "high retrieval accuracy, performant, and scalable"; lines 9–11 say the library "can optionally manage" embedding and storage, while lines 19–20 say it "manages" and "stores into". Readers will encounter two competing intro sections with contradictory details about the library's capabilities and listed file types. The old paragraph block (lines 17–20) should be removed or merged into the new intro.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: docs/docs/extraction/overview.md
Line: 17-20

Comment:
**Duplicate and conflicting introductory content**

The newly added paragraphs (lines 3–11) introduce NeMo Retriever Library in a way that is nearly duplicate — but subtly inconsistent — with the original paragraphs that still remain here. Line 3 calls it "scalable, performance-oriented" while line 17 calls it "high retrieval accuracy, performant, and scalable"; lines 9–11 say the library "can optionally manage" embedding and storage, while lines 19–20 say it "manages" and "stores into". Readers will encounter two competing intro sections with contradictory details about the library's capabilities and listed file types. The old paragraph block (lines 17–20) should be removed or merged into the new intro.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docs/docs/extraction/overview.md
Line: 9

Comment:
**Missing hyphen in compound modifier**

"well defined" modifying "JSON schema" is a compound adjective and should be hyphenated.

```suggestion
NeMo Retriever Library enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and infographics), extracted, and further contextualized through optical character recognition (OCR) into a well-defined JSON schema. 
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (4): Last reviewed commit: "Merge branch 'main' into kheiss/up-ovr" | Re-trigger Greptile}

jdye64 · 2026-04-22T17:07:43Z

@kheiss-uwzoo approved but merge conflicts that need resolved before I can merge

kheiss-uwzoo and others added 30 commits February 19, 2026 10:36

Update PDF blueprint architecture diagram

cd3c368

Merge remote-tracking branch 'upstream/main'

70b5a80

Merge remote-tracking branch 'upstream/main'

7f0248c

Merge remote-tracking branch 'upstream/main'

0dd5f1b

Merge remote-tracking branch 'upstream/main'

dea2770

Merge remote-tracking branch 'upstream/main'

3ff2f1f

Merge remote-tracking branch 'upstream/main'

a886244

Merge remote-tracking branch 'upstream/main'

b44f7ad

Merge remote-tracking branch 'upstream/main'

addf637

Merge remote-tracking branch 'upstream/main'

5900322

Merge remote-tracking branch 'upstream/main'

d12df70

Merge remote-tracking branch 'upstream/main'

67e674b

Merge remote-tracking branch 'upstream/main'

83c3c42

Introduce release branch 26.03 with version 26.3.0-RC1

371d883

Update all hardcoded version references from 26.1.2 to 26.3.0-RC1 across helm charts, docker-compose, FastAPI, docs, and examples. Made-with: Cursor

Merge remote-tracking branch 'upstream/main'

4af706f

Merge remote-tracking branch 'upstream/main'

a5812fa

Merge remote-tracking branch 'upstream/main'

6ecb070

Release prep: Update version to 26.03.0-RC1 (#1574)

72173fc

(retriever) Add .split() for text chunking by token count (#1547) (#1576

852910c

)

(retriever) add documentation for image file support (#1571) (#1577)

64c694b

Co-authored-by: Kurt Heiss <kheiss@nvidia.com>

[26.03] Refactor get_*_model_name to avoid caching fallback model name (

d38abb2

#1578)

[26.03] (helm) More nemotron rebranding (#1581)

fbd2e28

Merge remote-tracking branch 'upstream/main'

ba92f69

Add source_id column back to lancedb

1835ba7

upmerge

db03ed7

fix reranker in inproc (#1588)

5cbf38e

Add source_id to output columns

6459e60

fix in process extract to handle txt (#1589)

ed95c44

Co-authored-by: Jeremy Dyer <jdye64@gmail.com>

Release prep: 26.03.0-RC2 (#1591)

9568b50

kheiss-uwzoo and others added 17 commits March 30, 2026 10:24

Kheiss/5966722 (#1743)

9dc88b5

Updated files per bugs 5970369, 5966307, and 5966925 (#1740)

6c3c2a6

Align VLM caption model and MinIO defaults with runtime (#1739)

53262b4

Co-authored-by: sosahi <syousefisahi@nvidia.com>

added licensing info to documentation (#1750)

1a91164

updated quickstart guide file per 5966239 (#1751)

b5d7b96

update support matrix to add footnotes

4744677

update support matrix to add footnotes (#1752)

e8759e2

Merge remote-tracking branch 'upstream/26.03' into 26.03

f39912f

Kheiss/5966297update (#1758)

29f787b

Align VLM caption model, fix V2 ingest() example, document run_pipel…

c5e1c22

…ine; misc README/Helm fixesUpdating files per bugs 5966185, 5966211, and 5966281 (#1742) Co-authored-by: sosahi <syousefisahi@nvidia.com>

Merge remote-tracking branch 'upstream/26.03' into 26.03

7461ce4

Merge branch '26.03' into main

d56a8cb

Made-with: Cursor

Merge remote-tracking branch 'upstream/main'

3e80634

Merge remote-tracking branch 'upstream/main'

7f73df3

Merge remote-tracking branch 'upstream/main'

4ce21b5

Update overview.md

1bb2f98

updated overview text

0eba904

kheiss-uwzoo requested review from a team as code owners April 15, 2026 20:54

kheiss-uwzoo requested a review from jioffe502 April 15, 2026 20:54

Merge branch 'main' into kheiss/up-ovr

8b808e8

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread docs/docs/extraction/overview.md Outdated

kheiss-uwzoo added 2 commits April 15, 2026 13:57

Update overview.md

c1cb5e1

Merge branch 'main' into kheiss/up-ovr

04aff9d

jdye64 approved these changes Apr 22, 2026

View reviewed changes

Merge branch 'main' into kheiss/up-ovr

f97887d

kheiss-uwzoo closed this Apr 22, 2026

kheiss-uwzoo deleted the kheiss/up-ovr branch April 30, 2026 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kheiss/up ovr#1861

Kheiss/up ovr#1861
kheiss-uwzoo wants to merge 106 commits into
mainfrom
kheiss/up-ovr

kheiss-uwzoo commented Apr 15, 2026

Uh oh!

greptile-apps Bot commented Apr 15, 2026 •

edited

Loading

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

jdye64 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

kheiss-uwzoo commented Apr 15, 2026

Uh oh!

greptile-apps Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

jdye64 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

greptile-apps Bot commented Apr 15, 2026 •

edited

Loading