feat: MiniCPM-V-2.6 serving by PanZezhong1725 · Pull Request #396 · InfiniTensor/InfiniLM

PanZezhong1725 · 2026-05-22T01:12:17Z

Summary

Adds MiniCPM-V multimodal request support around image inputs.
Introduces a dedicated MiniCPMVProcessor that:
-- loads HF AutoProcessor / tokenizer,
-- applies MiniCPM-V chat templates,
-- processes image_url messages,
-- builds batched model inputs including pixel_values, tgt_sizes, image_bound, and image request ids.
Changes multimodal input plumbing from single tensors to per-request tensor lists:
-- pixel_values
-- image_bound
-- tgt_sizes
-- image_req_ids
Updates C++ InferEngine, RankWorker, pybind bindings, and InfinilmModel::Input to carry those per-request multimodal tensors.
Adds forward-context multimodal metadata so C++ model code can know which scheduled requests contain image data.
Updates MiniCPMVModel so image embeddings are replaced inside the already-batched token embeddings using request offsets, instead of assuming a single request.
Fixes MiniCPM-V batching behavior for continuous batching / paged attention.
Extends prefix-cache hashing to include multimodal identifiers, preventing different images with similar token text from incorrectly sharing cached blocks.
Adds logic to avoid reusing cache blocks that partially cover incomplete multimodal spans.
Optimizes MiniCPM-V vision path:
-- resampler now precomputes a max-size 2D sin/cos embedding table,
-- SigLIP attention switches to fused QKV projection,
-- SigLIP can use the configured attention backend, including flash attention.
Makes fused linear helpers easier to use without explicit quantization by defaulting to NoneQuantization.
Adds OpenAI-style image request tooling:
-- test/service/request.py
-- image-aware scripts/test_perf.py
-- README example for single-request image service testing.
Adjusts examples/test_infer.py to insert image content into the conversation structure correctly.

Motivation

Closes #343

Type of Change

feat — new feature / new model
fix — bug fix
perf — performance improvement (no behavioral change)
refactor — code restructuring without behavior change
test — adding or fixing tests only
docs — documentation only
build / ci — build system or CI configuration
chore — tooling, formatting, or other non-code changes
Breaking change

Test Results of Involved Models on Supported Platforms (Please attach screenshots)

MiniCPM-V-2.6 serving:

9g8b test_infer

qwen3 bench

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Every contributor must verify every item below before requesting
review. Tick each box only after the check has actually been performed —
do not tick speculatively. If an item truly does not apply, replace the
checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
No stray merge commits from main — the branch is rebased cleanly on top of the current main.
No fixup! / squash! / wip commits remain.
Existing PR/branch/commit that followed the legacy issue format.

Scope and Design

Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes (if any) are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

Code follows the Google C++ Style Guide strictly.
Error and warning message wording follows the LLVM Coding Standards (CONTRIBUTING.md §C++).
Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
No raw new/delete; RAII / smart pointers / existing allocators are used.
Changed files are formatted by scripts/format.py.
No changes/reference to csrc/models/llama_legacy/.

Python Specific (if Python files changed)

Code is PEP 8 compliant.
Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
Docstrings (if any) follow PEP 257 (CONTRIBUTING.md §Python).
Changed files are formatted by scripts/format.py.
No changes/reference to python/infinilm/auto_config.py.

Testing

For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
Passed single request test (examples/test_infer.py), or specify the reason for skipping.
Passed offline performance test (examples/bench.py), or specify the reason for skipping.
Passed sanity test (test/bench/test_benchmark.py), or specify the reason for skipping.
Passed service test (python/infinilm/server/inference_server.py + scripts/test_perf.py), or specify the reason for skipping.

Build, CI, and Tooling

The project builds cleanly from a fresh directory on at least one affected platform.

Documentation

README.md, CONTRIBUTING.md, or inline docs updated when behavior, build flags, or developer workflow changed.
Any user-visible breaking change is called out explicitly under "Motivation" and in the commit/PR title with a ! or BREAKING CHANGE: footer.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
Third-party code is license-compatible and attributed.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

feat: add prefix hashing for mm_data

feat: minicpm-v continuous-batching, service test with image inputs

PanZezhong1725 and others added 8 commits May 11, 2026 06:48

issue/344 add prefix hashing for mm_data

9e79068

Merge pull request #364 from InfiniTensor/issue/344

4a75a67

feat: add prefix hashing for mm_data

issue/369 minicpm-v continuous-batching, service test with image inputs

29473f9

Merge pull request #377 from InfiniTensor/issue/369

778a6d2

feat: minicpm-v continuous-batching, service test with image inputs

issue/343 fix LLM mm data processing

14c78c3

Merge remote-tracking branch 'origin/main' into issue/343

1b05f5f

issue/343 optimize minicpmv resampler

429471e

issue/343 fix batching

d493491

PanZezhong1725 force-pushed the issue/343 branch from 45020b8 to afa7033 Compare May 29, 2026 07:17

issue/343 optimize siglip attention

cd3bb2b

PanZezhong1725 force-pushed the issue/343 branch from afa7033 to cd3bb2b Compare June 5, 2026 06:18

wooway777 marked this pull request as ready for review June 5, 2026 07:15

wooway777 requested a review from a team June 5, 2026 07:15

PanZezhong1725 linked an issue Jun 5, 2026 that may be closed by this pull request

[DEV] MiniCPM-V模型推理服务 #343

Open

PanZezhong1725 requested a review from wooway777 June 5, 2026 08:13

Merge remote-tracking branch 'origin/main' into issue/343

bcbf633

PanZezhong1725 requested a review from ma-hang June 5, 2026 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MiniCPM-V-2.6 serving#396

feat: MiniCPM-V-2.6 serving#396
PanZezhong1725 wants to merge 10 commits into
mainfrom
issue/343

PanZezhong1725 commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PanZezhong1725 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Test Results of Involved Models on Supported Platforms (Please attach screenshots)

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PanZezhong1725 commented May 22, 2026 •

edited

Loading