Skip to content

feat(core): GPU instancing auto-batching#2957

Open
GuoLei1990 wants to merge 71 commits intodev/2.0from
feat/gpu-instancing
Open

feat(core): GPU instancing auto-batching#2957
GuoLei1990 wants to merge 71 commits intodev/2.0from
feat/gpu-instancing

Conversation

@GuoLei1990
Copy link
Copy Markdown
Member

@GuoLei1990 GuoLei1990 commented Apr 7, 2026

Closes #194

Summary

  • 为 MeshRenderer 引入自动 GPU Instancing,通过 UBO 打包 per-instance 数据驱动 instanced draw call
  • InstanceBatch 将 renderer uniform(ModelMat、Layer 等)打包到共享的 std140 UBO 中
  • ShaderFactory.injectInstanceUBO 自动扫描 shader 中的 renderer uniform,替换为 UBO 数组访问 + #define 重映射
  • ModelMat 以 mat3x4 存储(仿射优化,48 字节 vs 64 字节),派生 uniform(MVMat/MVPMat/NormalMat)通过 #define 实时计算
  • InstanceLayout 在 shader 编译时计算并缓存在 ShaderProgram 上,渲染时直接取用,无重复计算
  • Per-pass 独立布局:不同 pass 按各自需要的 uniform 计算 struct 大小,ShadowCaster 等轻量 pass 可获得更高 instanceMaxCount
  • MeshRenderer._canBatch/_batch 实现合批判定(相同 primitive + material + front-face)
  • ShaderProgram._recordLocation 跳过 UBO 成员(location === null),避免无用 ShaderUniform 创建
  • Opaque 排序策略改为按 material/primitive 排序(合批优先),移除距离排序

Performance

测试场景: 2500 glTF 模型(Avocado) + 2500 自定义 shader 立方体,全部动态旋转 + 缩放 + 颜色动画

设备 dev/2.0 (无 instancing) feat/gpu-instancing 提升
iPhone 16 Pro Max 30 FPS 50 FPS +67%

iPhone 实测截图(59 FPS / 21 Draw Calls / 5000 objects):

Future Optimization

  • ShaderLab 预编译元数据: 当前 injectInstanceUBO 通过正则扫描 GLSL 文本获取 renderer uniform 信息。如果 ShaderLab 预编译时提供 uniform 元数据(name, type, group),可以消除正则扫描,改为精确拼接,提升代码健壮性和可扩展性
  • SubRenderElement 层级排序: Opaque 排序应下沉到 SubRenderElement 层级,避免多 submesh 物体打断合批连续性

Key Files

文件 职责
RenderPipeline/InstanceBatch.ts UBO buffer 管理、per-instance 数据 upload
RenderPipeline/RenderQueue.ts Instance-aware 渲染循环
shaderlib/ShaderFactory.ts Uniform 扫描、std140 布局计算、UBO 代码生成注入
shader/ShaderPass.ts Shader 编译入口,InstanceLayout 存储到 ShaderProgram
shader/ShaderProgram.ts 新增 _instanceLayout 字段,跳过 UBO 成员反射
mesh/MeshRenderer.ts _canBatch/_batch 合批逻辑
shader/ShaderProgramMap.ts 多级位掩码 ShaderProgram 缓存

Test plan

  • 多个共享 mesh + material 的 MeshRenderer 正确 instanced 渲染
  • Shadow pass 与 instanced 物体配合正常
  • 非 instancing renderer(SkinnedMeshRenderer、2D sprite)不受影响
  • 无 renderer uniform 时(无 UBO layout)正常回退
  • 性能验证:大量相同物体的 draw call 数量显著减少
  • iPhone 16 Pro Max 实测 30 FPS → 50 FPS(+67%)

…nce data

Introduce automatic GPU instancing for MeshRenderer. The system scans
renderer-group uniforms across shader passes, builds a unified std140
UBO layout, and packs per-instance data (ModelMat, Layer, etc.) each
frame. Key changes:

- InstanceDataPacker: packs renderer data into shared UBO for instanced draw
- ShaderFactory: unified _scanInstanceUniforms, _buildLayout, _injectInstanceUBO
- MeshRenderer._canBatch/_batch: instancing merge logic
- ShaderPass/SubShader: instance-aware compilation with macro cache
- GLSLIfdefResolver: compile-time #ifdef resolution for instance field scanning
- MacroCachePool: pooled ShaderMacroCollection for shader program caching
- RenderQueue: instance-aware draw path with UBO binding
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds WebGL2 GPU instancing: instance UBO management, instance-aware shader compilation and caching, batching API/signature changes, render-queue instanced draw paths, device/GL helpers, examples and E2E tests, and supporting shader/uniform enhancements.

Changes

Cohort / File(s) Summary
GPU Instancing Core
packages/core/src/RenderPipeline/InstanceBatch.ts, packages/core/src/shader/ShaderBlockProperty.ts, packages/core/src/shader/enums/ConstantBufferBindingPoint.ts
New InstanceBatch class managing CPU packing and dynamic UBO allocation; ShaderBlockProperty for uniform block ids; ConstantBuffer binding enum for renderer-instance UBO.
Batching Parameter Refactoring
packages/core/src/RenderPipeline/BatchUtils.ts, packages/core/src/2d/sprite/SpriteRenderer.ts, packages/core/src/2d/sprite/SpriteMask.ts, packages/core/src/2d/text/TextRenderer.ts, packages/core/src/Renderer.ts, packages/ui/src/component/UIRenderer.ts
Renamed batching parameters from (elementA, elementB)(preSubElement, subElement) and adjusted nullability/optional semantics; updated BatchUtils to consume new parameter roles.
SubRenderElement & RenderPipeline
packages/core/src/RenderPipeline/SubRenderElement.ts, packages/core/src/RenderPipeline/BasicRenderPipeline.ts, packages/core/src/RenderPipeline/RenderQueue.ts, packages/core/src/RenderPipeline/BatcherManager.ts
Replaced shaderPasses with subShader; added instancedRenderers on SubRenderElement; BasicRenderPipeline and RenderQueue updated to use SubShader and to perform instanced draw paths; BatcherManager exposes InstanceBatch and calls updated renderer batching signature.
Renderer & Mesh Changes
packages/core/src/Renderer.ts, packages/core/src/mesh/MeshRenderer.ts, packages/core/src/mesh/SkinnedMeshRenderer.ts
Renderer shader-property fields visibility adjusted; batching hooks updated to new signatures; MeshRenderer supports collecting instanced renderers into SubRenderElement; SkinnedMeshRenderer disables batching.
Shader Program & Compilation
packages/core/src/shader/ShaderPass.ts, packages/core/src/shader/ShaderProgram.ts, packages/core/src/shader/ShaderProgramMap.ts, packages/core/src/shader/ShaderBlockProperty.ts, packages/core/src/shaderlib/ShaderFactory.ts, packages/core/src/graphic/TransformFeedbackShader.ts
Switched from ShaderProgramPool → ShaderProgramMap cache; added _compileShaderProgram path that computes instance layouts and injects instance UBOs; ShaderProgram now records uniform block ids and supports bindUniformBlocks; ShaderFactory extended to generate instance-aware GLSL and return instance layout; related compile/cache call sites updated.
Shader Uniforms & Uploads
packages/core/src/shader/ShaderUniform.ts
Added upload methods for Mat2/Mat3 and rectangular matrix types (WebGL2 variants).
Engine & Caching Fields
packages/core/src/Engine.ts, packages/core/src/shader/ShaderProgramMap.ts
Replaced internal _shaderProgramPools with _shaderProgramMaps and adapted lazy accessors; ShaderProgramMap renamed/refactored with destroy() lifecycle.
RHI / WebGL2 Support
packages/rhi-webgl/src/GLBuffer.ts, packages/rhi-webgl/src/WebGLGraphicDevice.ts, packages/core/src/graphic/enums/BufferBindFlag.ts
Added ConstantBuffer bind flag and mapped it to UNIFORM_BUFFER; WebGL device gains bindUniformBufferBase, bindUniformBlock, and getMaxUniformBlockSize helpers; GLBuffer target selection updated.
2D / UI Render Element Updates
packages/core/src/2d/sprite/SpriteMask.ts, packages/ui/src/component/advanced/Image.ts, packages/ui/src/component/advanced/Text.ts
Overlay/2D sub-render elements now assign subShader instead of shaderPasses; small adjustments to subRenderElement setup to match SubShader change.
Examples, E2E & Tests
examples/src/gpu-instancing-auto-batch.ts, examples/src/gpu-instancing-custom-data.ts, e2e/case/gpu-instancing-auto-batch.ts, e2e/case/gpu-instancing-custom-data.ts, e2e/config.ts, examples/package.json, tests/src/shader-lab/*
Added examples and E2E tests for instancing; registered new E2E config entries; examples add custom shaders and animated instanced entities; tests updated to call _compileShaderProgram and benchmark helpers adjusted.
Tooling / Packages
package.json, examples/package.json
Dev tooling upgrades (ESLint, TypeScript ESLint parser/plugin, lint-staged) and added @galacean/engine-toolkit-stats dependency in examples.
Other Render Pipeline Adjustments
packages/core/src/RenderPipeline/BasicRenderPipeline.ts, packages/core/src/RenderPipeline/BatchUtils.ts, packages/core/src/RenderPipeline/InstanceBatch.ts
pushRenderElement now consumes SubShader; BatchUtils.batchFor2D signature adjusted; InstanceBatch integrated into BatcherManager and RenderQueue flows for chunked uploads and instanced draw submission.

Sequence Diagram(s)

sequenceDiagram
    participant RenderQueue as RenderQueue
    participant BatcherManager as BatcherManager
    participant InstanceBatch as InstanceBatch
    participant GPUBuffer as GPU Constant Buffer
    participant ShaderProgram as ShaderProgram

    RenderQueue->>BatcherManager: request instanceBatch (lazy)
    BatcherManager->>InstanceBatch: setLayout(layout)
    InstanceBatch->>GPUBuffer: create/realloc UBO (if needed)
    loop per instanced chunk
        RenderQueue->>InstanceBatch: upload(renderers[], start, count)
        InstanceBatch->>InstanceBatch: pack per-instance fields into CPU buffer
        InstanceBatch->>GPUBuffer: setData(range, Discard)
    end
    RenderQueue->>ShaderProgram: bindUniformBlocks(bindingMap)
    ShaderProgram->>GPUBuffer: uniformBlockBinding(bindingPoint)
    RenderQueue->>RenderQueue: issue drawPrimitive with instanceCount
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I packed the fields in quiet rows,
UBOs hum where instance data flows,
Shaders peep and vary hue,
Batches leap — five thousand, who knew?
A tiny rabbit hops — render goes!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title "feat(core): GPU instancing auto-batching" directly and clearly summarizes the primary feature added in this changeset: GPU instancing with automatic batching for improved rendering performance.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gpu-instancing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Move _gpuInstanceMacro after _macroMap declaration to fix static
initialization order. Also apply prettier formatting fixes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 44.31227% with 749 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.08%. Comparing base (a54642f) to head (06759e6).
⚠️ Report is 9 commits behind head on dev/2.0.

Files with missing lines Patch % Lines
packages/core/src/shaderlib/ShaderFactory.ts 43.50% 174 Missing ⚠️
examples/src/gpu-instancing-auto-batch.ts 0.00% 116 Missing and 1 partial ⚠️
examples/src/gpu-instancing-custom-data.ts 0.00% 82 Missing and 1 partial ⚠️
e2e/case/gpu-instancing-custom-data.ts 0.00% 45 Missing and 1 partial ⚠️
packages/core/src/RenderPipeline/InstanceBuffer.ts 46.51% 46 Missing ⚠️
e2e/case/gpu-instancing-auto-batch.ts 0.00% 42 Missing and 1 partial ⚠️
packages/core/src/RenderPipeline/RenderQueue.ts 52.23% 32 Missing ⚠️
packages/core/src/shader/ShaderProgram.ts 52.08% 23 Missing ⚠️
...ges/core/src/RenderPipeline/BasicRenderPipeline.ts 43.24% 21 Missing ⚠️
packages/core/src/RenderPipeline/BatchUtils.ts 20.00% 16 Missing ⚠️
... and 20 more
Additional details and impacted files
@@             Coverage Diff             @@
##           dev/2.0    #2957      +/-   ##
===========================================
- Coverage    77.38%   77.08%   -0.30%     
===========================================
  Files          900      907       +7     
  Lines        98752    99807    +1055     
  Branches      9817     9866      +49     
===========================================
+ Hits         76415    76933     +518     
- Misses       22170    22703     +533     
- Partials       167      171       +4     
Flag Coverage Δ
unittests 77.08% <44.31%> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…es layout

The instance UBO is injected at compile time by ShaderFactory, so the
original shader uniform declarations don't need modification.
Bring back normalMat extraction, transform_declare reordering,
trailing whitespace fixes, and VertexPBR indent fix. Only the
renderer_Layer relocation stays reverted.
… 3×vec4

- Fix SubRenderElement.set() not resetting instanceDataPacker, causing
  stale packer references from previous frames to break all batching
- Use whitelist + _group fallback for identifying renderer uniforms in
  _scanInstanceUniforms (fixes _group===undefined for ModelMat)
- Store ModelMat as 3×vec4 (affine rows) instead of mat4 in UBO,
  saving 16 bytes per instance (structSize 80→64, +25% instances/batch)
- Add camera_VPMat to transform_declare.glsl for derived MVP define
- Extract struct definition outside uniform block (GLSL ES 3.00 compat)
- Fix _insertUBOBlock to only scan initial #define preamble
- Pass instanceID to fragment shader via flat varying
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 7, 2026
…lity

Wrap derived NormalMat define with mat4() so instancing and non-instancing
paths both produce mat4, avoiding shader compilation errors.
Add custom instance data example to verify per-renderer uniform batching.
Rename elementA/elementB to preSubElement/subElement across all
renderer subclasses and BatchUtils. Change _batch signature so
preSubElement is nullable (null = batch head, no previous element
to merge with), and subElement is always required.
- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker
- Rename getOrCreate() to get() in InstanceDataPackerPool
- Clear compileMacros in InstanceDataPacker.reset()
Move macro merging, layout computation, and UBO packing from batch
phase to render phase. _batch now only collects renderers into a
pre-allocated list. RenderQueue.render handles macro union, layout
lookup, and splits by maxInstanceCount for sub-batch rendering.

- SubRenderElement: instanceDataPacker → instancedRenderers (pooled array)
- MeshRenderer._canBatch: remove maxInstanceCount check
- MeshRenderer._batch: only push renderers, zero allocation
- InstanceDataPacker: remove compileMacros/addRenderer/instanceCount,
  add packAndUpload(renderers, start, count)
- InstanceDataPackerPool: remove uploadBuffer, simplify reset
- BatcherManager: remove instancing uploadBuffer call
Packer is now stateless (setLayout + packAndUpload + draw), so only
one instance is needed. Discard upload ensures no GPU stall when
reusing the same buffer across shadow and main passes.

- Delete InstanceDataPackerPool.ts
- BatcherManager: instanceDataPackerPool → instanceDataPacker
- Remove resetInstanceDataPackerPool lifecycle
- Saves GPU memory by using single buffer instead of pool
Rename class, file, and all references to better reflect its role as
an instance batch manager rather than a generic data packer.
It's a macro-keyed map, not a pool (no borrow/return semantics).
Align variable and method names with MacroMap rename — these are maps,
not pools.
Callers always pass a valid buffer, no need for null guard.
- nativeBuffer → buffer, _uboData → _data
- Replace separate instanceFields/_structSize with single _layout ref
- setLayout() now takes InstanceLayout directly
- Remove unnecessary null guard on _layout
- Inline uploadElements variable
- Destructure floatView/intView from this
- Improve worldMatrix comment
- Remove unnecessary component→renderer alias, use component directly
- Hoist bindUniformBlocks/bindUniformBufferBase out of sub-batch loop
- Move primitive.instanceCount=0 after loop (only need to reset once)
- Remove redundant let layout = undefined
- Upgrade lint-staged from v10.5 to v16.4.0
- Fix glob from *.{ts} to **/*.ts to match subdirectory files
- Remove redundant git add from tasks
- eslint 8.44 → 8.57
- @typescript-eslint/parser and eslint-plugin 6.x → 8.x
- Eliminates "unsupported TypeScript version" warning
Dispose means the object is permanently released, so null the array
to free memory instead of just clearing length.
Eliminate redundant SubShader._getInstanceLayout / ShaderPass._scanInstanceFields
by reusing the shader compilation chain — _injectInstanceUBO now returns
InstanceLayout directly, stored on ShaderProgram._instanceLayout.
GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

@GuoLei1990

This comment has been minimized.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

Remove the RenderElement container layer and promote SubRenderElement
(renamed to RenderElement) as the direct sort/render unit. Previously,
sorting was done at the container level while batching operated on
individual sub-elements, causing multi-submesh objects to break batches.

Now sorting and batching are aligned at the same granularity, allowing
same-material elements from different objects to be properly batched
together, reducing draw calls.
GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

The batched field on RenderElement was written by BatcherManager but
never effectively consumed - 2D overrides hardcoded true, 3D path
skipped via isInstanced. Remove the field, _batchedTransformShaderData
cache, and consolidate transform update methods into two clear paths:
_updateTransformShaderData (3D) and _updateWorldSpaceTransformShaderData (2D).
GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

…Queue

- BatchUtils -> VertexMergeBatcher (file and class); batchFor2D -> batch
- BatcherManager.instanceBatch -> instanceBuffer (align with class name)
- canBatchSprite: reorder conditions for better short-circuit
- RenderQueue: hoist needMaskType out of loop, add phase comments
# Conflicts:
#	packages/shader/src/shaders/Transform.glsl
GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

@GuoLei1990
Copy link
Copy Markdown
Member Author

补充 [P0] RenderElement.ts:dispose — instancedRenderers = null 导致对象池复用时 NPE

dispose()this.instancedRenderers = null,但 set() 中执行 this.instancedRenderers.length = 0。当 RenderElement 被 ObjectPool 回收后再取出调用 set() 时,会在 null.length 上抛 TypeError。应改为 this.instancedRenderers.length = 0 保持数组引用存活,或在 set() 中重新创建数组。

 dispose(): void {
-    this.instancedRenderers = null;
+    this.instancedRenderers.length = 0;
 }

(来自深度 review 补充,上一条 review 中遗漏)

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants