feat(core): GPU instancing auto-batching#2957
Conversation
…nce data Introduce automatic GPU instancing for MeshRenderer. The system scans renderer-group uniforms across shader passes, builds a unified std140 UBO layout, and packs per-instance data (ModelMat, Layer, etc.) each frame. Key changes: - InstanceDataPacker: packs renderer data into shared UBO for instanced draw - ShaderFactory: unified _scanInstanceUniforms, _buildLayout, _injectInstanceUBO - MeshRenderer._canBatch/_batch: instancing merge logic - ShaderPass/SubShader: instance-aware compilation with macro cache - GLSLIfdefResolver: compile-time #ifdef resolution for instance field scanning - MacroCachePool: pooled ShaderMacroCollection for shader program caching - RenderQueue: instance-aware draw path with UBO binding
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds WebGL2 GPU instancing: instance UBO management, instance-aware shader compilation and caching, batching API/signature changes, render-queue instanced draw paths, device/GL helpers, examples and E2E tests, and supporting shader/uniform enhancements. Changes
Sequence Diagram(s)sequenceDiagram
participant RenderQueue as RenderQueue
participant BatcherManager as BatcherManager
participant InstanceBatch as InstanceBatch
participant GPUBuffer as GPU Constant Buffer
participant ShaderProgram as ShaderProgram
RenderQueue->>BatcherManager: request instanceBatch (lazy)
BatcherManager->>InstanceBatch: setLayout(layout)
InstanceBatch->>GPUBuffer: create/realloc UBO (if needed)
loop per instanced chunk
RenderQueue->>InstanceBatch: upload(renderers[], start, count)
InstanceBatch->>InstanceBatch: pack per-instance fields into CPU buffer
InstanceBatch->>GPUBuffer: setData(range, Discard)
end
RenderQueue->>ShaderProgram: bindUniformBlocks(bindingMap)
ShaderProgram->>GPUBuffer: uniformBlockBinding(bindingPoint)
RenderQueue->>RenderQueue: issue drawPrimitive with instanceCount
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Move _gpuInstanceMacro after _macroMap declaration to fix static initialization order. Also apply prettier formatting fixes.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev/2.0 #2957 +/- ##
===========================================
- Coverage 77.38% 77.08% -0.30%
===========================================
Files 900 907 +7
Lines 98752 99807 +1055
Branches 9817 9866 +49
===========================================
+ Hits 76415 76933 +518
- Misses 22170 22703 +533
- Partials 167 171 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…es layout The instance UBO is injected at compile time by ShaderFactory, so the original shader uniform declarations don't need modification.
Bring back normalMat extraction, transform_declare reordering, trailing whitespace fixes, and VertexPBR indent fix. Only the renderer_Layer relocation stays reverted.
… 3×vec4 - Fix SubRenderElement.set() not resetting instanceDataPacker, causing stale packer references from previous frames to break all batching - Use whitelist + _group fallback for identifying renderer uniforms in _scanInstanceUniforms (fixes _group===undefined for ModelMat) - Store ModelMat as 3×vec4 (affine rows) instead of mat4 in UBO, saving 16 bytes per instance (structSize 80→64, +25% instances/batch) - Add camera_VPMat to transform_declare.glsl for derived MVP define - Extract struct definition outside uniform block (GLSL ES 3.00 compat) - Fix _insertUBOBlock to only scan initial #define preamble - Pass instanceID to fragment shader via flat varying
…lity Wrap derived NormalMat define with mat4() so instancing and non-instancing paths both produce mat4, avoiding shader compilation errors. Add custom instance data example to verify per-renderer uniform batching.
Rename elementA/elementB to preSubElement/subElement across all renderer subclasses and BatchUtils. Change _batch signature so preSubElement is nullable (null = batch head, no previous element to merge with), and subElement is always required.
- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker - Rename getOrCreate() to get() in InstanceDataPackerPool - Clear compileMacros in InstanceDataPacker.reset()
Move macro merging, layout computation, and UBO packing from batch phase to render phase. _batch now only collects renderers into a pre-allocated list. RenderQueue.render handles macro union, layout lookup, and splits by maxInstanceCount for sub-batch rendering. - SubRenderElement: instanceDataPacker → instancedRenderers (pooled array) - MeshRenderer._canBatch: remove maxInstanceCount check - MeshRenderer._batch: only push renderers, zero allocation - InstanceDataPacker: remove compileMacros/addRenderer/instanceCount, add packAndUpload(renderers, start, count) - InstanceDataPackerPool: remove uploadBuffer, simplify reset - BatcherManager: remove instancing uploadBuffer call
Packer is now stateless (setLayout + packAndUpload + draw), so only one instance is needed. Discard upload ensures no GPU stall when reusing the same buffer across shadow and main passes. - Delete InstanceDataPackerPool.ts - BatcherManager: instanceDataPackerPool → instanceDataPacker - Remove resetInstanceDataPackerPool lifecycle - Saves GPU memory by using single buffer instead of pool
Rename class, file, and all references to better reflect its role as an instance batch manager rather than a generic data packer.
It's a macro-keyed map, not a pool (no borrow/return semantics).
Align variable and method names with MacroMap rename — these are maps, not pools.
Callers always pass a valid buffer, no need for null guard.
- nativeBuffer → buffer, _uboData → _data - Replace separate instanceFields/_structSize with single _layout ref - setLayout() now takes InstanceLayout directly
- Remove unnecessary null guard on _layout - Inline uploadElements variable - Destructure floatView/intView from this - Improve worldMatrix comment
- Remove unnecessary component→renderer alias, use component directly - Hoist bindUniformBlocks/bindUniformBufferBase out of sub-batch loop - Move primitive.instanceCount=0 after loop (only need to reset once) - Remove redundant let layout = undefined
- Upgrade lint-staged from v10.5 to v16.4.0
- Fix glob from *.{ts} to **/*.ts to match subdirectory files
- Remove redundant git add from tasks
- eslint 8.44 → 8.57 - @typescript-eslint/parser and eslint-plugin 6.x → 8.x - Eliminates "unsupported TypeScript version" warning
Dispose means the object is permanently released, so null the array to free memory instead of just clearing length.
Eliminate redundant SubShader._getInstanceLayout / ShaderPass._scanInstanceFields by reusing the shader compilation chain — _injectInstanceUBO now returns InstanceLayout directly, stored on ShaderProgram._instanceLayout.
This comment has been minimized.
This comment has been minimized.
Remove the RenderElement container layer and promote SubRenderElement (renamed to RenderElement) as the direct sort/render unit. Previously, sorting was done at the container level while batching operated on individual sub-elements, causing multi-submesh objects to break batches. Now sorting and batching are aligned at the same granularity, allowing same-material elements from different objects to be properly batched together, reducing draw calls.
The batched field on RenderElement was written by BatcherManager but never effectively consumed - 2D overrides hardcoded true, 3D path skipped via isInstanced. Remove the field, _batchedTransformShaderData cache, and consolidate transform update methods into two clear paths: _updateTransformShaderData (3D) and _updateWorldSpaceTransformShaderData (2D).
…o InstanceBufferLayout
…Queue - BatchUtils -> VertexMergeBatcher (file and class); batchFor2D -> batch - BatcherManager.instanceBatch -> instanceBuffer (align with class name) - canBatchSprite: reorder conditions for better short-circuit - RenderQueue: hoist needMaskType out of loop, add phase comments
# Conflicts: # packages/shader/src/shaders/Transform.glsl
|
补充 [P0] RenderElement.ts:dispose —
dispose(): void {
- this.instancedRenderers = null;
+ this.instancedRenderers.length = 0;
}(来自深度 review 补充,上一条 review 中遗漏) |
Closes #194
Summary
InstanceBatch将 renderer uniform(ModelMat、Layer 等)打包到共享的 std140 UBO 中ShaderFactory.injectInstanceUBO自动扫描 shader 中的 renderer uniform,替换为 UBO 数组访问 +#define重映射mat3x4存储(仿射优化,48 字节 vs 64 字节),派生 uniform(MVMat/MVPMat/NormalMat)通过#define实时计算MeshRenderer._canBatch/_batch实现合批判定(相同 primitive + material + front-face)ShaderProgram._recordLocation跳过 UBO 成员(location === null),避免无用 ShaderUniform 创建Performance
测试场景: 2500 glTF 模型(Avocado) + 2500 自定义 shader 立方体,全部动态旋转 + 缩放 + 颜色动画
iPhone 实测截图(59 FPS / 21 Draw Calls / 5000 objects):
Future Optimization
injectInstanceUBO通过正则扫描 GLSL 文本获取 renderer uniform 信息。如果 ShaderLab 预编译时提供 uniform 元数据(name, type, group),可以消除正则扫描,改为精确拼接,提升代码健壮性和可扩展性Key Files
RenderPipeline/InstanceBatch.tsRenderPipeline/RenderQueue.tsshaderlib/ShaderFactory.tsshader/ShaderPass.tsshader/ShaderProgram.ts_instanceLayout字段,跳过 UBO 成员反射mesh/MeshRenderer.ts_canBatch/_batch合批逻辑shader/ShaderProgramMap.tsTest plan