feat(core): GPU instancing auto-batching#2957
Conversation
…nce data Introduce automatic GPU instancing for MeshRenderer. The system scans renderer-group uniforms across shader passes, builds a unified std140 UBO layout, and packs per-instance data (ModelMat, Layer, etc.) each frame. Key changes: - InstanceDataPacker: packs renderer data into shared UBO for instanced draw - ShaderFactory: unified _scanInstanceUniforms, _buildLayout, _injectInstanceUBO - MeshRenderer._canBatch/_batch: instancing merge logic - ShaderPass/SubShader: instance-aware compilation with macro cache - GLSLIfdefResolver: compile-time #ifdef resolution for instance field scanning - MacroCachePool: pooled ShaderMacroCollection for shader program caching - RenderQueue: instance-aware draw path with UBO binding
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds WebGL2 GPU instancing: instance UBO management, instance-aware shader compilation and caching, batching API/signature changes, render-queue instanced draw paths, device/GL helpers, examples and E2E tests, and supporting shader/uniform enhancements. Changes
Sequence Diagram(s)sequenceDiagram
participant RenderQueue as RenderQueue
participant BatcherManager as BatcherManager
participant InstanceBatch as InstanceBatch
participant GPUBuffer as GPU Constant Buffer
participant ShaderProgram as ShaderProgram
RenderQueue->>BatcherManager: request instanceBatch (lazy)
BatcherManager->>InstanceBatch: setLayout(layout)
InstanceBatch->>GPUBuffer: create/realloc UBO (if needed)
loop per instanced chunk
RenderQueue->>InstanceBatch: upload(renderers[], start, count)
InstanceBatch->>InstanceBatch: pack per-instance fields into CPU buffer
InstanceBatch->>GPUBuffer: setData(range, Discard)
end
RenderQueue->>ShaderProgram: bindUniformBlocks(bindingMap)
ShaderProgram->>GPUBuffer: uniformBlockBinding(bindingPoint)
RenderQueue->>RenderQueue: issue drawPrimitive with instanceCount
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Move _gpuInstanceMacro after _macroMap declaration to fix static initialization order. Also apply prettier formatting fixes.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev/2.0 #2957 +/- ##
===========================================
- Coverage 77.38% 77.02% -0.36%
===========================================
Files 900 907 +7
Lines 98752 99611 +859
Branches 9817 9818 +1
===========================================
+ Hits 76415 76728 +313
- Misses 22170 22711 +541
- Partials 167 172 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…es layout The instance UBO is injected at compile time by ShaderFactory, so the original shader uniform declarations don't need modification.
Bring back normalMat extraction, transform_declare reordering, trailing whitespace fixes, and VertexPBR indent fix. Only the renderer_Layer relocation stays reverted.
… 3×vec4 - Fix SubRenderElement.set() not resetting instanceDataPacker, causing stale packer references from previous frames to break all batching - Use whitelist + _group fallback for identifying renderer uniforms in _scanInstanceUniforms (fixes _group===undefined for ModelMat) - Store ModelMat as 3×vec4 (affine rows) instead of mat4 in UBO, saving 16 bytes per instance (structSize 80→64, +25% instances/batch) - Add camera_VPMat to transform_declare.glsl for derived MVP define - Extract struct definition outside uniform block (GLSL ES 3.00 compat) - Fix _insertUBOBlock to only scan initial #define preamble - Pass instanceID to fragment shader via flat varying
…lity Wrap derived NormalMat define with mat4() so instancing and non-instancing paths both produce mat4, avoiding shader compilation errors. Add custom instance data example to verify per-renderer uniform batching.
Rename elementA/elementB to preSubElement/subElement across all renderer subclasses and BatchUtils. Change _batch signature so preSubElement is nullable (null = batch head, no previous element to merge with), and subElement is always required.
- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker - Rename getOrCreate() to get() in InstanceDataPackerPool - Clear compileMacros in InstanceDataPacker.reset()
Move macro merging, layout computation, and UBO packing from batch phase to render phase. _batch now only collects renderers into a pre-allocated list. RenderQueue.render handles macro union, layout lookup, and splits by maxInstanceCount for sub-batch rendering. - SubRenderElement: instanceDataPacker → instancedRenderers (pooled array) - MeshRenderer._canBatch: remove maxInstanceCount check - MeshRenderer._batch: only push renderers, zero allocation - InstanceDataPacker: remove compileMacros/addRenderer/instanceCount, add packAndUpload(renderers, start, count) - InstanceDataPackerPool: remove uploadBuffer, simplify reset - BatcherManager: remove instancing uploadBuffer call
Packer is now stateless (setLayout + packAndUpload + draw), so only one instance is needed. Discard upload ensures no GPU stall when reusing the same buffer across shadow and main passes. - Delete InstanceDataPackerPool.ts - BatcherManager: instanceDataPackerPool → instanceDataPacker - Remove resetInstanceDataPackerPool lifecycle - Saves GPU memory by using single buffer instead of pool
Rename class, file, and all references to better reflect its role as an instance batch manager rather than a generic data packer.
It's a macro-keyed map, not a pool (no borrow/return semantics).
Align variable and method names with MacroMap rename — these are maps, not pools.
Callers always pass a valid buffer, no need for null guard.
- nativeBuffer → buffer, _uboData → _data - Replace separate instanceFields/_structSize with single _layout ref - setLayout() now takes InstanceLayout directly
- Remove unnecessary null guard on _layout - Inline uploadElements variable - Destructure floatView/intView from this - Improve worldMatrix comment
- Remove unnecessary component→renderer alias, use component directly - Hoist bindUniformBlocks/bindUniformBufferBase out of sub-batch loop - Move primitive.instanceCount=0 after loop (only need to reset once) - Remove redundant let layout = undefined
- Upgrade lint-staged from v10.5 to v16.4.0
- Fix glob from *.{ts} to **/*.ts to match subdirectory files
- Remove redundant git add from tasks
- eslint 8.44 → 8.57 - @typescript-eslint/parser and eslint-plugin 6.x → 8.x - Eliminates "unsupported TypeScript version" warning
Dispose means the object is permanently released, so null the array to free memory instead of just clearing length.
Eliminate redundant SubShader._getInstanceLayout / ShaderPass._scanInstanceFields by reusing the shader compilation chain — _injectInstanceUBO now returns InstanceLayout directly, stored on ShaderProgram._instanceLayout.
5bf37bd to
9533e2e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/src/shader-lab/PrecompileBenchmark.test.ts (1)
418-455:⚠️ Potential issue | 🟡 MinorDestroy the temporary
ShaderPrograms insidebenchSplit.Line 444 allocates a fresh WebGL program on every iteration and never releases it. That leaks GPU objects across scenarios and makes the later timings noisier than the earlier ones.
♻️ Proposed fix
): { cpu: number; gpu: number; total: number; vsLen: number; fsLen: number } { // Warmup for (let i = 0; i < warmup; i++) { - compileFn(macroCollection); + const { vertexSource, fragmentSource } = compileFn(macroCollection); + const program = new ShaderProgram(engine, vertexSource, fragmentSource); + program.destroy(); } const cpuTimes: number[] = []; const gpuTimes: number[] = []; let vsLen = 0; @@ vsLen = vertexSource.length; fsLen = fragmentSource.length; // GPU: create ShaderProgram // `@ts-ignore` - new ShaderProgram(engine, vertexSource, fragmentSource); + const program = new ShaderProgram(engine, vertexSource, fragmentSource); const t2 = performance.now(); + program.destroy(); cpuTimes.push(t1 - t0); gpuTimes.push(t2 - t1); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/src/shader-lab/PrecompileBenchmark.test.ts` around lines 418 - 455, benchSplit currently allocates a new ShaderProgram on each iteration without releasing it, leaking GPU resources; change the instantiation inside the loop to store the new ShaderProgram in a variable (e.g., const prog = new ShaderProgram(engine, vertexSource, fragmentSource)) and then immediately destroy it after measuring the creation time by calling its cleanup method (e.g., prog.destroy() or prog.dispose() depending on the ShaderProgram API); ensure you still capture t2 after creation and call the destroy method before the next iteration so GPU objects are released between runs.
♻️ Duplicate comments (2)
packages/core/src/RenderPipeline/RenderQueue.ts (2)
72-90:⚠️ Potential issue | 🔴 CriticalTreat
program._instanceLayout === nullas non-instanced before skipping renderer uploads.Line 73 takes the instancing fast path before the compiled program is known. When
program._instanceLayoutlater turns out to benull, this code has already skipped per-renderer transform/uniform uploads, then Lines 237-238 submit only one non-instanced draw. That drops the rest of the batch.Also applies to: 160-165, 189-195, 221-239
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/RenderPipeline/RenderQueue.ts` around lines 72 - 90, The instancing fast-path uses only instancedRenderers.length to skip per-renderer uploads, but you must also ensure the compiled program actually supports instancing; change the instancing check to consider program._instanceLayout (e.g. compute isInstanced = instancedRenderers.length > 0 && program._instanceLayout !== null) before skipping uploads so component._updateTransformShaderData still runs when program._instanceLayout is null; apply the same guard wherever similar logic appears (the other occurrences around the blocks using subElement, instancedRenderers, component._updateTransformShaderData, rendererUpdateFlag and ContextRendererUpdateFlag).
72-121:⚠️ Potential issue | 🟠 MajorSplit instanced batches on renderer-level macros and stencil state.
The compile macros and mask behavior are taken only from
subElement.component, but the batching rule described in this PR only keys on primitive/material/front-face. A mixed batch with differentreceiveShadows,_maskInteraction, or_maskLayervalues will therefore render every instance with the first renderer's state.Also applies to: 213-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/RenderPipeline/RenderQueue.ts` around lines 72 - 121, The instanced batching currently only unions material/global macros into compileMacros and ignores per-renderer macros and stencil/mask/receiveShadows/front-face state, so mixed renderers in one batch reuse the first renderer's state; fix by also unioning the renderer-specific macro collection (rendererData._macroCollection) into compileMacros via ShaderMacroCollection.unionCollection, and make the instancing decision (enabling InstanceBatch.gpuInstanceMacro) and mask/stencil handling conditional on renderer-level properties (component._maskInteraction, component._maskLayer, component.receiveShadows, primitive.frontFace or equivalent) so batches are split when those differ; apply the same change in the other identical block referenced (lines 213-219).
🧹 Nitpick comments (3)
examples/src/gpu-instancing-custom-data.ts (2)
24-24: Remove duplicateShaderPropertylookup.
renderer_CustomColoris already resolved at Line 24; re-fetching it at Line 122 is redundant.Use the existing shared property constant
-const _customColorProperty = ShaderProperty.getByName("renderer_CustomColor"); +const _customColorProperty = ShaderProperty.getByName("renderer_CustomColor"); @@ - const customColorProperty = ShaderProperty.getByName("renderer_CustomColor"); @@ - renderer.shaderData.setVector4(customColorProperty, initColor); + renderer.shaderData.setVector4(_customColorProperty, initColor);Also applies to: 122-122
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/src/gpu-instancing-custom-data.ts` at line 24, There is a duplicate ShaderProperty lookup for "renderer_CustomColor" — remove the second call to ShaderProperty.getByName("renderer_CustomColor") and use the already-declared constant _customColorProperty instead (replace any re-fetch in the code with _customColorProperty, e.g., in the code block that currently reassigns or re-gets the property).
65-65: Avoid per-framegetComponentlookup in this hot path.Line 65 does a component query every frame per entity; with 5000 entities this is avoidable overhead.
Cache MeshRenderer once and reuse
class SpiralFlash extends Script { + private _renderer: MeshRenderer | null = null; @@ onUpdate(deltaTime: number): void { @@ - this.entity.getComponent(MeshRenderer).shaderData.setVector4(_customColorProperty, this._color); + const renderer = this._renderer ?? (this._renderer = this.entity.getComponent(MeshRenderer)); + renderer?.shaderData.setVector4(_customColorProperty, this._color); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/src/gpu-instancing-custom-data.ts` at line 65, Per-frame call to this.entity.getComponent(MeshRenderer) is expensive; cache the MeshRenderer once (e.g., in the class constructor/awake/start) into a field like this._meshRenderer and replace the per-frame lookup with this._meshRenderer.shaderData.setVector4(_customColorProperty, this._color); ensure you initialize the cached reference (and guard or throw if missing) so subsequent frames reuse the MeshRenderer instead of calling getComponent each frame.tests/src/shader-lab/PrecompileABTest.test.ts (1)
185-297: Add at least oneGLSLES300compile case for the new instancing path.All touched compile assertions still run against
ShaderLanguage.GLSLES100, so this suite never exercises the WebGL2-only UBO /matNxMpath introduced by the PR. A singleGLSLES300case with renderer uniforms would give this A/B suite coverage overShaderFactory.injectInstanceUBOand the new matrix upload plumbing.Also applies to: 302-367, 517-576
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/src/shader-lab/PrecompileABTest.test.ts` around lines 185 - 297, The test suite never exercises the WebGL2/GLSLES300 instancing/UBO path; add at least one test case that calls validatePrecompiledWebGL with ShaderLanguage.GLSLES300 and a macro set that enables renderer uniforms/instancing (e.g. extend baseMacros with the renderer/instancing macros used by ShaderFactory.injectInstanceUBO and matrix upload code), so the precompile + _compileShaderProgram path covers the new matNxM / UBO upload plumbing; update the simpleShaders loop or add an explicit it("... GLSLES300") test using PBRSource or a shader that declares UBOs to ensure buildMacroCollection and ShaderPass._compileShaderProgram run the GLSLES300 code path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/shader/ShaderPass.ts`:
- Around line 174-183: isGPUInstance is computed solely from the macro
collection which allows UBO-style instancing to be injected on WebGL1; update
the _compileShaderProgram method so isGPUInstance is true only if the macro is
enabled AND the runtime supports WebGL2 (e.g. check engine.webGLVersion === 2 or
the engine property that indicates WebGL2 support). In practice modify the
isGPUInstance assignment near the top of _compileShaderProgram to combine
macroCollection.isEnable(InstanceBatch.gpuInstanceMacro) with the engine WebGL2
check so injectInstanceUBO (called indirectly by
_compileShaderLabSource/_compilePlatformSource) is never used on WebGL1.
---
Outside diff comments:
In `@tests/src/shader-lab/PrecompileBenchmark.test.ts`:
- Around line 418-455: benchSplit currently allocates a new ShaderProgram on
each iteration without releasing it, leaking GPU resources; change the
instantiation inside the loop to store the new ShaderProgram in a variable
(e.g., const prog = new ShaderProgram(engine, vertexSource, fragmentSource)) and
then immediately destroy it after measuring the creation time by calling its
cleanup method (e.g., prog.destroy() or prog.dispose() depending on the
ShaderProgram API); ensure you still capture t2 after creation and call the
destroy method before the next iteration so GPU objects are released between
runs.
---
Duplicate comments:
In `@packages/core/src/RenderPipeline/RenderQueue.ts`:
- Around line 72-90: The instancing fast-path uses only
instancedRenderers.length to skip per-renderer uploads, but you must also ensure
the compiled program actually supports instancing; change the instancing check
to consider program._instanceLayout (e.g. compute isInstanced =
instancedRenderers.length > 0 && program._instanceLayout !== null) before
skipping uploads so component._updateTransformShaderData still runs when
program._instanceLayout is null; apply the same guard wherever similar logic
appears (the other occurrences around the blocks using subElement,
instancedRenderers, component._updateTransformShaderData, rendererUpdateFlag and
ContextRendererUpdateFlag).
- Around line 72-121: The instanced batching currently only unions
material/global macros into compileMacros and ignores per-renderer macros and
stencil/mask/receiveShadows/front-face state, so mixed renderers in one batch
reuse the first renderer's state; fix by also unioning the renderer-specific
macro collection (rendererData._macroCollection) into compileMacros via
ShaderMacroCollection.unionCollection, and make the instancing decision
(enabling InstanceBatch.gpuInstanceMacro) and mask/stencil handling conditional
on renderer-level properties (component._maskInteraction, component._maskLayer,
component.receiveShadows, primitive.frontFace or equivalent) so batches are
split when those differ; apply the same change in the other identical block
referenced (lines 213-219).
---
Nitpick comments:
In `@examples/src/gpu-instancing-custom-data.ts`:
- Line 24: There is a duplicate ShaderProperty lookup for "renderer_CustomColor"
— remove the second call to ShaderProperty.getByName("renderer_CustomColor") and
use the already-declared constant _customColorProperty instead (replace any
re-fetch in the code with _customColorProperty, e.g., in the code block that
currently reassigns or re-gets the property).
- Line 65: Per-frame call to this.entity.getComponent(MeshRenderer) is
expensive; cache the MeshRenderer once (e.g., in the class
constructor/awake/start) into a field like this._meshRenderer and replace the
per-frame lookup with
this._meshRenderer.shaderData.setVector4(_customColorProperty, this._color);
ensure you initialize the cached reference (and guard or throw if missing) so
subsequent frames reuse the MeshRenderer instead of calling getComponent each
frame.
In `@tests/src/shader-lab/PrecompileABTest.test.ts`:
- Around line 185-297: The test suite never exercises the WebGL2/GLSLES300
instancing/UBO path; add at least one test case that calls
validatePrecompiledWebGL with ShaderLanguage.GLSLES300 and a macro set that
enables renderer uniforms/instancing (e.g. extend baseMacros with the
renderer/instancing macros used by ShaderFactory.injectInstanceUBO and matrix
upload code), so the precompile + _compileShaderProgram path covers the new
matNxM / UBO upload plumbing; update the simpleShaders loop or add an explicit
it("... GLSLES300") test using PBRSource or a shader that declares UBOs to
ensure buildMacroCollection and ShaderPass._compileShaderProgram run the
GLSLES300 code path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 24851768-3091-4202-b1a4-22334899b012
⛔ Files ignored due to path filters (16)
e2e/fixtures/originImage/Animator_animator-crossfade.jpgis excluded by!**/*.jpge2e/fixtures/originImage/Animator_animator-play.jpgis excluded by!**/*.jpge2e/fixtures/originImage/Camera_camera-opaque-texture.jpgis excluded by!**/*.jpge2e/fixtures/originImage/GPUInstancing_gpu-instancing-auto-batch.jpgis excluded by!**/*.jpge2e/fixtures/originImage/GPUInstancing_gpu-instancing-custom-data.jpgis excluded by!**/*.jpge2e/fixtures/originImage/Physics_physx-collision.jpgis excluded by!**/*.jpge2e/fixtures/originImage/Physics_physx-customUrl.jpgis excluded by!**/*.jpge2e/fixtures/originImage/Physics_physx-mesh-collider-data.jpgis excluded by!**/*.jpgpackages/core/src/shaderlib/extra/depthOnly.vs.glslis excluded by!**/*.glslpackages/core/src/shaderlib/extra/shadow-map.vs.glslis excluded by!**/*.glslpackages/core/src/shaderlib/extra/skybox.vs.glslis excluded by!**/*.glslpackages/core/src/shaderlib/normal_vert.glslis excluded by!**/*.glslpackages/core/src/shaderlib/transform_declare.glslis excluded by!**/*.glslpackages/shader/src/shaders/Transform.glslis excluded by!**/*.glslpackages/shader/src/shaders/shadingPBR/VertexPBR.glslis excluded by!**/*.glslpnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (38)
e2e/case/gpu-instancing-auto-batch.tse2e/case/gpu-instancing-custom-data.tse2e/config.tsexamples/package.jsonexamples/src/gpu-instancing-auto-batch.tsexamples/src/gpu-instancing-custom-data.tspackage.jsonpackages/core/src/2d/sprite/SpriteMask.tspackages/core/src/2d/sprite/SpriteRenderer.tspackages/core/src/2d/text/TextRenderer.tspackages/core/src/Engine.tspackages/core/src/RenderPipeline/BasicRenderPipeline.tspackages/core/src/RenderPipeline/BatchUtils.tspackages/core/src/RenderPipeline/BatcherManager.tspackages/core/src/RenderPipeline/InstanceBatch.tspackages/core/src/RenderPipeline/RenderQueue.tspackages/core/src/RenderPipeline/SubRenderElement.tspackages/core/src/Renderer.tspackages/core/src/graphic/TransformFeedbackShader.tspackages/core/src/graphic/enums/BufferBindFlag.tspackages/core/src/mesh/MeshRenderer.tspackages/core/src/mesh/SkinnedMeshRenderer.tspackages/core/src/shader/Shader.tspackages/core/src/shader/ShaderBlockProperty.tspackages/core/src/shader/ShaderPass.tspackages/core/src/shader/ShaderProgram.tspackages/core/src/shader/ShaderProgramMap.tspackages/core/src/shader/ShaderUniform.tspackages/core/src/shader/enums/ConstantBufferBindingPoint.tspackages/core/src/shaderlib/ShaderFactory.tspackages/rhi-webgl/src/GLBuffer.tspackages/rhi-webgl/src/WebGLGraphicDevice.tspackages/ui/src/component/UIRenderer.tspackages/ui/src/component/advanced/Image.tspackages/ui/src/component/advanced/Text.tstests/src/shader-lab/PrecompileABTest.test.tstests/src/shader-lab/PrecompileBenchmark.test.tstests/src/shader-lab/ShaderValidate.ts
✅ Files skipped from review due to trivial changes (9)
- examples/package.json
- packages/core/src/graphic/enums/BufferBindFlag.ts
- packages/core/src/shader/enums/ConstantBufferBindingPoint.ts
- packages/ui/src/component/advanced/Image.ts
- packages/ui/src/component/advanced/Text.ts
- packages/core/src/mesh/MeshRenderer.ts
- examples/src/gpu-instancing-auto-batch.ts
- packages/core/src/graphic/TransformFeedbackShader.ts
- packages/core/src/RenderPipeline/BatchUtils.ts
🚧 Files skipped from review as they are similar to previous changes (13)
- tests/src/shader-lab/ShaderValidate.ts
- packages/rhi-webgl/src/GLBuffer.ts
- packages/core/src/2d/sprite/SpriteRenderer.ts
- package.json
- e2e/case/gpu-instancing-auto-batch.ts
- packages/core/src/RenderPipeline/SubRenderElement.ts
- packages/core/src/shader/ShaderBlockProperty.ts
- e2e/config.ts
- e2e/case/gpu-instancing-custom-data.ts
- packages/core/src/2d/text/TextRenderer.ts
- packages/rhi-webgl/src/WebGLGraphicDevice.ts
- packages/core/src/RenderPipeline/InstanceBatch.ts
- packages/core/src/shaderlib/ShaderFactory.ts
| private _compileShaderProgram(engine: Engine, macroCollection: ShaderMacroCollection): ShaderProgram { | ||
| const isGPUInstance = macroCollection.isEnable(InstanceBatch.gpuInstanceMacro); | ||
| const { vertexSource, fragmentSource, instanceLayout } = | ||
| this._platformTarget != undefined | ||
| ? this._compileShaderLabSource(engine, macroCollection) | ||
| : this._compilePlatformSource(engine, macroCollection); | ||
| ? this._compileShaderLabSource(engine, macroCollection, isGPUInstance) | ||
| : this._compilePlatformSource(engine, macroCollection, isGPUInstance); | ||
|
|
||
| return new ShaderProgram(engine, vertexSource, fragmentSource); | ||
| const program = new ShaderProgram(engine, vertexSource, fragmentSource); | ||
| program._instanceLayout = instanceLayout; | ||
| return program; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does WebGL 1.0 / GLSL ES 1.00 support uniform blocks or layout(std140) declarations, or are those only available in WebGL2 / GLSL ES 3.00?
💡 Result:
No, WebGL 1.0 with GLSL ES 1.00 does not support uniform blocks or layout(std140) declarations. These features are only available in WebGL 2.0 with GLSL ES 3.00.
Citations:
- 1: https://registry.khronos.org/webgl/specs/latest/1.0/
- 2: https://learnwebgl.brown37.net/12_shader_language/documents/_GLSL_ES_Specification_1.0.17.pdf
- 3: https://www.fsynth.com/pdf/webgl1_glsl_1.pdf
- 4: https://registry.khronos.org/webgl/specs/latest/2.0/
- 5: https://stackoverflow.com/questions/33857321/usage-of-webgl2-uniform-buffer-object-and-layoutstd140
- 6: https://registry.khronos.org/OpenGL/specs/es/2.0/GLSL_ES_Specification_1.00.pdf
- 7: https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)
- 8: https://stackoverflow.com/questions/59474875/opengl-es2-syntax-error-when-using-layout-keyword-in-vertex-shaderios
- 9: https://stackoverflow.com/questions/42525541/opengl-es-2-0-attribute-vs-layout
🏁 Script executed:
# Check the actual content of ShaderPass.ts around the mentioned lines
cd packages/core/src/shader && wc -l ShaderPass.tsRepository: galacean/engine
Length of output: 76
🏁 Script executed:
# Read the critical section mentioned in the review (lines 174-250)
sed -n '174,250p' packages/core/src/shader/ShaderPass.ts | cat -nRepository: galacean/engine
Length of output: 3701
🏁 Script executed:
# Search for isWebGL2 property in the codebase to confirm it exists
rg "isWebGL2" --type ts -B 2 -A 2Repository: galacean/engine
Length of output: 43704
🏁 Script executed:
# Verify that InstanceBatch.gpuInstanceMacro exists
rg "gpuInstanceMacro" --type ts -B 2 -A 2Repository: galacean/engine
Length of output: 1316
Gate UBO instancing behind WebGL2 before shader injection.
isGPUInstance is derived only from the macro collection at line 175. On WebGL1, if the gpuInstanceMacro is enabled, injectInstanceUBO() will inject std140 uniform block syntax into the shader while #version 100 is emitted, causing shader compilation to fail instead of falling back to non-instanced rendering.
🔧 Proposed fix
private _compileShaderProgram(engine: Engine, macroCollection: ShaderMacroCollection): ShaderProgram {
- const isGPUInstance = macroCollection.isEnable(InstanceBatch.gpuInstanceMacro);
+ const isGPUInstance =
+ engine._hardwareRenderer.isWebGL2 && macroCollection.isEnable(InstanceBatch.gpuInstanceMacro);
const { vertexSource, fragmentSource, instanceLayout } =
this._platformTarget != undefined
? this._compileShaderLabSource(engine, macroCollection, isGPUInstance)
: this._compilePlatformSource(engine, macroCollection, isGPUInstance);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/shader/ShaderPass.ts` around lines 174 - 183, isGPUInstance
is computed solely from the macro collection which allows UBO-style instancing
to be injected on WebGL1; update the _compileShaderProgram method so
isGPUInstance is true only if the macro is enabled AND the runtime supports
WebGL2 (e.g. check engine.webGLVersion === 2 or the engine property that
indicates WebGL2 support). In practice modify the isGPUInstance assignment near
the top of _compileShaderProgram to combine
macroCollection.isEnable(InstanceBatch.gpuInstanceMacro) with the engine WebGL2
check so injectInstanceUBO (called indirectly by
_compileShaderLabSource/_compilePlatformSource) is never used on WebGL1.
Prevent incorrect batching when renderers have different shader macros (enableVertexColor, custom macros on renderer.shaderData, etc.)
Neither uniform is used in any engine shader. Unity/Unreal/Godot also don't provide equivalents. Removes per-frame Matrix.invert overhead
|
感谢详细的 review,逐条回复: Review 1[P1] ShaderPass.ts:175 — isGPUInstance 未加 WebGL2 守卫不需要。 上游 [P1] MeshRenderer.ts:177-184 — _canBatch 缺少 renderer 级 shader 状态比较已修复。 加了 [P1] ShaderFactory.ts:59-68 — renderer_LocalMat / renderer_MVInvMat derived 但无 #define已删除这两个 uniform(commit 761bec4)。 引擎内置 shader 没有任何代码使用这两个变量,Unity/Unreal/Godot 也都没有对应的内置变量。删除后还省了每帧一次 [P1] RenderQueue.ts:116 — compileMacros 缺少 renderer 级宏不存在。 [P2] RenderQueue.ts:213-218 — mask 状态未比较不需要。 [P2] ShaderFactory.ts — mat3x4 仿射假设不需要额外注释。 [P2] noise_common.glsl 残留常量不相关。 本 PR 是 GPU instancing,没有改动 noise 相关文件,与 #2960 不存在文件交叉。 [P2] SubRenderElement.ts dispose — 池复用 NPE不存在。 [P2] GLBuffer.ts — ConstantBuffer 无 WebGL2 检查不需要。 同 P1 第一条,RHI 层是内部代码,信任上游约束。 [P3] ShaderProgramMap.ts — engine 参数可选已修复(commit 91e2ce2)。 改为 required。 Review 2[P0] SubRenderElement.ts:49 — dispose 后池复用 NPE同上,不存在。 [P1] NormalMat inverse() 每顶点计算确实存在 GPU 开销,但不是本 PR 需要解决的。 非 instancing 路径下 NormalMat 是 CPU 预计算上传的,instancing 路径改为 [P1] 移除 opaque distance sort 影响 overdraw经过详细分析,不需要恢复距离排序。 移动端 TBDR 架构(Apple HSR、Mali FPK、Adreno LRZ)在硬件层面处理 overdraw,距离排序对 fragment 开销没有收益。顶点开销方面,排不排序都一样(被遮挡物体的顶点仍需处理)。合批优先的排序策略是行业趋势——Unity opaque 队列也是 shader/material 优先,Unreal/Godot 移动端同样不做距离排序。 [P2] InstanceBatch 小批次显存浪费当前设计合理。 UBO buffer 全局复用(只有一个 InstanceBatch),不是每个 batch 分配一个 buffer。 [P2] _scanInstanceUniforms 误匹配注释概率极低,且 PR 描述已规划 ShaderLab 预编译替代。 不阻塞。 [P2] renderer_MVInvMat derived 无 #define已删除(同上)。 [P2] _canBatch 中 isWebGL2 每次调用开销可忽略。 [P3] _derivedDefines 格式不影响功能,保持现状。 简化建议回复
|
…tin map Log error when renderer-group array uniforms are found during instancing UBO injection — arrays are not supported (consistent with Unity/Unreal/Godot). Also replace Map with Record<string, boolean> for _builtinRendererUniforms.
Review 5 回复[P1] NormalMat per-vertex inverse()同 Review 2 回复。 是已知 trade-off,GPU ALU 换 UBO 空间。大多数场景无 non-uniform scale 时 [P2] _derivedDefines 同时注入 VS 和 FS需要注入。
camera uniforms( [P2] Opaque 排序移除距离因子同 Review 2 回复。 移动端 TBDR 硬件处理 overdraw,距离排序无收益。合批优先是行业趋势。 [P2] upload() 中 worldMatrix getter 触发 lazy update这是 Transform 的公开契约,不是隐式依赖。 [P2] _scanInstanceUniforms 正则扫描同 Review 2 回复。 ShaderLab 预编译路径有结构化元数据,将替代正则方案。 [P3] _builtinRendererUniforms Map已修复。 最新 commit 已改为 |
Review 6 回复所有问题在之前的回复中已覆盖,逐条标注: [P1] NormalMat inverse() — 已回复(Review 2、5)已知 tradeoff,GPU ALU 换 UBO 空间。后续可优化。 [P1] opaque distance sort — 已回复(Review 2、5)移动端 TBDR 硬件处理 overdraw,合批优先是行业趋势。 [P2] _derivedDefines 注入 VS 和 FS — 已回复(Review 5)camera uniforms 在 VS 和 FS 中都有声明(通过 [P2] InstanceBatch setData 参数语义 — 误报
[P3] _derivedDefines 末行缺 \n — 已回复(Review 2)不影响功能,保持现状。 |
| } | ||
|
|
||
| /** | ||
| * @internal |
There was a problem hiding this comment.
不需要 @internal ,因为没有 export
后面 export 的 interface 应该放前面点
|
Top 1 — SubRenderElement ObjectPool NPE: Frame 1: new SubRenderElement() → instancedRenderers = [] ← class field Fix: Top 2 — _buildLayout structSize=0 崩溃: fieldMap 有 entry 但 type 全不在 _std140TypeInfoMap → addField 全部跳过 Fix: if (instanceFields.length === 0) return null;
const structSize = Math.ceil(currentOffset / 16) * 16;
const instanceMaxCount = Math.floor(maxUBOSize / structSize);
if (instanceMaxCount < 2) {
Logger.warn("GPU Instancing: struct too large, falling back");
return null;
} |
GuoLei1990
left a comment
There was a problem hiding this comment.
Re-review (Round 13) — GPU Instancing Auto-Batching
已关闭问题清单
前 12 轮提出的所有关键问题已修复或经作者解释关闭:
| # | 问题 | 状态 |
|---|---|---|
| 1 | _canBatch 缺少 renderer-level state |
✅ 已修复 (macroCollection.isEqual) |
| 2 | renderer_LocalMat/renderer_MVInvMat 删除 |
✅ 已修复 |
| 3 | ShaderProgramMap 重命名 | ✅ 已修复 |
| 4 | _builtinRendererUniforms Map → Record |
✅ 已修复 |
| 5 | SubRenderElement dispose NPE | ✅ 作者解释成立 |
| 6 | NormalMat per-vertex inverse | ✅ 已知 tradeoff |
| 7 | Opaque 移除距离排序 | ✅ TBDR 硬件处理 overdraw |
| 8-20 | 其他已关闭问题 | ✅ 详见 Round 11/12 |
总结
UBO-based auto instancing 方向正确,mat3x4 仿射优化、per-pass 独立布局、#define 重映射设计清晰。iPhone 16 PM 实测 +67% 数据有说服力。自 Round 12 以来无新 commit,前轮 P0 仍未修复。
问题
P0
-
[P0] ShaderFactory.ts
_buildLayout— structSize=0 导致 Infinity/NaN 崩溃前轮已提出但仍未修复。当
fieldMap有 entry 但所有 type 不在_std140TypeInfoMap中时(如用户只声明了uniform mat3 renderer_Custom;),_scanInstanceUniforms将 uniform 从 shader 中移除并加入fieldMap(hasField=true),但addField全部 skip →instanceFields为空 →currentOffset=0→structSize = Math.ceil(0/16)*16 = 0→instanceMaxCount = Math.floor(maxUBOSize/0) = Infinity→InstanceBatch.setLayout中new ArrayBuffer(Infinity * 0) = NaN→ RangeError 崩溃建议修复:在
_buildLayout的structSize计算后加守卫:if (instanceFields.length === 0) { return { instanceFields, instanceMaxCount: 0, structSize: 0 }; }
并在
injectInstanceUBO中检查instanceMaxCount < 2时返回instanceLayout: null
P1
-
[P1] ShaderFactory.ts
addField— 不支持的 uniform 类型静默跳过,应报错前轮已提出但仍未修复。
if (!info) return;静默跳过不支持的类型。用户的 renderer-group uniform 声明已被_scanInstanceUniforms从 shader 中移除,但又没进入 UBO struct,也没有#define重映射,shader 引用不存在的符号 → 编译错误,用户无法诊断原因。Array uniform 已有
Logger.error,此处应同样加上:if (!info) { Logger.error(`GPU Instancing does not support uniform type "${type}" for "${name}"`); return; }
-
[P1]
camera_VPMat在 ShaderLabTransform.glsl中缺失packages/core/src/shaderlib/transform_declare.glsl新增了uniform mat4 camera_VPMat;,但packages/shader/src/shaders/Transform.glsl(ShaderLab 路径)没有对应添加。当 ShaderLab shader 启用 instancing 时,注入的#define renderer_MVPMat (camera_VPMat * renderer_ModelMat)会引用未声明的camera_VPMat→ shader 编译失败。修复:在
Transform.glsl中camera_ProjMat后添加mat4 camera_VPMat;
简化建议
_buildLayout 返回 null 路径统一:让 _buildLayout 在 instanceFields.length === 0 时返回 null,外层 injectInstanceUBO 只需一个 if (!instanceLayout) 判断,消除当前的 !hasField 和 instanceMaxCount 两个独立 fallback 分支
Closes #194
Summary
InstanceBatch将 renderer uniform(ModelMat、Layer 等)打包到共享的 std140 UBO 中ShaderFactory.injectInstanceUBO自动扫描 shader 中的 renderer uniform,替换为 UBO 数组访问 +#define重映射mat3x4存储(仿射优化,48 字节 vs 64 字节),派生 uniform(MVMat/MVPMat/NormalMat)通过#define实时计算MeshRenderer._canBatch/_batch实现合批判定(相同 primitive + material + front-face)ShaderProgram._recordLocation跳过 UBO 成员(location === null),避免无用 ShaderUniform 创建Performance
测试场景: 2500 glTF 模型(Avocado) + 2500 自定义 shader 立方体,全部动态旋转 + 缩放 + 颜色动画
iPhone 实测截图(59 FPS / 21 Draw Calls / 5000 objects):
Future Optimization
injectInstanceUBO通过正则扫描 GLSL 文本获取 renderer uniform 信息。如果 ShaderLab 预编译时提供 uniform 元数据(name, type, group),可以消除正则扫描,改为精确拼接,提升代码健壮性和可扩展性Key Files
RenderPipeline/InstanceBatch.tsRenderPipeline/RenderQueue.tsshaderlib/ShaderFactory.tsshader/ShaderPass.tsshader/ShaderProgram.ts_instanceLayout字段,跳过 UBO 成员反射mesh/MeshRenderer.ts_canBatch/_batch合批逻辑shader/ShaderProgramMap.tsTest plan