Skip to content

feat(core): GPU instancing auto-batching#2957

Open
GuoLei1990 wants to merge 58 commits intodev/2.0from
feat/gpu-instancing
Open

feat(core): GPU instancing auto-batching#2957
GuoLei1990 wants to merge 58 commits intodev/2.0from
feat/gpu-instancing

Conversation

@GuoLei1990
Copy link
Copy Markdown
Member

@GuoLei1990 GuoLei1990 commented Apr 7, 2026

Closes #194

Summary

  • 为 MeshRenderer 引入自动 GPU Instancing,通过 UBO 打包 per-instance 数据驱动 instanced draw call
  • InstanceBatch 将 renderer uniform(ModelMat、Layer 等)打包到共享的 std140 UBO 中
  • ShaderFactory.injectInstanceUBO 自动扫描 shader 中的 renderer uniform,替换为 UBO 数组访问 + #define 重映射
  • ModelMat 以 mat3x4 存储(仿射优化,48 字节 vs 64 字节),派生 uniform(MVMat/MVPMat/NormalMat)通过 #define 实时计算
  • InstanceLayout 在 shader 编译时计算并缓存在 ShaderProgram 上,渲染时直接取用,无重复计算
  • Per-pass 独立布局:不同 pass 按各自需要的 uniform 计算 struct 大小,ShadowCaster 等轻量 pass 可获得更高 instanceMaxCount
  • MeshRenderer._canBatch/_batch 实现合批判定(相同 primitive + material + front-face)
  • ShaderProgram._recordLocation 跳过 UBO 成员(location === null),避免无用 ShaderUniform 创建
  • Opaque 排序策略改为按 material/primitive 排序(合批优先),移除距离排序

Performance

测试场景: 2500 glTF 模型(Avocado) + 2500 自定义 shader 立方体,全部动态旋转 + 缩放 + 颜色动画

设备 dev/2.0 (无 instancing) feat/gpu-instancing 提升
iPhone 16 Pro Max 30 FPS 50 FPS +67%

iPhone 实测截图(59 FPS / 21 Draw Calls / 5000 objects):

Future Optimization

  • ShaderLab 预编译元数据: 当前 injectInstanceUBO 通过正则扫描 GLSL 文本获取 renderer uniform 信息。如果 ShaderLab 预编译时提供 uniform 元数据(name, type, group),可以消除正则扫描,改为精确拼接,提升代码健壮性和可扩展性
  • SubRenderElement 层级排序: Opaque 排序应下沉到 SubRenderElement 层级,避免多 submesh 物体打断合批连续性

Key Files

文件 职责
RenderPipeline/InstanceBatch.ts UBO buffer 管理、per-instance 数据 upload
RenderPipeline/RenderQueue.ts Instance-aware 渲染循环
shaderlib/ShaderFactory.ts Uniform 扫描、std140 布局计算、UBO 代码生成注入
shader/ShaderPass.ts Shader 编译入口,InstanceLayout 存储到 ShaderProgram
shader/ShaderProgram.ts 新增 _instanceLayout 字段,跳过 UBO 成员反射
mesh/MeshRenderer.ts _canBatch/_batch 合批逻辑
shader/ShaderProgramMap.ts 多级位掩码 ShaderProgram 缓存

Test plan

  • 多个共享 mesh + material 的 MeshRenderer 正确 instanced 渲染
  • Shadow pass 与 instanced 物体配合正常
  • 非 instancing renderer(SkinnedMeshRenderer、2D sprite)不受影响
  • 无 renderer uniform 时(无 UBO layout)正常回退
  • 性能验证:大量相同物体的 draw call 数量显著减少
  • iPhone 16 Pro Max 实测 30 FPS → 50 FPS(+67%)

…nce data

Introduce automatic GPU instancing for MeshRenderer. The system scans
renderer-group uniforms across shader passes, builds a unified std140
UBO layout, and packs per-instance data (ModelMat, Layer, etc.) each
frame. Key changes:

- InstanceDataPacker: packs renderer data into shared UBO for instanced draw
- ShaderFactory: unified _scanInstanceUniforms, _buildLayout, _injectInstanceUBO
- MeshRenderer._canBatch/_batch: instancing merge logic
- ShaderPass/SubShader: instance-aware compilation with macro cache
- GLSLIfdefResolver: compile-time #ifdef resolution for instance field scanning
- MacroCachePool: pooled ShaderMacroCollection for shader program caching
- RenderQueue: instance-aware draw path with UBO binding
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds WebGL2 GPU instancing: instance UBO management, instance-aware shader compilation and caching, batching API/signature changes, render-queue instanced draw paths, device/GL helpers, examples and E2E tests, and supporting shader/uniform enhancements.

Changes

Cohort / File(s) Summary
GPU Instancing Core
packages/core/src/RenderPipeline/InstanceBatch.ts, packages/core/src/shader/ShaderBlockProperty.ts, packages/core/src/shader/enums/ConstantBufferBindingPoint.ts
New InstanceBatch class managing CPU packing and dynamic UBO allocation; ShaderBlockProperty for uniform block ids; ConstantBuffer binding enum for renderer-instance UBO.
Batching Parameter Refactoring
packages/core/src/RenderPipeline/BatchUtils.ts, packages/core/src/2d/sprite/SpriteRenderer.ts, packages/core/src/2d/sprite/SpriteMask.ts, packages/core/src/2d/text/TextRenderer.ts, packages/core/src/Renderer.ts, packages/ui/src/component/UIRenderer.ts
Renamed batching parameters from (elementA, elementB)(preSubElement, subElement) and adjusted nullability/optional semantics; updated BatchUtils to consume new parameter roles.
SubRenderElement & RenderPipeline
packages/core/src/RenderPipeline/SubRenderElement.ts, packages/core/src/RenderPipeline/BasicRenderPipeline.ts, packages/core/src/RenderPipeline/RenderQueue.ts, packages/core/src/RenderPipeline/BatcherManager.ts
Replaced shaderPasses with subShader; added instancedRenderers on SubRenderElement; BasicRenderPipeline and RenderQueue updated to use SubShader and to perform instanced draw paths; BatcherManager exposes InstanceBatch and calls updated renderer batching signature.
Renderer & Mesh Changes
packages/core/src/Renderer.ts, packages/core/src/mesh/MeshRenderer.ts, packages/core/src/mesh/SkinnedMeshRenderer.ts
Renderer shader-property fields visibility adjusted; batching hooks updated to new signatures; MeshRenderer supports collecting instanced renderers into SubRenderElement; SkinnedMeshRenderer disables batching.
Shader Program & Compilation
packages/core/src/shader/ShaderPass.ts, packages/core/src/shader/ShaderProgram.ts, packages/core/src/shader/ShaderProgramMap.ts, packages/core/src/shader/ShaderBlockProperty.ts, packages/core/src/shaderlib/ShaderFactory.ts, packages/core/src/graphic/TransformFeedbackShader.ts
Switched from ShaderProgramPool → ShaderProgramMap cache; added _compileShaderProgram path that computes instance layouts and injects instance UBOs; ShaderProgram now records uniform block ids and supports bindUniformBlocks; ShaderFactory extended to generate instance-aware GLSL and return instance layout; related compile/cache call sites updated.
Shader Uniforms & Uploads
packages/core/src/shader/ShaderUniform.ts
Added upload methods for Mat2/Mat3 and rectangular matrix types (WebGL2 variants).
Engine & Caching Fields
packages/core/src/Engine.ts, packages/core/src/shader/ShaderProgramMap.ts
Replaced internal _shaderProgramPools with _shaderProgramMaps and adapted lazy accessors; ShaderProgramMap renamed/refactored with destroy() lifecycle.
RHI / WebGL2 Support
packages/rhi-webgl/src/GLBuffer.ts, packages/rhi-webgl/src/WebGLGraphicDevice.ts, packages/core/src/graphic/enums/BufferBindFlag.ts
Added ConstantBuffer bind flag and mapped it to UNIFORM_BUFFER; WebGL device gains bindUniformBufferBase, bindUniformBlock, and getMaxUniformBlockSize helpers; GLBuffer target selection updated.
2D / UI Render Element Updates
packages/core/src/2d/sprite/SpriteMask.ts, packages/ui/src/component/advanced/Image.ts, packages/ui/src/component/advanced/Text.ts
Overlay/2D sub-render elements now assign subShader instead of shaderPasses; small adjustments to subRenderElement setup to match SubShader change.
Examples, E2E & Tests
examples/src/gpu-instancing-auto-batch.ts, examples/src/gpu-instancing-custom-data.ts, e2e/case/gpu-instancing-auto-batch.ts, e2e/case/gpu-instancing-custom-data.ts, e2e/config.ts, examples/package.json, tests/src/shader-lab/*
Added examples and E2E tests for instancing; registered new E2E config entries; examples add custom shaders and animated instanced entities; tests updated to call _compileShaderProgram and benchmark helpers adjusted.
Tooling / Packages
package.json, examples/package.json
Dev tooling upgrades (ESLint, TypeScript ESLint parser/plugin, lint-staged) and added @galacean/engine-toolkit-stats dependency in examples.
Other Render Pipeline Adjustments
packages/core/src/RenderPipeline/BasicRenderPipeline.ts, packages/core/src/RenderPipeline/BatchUtils.ts, packages/core/src/RenderPipeline/InstanceBatch.ts
pushRenderElement now consumes SubShader; BatchUtils.batchFor2D signature adjusted; InstanceBatch integrated into BatcherManager and RenderQueue flows for chunked uploads and instanced draw submission.

Sequence Diagram(s)

sequenceDiagram
    participant RenderQueue as RenderQueue
    participant BatcherManager as BatcherManager
    participant InstanceBatch as InstanceBatch
    participant GPUBuffer as GPU Constant Buffer
    participant ShaderProgram as ShaderProgram

    RenderQueue->>BatcherManager: request instanceBatch (lazy)
    BatcherManager->>InstanceBatch: setLayout(layout)
    InstanceBatch->>GPUBuffer: create/realloc UBO (if needed)
    loop per instanced chunk
        RenderQueue->>InstanceBatch: upload(renderers[], start, count)
        InstanceBatch->>InstanceBatch: pack per-instance fields into CPU buffer
        InstanceBatch->>GPUBuffer: setData(range, Discard)
    end
    RenderQueue->>ShaderProgram: bindUniformBlocks(bindingMap)
    ShaderProgram->>GPUBuffer: uniformBlockBinding(bindingPoint)
    RenderQueue->>RenderQueue: issue drawPrimitive with instanceCount
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I packed the fields in quiet rows,
UBOs hum where instance data flows,
Shaders peep and vary hue,
Batches leap — five thousand, who knew?
A tiny rabbit hops — render goes!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title "feat(core): GPU instancing auto-batching" directly and clearly summarizes the primary feature added in this changeset: GPU instancing with automatic batching for improved rendering performance.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gpu-instancing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Move _gpuInstanceMacro after _macroMap declaration to fix static
initialization order. Also apply prettier formatting fixes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 40.70266% with 692 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.02%. Comparing base (a54642f) to head (ee3aa61).

Files with missing lines Patch % Lines
packages/core/src/shaderlib/ShaderFactory.ts 44.05% 174 Missing ⚠️
examples/src/gpu-instancing-auto-batch.ts 0.00% 116 Missing and 1 partial ⚠️
examples/src/gpu-instancing-custom-data.ts 0.00% 82 Missing and 1 partial ⚠️
e2e/case/gpu-instancing-custom-data.ts 0.00% 45 Missing and 1 partial ⚠️
packages/core/src/RenderPipeline/InstanceBatch.ts 50.53% 46 Missing ⚠️
e2e/case/gpu-instancing-auto-batch.ts 0.00% 42 Missing and 1 partial ⚠️
packages/core/src/RenderPipeline/RenderQueue.ts 50.00% 37 Missing ⚠️
packages/core/src/shader/ShaderProgram.ts 52.63% 27 Missing ⚠️
packages/core/src/RenderPipeline/BatchUtils.ts 15.78% 16 Missing ⚠️
packages/core/src/shader/ShaderUniform.ts 50.00% 16 Missing ⚠️
... and 17 more
Additional details and impacted files
@@             Coverage Diff             @@
##           dev/2.0    #2957      +/-   ##
===========================================
- Coverage    77.38%   77.02%   -0.36%     
===========================================
  Files          900      907       +7     
  Lines        98752    99611     +859     
  Branches      9817     9818       +1     
===========================================
+ Hits         76415    76728     +313     
- Misses       22170    22711     +541     
- Partials       167      172       +5     
Flag Coverage Δ
unittests 77.02% <40.70%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…es layout

The instance UBO is injected at compile time by ShaderFactory, so the
original shader uniform declarations don't need modification.
Bring back normalMat extraction, transform_declare reordering,
trailing whitespace fixes, and VertexPBR indent fix. Only the
renderer_Layer relocation stays reverted.
… 3×vec4

- Fix SubRenderElement.set() not resetting instanceDataPacker, causing
  stale packer references from previous frames to break all batching
- Use whitelist + _group fallback for identifying renderer uniforms in
  _scanInstanceUniforms (fixes _group===undefined for ModelMat)
- Store ModelMat as 3×vec4 (affine rows) instead of mat4 in UBO,
  saving 16 bytes per instance (structSize 80→64, +25% instances/batch)
- Add camera_VPMat to transform_declare.glsl for derived MVP define
- Extract struct definition outside uniform block (GLSL ES 3.00 compat)
- Fix _insertUBOBlock to only scan initial #define preamble
- Pass instanceID to fragment shader via flat varying
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 7, 2026
…lity

Wrap derived NormalMat define with mat4() so instancing and non-instancing
paths both produce mat4, avoiding shader compilation errors.
Add custom instance data example to verify per-renderer uniform batching.
Rename elementA/elementB to preSubElement/subElement across all
renderer subclasses and BatchUtils. Change _batch signature so
preSubElement is nullable (null = batch head, no previous element
to merge with), and subElement is always required.
- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker
- Rename getOrCreate() to get() in InstanceDataPackerPool
- Clear compileMacros in InstanceDataPacker.reset()
Move macro merging, layout computation, and UBO packing from batch
phase to render phase. _batch now only collects renderers into a
pre-allocated list. RenderQueue.render handles macro union, layout
lookup, and splits by maxInstanceCount for sub-batch rendering.

- SubRenderElement: instanceDataPacker → instancedRenderers (pooled array)
- MeshRenderer._canBatch: remove maxInstanceCount check
- MeshRenderer._batch: only push renderers, zero allocation
- InstanceDataPacker: remove compileMacros/addRenderer/instanceCount,
  add packAndUpload(renderers, start, count)
- InstanceDataPackerPool: remove uploadBuffer, simplify reset
- BatcherManager: remove instancing uploadBuffer call
Packer is now stateless (setLayout + packAndUpload + draw), so only
one instance is needed. Discard upload ensures no GPU stall when
reusing the same buffer across shadow and main passes.

- Delete InstanceDataPackerPool.ts
- BatcherManager: instanceDataPackerPool → instanceDataPacker
- Remove resetInstanceDataPackerPool lifecycle
- Saves GPU memory by using single buffer instead of pool
Rename class, file, and all references to better reflect its role as
an instance batch manager rather than a generic data packer.
It's a macro-keyed map, not a pool (no borrow/return semantics).
Align variable and method names with MacroMap rename — these are maps,
not pools.
Callers always pass a valid buffer, no need for null guard.
- nativeBuffer → buffer, _uboData → _data
- Replace separate instanceFields/_structSize with single _layout ref
- setLayout() now takes InstanceLayout directly
- Remove unnecessary null guard on _layout
- Inline uploadElements variable
- Destructure floatView/intView from this
- Improve worldMatrix comment
- Remove unnecessary component→renderer alias, use component directly
- Hoist bindUniformBlocks/bindUniformBufferBase out of sub-batch loop
- Move primitive.instanceCount=0 after loop (only need to reset once)
- Remove redundant let layout = undefined
- Upgrade lint-staged from v10.5 to v16.4.0
- Fix glob from *.{ts} to **/*.ts to match subdirectory files
- Remove redundant git add from tasks
- eslint 8.44 → 8.57
- @typescript-eslint/parser and eslint-plugin 6.x → 8.x
- Eliminates "unsupported TypeScript version" warning
Dispose means the object is permanently released, so null the array
to free memory instead of just clearing length.
Eliminate redundant SubShader._getInstanceLayout / ShaderPass._scanInstanceFields
by reusing the shader compilation chain — _injectInstanceUBO now returns
InstanceLayout directly, stored on ShaderProgram._instanceLayout.
@GuoLei1990 GuoLei1990 self-assigned this Apr 12, 2026
@GuoLei1990 GuoLei1990 added enhancement New feature or request and removed documentation Improvements or additions to documentation labels Apr 12, 2026
@GuoLei1990 GuoLei1990 force-pushed the feat/gpu-instancing branch from 5bf37bd to 9533e2e Compare April 12, 2026 07:21
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 12, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/src/shader-lab/PrecompileBenchmark.test.ts (1)

418-455: ⚠️ Potential issue | 🟡 Minor

Destroy the temporary ShaderPrograms inside benchSplit.

Line 444 allocates a fresh WebGL program on every iteration and never releases it. That leaks GPU objects across scenarios and makes the later timings noisier than the earlier ones.

♻️ Proposed fix
       ): { cpu: number; gpu: number; total: number; vsLen: number; fsLen: number } {
         // Warmup
         for (let i = 0; i < warmup; i++) {
-          compileFn(macroCollection);
+          const { vertexSource, fragmentSource } = compileFn(macroCollection);
+          const program = new ShaderProgram(engine, vertexSource, fragmentSource);
+          program.destroy();
         }

         const cpuTimes: number[] = [];
         const gpuTimes: number[] = [];
         let vsLen = 0;
@@
           vsLen = vertexSource.length;
           fsLen = fragmentSource.length;

           // GPU: create ShaderProgram
           // `@ts-ignore`
-          new ShaderProgram(engine, vertexSource, fragmentSource);
+          const program = new ShaderProgram(engine, vertexSource, fragmentSource);
           const t2 = performance.now();
+          program.destroy();

           cpuTimes.push(t1 - t0);
           gpuTimes.push(t2 - t1);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/shader-lab/PrecompileBenchmark.test.ts` around lines 418 - 455,
benchSplit currently allocates a new ShaderProgram on each iteration without
releasing it, leaking GPU resources; change the instantiation inside the loop to
store the new ShaderProgram in a variable (e.g., const prog = new
ShaderProgram(engine, vertexSource, fragmentSource)) and then immediately
destroy it after measuring the creation time by calling its cleanup method
(e.g., prog.destroy() or prog.dispose() depending on the ShaderProgram API);
ensure you still capture t2 after creation and call the destroy method before
the next iteration so GPU objects are released between runs.
♻️ Duplicate comments (2)
packages/core/src/RenderPipeline/RenderQueue.ts (2)

72-90: ⚠️ Potential issue | 🔴 Critical

Treat program._instanceLayout === null as non-instanced before skipping renderer uploads.

Line 73 takes the instancing fast path before the compiled program is known. When program._instanceLayout later turns out to be null, this code has already skipped per-renderer transform/uniform uploads, then Lines 237-238 submit only one non-instanced draw. That drops the rest of the batch.

Also applies to: 160-165, 189-195, 221-239

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/RenderPipeline/RenderQueue.ts` around lines 72 - 90, The
instancing fast-path uses only instancedRenderers.length to skip per-renderer
uploads, but you must also ensure the compiled program actually supports
instancing; change the instancing check to consider program._instanceLayout
(e.g. compute isInstanced = instancedRenderers.length > 0 &&
program._instanceLayout !== null) before skipping uploads so
component._updateTransformShaderData still runs when program._instanceLayout is
null; apply the same guard wherever similar logic appears (the other occurrences
around the blocks using subElement, instancedRenderers,
component._updateTransformShaderData, rendererUpdateFlag and
ContextRendererUpdateFlag).

72-121: ⚠️ Potential issue | 🟠 Major

Split instanced batches on renderer-level macros and stencil state.

The compile macros and mask behavior are taken only from subElement.component, but the batching rule described in this PR only keys on primitive/material/front-face. A mixed batch with different receiveShadows, _maskInteraction, or _maskLayer values will therefore render every instance with the first renderer's state.

Also applies to: 213-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/RenderPipeline/RenderQueue.ts` around lines 72 - 121, The
instanced batching currently only unions material/global macros into
compileMacros and ignores per-renderer macros and
stencil/mask/receiveShadows/front-face state, so mixed renderers in one batch
reuse the first renderer's state; fix by also unioning the renderer-specific
macro collection (rendererData._macroCollection) into compileMacros via
ShaderMacroCollection.unionCollection, and make the instancing decision
(enabling InstanceBatch.gpuInstanceMacro) and mask/stencil handling conditional
on renderer-level properties (component._maskInteraction, component._maskLayer,
component.receiveShadows, primitive.frontFace or equivalent) so batches are
split when those differ; apply the same change in the other identical block
referenced (lines 213-219).
🧹 Nitpick comments (3)
examples/src/gpu-instancing-custom-data.ts (2)

24-24: Remove duplicate ShaderProperty lookup.

renderer_CustomColor is already resolved at Line 24; re-fetching it at Line 122 is redundant.

Use the existing shared property constant
-const _customColorProperty = ShaderProperty.getByName("renderer_CustomColor");
+const _customColorProperty = ShaderProperty.getByName("renderer_CustomColor");
@@
-  const customColorProperty = ShaderProperty.getByName("renderer_CustomColor");
@@
-    renderer.shaderData.setVector4(customColorProperty, initColor);
+    renderer.shaderData.setVector4(_customColorProperty, initColor);

Also applies to: 122-122

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/src/gpu-instancing-custom-data.ts` at line 24, There is a duplicate
ShaderProperty lookup for "renderer_CustomColor" — remove the second call to
ShaderProperty.getByName("renderer_CustomColor") and use the already-declared
constant _customColorProperty instead (replace any re-fetch in the code with
_customColorProperty, e.g., in the code block that currently reassigns or
re-gets the property).

65-65: Avoid per-frame getComponent lookup in this hot path.

Line 65 does a component query every frame per entity; with 5000 entities this is avoidable overhead.

Cache MeshRenderer once and reuse
 class SpiralFlash extends Script {
+  private _renderer: MeshRenderer | null = null;
@@
   onUpdate(deltaTime: number): void {
@@
-    this.entity.getComponent(MeshRenderer).shaderData.setVector4(_customColorProperty, this._color);
+    const renderer = this._renderer ?? (this._renderer = this.entity.getComponent(MeshRenderer));
+    renderer?.shaderData.setVector4(_customColorProperty, this._color);
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/src/gpu-instancing-custom-data.ts` at line 65, Per-frame call to
this.entity.getComponent(MeshRenderer) is expensive; cache the MeshRenderer once
(e.g., in the class constructor/awake/start) into a field like
this._meshRenderer and replace the per-frame lookup with
this._meshRenderer.shaderData.setVector4(_customColorProperty, this._color);
ensure you initialize the cached reference (and guard or throw if missing) so
subsequent frames reuse the MeshRenderer instead of calling getComponent each
frame.
tests/src/shader-lab/PrecompileABTest.test.ts (1)

185-297: Add at least one GLSLES300 compile case for the new instancing path.

All touched compile assertions still run against ShaderLanguage.GLSLES100, so this suite never exercises the WebGL2-only UBO / matNxM path introduced by the PR. A single GLSLES300 case with renderer uniforms would give this A/B suite coverage over ShaderFactory.injectInstanceUBO and the new matrix upload plumbing.

Also applies to: 302-367, 517-576

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/shader-lab/PrecompileABTest.test.ts` around lines 185 - 297, The
test suite never exercises the WebGL2/GLSLES300 instancing/UBO path; add at
least one test case that calls validatePrecompiledWebGL with
ShaderLanguage.GLSLES300 and a macro set that enables renderer
uniforms/instancing (e.g. extend baseMacros with the renderer/instancing macros
used by ShaderFactory.injectInstanceUBO and matrix upload code), so the
precompile + _compileShaderProgram path covers the new matNxM / UBO upload
plumbing; update the simpleShaders loop or add an explicit it("... GLSLES300")
test using PBRSource or a shader that declares UBOs to ensure
buildMacroCollection and ShaderPass._compileShaderProgram run the GLSLES300 code
path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/shader/ShaderPass.ts`:
- Around line 174-183: isGPUInstance is computed solely from the macro
collection which allows UBO-style instancing to be injected on WebGL1; update
the _compileShaderProgram method so isGPUInstance is true only if the macro is
enabled AND the runtime supports WebGL2 (e.g. check engine.webGLVersion === 2 or
the engine property that indicates WebGL2 support). In practice modify the
isGPUInstance assignment near the top of _compileShaderProgram to combine
macroCollection.isEnable(InstanceBatch.gpuInstanceMacro) with the engine WebGL2
check so injectInstanceUBO (called indirectly by
_compileShaderLabSource/_compilePlatformSource) is never used on WebGL1.

---

Outside diff comments:
In `@tests/src/shader-lab/PrecompileBenchmark.test.ts`:
- Around line 418-455: benchSplit currently allocates a new ShaderProgram on
each iteration without releasing it, leaking GPU resources; change the
instantiation inside the loop to store the new ShaderProgram in a variable
(e.g., const prog = new ShaderProgram(engine, vertexSource, fragmentSource)) and
then immediately destroy it after measuring the creation time by calling its
cleanup method (e.g., prog.destroy() or prog.dispose() depending on the
ShaderProgram API); ensure you still capture t2 after creation and call the
destroy method before the next iteration so GPU objects are released between
runs.

---

Duplicate comments:
In `@packages/core/src/RenderPipeline/RenderQueue.ts`:
- Around line 72-90: The instancing fast-path uses only
instancedRenderers.length to skip per-renderer uploads, but you must also ensure
the compiled program actually supports instancing; change the instancing check
to consider program._instanceLayout (e.g. compute isInstanced =
instancedRenderers.length > 0 && program._instanceLayout !== null) before
skipping uploads so component._updateTransformShaderData still runs when
program._instanceLayout is null; apply the same guard wherever similar logic
appears (the other occurrences around the blocks using subElement,
instancedRenderers, component._updateTransformShaderData, rendererUpdateFlag and
ContextRendererUpdateFlag).
- Around line 72-121: The instanced batching currently only unions
material/global macros into compileMacros and ignores per-renderer macros and
stencil/mask/receiveShadows/front-face state, so mixed renderers in one batch
reuse the first renderer's state; fix by also unioning the renderer-specific
macro collection (rendererData._macroCollection) into compileMacros via
ShaderMacroCollection.unionCollection, and make the instancing decision
(enabling InstanceBatch.gpuInstanceMacro) and mask/stencil handling conditional
on renderer-level properties (component._maskInteraction, component._maskLayer,
component.receiveShadows, primitive.frontFace or equivalent) so batches are
split when those differ; apply the same change in the other identical block
referenced (lines 213-219).

---

Nitpick comments:
In `@examples/src/gpu-instancing-custom-data.ts`:
- Line 24: There is a duplicate ShaderProperty lookup for "renderer_CustomColor"
— remove the second call to ShaderProperty.getByName("renderer_CustomColor") and
use the already-declared constant _customColorProperty instead (replace any
re-fetch in the code with _customColorProperty, e.g., in the code block that
currently reassigns or re-gets the property).
- Line 65: Per-frame call to this.entity.getComponent(MeshRenderer) is
expensive; cache the MeshRenderer once (e.g., in the class
constructor/awake/start) into a field like this._meshRenderer and replace the
per-frame lookup with
this._meshRenderer.shaderData.setVector4(_customColorProperty, this._color);
ensure you initialize the cached reference (and guard or throw if missing) so
subsequent frames reuse the MeshRenderer instead of calling getComponent each
frame.

In `@tests/src/shader-lab/PrecompileABTest.test.ts`:
- Around line 185-297: The test suite never exercises the WebGL2/GLSLES300
instancing/UBO path; add at least one test case that calls
validatePrecompiledWebGL with ShaderLanguage.GLSLES300 and a macro set that
enables renderer uniforms/instancing (e.g. extend baseMacros with the
renderer/instancing macros used by ShaderFactory.injectInstanceUBO and matrix
upload code), so the precompile + _compileShaderProgram path covers the new
matNxM / UBO upload plumbing; update the simpleShaders loop or add an explicit
it("... GLSLES300") test using PBRSource or a shader that declares UBOs to
ensure buildMacroCollection and ShaderPass._compileShaderProgram run the
GLSLES300 code path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 24851768-3091-4202-b1a4-22334899b012

📥 Commits

Reviewing files that changed from the base of the PR and between 5bf37bd and 9533e2e.

⛔ Files ignored due to path filters (16)
  • e2e/fixtures/originImage/Animator_animator-crossfade.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/Animator_animator-play.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/Camera_camera-opaque-texture.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/GPUInstancing_gpu-instancing-auto-batch.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/GPUInstancing_gpu-instancing-custom-data.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/Physics_physx-collision.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/Physics_physx-customUrl.jpg is excluded by !**/*.jpg
  • e2e/fixtures/originImage/Physics_physx-mesh-collider-data.jpg is excluded by !**/*.jpg
  • packages/core/src/shaderlib/extra/depthOnly.vs.glsl is excluded by !**/*.glsl
  • packages/core/src/shaderlib/extra/shadow-map.vs.glsl is excluded by !**/*.glsl
  • packages/core/src/shaderlib/extra/skybox.vs.glsl is excluded by !**/*.glsl
  • packages/core/src/shaderlib/normal_vert.glsl is excluded by !**/*.glsl
  • packages/core/src/shaderlib/transform_declare.glsl is excluded by !**/*.glsl
  • packages/shader/src/shaders/Transform.glsl is excluded by !**/*.glsl
  • packages/shader/src/shaders/shadingPBR/VertexPBR.glsl is excluded by !**/*.glsl
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (38)
  • e2e/case/gpu-instancing-auto-batch.ts
  • e2e/case/gpu-instancing-custom-data.ts
  • e2e/config.ts
  • examples/package.json
  • examples/src/gpu-instancing-auto-batch.ts
  • examples/src/gpu-instancing-custom-data.ts
  • package.json
  • packages/core/src/2d/sprite/SpriteMask.ts
  • packages/core/src/2d/sprite/SpriteRenderer.ts
  • packages/core/src/2d/text/TextRenderer.ts
  • packages/core/src/Engine.ts
  • packages/core/src/RenderPipeline/BasicRenderPipeline.ts
  • packages/core/src/RenderPipeline/BatchUtils.ts
  • packages/core/src/RenderPipeline/BatcherManager.ts
  • packages/core/src/RenderPipeline/InstanceBatch.ts
  • packages/core/src/RenderPipeline/RenderQueue.ts
  • packages/core/src/RenderPipeline/SubRenderElement.ts
  • packages/core/src/Renderer.ts
  • packages/core/src/graphic/TransformFeedbackShader.ts
  • packages/core/src/graphic/enums/BufferBindFlag.ts
  • packages/core/src/mesh/MeshRenderer.ts
  • packages/core/src/mesh/SkinnedMeshRenderer.ts
  • packages/core/src/shader/Shader.ts
  • packages/core/src/shader/ShaderBlockProperty.ts
  • packages/core/src/shader/ShaderPass.ts
  • packages/core/src/shader/ShaderProgram.ts
  • packages/core/src/shader/ShaderProgramMap.ts
  • packages/core/src/shader/ShaderUniform.ts
  • packages/core/src/shader/enums/ConstantBufferBindingPoint.ts
  • packages/core/src/shaderlib/ShaderFactory.ts
  • packages/rhi-webgl/src/GLBuffer.ts
  • packages/rhi-webgl/src/WebGLGraphicDevice.ts
  • packages/ui/src/component/UIRenderer.ts
  • packages/ui/src/component/advanced/Image.ts
  • packages/ui/src/component/advanced/Text.ts
  • tests/src/shader-lab/PrecompileABTest.test.ts
  • tests/src/shader-lab/PrecompileBenchmark.test.ts
  • tests/src/shader-lab/ShaderValidate.ts
✅ Files skipped from review due to trivial changes (9)
  • examples/package.json
  • packages/core/src/graphic/enums/BufferBindFlag.ts
  • packages/core/src/shader/enums/ConstantBufferBindingPoint.ts
  • packages/ui/src/component/advanced/Image.ts
  • packages/ui/src/component/advanced/Text.ts
  • packages/core/src/mesh/MeshRenderer.ts
  • examples/src/gpu-instancing-auto-batch.ts
  • packages/core/src/graphic/TransformFeedbackShader.ts
  • packages/core/src/RenderPipeline/BatchUtils.ts
🚧 Files skipped from review as they are similar to previous changes (13)
  • tests/src/shader-lab/ShaderValidate.ts
  • packages/rhi-webgl/src/GLBuffer.ts
  • packages/core/src/2d/sprite/SpriteRenderer.ts
  • package.json
  • e2e/case/gpu-instancing-auto-batch.ts
  • packages/core/src/RenderPipeline/SubRenderElement.ts
  • packages/core/src/shader/ShaderBlockProperty.ts
  • e2e/config.ts
  • e2e/case/gpu-instancing-custom-data.ts
  • packages/core/src/2d/text/TextRenderer.ts
  • packages/rhi-webgl/src/WebGLGraphicDevice.ts
  • packages/core/src/RenderPipeline/InstanceBatch.ts
  • packages/core/src/shaderlib/ShaderFactory.ts

Comment on lines +174 to +183
private _compileShaderProgram(engine: Engine, macroCollection: ShaderMacroCollection): ShaderProgram {
const isGPUInstance = macroCollection.isEnable(InstanceBatch.gpuInstanceMacro);
const { vertexSource, fragmentSource, instanceLayout } =
this._platformTarget != undefined
? this._compileShaderLabSource(engine, macroCollection)
: this._compilePlatformSource(engine, macroCollection);
? this._compileShaderLabSource(engine, macroCollection, isGPUInstance)
: this._compilePlatformSource(engine, macroCollection, isGPUInstance);

return new ShaderProgram(engine, vertexSource, fragmentSource);
const program = new ShaderProgram(engine, vertexSource, fragmentSource);
program._instanceLayout = instanceLayout;
return program;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does WebGL 1.0 / GLSL ES 1.00 support uniform blocks or layout(std140) declarations, or are those only available in WebGL2 / GLSL ES 3.00?

💡 Result:

No, WebGL 1.0 with GLSL ES 1.00 does not support uniform blocks or layout(std140) declarations. These features are only available in WebGL 2.0 with GLSL ES 3.00.

Citations:


🏁 Script executed:

# Check the actual content of ShaderPass.ts around the mentioned lines
cd packages/core/src/shader && wc -l ShaderPass.ts

Repository: galacean/engine

Length of output: 76


🏁 Script executed:

# Read the critical section mentioned in the review (lines 174-250)
sed -n '174,250p' packages/core/src/shader/ShaderPass.ts | cat -n

Repository: galacean/engine

Length of output: 3701


🏁 Script executed:

# Search for isWebGL2 property in the codebase to confirm it exists
rg "isWebGL2" --type ts -B 2 -A 2

Repository: galacean/engine

Length of output: 43704


🏁 Script executed:

# Verify that InstanceBatch.gpuInstanceMacro exists
rg "gpuInstanceMacro" --type ts -B 2 -A 2

Repository: galacean/engine

Length of output: 1316


Gate UBO instancing behind WebGL2 before shader injection.

isGPUInstance is derived only from the macro collection at line 175. On WebGL1, if the gpuInstanceMacro is enabled, injectInstanceUBO() will inject std140 uniform block syntax into the shader while #version 100 is emitted, causing shader compilation to fail instead of falling back to non-instanced rendering.

🔧 Proposed fix
   private _compileShaderProgram(engine: Engine, macroCollection: ShaderMacroCollection): ShaderProgram {
-    const isGPUInstance = macroCollection.isEnable(InstanceBatch.gpuInstanceMacro);
+    const isGPUInstance =
+      engine._hardwareRenderer.isWebGL2 && macroCollection.isEnable(InstanceBatch.gpuInstanceMacro);
     const { vertexSource, fragmentSource, instanceLayout } =
       this._platformTarget != undefined
         ? this._compileShaderLabSource(engine, macroCollection, isGPUInstance)
         : this._compilePlatformSource(engine, macroCollection, isGPUInstance);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/shader/ShaderPass.ts` around lines 174 - 183, isGPUInstance
is computed solely from the macro collection which allows UBO-style instancing
to be injected on WebGL1; update the _compileShaderProgram method so
isGPUInstance is true only if the macro is enabled AND the runtime supports
WebGL2 (e.g. check engine.webGLVersion === 2 or the engine property that
indicates WebGL2 support). In practice modify the isGPUInstance assignment near
the top of _compileShaderProgram to combine
macroCollection.isEnable(InstanceBatch.gpuInstanceMacro) with the engine WebGL2
check so injectInstanceUBO (called indirectly by
_compileShaderLabSource/_compilePlatformSource) is never used on WebGL1.

GuoLei1990

This comment was marked as outdated.

Prevent incorrect batching when renderers have different shader macros
(enableVertexColor, custom macros on renderer.shaderData, etc.)
GuoLei1990

This comment was marked as outdated.

Neither uniform is used in any engine shader. Unity/Unreal/Godot also
don't provide equivalents. Removes per-frame Matrix.invert overhead
GuoLei1990

This comment was marked as outdated.

@GuoLei1990
Copy link
Copy Markdown
Member Author

感谢详细的 review,逐条回复:

Review 1

[P1] ShaderPass.ts:175 — isGPUInstance 未加 WebGL2 守卫

不需要。 上游 _canBatch 已保证 WebGL1 不会启用 gpuInstanceMacro,内部调用链信任上游约束,不做重复防御。只在系统边界(用户输入、外部 API)做防御,内部链路不需要。

[P1] MeshRenderer.ts:177-184 — _canBatch 缺少 renderer 级 shader 状态比较

已修复。 加了 shaderData._macroCollection.isEqual() 检查(commit d58aaa3),系统性覆盖所有 renderer 级宏差异(enableVertexColor、UV flags、用户自定义宏),不需要逐属性枚举。新增了 ShaderMacroCollection.isEqual() 方法,bitmask 数组比较,开销极小。

[P1] ShaderFactory.ts:59-68 — renderer_LocalMat / renderer_MVInvMat derived 但无 #define

已删除这两个 uniform(commit 761bec4)。 引擎内置 shader 没有任何代码使用这两个变量,Unity/Unreal/Godot 也都没有对应的内置变量。删除后还省了每帧一次 Matrix.invert 计算。

[P1] RenderQueue.ts:116 — compileMacros 缺少 renderer 级宏

不存在。 component._globalShaderMacroRenderer._prepareRender 中已经 union 了 camera._globalShaderMacro + renderer.shaderData._macroCollection,renderer 级宏已经包含在内。

[P2] RenderQueue.ts:213-218 — mask 状态未比较

不需要。 SpriteMaskInteraction 是 2D sprite 体系的功能,MeshRenderer 的 _maskInteraction 恒为 None,没有用户 API 可修改。

[P2] ShaderFactory.ts — mat3x4 仿射假设

不需要额外注释。 renderer_ModelMat 是 world matrix,永远是仿射变换(平移+旋转+缩放),第四行恒为 (0,0,0,1)。投影矩阵在 camera 组,不会混到 renderer 的 ModelMat。这是引擎架构保证。

[P2] noise_common.glsl 残留常量

不相关。 本 PR 是 GPU instancing,没有改动 noise 相关文件,与 #2960 不存在文件交叉。

[P2] SubRenderElement.ts dispose — 池复用 NPE

不存在。 dispose() 只在 ObjectPool.garbageCollection() 中调用,调完后 elements.length = 0 清空池子,dispose 过的对象不会被复用,后续 get() 都是 new 新实例。

[P2] GLBuffer.ts — ConstantBuffer 无 WebGL2 检查

不需要。 同 P1 第一条,RHI 层是内部代码,信任上游约束。

[P3] ShaderProgramMap.ts — engine 参数可选

已修复(commit 91e2ce2)。 改为 required。

Review 2

[P0] SubRenderElement.ts:49 — dispose 后池复用 NPE

同上,不存在。 garbageCollection() 后对象被丢弃,不会复用。

[P1] NormalMat inverse() 每顶点计算

确实存在 GPU 开销,但不是本 PR 需要解决的。 非 instancing 路径下 NormalMat 是 CPU 预计算上传的,instancing 路径改为 #define 推导是合理的 tradeoff——避免在 UBO 中多占 64 字节(降低 instanceMaxCount)。对于大多数场景(无 non-uniform scale),可以直接用 mat3(renderer_ModelMat)。这个优化可以作为后续 follow-up。

[P1] 移除 opaque distance sort 影响 overdraw

经过详细分析,不需要恢复距离排序。 移动端 TBDR 架构(Apple HSR、Mali FPK、Adreno LRZ)在硬件层面处理 overdraw,距离排序对 fragment 开销没有收益。顶点开销方面,排不排序都一样(被遮挡物体的顶点仍需处理)。合批优先的排序策略是行业趋势——Unity opaque 队列也是 shader/material 优先,Unreal/Godot 移动端同样不做距离排序。

[P2] InstanceBatch 小批次显存浪费

当前设计合理。 UBO buffer 全局复用(只有一个 InstanceBatch),不是每个 batch 分配一个 buffer。Discard 标记让驱动可以优化小写入。按需分配会引入频繁的 buffer resize,反而更差。

[P2] _scanInstanceUniforms 误匹配注释

概率极低,且 PR 描述已规划 ShaderLab 预编译替代。 不阻塞。

[P2] renderer_MVInvMat derived 无 #define

已删除(同上)。

[P2] _canBatch 中 isWebGL2 每次调用

开销可忽略。 isWebGL2 是一个 boolean 字段读取,不是函数调用。在热路径上这是一次内存读取,远小于后续的 primitive/material 比较开销。

[P3] _derivedDefines 格式

不影响功能,保持现状。

简化建议回复

  1. _canBatch 完整等价检查 — 已通过 macroCollection.isEqual() 系统性解决。
  2. instancedRenderers 可选字段 — 一个空数组的内存开销可忽略,改可选反而在热路径上增加 null 检查。
  3. InstanceBatch.upload modelMatId 分支消除 — 可以做但收益很小(每帧只执行 instanceCount 次),留作后续优化。

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

…tin map

Log error when renderer-group array uniforms are found during instancing
UBO injection — arrays are not supported (consistent with Unity/Unreal/Godot).
Also replace Map with Record<string, boolean> for _builtinRendererUniforms.
@GuoLei1990
Copy link
Copy Markdown
Member Author

Review 5 回复

[P1] NormalMat per-vertex inverse()

同 Review 2 回复。 是已知 trade-off,GPU ALU 换 UBO 空间。大多数场景无 non-uniform scale 时 inverse() 退化为转置,编译器可优化。后续可通过 RENDERER_HAS_NON_UNIFORM_SCALE 宏分两条路径优化。

[P2] _derivedDefines 同时注入 VS 和 FS

需要注入。 renderer_ModelMat(UBO 字段)在 VS 和 FS 中都可能被引用。_derivedDefines 中的三个 define 都依赖 renderer_ModelMat + camera uniforms:

  • renderer_MVMat = camera_ViewMat * renderer_ModelMat
  • renderer_MVPMat = camera_VPMat * renderer_ModelMat
  • renderer_NormalMat = transpose(inverse(mat3(renderer_ModelMat)))

camera uniforms(camera_ViewMatcamera_VPMat)在 VS 和 FS 中都有声明(通过 transform_declare.glsl include),所以 FS 中引用 derived uniform 不会编译失败。用户自定义 shader 如果在 FS 中用 renderer_NormalMat(例如做 bump mapping),不注入会反而出错。

[P2] Opaque 排序移除距离因子

同 Review 2 回复。 移动端 TBDR 硬件处理 overdraw,距离排序无收益。合批优先是行业趋势。

[P2] upload() 中 worldMatrix getter 触发 lazy update

这是 Transform 的公开契约,不是隐式依赖。 transform.worldMatrix getter 保证返回最新值(dirty 时自动更新),这是 Transform 组件的核心设计。引擎内所有需要 worldMatrix 的地方都通过这个 getter 获取。如果未来 Transform 更新策略变化,需要同步修改所有消费者,不限于 instancing 路径。注释已说明。

[P2] _scanInstanceUniforms 正则扫描

同 Review 2 回复。 ShaderLab 预编译路径有结构化元数据,将替代正则方案。

[P3] _builtinRendererUniforms Map

已修复。 最新 commit 已改为 Record<string, boolean>

GuoLei1990

This comment was marked as outdated.

@GuoLei1990
Copy link
Copy Markdown
Member Author

Review 6 回复

所有问题在之前的回复中已覆盖,逐条标注:

[P1] NormalMat inverse() — 已回复(Review 2、5)

已知 tradeoff,GPU ALU 换 UBO 空间。后续可优化。

[P1] opaque distance sort — 已回复(Review 2、5)

移动端 TBDR 硬件处理 overdraw,合批优先是行业趋势。

[P2] _derivedDefines 注入 VS 和 FS — 已回复(Review 5)

camera uniforms 在 VS 和 FS 中都有声明(通过 transform_declare.glsl),FS 中引用 derived uniform 不会编译失败。不注入反而会导致用户自定义 shader 在 FS 中使用 renderer_NormalMat 时出错。

[P2] InstanceBatch setData 参数语义 — 误报

GLBuffer.setDataDiscard 模式下先 gl.bufferData(target, byteLength, usage) orphan 整个 buffer,再 gl.bufferSubData 写入 partial 数据。这是标准的 buffer orphaning 技术,避免 GPU stall。partial write 是正确的。

[P3] _derivedDefines 末行缺 \n — 已回复(Review 2)

不影响功能,保持现状。

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

}

/**
* @internal
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要 @internal ,因为没有 export
后面 export 的 interface 应该放前面点

@zhuxudong
Copy link
Copy Markdown
Member

Top 1 — SubRenderElement ObjectPool NPE:

Frame 1: new SubRenderElement() → instancedRenderers = [] ← class field
Frame 1: set() → instancedRenderers.length = 0 ← OK
Frame 1: dispose() → instancedRenderers = null ← 置 null
Frame 2: ObjectPool 复用(不走 new,不重新初始化 field)
Frame 2: set() → this.instancedRenderers.length = 0
💥 TypeError: Cannot set property 'length' of null

Fix: dispose()this.instancedRenderers.length = 0 而非 = null

Top 2 — _buildLayout structSize=0 崩溃:

fieldMap 有 entry 但 type 全不在 _std140TypeInfoMap → addField 全部跳过
→ currentOffset = 0, structSize = 0
→ instanceMaxCount = Math.floor(16384 / 0) = Infinity
→ new ArrayBuffer(Infinity * 0) = new ArrayBuffer(NaN)
💥 RangeError: Invalid array buffer length

Fix:

if (instanceFields.length === 0) return null;
const structSize = Math.ceil(currentOffset / 16) * 16;
const instanceMaxCount = Math.floor(maxUBOSize / structSize);
if (instanceMaxCount < 2) {
  Logger.warn("GPU Instancing: struct too large, falling back");
  return null;
}

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

Copy link
Copy Markdown
Member Author

@GuoLei1990 GuoLei1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (Round 13) — GPU Instancing Auto-Batching

已关闭问题清单

前 12 轮提出的所有关键问题已修复或经作者解释关闭:

# 问题 状态
1 _canBatch 缺少 renderer-level state ✅ 已修复 (macroCollection.isEqual)
2 renderer_LocalMat/renderer_MVInvMat 删除 ✅ 已修复
3 ShaderProgramMap 重命名 ✅ 已修复
4 _builtinRendererUniforms Map → Record ✅ 已修复
5 SubRenderElement dispose NPE ✅ 作者解释成立
6 NormalMat per-vertex inverse ✅ 已知 tradeoff
7 Opaque 移除距离排序 ✅ TBDR 硬件处理 overdraw
8-20 其他已关闭问题 ✅ 详见 Round 11/12

总结

UBO-based auto instancing 方向正确,mat3x4 仿射优化、per-pass 独立布局、#define 重映射设计清晰。iPhone 16 PM 实测 +67% 数据有说服力。自 Round 12 以来无新 commit,前轮 P0 仍未修复。

问题

P0

  • [P0] ShaderFactory.ts _buildLayout — structSize=0 导致 Infinity/NaN 崩溃

    前轮已提出但仍未修复。当 fieldMap 有 entry 但所有 type 不在 _std140TypeInfoMap 中时(如用户只声明了 uniform mat3 renderer_Custom;),_scanInstanceUniforms 将 uniform 从 shader 中移除并加入 fieldMaphasField=true),但 addField 全部 skip → instanceFields 为空 → currentOffset=0structSize = Math.ceil(0/16)*16 = 0instanceMaxCount = Math.floor(maxUBOSize/0) = InfinityInstanceBatch.setLayoutnew ArrayBuffer(Infinity * 0) = NaNRangeError 崩溃

    建议修复:在 _buildLayoutstructSize 计算后加守卫:

    if (instanceFields.length === 0) {
      return { instanceFields, instanceMaxCount: 0, structSize: 0 };
    }

    并在 injectInstanceUBO 中检查 instanceMaxCount < 2 时返回 instanceLayout: null

P1

  • [P1] ShaderFactory.ts addField — 不支持的 uniform 类型静默跳过,应报错

    前轮已提出但仍未修复。if (!info) return; 静默跳过不支持的类型。用户的 renderer-group uniform 声明已被 _scanInstanceUniforms 从 shader 中移除,但又没进入 UBO struct,也没有 #define 重映射,shader 引用不存在的符号 → 编译错误,用户无法诊断原因。

    Array uniform 已有 Logger.error,此处应同样加上:

    if (!info) {
      Logger.error(`GPU Instancing does not support uniform type "${type}" for "${name}"`);
      return;
    }
  • [P1] camera_VPMat 在 ShaderLab Transform.glsl 中缺失

    packages/core/src/shaderlib/transform_declare.glsl 新增了 uniform mat4 camera_VPMat;,但 packages/shader/src/shaders/Transform.glsl(ShaderLab 路径)没有对应添加。当 ShaderLab shader 启用 instancing 时,注入的 #define renderer_MVPMat (camera_VPMat * renderer_ModelMat) 会引用未声明的 camera_VPMat → shader 编译失败。

    修复:在 Transform.glslcamera_ProjMat 后添加 mat4 camera_VPMat;

简化建议

_buildLayout 返回 null 路径统一:让 _buildLayoutinstanceFields.length === 0 时返回 null,外层 injectInstanceUBO 只需一个 if (!instanceLayout) 判断,消除当前的 !hasFieldinstanceMaxCount 两个独立 fallback 分支

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants