Add universal noise and optional denoiser noise inputs#9044
Conversation
95915db to
d24eadf
Compare
fb931f3 to
eb4a0d9
Compare
…ches/318403-1776197224/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/558-1776426442/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/4860-1776428487/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/30802-1776472946/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/40315-1776547892/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/44449-1776563122/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/61407-1776641836/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/64843-1776644718/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/204380-1777639603/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/243554-1778015819/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/5274-1778067537/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/8515-1778079699/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/30379-1778113827/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/33047-1778114213/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/37441-1778116521/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/43512-1778151310/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/100137-1778198890/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/103250-1778200689/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/106872-1778204455/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/118861-1778206133/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/133068-1778207702/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/2384-1778293810/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/15493-1778355273/48f3124b5b84849f06a8827ac7636044580ecbbb
|
@dunkeroni I agree that the current node is not very clear. Right now the rule exists in the backend, but the UI does not explain it well:
The transformer is only there so the node can infer the right latent channel count for architectures where that is model-defined. For the others, the noise shape is fixed, so there is nothing to infer. On the SD3 question: I don't know of any SD3 or SD3.5 variants in our current ecosystem that are not 16-channel. The local SD3 fixtures I checked are all It also seems that CogView4 models are 16 channels, though I don't have that model installed. We could drop it there as well if you can confirm the 16 channel Depending on how those get answered, we could drop the Transformer linkage entirely. Thoughts? |
…ches/110826-1778506114/48f3124b5b84849f06a8827ac7636044580ecbbb
…ches/115319-1778508228/48f3124b5b84849f06a8827ac7636044580ecbbb
|
I think SD3 and CogView4 were made at a time when layerdiffuse and inpainting models were prevalent, so there were theoretically future models that used different channel counts (e.g. SDXL/SD1.5 inpaint models have 9 in channels). But at least in the case of inpaint models, we don't add noise to the excess channels anyway. Noise should match the number of channels of the VAE for the architecture, and we can add special cases if we ever support them. I think we can drop the transformers input and use the standard counts. That being the case, this node would be very similar to the existing noise node. So rather than add a new node, we could add the model type selector to the one we already have. As long as it defaults to SD (4 channel), it shouldn't disrupt any existing workflows. |
Fair! I'll switch things up. Stay tuned. |
|
@dunkeroni Give it another shot, please. I think things are aligned with your suggestions. |
| return (1, 16, 2 * math.ceil(height / 16), 2 * math.ceil(width / 16)) | ||
| if noise_type == "FLUX.2": | ||
| return (1, 32, 2 * math.ceil(height / 16), 2 * math.ceil(width / 16)) | ||
| if noise_type == "SD3": |
There was a problem hiding this comment.
Because we have already run validate_noise_dimensions, this ceiling math is always identical to height // LATENT_SCALE_FACTOR, width // LATENT_SCALE_FACTOR; height and width are already multiples of 16.
| 2 * math.ceil(height / 16), | ||
| 2 * math.ceil(width / 16), | ||
| device=device, | ||
| dtype=dtype, | ||
| generator=torch.Generator(device=device).manual_seed(seed), | ||
| ).to("cpu") | ||
| if noise_type == "FLUX.2": | ||
| if use_cpu: | ||
| return get_noise_flux2(num_samples=1, height=height, width=width, device=device, dtype=dtype, seed=seed).to( | ||
| "cpu" | ||
| ) | ||
| return torch.randn( | ||
| 1, | ||
| 32, | ||
| 2 * math.ceil(height / 16), | ||
| 2 * math.ceil(width / 16), |
There was a problem hiding this comment.
Again, height and width are already validated as multiples of 16 with these two models.
|
|
||
| def invoke(self, context: InvocationContext) -> NoiseOutput: | ||
| noise = get_noise( | ||
| validate_noise_dimensions(self.noise_type, self.width, self.height) |
There was a problem hiding this comment.
generate_noise_tensor begins by running validate_noise_dimensions, so we don't need to import or use that again here.
| if (node.type !== 'invocation') { | ||
| continue; | ||
| } | ||
| if (node.data.type !== 'universal_noise') { |
There was a problem hiding this comment.
Node type no longer exists with the latest PR state.
| it('should migrate universal_noise nodes to noise and drop the removed transformer input', async () => { | ||
| const workflow = getWorkflow(); | ||
| const noiseNode = buildNode(noise); | ||
| const noiseTypeInput = noiseNode.data.inputs.noise_type; | ||
| if (!noiseTypeInput) { | ||
| throw new Error('Missing noise_type input'); | ||
| } | ||
| noiseNode.data.type = 'universal_noise'; | ||
| noiseNode.data.version = '1.0.0'; | ||
| noiseTypeInput.value = 'FLUX'; | ||
| noiseNode.data.inputs.transformer = { | ||
| name: 'transformer', | ||
| label: '', | ||
| description: '', | ||
| value: { key: 'transformer-key', hash: 'hash', name: 'name', base: 'sd-3', type: 'main' }, | ||
| } as never; | ||
| workflow.nodes = [noiseNode]; | ||
|
|
||
| const validationResult = await validateWorkflow({ | ||
| workflow, | ||
| templates: { noise }, | ||
| checkImageAccess: resolveTrue, | ||
| checkBoardAccess: resolveTrue, | ||
| checkModelAccess: resolveTrue, | ||
| }); | ||
|
|
||
| expect(validationResult.warnings).toEqual([]); | ||
| const migratedNode = validationResult.workflow.nodes[0]; | ||
| expect(isWorkflowInvocationNode(migratedNode)).toBe(true); | ||
| if (!isWorkflowInvocationNode(migratedNode)) { | ||
| throw new Error('Expected invocation node'); | ||
| } | ||
| expect(migratedNode.data.type).toBe('noise'); | ||
| expect(migratedNode.data.version).toBe('1.1.0'); | ||
| expect(get(validationResult.workflow, 'nodes[0].data.inputs.noise_type.value')).toBe('FLUX'); | ||
| expect(get(validationResult.workflow, 'nodes[0].data.inputs.transformer')).toBeUndefined(); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Not a migration that anyone needs to do unless they made workflows from an earlier version of this PR.
| if use_cpu: | ||
| return get_flux_noise(num_samples=1, height=height, width=width, device=device, dtype=dtype, seed=seed).to( | ||
| "cpu" | ||
| ) |
There was a problem hiding this comment.
If device is cpu, we call to an imported function's torch.randn(), and if not cpu then we run an identical but local torch.randn()? I don't see the difference for this exception case. Seems like we could just return torch.randn(device="cpu") from here just like the other models do and not rely on the import. Same for FLUX.2 below.
|
@dunkeroni I think all of your comments are addressed. I hope. |
…ches/142805-1778527764/48f3124b5b84849f06a8827ac7636044580ecbbb

Summary
Adds a new
Universal Noiseinvocation for architecture-specific latent noise generation and extends supported denoise invocations to accept optional externalnoiseinputs while preserving existing behavior whennoiseis not connected. This backwards compatibility means that existing workflows, including those generated by the canvas and linear frontend, continue to work without modification because all newnoiseinputs are optional and default to the existing internal noise-generation path.Backend changes cover
FLUX,FLUX.2,SD3,CogView4,Z-Image, andAnimadenoisers, plus focused validation and regression tests. Frontend-facing schema was regenerated so the new invocation and denoisenoiseinputs are available ininvokeai/frontend/web/src/services/api/schema.ts. Documentation was updated ininvokeai/docs/contributing/NEW_MODEL_INTEGRATION.mdto require extendingUniversal Noisewhen possible for new architectures that support external noise.Also added inline denoiser documentation for the existing img2img scheduler-parity limitations, wrote tests for these limitations, and preserved explicit regression coverage by marking the known scheduler-mismatch cases as expected failures rather than dropping those tests.
Related Issues / Discussions
QA Instructions
Try different models using the new
Universal Noisenode as noise input. Verify that it works with different start points and existing input latents whether using the added noise or leaving it off.Merge Plan
Checklist
What's Newcopy (if doing a release after this PR)