[Feature]: Add LoRA fine-tuning support for the Qwen3.5 (Gated DeltaNet) model family

### 🎯 Problem Statement

Qwen3.5 inference already works well in qvac-fabric-llm.cpp (the Gated-DeltaNet op is in place as of the 8828 line), but LoRA fine-tuning is not yet supported for Qwen3.5 — the fine-tuning path currently covers Qwen3, Gemma3, and BitNet only. We'd like to train lightweight LoRA adapters on the small dense Qwen3.5 variants (0.8B / 2B / 4B) for language and domain adaptation, and the blocker is that Qwen3.5's hybrid architecture — ~75% linear-attention "Gated DeltaNet" layers mixed with full-attention layers — needs custom backward-pass implementations for those linear-attention layers that don't exist yet. Could you add Qwen3.5 to the supported fine-tuning architectures? Ideally this would also include being able to apply the resulting adapter at inference time, since the standalone LoRA→GGUF adapter export is currently broken upstream for this architecture (ggml-org/llama.cpp#21125), which today forces a merge-and-reconvert workaround and rules out runtime adapter hot-swapping. Support for the small variants would make Qwen3.5 viable for on-device, fine-tuned use cases.

### 💡 Proposed Solution

Two possible tracks, depending on appetite:

Track A — full on-device LoRA training (the proper fix). Implement the backward pass for the Gated-DeltaNet / linear-attention op in the fine-tuning engine. Note that even though LoRA only targets the standard projection modules (q/k/v/o_proj, gate/up/down_proj), the linear-attention layers make up ~75% of the stack and are interleaved with the full-attention layers, so gradients have to flow through the recurrent op to reach almost every adapter — i.e. a forward-only op isn't enough, the op's gradient is required. The chunked gated-delta-rule has known analytical gradients; the flash-linear-attention (FLA) library is a reasonable reference implementation for the fwd+bwd to port.

Track B — interim that unblocks the workflow without new training kernels. Two smaller pieces: (1) Officially support importing a merged, externally-fine-tuned Qwen3.5 model (train the LoRA off-device with PEFT/Unsloth, merge_and_unload() into a 16-bit base, convert with convert_hf_to_gguf.py) — this already produces a working Qwen3.5 GGUF, it just needs to be a documented/supported path. (2) Fix standalone LoRA→GGUF export so adapters can be loaded at runtime (enabling hot-swap instead of shipping a full merged model per adapter): the failure is in _reorder_v_heads / LoraTorchTensor.reshape (upstream ggml-org/llama.cpp#21125); a previously-proposed approach was to permute the LoRA B/A factors rather than reshape them (see the closed ggml-org/llama.cpp#21354).

Track B (1) is essentially free and would help immediately; Track B (2) restores runtime adapters; Track A is the full on-device capability.

### 📋 Use Cases

LoRA fine-tuning for local dialects

### 📊 Expected Impact

High - Improves common workflows

### 🔄 Alternatives Considered

_No response_

### ⚠️ Constraints & Considerations

_No response_

### 🤝 Contribution

- [ ] I would be willing to submit a PR for this feature

### 📎 Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add LoRA fine-tuning support for the Qwen3.5 (Gated DeltaNet) model family #2319

🎯 Problem Statement

💡 Proposed Solution

📋 Use Cases

📊 Expected Impact

🔄 Alternatives Considered

⚠️ Constraints & Considerations

🤝 Contribution

📎 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Add LoRA fine-tuning support for the Qwen3.5 (Gated DeltaNet) model family #2319

Description

🎯 Problem Statement

💡 Proposed Solution

📋 Use Cases

📊 Expected Impact

🔄 Alternatives Considered

⚠️ Constraints & Considerations

🤝 Contribution

📎 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions