Skip to content

Commit 812d413

Browse files
kvapsclaude
andcommitted
fix(linstor): preserve TCP ports during toggle-disk operations
Update fix-duplicate-tcp-ports patch to preserve existing TCP ports when DrbdRscData is recreated during toggle-disk operations. Without this, removeLayerData() frees ports and ensureStackDataExists() may allocate different ones, causing port mismatches between controller and satellites if the satellite misses the update. Also add dh_strip_nondeterminism override in Dockerfile to fix build failures on some JAR files. Upstream: LINBIT/linstor-server#476 (comment) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
1 parent 5da23f4 commit 812d413

File tree

3 files changed

+112
-59
lines changed

3 files changed

+112
-59
lines changed

packages/system/linstor/images/piraeus-server/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ RUN test -d .gradlehome && echo ".gradlehome found in tarball" || (echo ".gradle
6161
# Build DEB packages from tarball
6262
# Override GRADLE_FLAGS to remove --offline flag, allowing Gradle to download missing dependencies
6363
RUN sed -i 's/GRADLE_FLAGS = --offline/GRADLE_FLAGS =/' debian/rules || true
64+
# Skip dh_strip_nondeterminism to avoid failures on some JAR files (logback-core)
65+
RUN printf '\noverride_dh_strip_nondeterminism:\n\ttrue\n' >> debian/rules
6466
RUN LD_LIBRARY_PATH='' dpkg-buildpackage -rfakeroot -b -uc
6567

6668
# Copy built .deb packages to a location accessible from final image

packages/system/linstor/images/piraeus-server/patches/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Custom patches for piraeus-server (linstor-server) v1.32.3.
88
- Upstream: [#475](https://github.com/LINBIT/linstor-server/pull/475)
99
- **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful
1010
- Upstream: [#474](https://github.com/LINBIT/linstor-server/pull/474)
11-
- **fix-duplicate-tcp-ports.diff**Prevent duplicate TCP ports after toggle-disk operations
12-
- Upstream: [#476](https://github.com/LINBIT/linstor-server/pull/476)
11+
- **fix-duplicate-tcp-ports.diff**Preserve TCP ports during toggle-disk to prevent port mismatch between controller and satellites
12+
- Upstream: [#476](https://github.com/LINBIT/linstor-server/pull/476) (superseded by this expanded fix)
1313
- **skip-adjust-when-device-inaccessible.diff** — Fix resources stuck in StandAlone after reboot, Unknown state race condition, and encrypted resource deletion
1414
- Upstream: [#477](https://github.com/LINBIT/linstor-server/pull/477)
Lines changed: 108 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,138 @@
1-
From 1250abe99d64a0501795e37d3b6af62410002239 Mon Sep 17 00:00:00 2001
1+
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
22
From: Andrei Kvapil <kvapss@gmail.com>
3-
Date: Mon, 12 Jan 2026 13:44:46 +0100
4-
Subject: [PATCH] fix(drbd): prevent duplicate TCP ports after toggle-disk
5-
operations
3+
Date: Fri, 28 Mar 2026 13:00:00 +0100
4+
Subject: [PATCH] fix(drbd): preserve TCP ports during toggle-disk operations
65

7-
Remove redundant ensureStackDataExists() call with empty payload from
8-
resetStoragePools() method that was causing TCP port conflicts after
9-
toggle-disk operations.
6+
Prevent TCP port mismatches after toggle-disk operations by preserving
7+
existing TCP ports when rebuilding DrbdRscData.
108

119
Root Cause:
1210
-----------
13-
The resetStoragePools() method, introduced in 2019 (commit 95cc17d0b8),
14-
calls ensureStackDataExists() with an empty LayerPayload. This worked
15-
correctly when TCP ports were stored at RscDfn level.
11+
During toggle-disk operations, removeLayerData() deletes DrbdRscData
12+
(freeing its TCP ports from the number pool), then ensureStackDataExists()
13+
creates new DrbdRscData. Since the payload has no explicit tcpPorts,
14+
the controller allocates new ports from the pool -- which may differ from
15+
the old ports if other resources claimed them in the meantime.
1616

17-
After the TCP port migration to per-node level (commit f754943463, May
18-
2025), this empty payload results in DrbdRscData being created without
19-
TCP ports assigned. The controller then sends a Pojo with an empty port
20-
Set to satellites.
17+
The controller correctly avoids collisions in its own number pool, but
18+
the satellite may miss the update (e.g. during controller restart or
19+
network issues). When this happens, the satellite keeps the old ports
20+
while peers receive the new ones, causing DRBD connection failures
21+
(StandAlone/Connecting state).
2122

22-
On satellites, when DrbdRscData is initialized with an empty port list,
23-
initPorts() uses preferredNewPortsRef from peer resources. Since
24-
SatelliteDynamicNumberPool.tryAllocate() always returns true (no-op),
25-
any port from preferredNewPortsRef is accepted without conflict checking,
26-
leading to duplicate TCP port assignments.
27-
28-
Impact:
29-
-------
30-
This regression affects toggle-disk operations, particularly:
31-
- Snapshot creation/restore operations
32-
- Manual toggle-disk operations
33-
- Any operation calling resetStoragePools()
34-
35-
Symptoms include:
36-
- DRBD resources failing to adjust with "port is also used" errors
37-
- Resources stuck in StandAlone or Connecting states
38-
- Multiple resources on the same node using identical TCP ports
23+
Additionally, remove the redundant ensureStackDataExists() call from
24+
resetStoragePools() -- the caller already invokes it with the correct
25+
payload.
3926

4027
Solution:
4128
---------
42-
Remove the ensureStackDataExists() call from resetStoragePools() as it
43-
is redundant. The calling code (e.g., CtrlRscToggleDiskApiCallHandler
44-
line 1071) already invokes ensureStackDataExists() with the correct
45-
payload immediately after resetStoragePools().
46-
47-
This fix ensures:
48-
1. resetStoragePools() only resets storage pool assignments
49-
2. Layer data creation with proper TCP ports happens via the caller's
50-
ensureStackDataExists() with correct payload
51-
3. No DrbdRscData objects are created without TCP port assignments
29+
1. Add copyDrbdTcpPortsIfExists() to save existing TCP ports into the
30+
LayerPayload before removeLayerData() deletes them.
31+
2. Call it from copyDrbdNodeIdIfExists() (covers both toggle-disk paths)
32+
and from the needsDeactivate path (shared storage pool case).
33+
3. Remove the redundant ensureStackDataExists() from resetStoragePools().
5234

53-
Related Issues:
54-
---------------
55-
Fixes #454 - Duplicate TCP ports after backup/restore operations
56-
Related to user reports of resources stuck in StandAlone after node
57-
reboots when toggle-disk or backup operations were in progress.
58-
59-
Testing:
60-
--------
61-
Verified that:
62-
- Toggle-disk operations no longer create resources without TCP ports
63-
- Backup/restore operations complete without TCP port conflicts
64-
- Resources maintain unique TCP ports across toggle-disk cycles
35+
This ensures the same TCP ports are reused when DrbdRscData is recreated,
36+
eliminating the window for port mismatch between controller and satellites.
6537

6638
Co-Authored-By: Claude <noreply@anthropic.com>
6739
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
6840
---
69-
.../linbit/linstor/layer/resource/CtrlRscLayerDataFactory.java | 2 --
70-
1 file changed, 2 deletions(-)
41+
.../controller/CtrlRscToggleDiskApiCallHandler.java | 40 +++++++++++++++++++--
42+
.../linstor/layer/resource/CtrlRscLayerDataFactory.java | 2 --
43+
2 files changed, 38 insertions(+), 4 deletions(-)
44+
45+
diff --git a/controller/src/main/java/com/linbit/linstor/core/apicallhandler/controller/CtrlRscToggleDiskApiCallHandler.java b/controller/src/main/java/com/linbit/linstor/core/apicallhandler/controller/CtrlRscToggleDiskApiCallHandler.java
46+
index ccdb0cee5..b0554c2ec 100644
47+
--- a/controller/src/main/java/com/linbit/linstor/core/apicallhandler/controller/CtrlRscToggleDiskApiCallHandler.java
48+
+++ b/controller/src/main/java/com/linbit/linstor/core/apicallhandler/controller/CtrlRscToggleDiskApiCallHandler.java
49+
@@ -58,6 +58,7 @@ import com.linbit.linstor.stateflags.StateFlags;
50+
import com.linbit.linstor.storage.StorageException;
51+
import com.linbit.linstor.storage.data.adapter.drbd.DrbdRscData;
52+
import com.linbit.linstor.storage.interfaces.categories.resource.AbsRscLayerObject;
53+
+import com.linbit.linstor.core.types.TcpPortNumber;
54+
import com.linbit.linstor.storage.interfaces.categories.resource.VlmProviderObject;
55+
import com.linbit.linstor.storage.kinds.DeviceLayerKind;
56+
import com.linbit.linstor.storage.kinds.DeviceProviderKind;
57+
@@ -88,6 +89,7 @@ import java.util.LinkedHashMap;
58+
import java.util.List;
59+
import java.util.Map.Entry;
60+
import java.util.Set;
61+
+import java.util.TreeSet;
7162

63+
import org.reactivestreams.Publisher;
64+
import reactor.core.publisher.Flux;
65+
@@ -587,8 +589,9 @@ public class CtrlRscToggleDiskApiCallHandler implements CtrlSatelliteConnectionL
66+
67+
/*
68+
* We also have to remove the currently diskless DrbdRscData and free up the node-id as now we must
69+
- * use the shared resource's node-id
70+
+ * use the shared resource's node-id. We still need to preserve TCP ports though.
71+
*/
72+
+ copyDrbdTcpPortsIfExists(rsc, payload);
73+
}
74+
else
75+
{
76+
@@ -726,7 +729,7 @@ public class CtrlRscToggleDiskApiCallHandler implements CtrlSatelliteConnectionL
77+
/**
78+
* Although we need to rebuild the layerData as the layerList might have changed, if we do not
79+
* deactivate (i.e. down) the current resource, we need to make sure that deleting DrbdRscData
80+
- * and recreating a new DrbdRscData ends up with the same node-id as before.
81+
+ * and recreating a new DrbdRscData ends up with the same node-id and TCP ports as before.
82+
*/
83+
private void copyDrbdNodeIdIfExists(Resource rsc, LayerPayload payload) throws ImplementationError
84+
{
85+
@@ -743,6 +746,37 @@ public class CtrlRscToggleDiskApiCallHandler implements CtrlSatelliteConnectionL
86+
DrbdRscData<Resource> drbdRscData = (DrbdRscData<Resource>) drbdRscDataSet.iterator().next();
87+
payload.drbdRsc.nodeId = drbdRscData.getNodeId().value;
88+
}
89+
+ copyDrbdTcpPortsIfExists(rsc, payload);
90+
+ }
91+
+
92+
+ /**
93+
+ * Preserves existing TCP ports during toggle-disk operations.
94+
+ *
95+
+ * When removeLayerData() deletes DrbdRscData, the TCP ports are freed from the number pool.
96+
+ * If ensureStackDataExists() then allocates different ports, and the satellite misses the update
97+
+ * (e.g. due to controller restart or connectivity issues), the satellite keeps the old ports
98+
+ * while peers get the new ones, causing DRBD connections to fail with StandAlone state.
99+
+ */
100+
+ private void copyDrbdTcpPortsIfExists(Resource rsc, LayerPayload payload) throws ImplementationError
101+
+ {
102+
+ Set<AbsRscLayerObject<Resource>> drbdRscDataSet = LayerRscUtils.getRscDataByLayer(
103+
+ getLayerData(apiCtx, rsc),
104+
+ DeviceLayerKind.DRBD
105+
+ );
106+
+ if (!drbdRscDataSet.isEmpty())
107+
+ {
108+
+ DrbdRscData<Resource> drbdRscData = (DrbdRscData<Resource>) drbdRscDataSet.iterator().next();
109+
+ Collection<TcpPortNumber> tcpPorts = drbdRscData.getTcpPortList();
110+
+ if (tcpPorts != null && !tcpPorts.isEmpty())
111+
+ {
112+
+ Set<Integer> portInts = new TreeSet<>();
113+
+ for (TcpPortNumber port : tcpPorts)
114+
+ {
115+
+ portInts.add(port.value);
116+
+ }
117+
+ payload.drbdRsc.tcpPorts = portInts;
118+
+ }
119+
+ }
120+
}
121+
122+
private List<DeviceLayerKind> removeLayerData(Resource rscRef)
72123
diff --git a/controller/src/main/java/com/linbit/linstor/layer/resource/CtrlRscLayerDataFactory.java b/controller/src/main/java/com/linbit/linstor/layer/resource/CtrlRscLayerDataFactory.java
73124
index 3538b380c..4f589145e 100644
74125
--- a/controller/src/main/java/com/linbit/linstor/layer/resource/CtrlRscLayerDataFactory.java
75126
+++ b/controller/src/main/java/com/linbit/linstor/layer/resource/CtrlRscLayerDataFactory.java
76127
@@ -276,8 +276,6 @@ public class CtrlRscLayerDataFactory
77-
128+
78129
rscDataToProcess.addAll(rscData.getChildren());
79130
}
80131
-
81132
- ensureStackDataExists(rscRef, null, new LayerPayload());
82133
}
83134
catch (AccessDeniedException exc)
84135
{
85-
--
136+
--
86137
2.39.5 (Apple Git-154)
87138

0 commit comments

Comments
 (0)