Skip to content

[Messaging] Force idle timeout for first message of exchange#72756

Closed
andy31415 wants to merge 2 commits into
project-chip:masterfrom
andy31415:mrp-retrans-first-msg-fix
Closed

[Messaging] Force idle timeout for first message of exchange#72756
andy31415 wants to merge 2 commits into
project-chip:masterfrom
andy31415:mrp-retrans-first-msg-fix

Conversation

@andy31415

@andy31415 andy31415 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Re-work of #72230 - that change made GetMRPBaseTimeout to be used for all messages, including the first message of an exchange. For a newly created session, the peer is considered active (due to fresh session activity time being within SAT), so GetMRPBaseTimeout() returns SAI.

According to Matter Core Specification §4.12.2.1:

For the first message of a new exchange, the base interval, i, SHALL be set according to the idle state of the peer node as stored in the Session Context of the session...

Using SAI for the first message on a fresh session is a spec violation.

Change

This PR reworks the logic in ReliableMessageMgr::CalculateNextRetransTime to satisfy both requirements:

  1. First Message of Exchange: Forces the use of the peer's idle retransmission timeout (SII/mIdleRetransTimeout), complying with the spec.
  2. Subsequent Messages: Continues to use GetMRPBaseTimeout() to dynamically switch between SAI and SII based on peer activity and SAT, preserving the fix from [Messaging] Honor peer SAT in MRP retransmit backoff for ICDs #72230.

Testing

Verified that all unit tests in TestReliableMessageProtocol pass, including:

  • CheckPeerRetxUsesIdleBackoffWhenNoMessagesReceived (which verifies that the first message uses idle backoff).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CalculateNextRetransTime method in ReliableMessageMgr.cpp to determine the base retransmission timeout using an immediately invoked lambda expression (IIFE). However, the lambda expression as written will fail to compile because its return statements yield mismatched types (System::Clock::Milliseconds32 and System::Clock::Timeout). It is recommended to replace the IIFE with a standard if-else block, which resolves the compilation error and simplifies the code by leveraging implicit type conversion.

Comment thread src/messaging/ReliableMessageMgr.cpp
andy31415 and others added 2 commits June 26, 2026 13:40
According to Matter Core Specification §4.12.2.1, the first message of a
new exchange SHALL use the idle retransmission timeout (SII) of the peer,
as we cannot assume the peer is active yet.

Subsequent messages SHOULD use the active state of the peer (dynamic based
on PeerActiveMode, i.e. using GetMRPBaseTimeout()), unless the sender has
other means to determine if the device is active.

We implement these "other means" via GetMRPBaseTimeout(), which tracks
peer activity against the Session Active Threshold (SAT).

This fixes a spec violation introduced in PR project-chip#72230, where the first message
could use active timeout if the session was considered active (e.g. fresh session).
@andy31415 andy31415 force-pushed the mrp-retrans-first-msg-fix branch from 592905b to 0e8d0a5 Compare June 26, 2026 13:41
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

PR #72756: Size comparison from 51d1d76 to 0e8d0a5

Full report (33 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, esp32, nrfconnect, psoc6, qpg, realtek, stm32, telink)
platform target config section 51d1d76 0e8d0a5 change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1099176 1099198 22 0.0
RAM 133418 133418 0 0.0
bl702 lighting-app bl702+eth FLASH 1085726 1085748 22 0.0
RAM 109029 109029 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 882218 882240 22 0.0
RAM 108596 108596 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 777368 777384 16 0.0
RAM 103404 103404 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 790120 790136 16 0.0
RAM 108684 108684 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 739376 739400 24 0.0
RAM 97612 97612 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 719548 719564 16 0.0
RAM 97644 97644 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 569654 569670 16 0.0
RAM 205112 205112 0 0.0
lock CC3235SF_LAUNCHXL FLASH 597214 597230 16 0.0
RAM 205272 205272 0 0.0
efr32 lighting-app BRD4187C FLASH 1094924 1094988 64 0.0
RAM 135256 135256 0 0.0
lock-app BRD4187C FLASH 995184 995184 0 0.0
RAM 131292 131292 0 0.0
BRD4338a FLASH 799809 799857 48 0.0
RAM 243432 243432 0 0.0
esp32 all-clusters-app c3devkit DRAM 99556 99556 0 0.0
FLASH 1626146 1626170 24 0.0
IRAM 94776 94776 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 844772 844792 20 0.0
RAM 157771 157771 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1750756 1750788 32 0.0
RAM 215492 215492 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1626548 1626596 48 0.0
RAM 211604 211604 0 0.0
light cy8ckit_062s2_43012 FLASH 1470860 1470892 32 0.0
RAM 197436 197436 0 0.0
lock cy8ckit_062s2_43012 FLASH 1504308 1504356 48 0.0
RAM 225268 225268 0 0.0
qpg lighting-app qpg6200+debug FLASH 843156 843188 32 0.0
RAM 127908 127908 0 0.0
lock-app qpg6200+debug FLASH 782976 783008 32 0.0
RAM 118840 118840 0 0.0
realtek light-switch-app rtl8777g FLASH 689368 689392 24 0.0
RAM 101780 101780 0 0.0
lighting-app rtl8777g FLASH 730304 730320 16 0.0
RAM 102052 102052 0 0.0
stm32 light STM32WB5MM-DK FLASH 478976 478992 16 0.0
RAM 141492 141492 0 0.0
telink all-devices-app tl7218x FLASH 881716 881738 22 0.0
RAM 99716 99716 0 0.0
tlsr9118bdk40d FLASH 673322 673344 22 0.0
RAM 120848 120848 0 0.0
bridge-app tl7218x FLASH 734156 734178 22 0.0
RAM 97700 97700 0 0.0
light-app-ota-compress-lzma-factory-data tl3218x FLASH 800682 800704 22 0.0
RAM 42380 42380 0 0.0
light-app-ota-compress-lzma-shell-factory-data tl7218x FLASH 845822 845844 22 0.0
RAM 101492 101492 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 734714 734736 22 0.0
RAM 57824 57824 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 795802 795824 22 0.0
RAM 75176 75176 0 0.0
light-switch-app-ota-factory-data tl3218x_retention FLASH 734630 734652 22 0.0
RAM 34480 34480 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 615214 615236 22 0.0
RAM 118508 118508 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 842038 842064 26 0.0
RAM 97376 97376 0 0.0

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.60%. Comparing base (fc4a9e8) to head (0e8d0a5).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #72756      +/-   ##
==========================================
- Coverage   56.79%   56.60%   -0.19%     
==========================================
  Files        1642     1642              
  Lines      112757   113141     +384     
  Branches    13139    13245     +106     
==========================================
+ Hits        64041    64049       +8     
- Misses      48716    49092     +376     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +556 to +557
// We use the idle retransmission timeout (SII) from the session.
return sessionHandle->GetRemoteMRPConfig().mIdleRetransTimeout;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? The "idle state of the peer" is "is the peer idle or active?" and you're supposed to ask the session that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you refering to

bool IsPeerActive() const
    {
        return ((System::SystemClock().GetMonotonicTimestamp() - GetLastPeerActivityTime()) <
                GetRemoteMRPConfig().mActiveThresholdTime);
    }

This is used to determine sessionHandle->GetMRPBaseTimeout() like below. It could be used here yes.
The problem with only this check tho is that GetRemoteMRPConfig().mActiveThresholdTime could be set to 0 and then IsPeerActive is always false.
But in reality the ICD device should not go to Idle while a Exchange Context is open.
(Yes we did discus that the spec isn't enforcing this, but this is a real issue we should look into clearing that up too)

@andy31415

Copy link
Copy Markdown
Contributor Author

Closing for now ... this behavior needs settling in ICD TT ...

@andy31415 andy31415 closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants