Skip to content

DAOS-18889 object: client retry conditional ops for DER_TX_RESTART case - b28#18273

Open
Nasf-Fan wants to merge 1 commit into
release/2.8from
Nasf-Fan/DAOS-18889_b28
Open

DAOS-18889 object: client retry conditional ops for DER_TX_RESTART case - b28#18273
Nasf-Fan wants to merge 1 commit into
release/2.8from
Nasf-Fan/DAOS-18889_b28

Conversation

@Nasf-Fan
Copy link
Copy Markdown
Contributor

@Nasf-Fan Nasf-Fan commented May 18, 2026

On DTX non-leader, the order of two conditional modifications against the same object shard is uncontrolled. If the one with newer epoch is handled before the older one, then related ilog logic may regard them as potential conflict, then return -DER_TX_RESTART to the caller when handle the old one. Under such case, directly restart related DTX (on DTX leader) with newer epoch may still generate conflict, because hlc epsilon boundary covers relative large range of epoch. Then let's ask client to retry the operation with random delay that will much reduce the possibility of subsequent epoch conflict.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…se - b28

On DTX non-leader, the order of two conditional modifications against
the same object shard is uncontrolled. If the one with newer epoch is
handled before the older one, then related ilog logic may regard them
as potential conflict, then return -DER_TX_RESTART to the caller when
handle the old one. Under such case, directly restart related DTX (on
DTX leader) with newer epoch may still generate conflict, because hlc
epsilon boundary covers relative large range of epoch. Then let's ask
client to retry the operation with random delay that will much reduce
the possibility of subsequent epoch conflict.

Signed-off-by: Fan Yong <fan.yong@hpe.com>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18889_b28 branch from ae1a63d to 94e555a Compare May 18, 2026 15:58
@github-actions
Copy link
Copy Markdown

Ticket title is 'Random timeouts on IOR file creation causing slowdowns'
Status is 'In Progress'
Labels: 'triaged'
https://daosio.atlassian.net/browse/DAOS-18889

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18273/2/execution/node/1325/log

@Nasf-Fan
Copy link
Copy Markdown
Contributor Author

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18273/2/execution/node/1325/log

osa_online_drain failed for DAOS-18218, to be retested.

@Nasf-Fan Nasf-Fan marked this pull request as ready for review May 20, 2026 00:53
@Nasf-Fan Nasf-Fan requested review from a team as code owners May 20, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants