Skip to content

[9.4] (backport #18970) Optimize DLQ segment directory scans with single-pass logic.#19013

Merged
mashhurs merged 1 commit into9.4from
mergify/bp/9.4/pr-18970
Apr 16, 2026
Merged

[9.4] (backport #18970) Optimize DLQ segment directory scans with single-pass logic.#19013
mashhurs merged 1 commit into9.4from
mergify/bp/9.4/pr-18970

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented Apr 16, 2026

Release notes

Performance improvements which saves ~40% CPU resource on DLQ segment file lookup operations.

What does this PR do?

~40% improvement on DLQ segment logics.

Before this change, listing segment files and finding max segment ID logic was using plain Java stream (UsingStream benchmark) to list all files, then filter by size and sort.
With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans.
Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 (WithMinSize in benchmarks) when updating oldest file segment and no size check when removing oldest segment file (NoMinSize in benchmarks) which will be handled in a single logic.

Why is it important/What is the impact to the user?

Improves the performances of the LS pipelines heavily using DLQs.

Old logic and Benchmarks can be seen here - https://github.com/elastic/logstash/compare/main...mashhurs:logstash:dlq-benchmark-test?expand=1

Segments NoMinSize WithMinSize (optimized) UsingStream
100 11.9 ops/ms 11.2 ops/ms 10.4 ops/ms
1000 1.70 ops/ms 1.70 ops/ms 1.22 ops/ms
10000 0.175 ops/ms 0.176 ops/ms 0.106 ops/ms
20000 0.077 ops/ms 0.084 ops/ms 0.046 ops/ms

Raw JMH data:

Benchmark segmentCount Mode Cnt Score Error Units
maxSegmentId 100 thrpt 10 12.610 ± 0.614 ops/ms
maxSegmentId 1000 thrpt 10 1.778 ± 0.069 ops/ms
maxSegmentId 10000 thrpt 10 0.184 ± 0.012 ops/ms
maxSegmentId 20000 thrpt 10 0.081 ± 0.015 ops/ms
maxSegmentIdUsingStream 100 thrpt 10 12.755 ± 1.596 ops/ms
maxSegmentIdUsingStream 1000 thrpt 10 1.917 ± 0.075 ops/ms
maxSegmentIdUsingStream 10000 thrpt 10 0.196 ± 0.020 ops/ms
maxSegmentIdUsingStream 20000 thrpt 10 0.086 ± 0.023 ops/ms
oldestSegmentPathNoMinSize 100 thrpt 10 11.913 ± 0.826 ops/ms
oldestSegmentPathNoMinSize 1000 thrpt 10 1.696 ± 0.090 ops/ms
oldestSegmentPathNoMinSize 10000 thrpt 10 0.175 ± 0.010 ops/ms
oldestSegmentPathNoMinSize 20000 thrpt 10 0.077 ± 0.013 ops/ms
oldestSegmentPathUsingStream 100 thrpt 10 10.363 ± 0.411 ops/ms
oldestSegmentPathUsingStream 1000 thrpt 10 1.221 ± 0.042 ops/ms
oldestSegmentPathUsingStream 10000 thrpt 10 0.106 ± 0.007 ops/ms
oldestSegmentPathUsingStream 20000 thrpt 10 0.046 ± 0.002 ops/ms
oldestSegmentPathWithMinSize 100 thrpt 10 11.231 ± 0.944 ops/ms
oldestSegmentPathWithMinSize 1000 thrpt 10 1.700 ± 0.072 ops/ms
oldestSegmentPathWithMinSize 10000 thrpt 10 0.176 ± 0.011 ops/ms
oldestSegmentPathWithMinSize 20000 thrpt 10 0.084 ± 0.006 ops/ms

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs


This is an automatic backport of pull request #18970 done by [Mergify](https://mergify.com).

* Optimize DLQ segment directory scans with single-pass DirectoryStream lookups

Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort.
With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans.
Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file  which will be handled in a single logic.

* Move file size condition after the extract segment ID.

* Add unit tests

* Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Refine the code comment.

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem

* Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java

Apply Java doc suggestion, provides clearer signal.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
(cherry picked from commit 894ca21)
@mergify mergify Bot added the backport label Apr 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

Copy link
Copy Markdown
Contributor

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean backport

@elasticmachine
Copy link
Copy Markdown

💛 Build succeeded, but was flaky

Failed CI Steps

cc @mashhurs

@mashhurs mashhurs merged commit 5c7f5c9 into 9.4 Apr 16, 2026
13 of 14 checks passed
@mashhurs mashhurs deleted the mergify/bp/9.4/pr-18970 branch April 16, 2026 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants