Optimize DLQ segment directory scans with single-pass logic. by mashhurs · Pull Request #18970 · elastic/logstash

mashhurs · 2026-04-08T23:57:22Z

Release notes

Performance improvements which saves ~40% CPU resource on DLQ segment file lookup operations.

What does this PR do?

~40% improvement on DLQ segment logics.

Before this change, listing segment files and finding max segment ID logic was using plain Java stream (UsingStream benchmark) to list all files, then filter by size and sort.
With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans.
Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 (WithMinSize in benchmarks) when updating oldest file segment and no size check when removing oldest segment file (NoMinSize in benchmarks) which will be handled in a single logic.

Why is it important/What is the impact to the user?

Improves the performances of the LS pipelines heavily using DLQs.

Old logic and Benchmarks can be seen here - https://github.com/elastic/logstash/compare/main...mashhurs:logstash:dlq-benchmark-test?expand=1

Segments	NoMinSize	WithMinSize (optimized)	UsingStream
100	11.9 ops/ms	11.2 ops/ms	10.4 ops/ms
1000	1.70 ops/ms	1.70 ops/ms	1.22 ops/ms
10000	0.175 ops/ms	0.176 ops/ms	0.106 ops/ms
20000	0.077 ops/ms	0.084 ops/ms	0.046 ops/ms

Raw JMH data:

Benchmark	segmentCount	Mode	Cnt	Score	Error	Units
maxSegmentId	100	thrpt	10	12.610	± 0.614	ops/ms
maxSegmentId	1000	thrpt	10	1.778	± 0.069	ops/ms
maxSegmentId	10000	thrpt	10	0.184	± 0.012	ops/ms
maxSegmentId	20000	thrpt	10	0.081	± 0.015	ops/ms
maxSegmentIdUsingStream	100	thrpt	10	12.755	± 1.596	ops/ms
maxSegmentIdUsingStream	1000	thrpt	10	1.917	± 0.075	ops/ms
maxSegmentIdUsingStream	10000	thrpt	10	0.196	± 0.020	ops/ms
maxSegmentIdUsingStream	20000	thrpt	10	0.086	± 0.023	ops/ms
oldestSegmentPathNoMinSize	100	thrpt	10	11.913	± 0.826	ops/ms
oldestSegmentPathNoMinSize	1000	thrpt	10	1.696	± 0.090	ops/ms
oldestSegmentPathNoMinSize	10000	thrpt	10	0.175	± 0.010	ops/ms
oldestSegmentPathNoMinSize	20000	thrpt	10	0.077	± 0.013	ops/ms
oldestSegmentPathUsingStream	100	thrpt	10	10.363	± 0.411	ops/ms
oldestSegmentPathUsingStream	1000	thrpt	10	1.221	± 0.042	ops/ms
oldestSegmentPathUsingStream	10000	thrpt	10	0.106	± 0.007	ops/ms
oldestSegmentPathUsingStream	20000	thrpt	10	0.046	± 0.002	ops/ms
oldestSegmentPathWithMinSize	100	thrpt	10	11.231	± 0.944	ops/ms
oldestSegmentPathWithMinSize	1000	thrpt	10	1.700	± 0.072	ops/ms
oldestSegmentPathWithMinSize	10000	thrpt	10	0.176	± 0.011	ops/ms
oldestSegmentPathWithMinSize	20000	thrpt	10	0.084	± 0.006	ops/ms

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

[ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

github-actions · 2026-04-08T23:57:31Z

🤖 GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

Copilot

Pull request overview

This PR optimizes Dead Letter Queue (DLQ) segment file lookups by avoiding full directory materialization + sorting when only the min/max segment is needed, using a single-pass DirectoryStream scan with OS-level glob filtering.

Changes:

Replace multi-step segment index discovery with DeadLetterQueueUtils.maxSegmentId(...).
Replace sorted segment-path lookups with DeadLetterQueueUtils.oldestSegmentPath(...) for selecting the oldest segment (with optional size filtering).
Remove now-unused sorted-list helper and adjust callers to use the updated utilities.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java`	Switches writer initialization and oldest-segment selection to new single-pass utility methods.
`logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java`	Adds single-pass `maxSegmentId`/`oldestSegmentPath` implementations using `DirectoryStream` globbing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… lookups Replace listSegmentPathsSortedBySegmentId (which materialized all paths, sorted O(N log N), then took the first element) with purpose-built maxSegmentId and oldestSegmentPath utilities that use Files.newDirectoryStream with OS-level glob filtering and a single O(N) pass. Also narrow listFiles to compare only the filename component instead of the full path, and consolidate duplicate segment ID parsing in DeadLetterQueueWriter to reuse extractSegmentId.

…ueueUtils.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mashhurs · 2026-04-10T21:09:37Z

        }
    }
-
-    @Test


moved to dedicated DeadLetterQueueUtilsTest.java space

andsel · 2026-04-13T09:14:58Z

Hi @mashhurs which is the baseline that measure the existing implementation?

andsel

I think the idea good to avoid the sorting to find min and max. I don't know if the big differentiator is the usage of DirectoryStream over the listing of files. However, I'm in favor of using it.

Left a question in a separate comment to understand which is the baseline in the performance analysis you have done.

I've suggested the usage of a filter interface and asked some clarification on a javadoc comment.

Refine the code comment. Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

mashhurs · 2026-04-13T22:54:50Z

Hi @mashhurs which is the baseline that measure the existing implementation?

The benchmarks are on the logics "before this PR" and "with this PR". I have placed them in my separate remote repo branch (also added in this PR description) -
https://github.com/elastic/logstash/compare/main...mashhurs:logstash:dlq-benchmark-test?expand=1

…nstead of rescanning filesystem

andsel

LGTM

Left a suggestion on the Javadoc and thanks for checking my suggestion about DirectoryStream's filtering.

andsel · 2026-04-14T08:24:26Z

The benchmarks are on the logics "before this PR" and "with this PR".

I mean that to have a comparison of performance, we need a clear definition of which is the baseline before the changes. In the table presented in the "Why is it important/What is the impact to the user?" we have 3 columns:

NoMinSize
WithMinSize (optimized)
UsingStream

There is no clear indication of the original baseline, I suppose it's "UsingStream" but given that in the description it's also cited DirectoryStream it's not clear to which stream it refers.

…ueueUtils.java Apply Java doc suggestion, provides clearer signal. Co-authored-by: Andrea Selva <selva.andre@gmail.com>

mashhurs · 2026-04-14T19:10:19Z

The benchmarks are on the logics "before this PR" and "with this PR".

I mean that to have a comparison of performance, we need a clear definition of which is the baseline before the changes. In the table presented in the "Why is it important/What is the impact to the user?" we have 3 columns:

NoMinSize

WithMinSize (optimized)

UsingStream

There is no clear indication of the original baseline, I suppose it's "UsingStream" but given that in the description it's also cited DirectoryStream it's not clear to which stream it refers.

Ah I thought, I added to the PR description 🤦 , just added sorry for that.

elasticmachine · 2026-04-14T19:36:33Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: f0e739f

Failed CI Steps

History

💚 Build #4675 succeeded 01997fd
💚 Build #4674 succeeded 0513bbe
💚 Build #4658 succeeded d858db6
💚 Build #4657 succeeded d1960dc
💔 Build #4655 failed 6a70c43
💛 Build #4636 was flaky 0675c3a

cc @mashhurs

* Optimize DLQ segment directory scans with single-pass DirectoryStream lookups Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort. With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans. Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file which will be handled in a single logic. * Move file size condition after the extract segment ID. * Add unit tests * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Refine the code comment. Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Apply Java doc suggestion, provides clearer signal. Co-authored-by: Andrea Selva <selva.andre@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com> (cherry picked from commit 894ca21)

mashhurs · 2026-04-16T18:07:07Z

@Mergifyio backport 9.4

mergify · 2026-04-16T18:10:36Z

backport 9.4

✅ Backports have been created

Details

#19013 [9.4] (backport #18970) Optimize DLQ segment directory scans with single-pass logic. has been created for branch 9.4

* Optimize DLQ segment directory scans with single-pass DirectoryStream lookups Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort. With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans. Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file which will be handled in a single logic. * Move file size condition after the extract segment ID. * Add unit tests * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Refine the code comment. Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Apply Java doc suggestion, provides clearer signal. Co-authored-by: Andrea Selva <selva.andre@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com> (cherry picked from commit 894ca21)

…#19012) * Optimize DLQ segment directory scans with single-pass DirectoryStream lookups Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort. With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans. Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file which will be handled in a single logic. * Move file size condition after the extract segment ID. * Add unit tests * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java * Apply suggestions from code review Refine the code comment. * When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Apply Java doc suggestion, provides clearer signal. --------- (cherry picked from commit 894ca21) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com>

…#19011) * Optimize DLQ segment directory scans with single-pass DirectoryStream lookups Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort. With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans. Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file which will be handled in a single logic. * Move file size condition after the extract segment ID. * Add unit tests * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java * Apply suggestions from code review Refine the code comment. * When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Apply Java doc suggestion, provides clearer signal. --------- (cherry picked from commit 894ca21) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com>

…#19013) * Optimize DLQ segment directory scans with single-pass DirectoryStream lookups Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort. With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans. Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file which will be handled in a single logic. * Move file size condition after the extract segment ID. * Add unit tests * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java * Apply suggestions from code review Refine the code comment. * When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Apply Java doc suggestion, provides clearer signal. --------- (cherry picked from commit 894ca21) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com>

mashhurs self-assigned this Apr 8, 2026

mashhurs added enhancement backport-8.19 Automated backport to the 8.19 branch backport-9.3 Automated backport to the 9.3 branch backport-9.4 labels Apr 8, 2026

robbavey requested a review from Copilot April 9, 2026 21:26

Copilot started reviewing on behalf of robbavey April 9, 2026 21:27 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Outdated

mashhurs added 3 commits April 10, 2026 13:38

Move file size condition after the extract segment ID.

f6c75e8

Add unit tests

d1960dc

mashhurs force-pushed the dlq-file-operations-improvements branch from 661bf42 to d1960dc Compare April 10, 2026 20:38

Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQ…

d858db6

…ueueUtils.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mashhurs marked this pull request as ready for review April 10, 2026 21:08

mashhurs commented Apr 10, 2026

View reviewed changes

mashhurs requested a review from andsel April 10, 2026 21:11

andsel requested changes Apr 13, 2026

View reviewed changes

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Outdated

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java

mashhurs commented Apr 13, 2026

View reviewed changes

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Outdated

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Outdated

Apply suggestions from code review

0513bbe

Refine the code comment. Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

When removing the segment, track DLQ currentQueueSize incrementally i…

01997fd

…nstead of rescanning filesystem

andsel approved these changes Apr 14, 2026

View reviewed changes

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Outdated

Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQ…

f0e739f

…ueueUtils.java Apply Java doc suggestion, provides clearer signal. Co-authored-by: Andrea Selva <selva.andre@gmail.com>

mashhurs merged commit 894ca21 into elastic:main Apr 16, 2026
11 checks passed

mashhurs deleted the dlq-file-operations-improvements branch April 16, 2026 17:41

mergify Bot mentioned this pull request Apr 16, 2026

[8.19] (backport #18970) Optimize DLQ segment directory scans with single-pass logic. #19011

Merged

3 tasks

mergify Bot mentioned this pull request Apr 16, 2026

[9.3] (backport #18970) Optimize DLQ segment directory scans with single-pass logic. #19012

Merged

3 tasks

mergify Bot mentioned this pull request Apr 16, 2026

[9.4] (backport #18970) Optimize DLQ segment directory scans with single-pass logic. #19013

Merged

3 tasks

                       }
                   }
-                  @Test

Conversation

mashhurs commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Uh oh!

github-actions Bot commented Apr 8, 2026

🤖 GitHub comments

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mashhurs Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

andsel commented Apr 13, 2026

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mashhurs commented Apr 13, 2026

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andsel commented Apr 14, 2026

Uh oh!

mashhurs commented Apr 14, 2026

Uh oh!

elasticmachine commented Apr 14, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

History

Uh oh!

Uh oh!

mashhurs commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Backports have been created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mashhurs commented Apr 8, 2026 •

edited

Loading

mergify Bot commented Apr 16, 2026 •

edited

Loading