[9.3] (backport #18970) Optimize DLQ segment directory scans with single-pass logic.#19012
Merged
[9.3] (backport #18970) Optimize DLQ segment directory scans with single-pass logic.#19012
Conversation
* Optimize DLQ segment directory scans with single-pass DirectoryStream lookups Before this change, listing segment files and finding max segment ID logic was using plain Java stream to list all files, then filter by size and sort. With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans. Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 when updating oldest file segment and no size check when removing oldest segment file which will be handled in a single logic. * Move file size condition after the extract segment ID. * Add unit tests * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Refine the code comment. Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * When removing the segment, track DLQ currentQueueSize incrementally instead of rescanning filesystem * Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueUtils.java Apply Java doc suggestion, provides clearer signal. Co-authored-by: Andrea Selva <selva.andre@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com> (cherry picked from commit 894ca21)
Contributor
🤖 GitHub commentsJust comment with:
|
💚 Build Succeeded
cc @mashhurs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release notes
Performance improvements which saves ~40% CPU resource on DLQ segment file lookup operations.
What does this PR do?
~40% improvement on DLQ segment logics.
Before this change, listing segment files and finding max segment ID logic was using plain Java stream (
UsingStreambenchmark) to list all files, then filter by size and sort.With this PR change, we optimize DLQ segment file lookups to use single-pass directory scans.
Use DirectoryStream with OS-level glob instead of listing all files, find the min or max segment. There are use-cases which require size > 0 (
WithMinSizein benchmarks) when updating oldest file segment and no size check when removing oldest segment file (NoMinSizein benchmarks) which will be handled in a single logic.Why is it important/What is the impact to the user?
Improves the performances of the LS pipelines heavily using DLQs.
Old logic and Benchmarks can be seen here - https://github.com/elastic/logstash/compare/main...mashhurs:logstash:dlq-benchmark-test?expand=1
Raw JMH data:
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
This is an automatic backport of pull request #18970 done by [Mergify](https://mergify.com).