Skip to content

Add early return logging to timequeries#28485

Closed
wdberkeley wants to merge 1 commit into
redpanda-data:devfrom
wdberkeley:timequery-robot-logging
Closed

Add early return logging to timequeries#28485
wdberkeley wants to merge 1 commit into
redpanda-data:devfrom
wdberkeley:timequery-robot-logging

Conversation

@wdberkeley
Copy link
Copy Markdown
Contributor

There's a test flake where timequeries against cloud storage sometimes return nothing incorrectly. This extra logging will help pinpoint how this is happening.

Ideally, we can remove or pare down the logging once the issue is resolved.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

  • none

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds debug logging to the timequery function to help diagnose a test flake where timequeries against cloud storage occasionally return incorrect empty results. The logging captures key state information at function entry and at early return points where nullopt is returned.

Key Changes:

  • Added entry-point logging to capture timequery configuration and partition state
  • Added logging at two early return points where the function returns nullopt due to offset range violations

Comment thread src/v/cluster/partition.cc Outdated
Comment thread src/v/cluster/partition.cc Outdated
vlog(
clusterlog.debug,
"[{}] timequery: early return (nullopt) at line 658 - "
"[{}] timequery: early return (nullopt) - "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm mind squashing these? not sure if we have a policy for this, but it seems undesirable to leave them in (e.g. they'll be in git history which is kind of distracting when code spelunking)

Copy link
Copy Markdown
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we can remove or pare down the logging once the issue is resolved.

If you're thinking we don't want these long term, you can try running ci-repeat commands on this PR to repeat the specific test that runs into the issue. Then we can get to the bottom of it without merging these lines to dev

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Nov 12, 2025

CI test results

test results on build#76078
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/76078#019a7580-14ab-42a6-9ccb-0537a027d9b4 FLAKY 15/21 upstream reliability is '91.20234604105572'. current run reliability is '71.42857142857143'. drift is 19.77377 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#76167
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [2, "virtual_host"], "test_case": {"name": "(TS_Read == True, TS_Timequery == True)"}} integration https://buildkite.com/redpanda/redpanda/builds/76167#019a7a27-7807-48f4-8ab6-df98152109c4 FLAKY 29/30 upstream reliability is '99.36440677966102'. current run reliability is '95.23809523809523'. drift is 4.12631 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TieredStorageTest&test_method=test_tiered_storage
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [2, "virtual_host"], "test_case": {"name": "(TS_Read == True, TS_Timequery == True)"}} integration https://buildkite.com/redpanda/redpanda/builds/76167#019a7a27-7808-4455-8888-a02abb6f0f34 FLAKY 29/30 upstream reliability is '99.36440677966102'. current run reliability is '95.23809523809523'. drift is 4.12631 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TieredStorageTest&test_method=test_tiered_storage
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [1, "path"], "test_case": {"name": "(TS_Read == True, TS_Timequery == True)"}} integration https://buildkite.com/redpanda/redpanda/builds/76167#019a7a27-7808-4455-8888-a02abb6f0f34 FLAKY 29/30 upstream reliability is '99.57627118644068'. current run reliability is '95.23809523809523'. drift is 4.33818 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TieredStorageTest&test_method=test_tiered_storage
test results on build#76235
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [1, "virtual_host"], "test_case": {"name": "(TS_Read == True, TS_Timequery == True, SpilloverManifestUploaded == True)"}} integration https://buildkite.com/redpanda/redpanda/builds/76235#019a7e88-d229-4435-ac62-870ac5566e5d FLAKY 28/30 upstream reliability is '100.0'. current run reliability is '90.47619047619048'. drift is 9.52381 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TieredStorageTest&test_method=test_tiered_storage
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [1, "virtual_host"], "test_case": {"name": "(TS_Read == True, TS_Timequery == True, SpilloverManifestUploaded == True)"}} integration https://buildkite.com/redpanda/redpanda/builds/76235#019a7e88-d229-4fe7-818c-b40ad4a14564 FLAKY 29/30 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TieredStorageTest&test_method=test_tiered_storage

@wdberkeley
Copy link
Copy Markdown
Contributor Author

Force push to squash. Forgot to rebase and it pulled in a bunch of old commits. If you're here because you got a random review request please ignore, sorry.

@wdberkeley
Copy link
Copy Markdown
Contributor Author

/ci-repeat 10
debug
skip-units
dt-repeat=10
tests/rptest/tests/tiered_storage.py::TieredStorageTest.test_tiered_storage

@wdberkeley
Copy link
Copy Markdown
Contributor Author

/ci-repeat 5
debug
skip-units
dt-repeat=10
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage

1 similar comment
@wdberkeley
Copy link
Copy Markdown
Contributor Author

/ci-repeat 5
debug
skip-units
dt-repeat=10
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage

@wdberkeley
Copy link
Copy Markdown
Contributor Author

/ci-repeat 10
debug
skip-units
dt-repeat=20
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage

@wdberkeley
Copy link
Copy Markdown
Contributor Author

Never got this logging to pay off but I was able to figure out the source of the bug. See #28642

@wdberkeley wdberkeley closed this Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants