Skip to content

[CORE-14250] Ducktape: Reduce cost of datalake/schema_evolution_test.py#28466

Merged
oleiman merged 4 commits into
redpanda-data:devfrom
oleiman:ci/core-14250/limit-schema-evo-dt
Nov 13, 2025
Merged

[CORE-14250] Ducktape: Reduce cost of datalake/schema_evolution_test.py#28466
oleiman merged 4 commits into
redpanda-data:devfrom
oleiman:ci/core-14250/limit-schema-evo-dt

Conversation

@oleiman
Copy link
Copy Markdown
Member

@oleiman oleiman commented Nov 11, 2025

PR refactors datalake/schema_evolution_test.py to cut down on wasted work setting up and tearing down datalake services for every test case. The general idea is to remove test case & schema language parameterization from test functions, instead iterating through them manually, within a single test run.

Also changed the way we combine query engines and catalogs. Previously we would parameterize with a cartesian product of query engines & catalog types, resulting in 6 (at the time of this writing) combinations. My claim is that we only really care about running with each QE & catalog at least once, so we can instead generate N combinations thereof where N = max(n_qe, n_catalog). The effect of this is less pronounced for docker builds, which run exclusively against s3 (minio) due to azurite issues.

Other adjustments we could make:

  • Don't worry about testing all of avro, proto2, proto3

The reports below show a reduction in test time for schema_evolution_test.py running in isolation in release mode w/ otherwise default settings, though ducktape parallelism probably has an effect here.

New build

flatten case & schema type

https://buildkite.com/redpanda/redpanda/builds/76066#019a74ff-2497-4da5-b044-58700827e2c5

SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-11-11--001
run time:         27 minutes 25.268 seconds
tests run:        18
passed:           18
flaky:            0
failed:           0
ignored:          0

also combine old schema and base evo tests

https://buildkite.com/redpanda/redpanda/builds/76080#019a7588-0de9-42c1-ba29-d445e28a7462

SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-11-12--001
run time:         21 minutes 49.953 seconds
tests run:        12
passed:           12
flaky:            0
failed:           0
ignored:          0

and then cut the number of query engine / catalog combos in half

https://buildkite.com/redpanda/redpanda/builds/76092#019a76f6-a2dc-47c8-bc52-370bbfdbdf97

SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-11-12--001
run time:         19 minutes 1.410 seconds
tests run:        6
passed:           6
flaky:            0
failed:           0
ignored:          0

Old build

https://buildkite.com/redpanda/redpanda/builds/76067#019a7500-9b42-4fbd-a01f-7eb5d9d1d978

SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-11-11--001
run time:         33 minutes 14.114 seconds
tests run:        180
passed:           180
flaky:            0
failed:           0
ignored:          0

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

  • none

@oleiman oleiman self-assigned this Nov 11, 2025
@oleiman oleiman force-pushed the ci/core-14250/limit-schema-evo-dt branch from 77a40f6 to ca16950 Compare November 11, 2025 21:54
@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 11, 2025

/ci-repeat 1
release
skip-units
tests/rptest/tests/datalake/schema_evolution_test.py

@oleiman oleiman force-pushed the ci/core-14250/limit-schema-evo-dt branch from ca16950 to 7baf4ed Compare November 12, 2025 00:45
@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 12, 2025

/ci-repeat 1
release
skip-redpanda-build
skip-units
tests/rptest/tests/datalake/schema_evolution_test.py

1 similar comment
@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 12, 2025

/ci-repeat 1
release
skip-redpanda-build
skip-units
tests/rptest/tests/datalake/schema_evolution_test.py

@oleiman oleiman marked this pull request as ready for review November 12, 2025 08:02
Copilot AI review requested due to automatic review settings November 12, 2025 08:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors datalake/schema_evolution_test.py to improve test efficiency by reducing redundant setup and teardown operations. Instead of parameterizing test functions across schema languages and test cases, the tests now iterate through these parameters within a single test run.

Key changes:

  • Consolidates test cases (LEGAL_TEST_CASES and ILLEGAL_TEST_CASES merged into TEST_CASES)
  • Reduces test matrix by eliminating full cartesian product of query engines and catalogs
  • Combines previously separate tests (test_legal_schema_evolution, test_illegal_schema_evolution, test_old_schema_writer) into single test_schema_evolution

Comment thread tests/rptest/tests/datalake/schema_evolution_test.py Outdated
Comment thread tests/rptest/tests/datalake/schema_evolution_test.py Outdated
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Nov 12, 2025

CI test results

test results on build#76095
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
SchemaRegistryAutoAuthTest test_restarts {"move_controller_leader": true} integration https://buildkite.com/redpanda/redpanda/builds/76095#019a7729-827f-441f-9ba8-22c631d01d45 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SchemaRegistryAutoAuthTest&test_method=test_restarts
test results on build#76172
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/76172#019a7a6d-273a-4628-96f1-82e6dd68ae13 FLAKY 19/21 upstream reliability is '91.15969581749049'. current run reliability is '90.47619047619048'. drift is 0.68351 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
src/v/storage/tests/segment_appender_rpbench_test src/v/storage/tests/segment_appender_rpbench_test unit https://buildkite.com/redpanda/redpanda/builds/76172#019a7a35-122d-439a-8d48-914f27ccc532 FAIL 0/1
test results on build#76181
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_replication_with_failures null integration https://buildkite.com/redpanda/redpanda/builds/76181#019a7b75-9d6e-47da-8bf8-de015978eb34 FLAKY 18/21 upstream reliability is '96.36363636363636'. current run reliability is '85.71428571428571'. drift is 10.64935 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_with_failures
ControllerSnapshotTest test_join_restart_catch_up null integration https://buildkite.com/redpanda/redpanda/builds/76181#019a7b75-9d71-46ad-b049-cb2a6eaf0984 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ControllerSnapshotTest&test_method=test_join_restart_catch_up
test results on build#76231
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
RpkDebugBundleTest test_debug_bundle null integration https://buildkite.com/redpanda/redpanda/builds/76231#019a7e66-0313-4fab-9f64-cb37ceabac39 FLAKY 20/21 upstream reliability is '99.75550122249389'. current run reliability is '95.23809523809523'. drift is 4.51741 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RpkDebugBundleTest&test_method=test_debug_bundle

@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 12, 2025

/ci-repeat 1
release
skip-redpanda-build
skip-units
dt-nodes=3
tests/rptest/tests/datalake/schema_evolution_test.py

@oleiman oleiman changed the title Ci/core 14250/limit schema evo dt [CORE-14250] Ducktape: Reduce cost of datalake/schema_evolution_test.py Nov 12, 2025
@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 12, 2025

/ci-repeat 1
release
skip-redpanda-build
skip-units
dt-nodes=4
tests/rptest/tests/datalake/schema_evolution_test.py

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the ci/core-14250/limit-schema-evo-dt branch from 66c2885 to 3072ed4 Compare November 12, 2025 22:33
@travisdowns
Copy link
Copy Markdown
Member

Yeah I guess the improvement looks poor (< 2x runtime improvement) compared to the test case reduction (30x !!), because tests got much longer: before all tests were in the ~1:20 range, now one of the remaining 6 is 19 minutes long.

Still, what matters is total node time, and looking at the detailed results, that was reduced by a factor of ~5x (the 19 minute run had poor utilization since it was running alone): so that's the overall improvement I would expect here, which is great!

@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 13, 2025

@travisdowns yeah that tracks. I think that's about the best we can do without reducing coverage, which might be fine really but maybe a question for another day.

total node time ... was reduced by a factor of ~5x

Sweet, wasn't clear how to calculate that. Where can I find it?

@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Nov 13, 2025

/ci-repeat 1
debug

nvartolomei
nvartolomei previously approved these changes Nov 13, 2025
Copy link
Copy Markdown
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits but lgtm overall

Comment thread tests/rptest/tests/datalake/schema_evolution_test.py
def all_cases(self) -> dict[str, EvolutionTestCase]:
return TEST_CASES

def cases_by_modes(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: had to double check what the function does because it's a class method. I'd move all these to be free-standing functions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another reason i double checked is that it reads as if it filters cases by mode (argument) where instead it is cases_grouped_by_mode

nit category too

travisdowns
travisdowns previously approved these changes Nov 13, 2025
We don't actually need a cartesian product here, just to make sure that
we cover every query engine and catalog type at least once.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman dismissed stale reviews from travisdowns and nvartolomei via 75ab0da November 13, 2025 17:46
@oleiman oleiman force-pushed the ci/core-14250/limit-schema-evo-dt branch from 3072ed4 to 75ab0da Compare November 13, 2025 17:46
@oleiman oleiman enabled auto-merge November 13, 2025 17:47
@oleiman oleiman merged commit f933e80 into redpanda-data:dev Nov 13, 2025
17 checks passed
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.3.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.2.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.1.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Failed to create a backport PR to v25.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-28466-v25.2.x-786 remotes/upstream/v25.2.x
git cherry-pick -x 707422f85a 8931902ce7 2881d9cf77 75ab0daa13

Workflow run logs.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Failed to create a backport PR to v25.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-28466-v25.1.x-157 remotes/upstream/v25.1.x
git cherry-pick -x 707422f85a 8931902ce7 2881d9cf77 75ab0daa13

Workflow run logs.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-28466-v24.3.x-407 remotes/upstream/v24.3.x
git cherry-pick -x 707422f85a 8931902ce7 2881d9cf77 75ab0daa13

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants