Skip to content

[AMORO-4229] Refactor synchronizing Hive tables process via ProcessFactory plugin#4230

Merged
zhoujinsong merged 2 commits into
apache:masterfrom
zhangwl9:AMORO_refact-sync-hive-tables-dev
May 25, 2026
Merged

[AMORO-4229] Refactor synchronizing Hive tables process via ProcessFactory plugin#4230
zhoujinsong merged 2 commits into
apache:masterfrom
zhangwl9:AMORO_refact-sync-hive-tables-dev

Conversation

@zhangwl9
Copy link
Copy Markdown
Contributor

Why are the changes needed?

Close #4229.

Brief change log

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions Bot added type:docs Improvements or additions to documentation module:ams-server Ams server module type:build module:common labels May 20, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 23.09%. Comparing base (38ef381) to head (c5e9d0f).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...src/main/java/org/apache/amoro/IcebergActions.java 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (38ef381) and HEAD (c5e9d0f). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (38ef381) HEAD (c5e9d0f)
core 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4230      +/-   ##
============================================
- Coverage     29.91%   23.09%   -6.82%     
+ Complexity     4317     2706    -1611     
============================================
  Files           677      463     -214     
  Lines         55037    42826   -12211     
  Branches       7028     6044     -984     
============================================
- Hits          16464     9891    -6573     
+ Misses        37337    32076    -5261     
+ Partials       1236      859     -377     
Flag Coverage Δ
core ?
trino 23.09% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhangwl9 zhangwl9 force-pushed the AMORO_refact-sync-hive-tables-dev branch 2 times, most recently from ef170cc to 821cd71 Compare May 21, 2026 02:40
@zhangwl9 zhangwl9 force-pushed the AMORO_refact-sync-hive-tables-dev branch from 821cd71 to 72578cc Compare May 22, 2026 03:06
return Optional.empty();
}

return Optional.of(new HiveCommitSyncProcess(tableRuntime, localEngine));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggerHiveCommitSync is missing a table format check before creating the process.

The IcebergProcessFactory.formats list includes ICEBERG, MIXED_ICEBERG, and MIXED_HIVE, so this trigger will be invoked for all three formats. Without an early guard, a HiveCommitSyncProcess is created and submitted to the thread pool for every table, only to discover inside run() that the table is not a Hive table and return early — wasted scheduling overhead on every cycle.

For comparison, triggerAutoCreateTag already does this correctly:

if (localEngine == null
    || tableRuntime.getFormat() != TableFormat.ICEBERG
    || ...) {
  return Optional.empty();
}

Suggested fix — filter at the trigger layer:

private Optional<TableProcess> triggerHiveCommitSync(TableRuntime tableRuntime) {
  if (localEngine == null
      || tableRuntime.getFormat() != TableFormat.MIXED_HIVE) {
    return Optional.empty();
  }
  return Optional.of(new HiveCommitSyncProcess(tableRuntime, localEngine));
}

The TableTypeUtil.isHive() guard inside HiveCommitSyncProcess.run() can remain as a defensive check, but the trigger layer should be the first gate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixup

@zhangwl9 zhangwl9 force-pushed the AMORO_refact-sync-hive-tables-dev branch from 4368de4 to 16d1064 Compare May 25, 2026 05:46
Copy link
Copy Markdown
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for the work!

@zhoujinsong zhoujinsong merged commit be6663b into apache:master May 25, 2026
6 checks passed
@zhangwl9 zhangwl9 deleted the AMORO_refact-sync-hive-tables-dev branch May 25, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module module:common type:build type:docs Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask]: Refactor synchronizing Hive tables process via ProcessFactory plugin

3 participants