Skip to content

WPB-25915 add timeout and duration metric for conversation migration#5244

Open
battermann wants to merge 21 commits into
developfrom
WPB-25915-add-timeout-and-duration-metric-for-conversation-migration
Open

WPB-25915 add timeout and duration metric for conversation migration#5244
battermann wants to merge 21 commits into
developfrom
WPB-25915-add-timeout-and-duration-metric-for-conversation-migration

Conversation

@battermann
Copy link
Copy Markdown
Contributor

@battermann battermann commented May 28, 2026

https://wearezeta.atlassian.net/browse/WPB-25915

I have tested that it works. However, those test cannot be committed, because they hook into the production migration code to simulate the blocking conversation migration.

Checklist

  • Add a new entry in an appropriate subdirectory of changelog.d
  • Read and follow the PR guidelines

@zebot zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label May 28, 2026
@battermann battermann marked this pull request as ready for review May 28, 2026 07:27
@battermann battermann requested review from a team as code owners May 28, 2026 07:28
@battermann battermann requested a review from Copilot May 28, 2026 07:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds timeout handling and duration observability for background-worker conversation migrations to make stuck per-conversation attempts visible and fail-fast.

Changes:

  • Adds optional timeout to MigrationOptions and applies it to per-conversation migration attempts.
  • Registers and records a Prometheus histogram for conversation migration attempt durations by outcome.
  • Documents the new timeout setting and updates package dependencies for Timeout time-unit support.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/background-worker/src/Wire/PostgresMigrations.hs Registers and passes the new conversation migration duration histogram.
services/background-worker/src/Wire/BackgroundWorker.hs Updates existing MigrationOptions constructor calls for the new timeout field.
libs/wire-subsystems/src/Wire/Migration.hs Adds timeout configuration and timeout exception type.
libs/wire-subsystems/src/Wire/ConversationStore/Migration.hs Applies per-conversation timeout logic and records duration metrics.
libs/types-common/types-common.cabal Adds polysemy-time dependency.
libs/types-common/src/Util/Timeout.hs Derives TimeUnit for Timeout.
libs/types-common/default.nix Adds polysemy-time to Nix dependencies.
docs/src/developer/reference/config-options.md Documents the new migration timeout option.
changelog.d/5-internal/WPB-25915 Adds a changelog entry file, but it is currently empty.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/src/developer/reference/config-options.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread docs/src/developer/reference/config-options.md Outdated
Comment thread docs/src/developer/reference/config-options.md Outdated
Comment thread docs/src/developer/reference/config-options.md Outdated
Comment thread docs/src/developer/reference/config-options.md Outdated
Comment thread docs/src/developer/reference/config-options.md Outdated
Comment thread libs/wire-subsystems/src/Wire/Migration.hs Outdated
Comment thread services/background-worker/src/Wire/BackgroundWorker.hs Outdated
Comment thread services/background-worker/src/Wire/PostgresMigrations.hs
Comment thread changelog.d/5-internal/WPB-25915
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.

Comment on lines +42 to +46
convMigDuration <- register $ vector "outcome" $ histogram (Prometheus.Info "wire_local_convs_migration_duration_seconds_bucket" "Duration of local conversation migration attempts") defaultBuckets
userMigCounter <- register $ counter $ Prometheus.Info "wire_user_remote_convs_migrated_to_pg" "Number of users whose remote conversation membership data is migrated to Postgresql"
userMigFinished <- register $ counter $ Prometheus.Info "wire_user_remote_convs_migration_finished" "Whether the migration of remote conversation membership data to Postgresql is finished successfully"
userMigFailed <- register $ counter $ Prometheus.Info "wire_user_remote_convs_migration_failed" "Whether the migration of remote conversation membership data to Postgresql has failed"
userMigDuration <- register $ vector "outcome" $ histogram (Prometheus.Info "wire_user_remote_convs_migration_duration_seconds_bucket" "Duration of remote conversation membership migration attempts") defaultBuckets
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshaymankar can you verify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, it seems like it is true.

count <- register $ counter $ Prometheus.Info "wire_conv_codes_migrated_to_pg" "Number of conversation codes migrated to Postgresql"
finished <- register $ counter $ Prometheus.Info "wire_conv_codes_migration_finished" "Whether the conversation codes migration to Postgresql is finished successfully"
failed <- register $ counter $ Prometheus.Info "wire_conv_codes_migration_failed" "Whether the conversation codes migration to Postgresql has failed"
duration <- register $ vector "outcome" $ histogram (Prometheus.Info "wire_conv_codes_migration_duration_seconds_bucket" "Duration of conversation code migration attempts") defaultBuckets
count <- register $ counter $ Prometheus.Info "wire_team_features_migrated_to_pg" "Number of team features migrated to Postgresql"
finished <- register $ counter $ Prometheus.Info "wire_team_features_migration_finished" "Whether the team features migration to Postgresql is finished successfully"
failed <- register $ counter $ Prometheus.Info "wire_team_features_migration_failed" "Whether the team features migration to Postgresql has failed"
duration <- register $ vector "outcome" $ histogram (Prometheus.Info "wire_team_features_migration_duration_seconds_bucket" "Duration of team feature migration attempts") defaultBuckets
count <- register $ counter $ Prometheus.Info "wire_domain_registration_migrated_to_pg" "Number of domain registration rows migrated to Postgresql"
finished <- register $ counter $ Prometheus.Info "wire_domain_registration_migration_finished" "Whether the domain registration migration to Postgresql is finished successfully"
failed <- register $ counter $ Prometheus.Info "wire_domain_registration_migration_failed" "Whether the domain registration migration to Postgresql has failed"
duration <- register $ vector "outcome" $ histogram (Prometheus.Info "wire_domain_registration_migration_duration_seconds_bucket" "Duration of domain registration migration attempts") defaultBuckets
Comment thread libs/wire-subsystems/src/Wire/Migration.hs
Comment thread libs/wire-subsystems/src/Wire/CodeStore/Migration.hs Outdated
Comment thread changelog.d/5-internal/WPB-25915 Outdated
battermann and others added 4 commits June 2, 2026 16:29
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants