Currently spirit will always resume from checkpoint if it is possible.
We have seen some cases where a change is stopped for several days, then picked up again, and replaying through all the binary logs is actually more work than starting again would be.
It's not always easy to know when resume is worse, and usually operators configure binary log retention such that they won't hold more than a week or so of binlogs on a fast changing system... but it does look like we need a safety threshold here.
Maybe we can start with looking if the binary log coordinate in the checkpoint is >= 7 days old, and refuse to resume if that's the case? I am open to other ideas here.
Currently spirit will always resume from checkpoint if it is possible.
We have seen some cases where a change is stopped for several days, then picked up again, and replaying through all the binary logs is actually more work than starting again would be.
It's not always easy to know when resume is worse, and usually operators configure binary log retention such that they won't hold more than a week or so of binlogs on a fast changing system... but it does look like we need a safety threshold here.
Maybe we can start with looking if the binary log coordinate in the checkpoint is >= 7 days old, and refuse to resume if that's the case? I am open to other ideas here.