db: Compute post-apply checksum incrementally by saleemrashid · Pull Request #449 · superfly/litefs

saleemrashid · 2026-04-08T14:05:17Z

This change computes the post-apply checksum incrementally in CommitJournal and CommitWAL, keeping commit times proportional to the transaction size, rather than the database size or the WAL size (which quickly causes a performance cliff, because it invalidates the cached block checksums).

With a 30GiB database, our commit times have been increasing, and we see some start to exceed 30 seconds when a long-running read transaction (particularly a LiteFS snapshot) causes the WAL to grow to above 70MiB.

Since ApplyLTXNoLock hasn't been changed, replicas will still be verifying the post-apply checksum with the existing checksum method, which should at least allow us to detect if there's any bugs in the incremental checksumming that we haven't caught.

The increased test coverage also caught a bug where switching between WAL and journal mode results in some pages not being written to the LTX.

TruncateWAL already clears the checksums

We do not make use of older checksum values

Copilot

Pull request overview

This PR updates checksum handling to avoid full database checksum recomputation by computing post-apply checksums incrementally during WAL & rollback-journal commits.

Changes:

Replace rollback-journal dirty page tracking with a map of previous per-page checksums to support incremental checksum updates.
Simplify WAL per-page checksum tracking to store only the latest checksum per page.
Remove the “in-progress WAL checksums” path from checksum()/pageChecksum() and compute post-apply checksum via XOR-deltas during commit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T11:29:13Z

+	// Disable page cache so dirty pages are written immediately.
+	if _, err := db.Exec(`PRAGMA cache_size = 0`); err != nil {
+		t.Fatal(err)
+	}


PRAGMA cache_size = 0 does not disable SQLite's page cache; it resets the cache size to the default. If the intent is to force more frequent page flush/eviction so the same pages are written multiple times within a transaction, consider using a very small cache (e.g., 1–2 pages) or update the comment to match the actual behavior.

Does not seem to be the case? https://github.com/sqlite/sqlite/blob/a5495355c48e65411db52ac88d5b361d4d2c9f35/test/cache.test#L117

saleemrashid · 2026-04-09T13:58:14Z

 // This store will automatically close when the test ends.
 func newStore(tb testing.TB, leaser litefs.Leaser, client litefs.Client) *litefs.Store {
 	store := litefs.NewStore(tb.TempDir(), true)
+	store.StrictVerify = true


This doesn't actually do anything for these tests, so we could drop this commit.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T14:20:10Z

+	// Update WAL checksums.
 	for pgno, chksum := range newWALChksums {
-		db.wal.chksums[pgno] = append(db.wal.chksums[pgno], chksum)
+		db.wal.chksums[pgno] = chksum
 	}


CommitWAL() stores 0 values in db.wal.chksums (e.g. for truncated pages). Because checksum() ranges over the keys in db.wal.chksums to mark ignored blocks, leaving tombstone entries can (a) keep blocks permanently marked as ignored and (b) create invalid pgno keys (> current PageN()) after a truncating commit, which can lead to out-of-range indexing in checksum(). Prefer deleting entries when the WAL no longer contains a page (e.g. use delete(db.wal.chksums, pgno) when chksum == 0), or filter pgno keys against pageN in checksum().

Copilot · 2026-04-09T14:20:11Z

 	// Remove WAL checksums. These shouldn't exist but remove them just in case.
-	db.wal.chksums = make(map[uint32][]ltx.Checksum)
+	db.wal.chksums = make(map[uint32]ltx.Checksum)
+
+	// Compute new database checksum.
+	postApplyChecksum := prevPos.PostApplyChecksum


CommitJournal() clears db.wal.chksums but leaves db.wal.frameOffsets intact. These two structures both represent WAL overlays; clearing only one can leave the DB in an inconsistent state (e.g. onDiskChecksum(..., walFile=nil) will still try to read from WAL if frameOffsets contains keys). Consider clearing db.wal.frameOffsets (and any other WAL overlay state) alongside db.wal.chksums here to avoid stale WAL references when processing rollback-journal commits or mode switches.

saleemrashid added 2 commits April 8, 2026 23:28

db: Remove duplicate WAL checksum clearing

3cf3564

TruncateWAL already clears the checksums

db: Store only latest checksum for WAL pages

aa5af98

We do not make use of older checksum values

saleemrashid force-pushed the saleem/incremental-chksum branch 4 times, most recently from 766edc5 to df75264 Compare April 8, 2026 22:41

saleemrashid requested a review from Copilot April 8, 2026 22:41

Copilot started reviewing on behalf of saleemrashid April 8, 2026 22:42 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread db.go

Comment thread db.go Outdated

Comment thread db.go

saleemrashid force-pushed the saleem/incremental-chksum branch 4 times, most recently from e077010 to ac2bef2 Compare April 9, 2026 11:23

saleemrashid requested a review from Copilot April 9, 2026 11:23

Copilot started reviewing on behalf of saleemrashid April 9, 2026 11:24 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

saleemrashid changed the title ~~db: Compute database checksum incrementally~~ db: Compute post-apply checksum incrementally Apr 9, 2026

saleemrashid marked this pull request as ready for review April 9, 2026 12:22

saleemrashid added 4 commits April 9, 2026 14:54

db: Fix switching between WAL and rollback journal

0e3547f

db: Compute incremental checksum in CommitJournal/CommitWAL

8f99591

store: Enable StrictVerify in tests

129daad

mount: Add tests for CommitJournal incremental checksumming

fa7bad4

saleemrashid force-pushed the saleem/incremental-chksum branch from ac2bef2 to fa7bad4 Compare April 9, 2026 13:55

saleemrashid commented Apr 9, 2026

View reviewed changes

saleemrashid requested a review from Copilot April 9, 2026 14:11

Copilot started reviewing on behalf of saleemrashid April 9, 2026 14:12 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

saleemrashid requested a review from benbjohnson April 13, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: Compute post-apply checksum incrementally#449

db: Compute post-apply checksum incrementally#449
saleemrashid wants to merge 6 commits intomainfrom
saleem/incremental-chksum

saleemrashid commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

saleemrashid Apr 9, 2026

Uh oh!

saleemrashid Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saleemrashid commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

saleemrashid Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

saleemrashid Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saleemrashid commented Apr 8, 2026 •

edited

Loading