Add a git based sync option #441 by carmiac · Pull Request #722 · GothenburgBitFactory/taskchampion

carmiac · 2026-04-18T05:33:26Z

This add a git backed sync server to TaskChampion.

Repo Layout

Versions are stored as files named v-{parent_uuid}-{child_uuid}, containing encrypted [HistorySegment] bytes.
Snapshots are stored as a single file named snapshot, containing a JSON wrapper around an encrypted full-state blob.
Metadata (meta) holds the latest version UUID and the encryption salt as JSON.

Writes

After each write (add_version, add_snapshot) the server stages the changed files, creates a commit, and pushes to the remote. If the push is rejected , the commit is rolled back and the caller receives an [AddVersionResult::ExpectedParentVersion] or an [Error] so it can retry.

After a snapshot is stored, [GitSyncServer::cleanup] automatically removes all version files whose history is now captured by the snapshot, keeping the repository compact.

Options

There are several configuration options.

branch sets the branch to use for TaskChampion. This would let someone keep tasks alongside a project without dealing with history issues.
remote sets the remote. Same format the git uses.
local_only makes push and pull no-ops. This could be used if the remote isn't ready yet, or is unavailable for some other reason.

General Notes for Reviewers

I haven't done any performance testing, but it seems reasonably quick for manual use. I didn't see a general performance testing module.
Currently is uses the same salt for all files. This isn't great security practice, but does seem to be what the other servers are doing. I could add a per-file salt, at the cost of however long Cryptor::new() takes on each read and write.

and then to bed

djmitche

This is so cool! I would like to take the time to read and experiment with it, but a few initial reactions to the comments and the PR description:

Optional configuration for the path to the git binary will probably be useful for someone, and should be easy to add
It looks like remote accepts the string "None"? Let's make that an Option<String> instead, if that's the case, or make the empty string the special case.
For cleanup, two observations:
- Git keeps all of the data anyway, so "deleting" a version is really just removing the name from the directory. So, I don't think deleting old versions has much impact.
- When a replica syncs, it needs all versions since the last version it saw. Otherwise, it has to restore from the snapshot and lose any local data since it last sync'd.
- So, it's beneficial to keep versions around for as long as is practical. The cloud servers keep a half-year's worth of versions (src/server/cloud/server.rs, MAX_VERSION_AGE_SECS), and I think that's reasonable here, too.

I'd be interested to know what @ryneeverett thinks, too!

carmiac · 2026-04-19T13:45:54Z

Thanks for the review! I'll take a look at those. My reason behind the git rm was to make the globs when adding new versions faster, but I didn't realize how that interacted with restoration.

djmitche · 2026-04-19T18:17:50Z

Hm, good point, but local globs are probably pretty fast even for 100's of dentries.

djmitche

A bunch of comments here, but all toward improving the implementation rather than show-stopping issues. I think most of what I've suggested is relatively straightforward, but if necessary some of it can be done in followup PRs.

One general observation is, despite Server being an async trait, this invokes Git synchronously. I think that's fine for the expected use-cases for this sync model, and it's something that can be improved later if desired.

tl;dr: This looks great, and I look forward to merging it after some minor revisions!

djmitche · 2026-04-19T22:44:40Z

 # Suppport for sync to another SQLite database on the same machine
 server-local = ["dep:rusqlite"]
+# Support for sync via Git
+server-git = ["git-sync"]


Why the two levels of features here? I think one would be sufficient. The cloud syncs have cloud because it enables some common functionality, but there's no such thing for git. So,

server-git = ["dep:serde_with", "dep:glob", "encryption"]

and update the cfg(feature..) in the code.

djmitche · 2026-04-19T22:52:19Z

+//!
+//! - I haven't done any performance testing, but it seems reasonably quick for manual use.
+//! - Currently is uses the same salt for all files. This isn't great security practice,
+//!   but does seem to be what the other servers are doing.


This is a good point! In defense of the idea, the salt is used in the key derivation, and we use the same key for every file, so in that sense it's only used once. If there's further concern, let's open an issue about it -- I'm sure others would like to chime in too!

djmitche · 2026-04-19T22:53:34Z

+//!
+//! Notes for Reviewers
+//!
+//! - I haven't done any performance testing, but it seems reasonably quick for manual use.


This seems fine. We have not focused on performance testing of the sync operations since most of the time is network-related.

djmitche · 2026-04-19T23:01:20Z

+//!   create a 'task' branch and let TaskChampion manage that branch.
+//! - This does support both defining a remote and having `local_only` mode set at the same
+//!   time. The idea is that maybe the remote isn't ready yet, or eithe rtemporarily or
+//!   permanantly down. Either way, you can use this in local mode in the mean time.


What are the risks here? In general, the more user-facing bits of the doc here might be better as docstrings on the configuration enum.

djmitche · 2026-04-19T23:04:54Z

+//!
+//! - Since this shells out to git, it assumes that you havea reasonably functional git
+//!   setup. I.e. 'git init', 'git add', 'git commit', etc shoud just work.
+//! - If you are using a remote, 'git push' and 'git pull' shoud work.


I suspect there's some room for more robustness here, such as disabling prompts. Maybe we can address that as we find issues, but it might be worth thinking about ahead of time.

I see that nothing needs to parse the output of a git command, so that simplifies things!

djmitche · 2026-04-19T23:12:59Z

+}
+
+/// Run a git command in a given directory, returning an error if it exits non-zero.
+fn git_cmd(dir: &Path, args: &[&str]) -> Result<()> {


This ends up putting a lot of git's output in the cargo test output and probably in the taskwarrior output as well. What do you think of amending this function so that it only shows the output on an unexpected error, or logs it all to log::debug or something similar?

djmitche · 2026-04-19T23:22:27Z

+        fs::create_dir_all(local_path)?;
+
+        // Check if path is already a git repo.
+        let is_repo = Command::new("git")


I think this doesn't use git_cmd because a nonzero exit status is expected, and similar for git checkout below. Maybe git_cmd could be extended to support that situation?

djmitche · 2026-04-19T23:26:38Z

+        git_cmd(
+            local_path,
+            &["clean", "-f", "--", "v-*", "snapshot", "meta"],
+        )?;


This happens at least twice and could be a helper function!

djmitche · 2026-04-19T23:42:22Z

+
+    /// Fetch and fast-forward to the remote branch. No-op in local-only mode.
+    /// If the remote branch does not yet exist (e.g. fresh bare repo), this is also a no-op.
+    fn pull(&self) -> Result<()> {


This isn't really a pull, since it does a hard reset. Maybe fn reset_to_remote?

ryneeverett

I don't see any mention of purging old versions or snapshots which suggests they will live on in git history indefinitely. This seems somewhat worse than other implementations which may not purge snapshots but at least truly delete versions. This doesn't necessarily need to be addressed in the initial implementation unless it informs the design. Have we thought about this?

carmiac · 2026-04-25T17:57:49Z

Sorry but I can't stop thinking about the disk size issue. Maybe snapshots should be disabled entirely for the git backend? Whereas they (eventually) save disk space on other backends, on git they only serve to increase the disk space and data transfer.

I think there are several options for handling disk size with a git backend.

What it does now. Disk size increases unbounded on all replicas, but the implementation is relatively simple and doesn't cause a large slowdown for new replicas of existing servers.
No snapshots. Disk size still increases unbounded, but at a slower rate. Implementation is even easier but new replicas of existing servers are slow to initialize.
Shallow clones only. Local disk space is more or less bounded, but origin still grows unbounded. Will take a bit of thinking about how to handle local only mode.
Purge old files via 'git filter-repo'. All repos have a more-or-less bounded size, but push/pull starts to get dicey as the repo history is rewritten.
Something tricky that combines 3 and 4, with only origin purging. It will take some thinking that I'm not capable of right now due to having a cold.
Something tricky with re-initializing origin every so often. Also not capable of figuring that our right now.

ryneeverett · 2026-04-25T19:42:16Z

No snapshots. Disk size still increases unbounded, but at a slower rate. Implementation is even easier but new replicas of existing servers are slow to initialize.

How confident are we that new replicas are slower to initialize? The assumption is that it is faster to download and index a snapshot than to traverse the additional versions it squashed, right?

carmiac · 2026-04-25T21:17:55Z

Completely untested by me. I'm only basing it on the comments in the other servers code and the TaskChampion book. But it does make sense that having to walk the entire history will be slower than starting from a reasonably recent point and only walking through updates since then. Though with also having to pull a full git repo that may not be true.

djmitche · 2026-04-25T21:29:30Z

I think we can push this particular problem into the future, rather than solving it here -- and I think keeping snapshots is the most future-compatible way to do that. Let's wrap this PR up and open an issue for bounding size.

ryneeverett · 2026-04-25T21:55:51Z

If bounding size is actually going to happen then I agree that snapshots are essential. But if the problem is being pushed into the indefinite (possibly never) future then it might be better to reduce the impact of unbound size.

djmitche · 2026-04-25T22:56:33Z

That's fair, but I'll be optimistic (which is unusual for me) and say it's better to plan for the future being positive than to assume it will not. And, I worry that something like not including snapshots would leave us with library users that depend on the full history, meaning however well-intentioned future devs are, they can't fix this without breaking things.

Realistically, this seems like like something bite-sized that any of the three of us, given a bit of time, could do -- as could any motivated contributor. So if we see users suffering from excessive repo sizes, I think we'll see some pressure to fix it and that the fix will happen.

Now, the WASM sync issues are another matter entirely...

ryneeverett · 2026-04-25T23:08:35Z

If it is that easy then great. I haven't come up with any better ideas than the ones @carmiac listed though and it isn't clear that there is a robust option. I guess worst case scenario is the client rebases and force pushes the working tree as the initial commit?

I was thinking that unbounded size might actually be tolerable for the vast majority of users who would choose git. I would see this as a alternative positive outcome.

carmiac · 2026-04-26T02:12:34Z

I will say that for my use case, unbounded repo size doesn't really matter. If it ever gets too ridiculous, I could manually clean up the remote and then re-init the locals. It would be interesting to see what the average storage size per 1000 tasks is to help get a handle on how much bike shedding is worth it.

FWIW, in 2020 torvalds/linux took up about 4GB on disk with over a million commits. I think we'll be fine.

ryneeverett · 2026-04-26T03:04:56Z

Frankly, I think your position supports my point -- we'll probably never be motivated to address this issue. So does it make sense to have the repo size grow many times faster in order to make it easier to stop the growth "someday maybe"?

djmitche · 2026-04-26T03:32:48Z

I can see a path here where we don't store snapshots, but still look for them when initializing a replica. Then if we decide to do the followup work, we can start creating snapshots and ensure there is an active snapshot before deleting old data.

However, that seems higher-risk than the alternative of landing this and following up (either soon or when there's user demand) with a mechanism for removing stale data.

In particular, I don't think we have enough data to understand the performance of applying many, many versions when initializing a replica. Nor do we have enough data to be confident that storing versions but not snapshots is likely to provide enough space savings to make this a non-issue. Taskwarrior users have surprisingly large and frequently-changing task DB's.

I'm inclined to land this PR more-or-less as-is, mark the issue for later followup, and see what happens.

carmiac · 2026-05-01T00:58:45Z

Is there any more work needed for this?

djmitche · 2026-05-01T02:18:22Z

Just rebase or merge to avoid the conflicts and I'm happy to merge to main.

carmiac · 2026-05-01T02:24:02Z

Done!

djmitche · 2026-05-02T20:25:33Z

Woo, we have a Git backend!

Do you mind opening issues for the followups?

carmiac added 15 commits April 10, 2026 23:04

Basic repo creation implemented, with tests.

06c70bb

add git push/pull helpers

a9b3217

add version

26fbaaf

and then to bed

Add get_version_by_parent_version_id and test

e69b03b

add get_child_version

73989ec

add snapshot system

660881c

fix git cleanup and misc clippy issues

c104e0c

fix/log some git issues from snapshot testing

ec2ae74

fix snapshot encryption

94fbbd4

add some documentation

9b2818a

Add more snapshot tests

d40262f

add cleanup after snapshot

e603030

add documentation

7fc7ecf

cleanup of snapshot code, add documentation

fd09e02

misc cleanup

022948a

carmiac requested a review from djmitche as a code owner April 18, 2026 05:33

carmiac added 3 commits April 17, 2026 22:36

Merge branch 'main' into git-sync

faad1ba

lint

8cb46b2

Merge branch 'git-sync' of github.com:carmiac/taskchampion into git-sync

9a4b371

djmitche reviewed Apr 18, 2026

View reviewed changes

djmitche requested changes Apr 19, 2026

View reviewed changes

ryneeverett reviewed Apr 21, 2026

View reviewed changes

Comment thread src/server/gitsync/mod.rs Outdated

Comment thread src/server/gitsync/mod.rs Outdated

Comment thread src/server/gitsync/mod.rs Outdated

Comment thread src/server/gitsync/mod.rs Outdated

carmiac added 2 commits April 22, 2026 18:57

fix typos and clarify new repo comment

64c3c0a

two feature levels -> one

7fcb702

ryneeverett reviewed Apr 23, 2026

View reviewed changes

Comment thread src/server/gitsync/mod.rs Outdated

carmiac added 3 commits April 22, 2026 20:24

move remote from String to Option<String>

78c4e55

redirect git output to logs

d9cfb36

Split git_cmd into git_cmd_ok to simplify calls.

ddb036b

carmiac added 13 commits April 25, 2026 10:32

move remote from String to Option<String>

c5362fb

redirect git output to logs

a409c74

Split git_cmd into git_cmd_ok to simplify calls.

5684939

add some helper fns

f1af201

create a git struct to hold the git helpers and the git path

4a3cc69

lots of documentaion cleanup

f641b67

fix snaphot urgency to only count post-snapshot versions

2422827

cleanup cleanup and add cleanup tests

38d7b49

lint

5232d91

clippy

5c83d81

move free git functions into the git struct

5e857d3

clippy

ece145b

Merge branch 'git-sync' of github.com:carmiac/taskchampion into git-sync

3007f5f

Merge branch 'main' into git-sync

5c97a9f

djmitche merged commit d7fd376 into GothenburgBitFactory:main May 2, 2026
20 checks passed

carmiac mentioned this pull request May 2, 2026

Git backend grows unbounded #726

Open

Conversation

carmiac commented Apr 18, 2026

Repo Layout

Writes

Options

General Notes for Reviewers

Uh oh!

djmitche left a comment

Choose a reason for hiding this comment

Uh oh!

carmiac commented Apr 19, 2026

Uh oh!

djmitche commented Apr 19, 2026

Uh oh!

djmitche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryneeverett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carmiac commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryneeverett commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carmiac commented Apr 25, 2026

Uh oh!

djmitche commented Apr 25, 2026

Uh oh!

ryneeverett commented Apr 25, 2026

Uh oh!

djmitche commented Apr 25, 2026

Uh oh!

ryneeverett commented Apr 25, 2026

Uh oh!

carmiac commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryneeverett commented Apr 26, 2026

Uh oh!

djmitche commented Apr 26, 2026

Uh oh!

carmiac commented May 1, 2026

Uh oh!

djmitche commented May 1, 2026

Uh oh!

carmiac commented May 1, 2026

Uh oh!

Uh oh!

djmitche commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

carmiac commented Apr 25, 2026 •

edited

Loading

ryneeverett commented Apr 25, 2026 •

edited

Loading

carmiac commented Apr 26, 2026 •

edited

Loading