Skip to content

cache: Added consistent reads for cache#21428

Merged
serathius merged 1 commit intoetcd-io:mainfrom
akstron:dev/cache-consistent-read
Mar 23, 2026
Merged

cache: Added consistent reads for cache#21428
serathius merged 1 commit intoetcd-io:mainfrom
akstron:dev/cache-consistent-read

Conversation

@akstron
Copy link
Copy Markdown
Contributor

@akstron akstron commented Mar 4, 2026

Cache supports consistent reads when IsSerializable is false. Based on: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2340-Consistent-reads-from-cache

Cache.Get Request Flow

cache.Get(ctx, key, opts...)
│
├── store.LatestRev == 0?
│   └── yes ──► WaitReady(ctx) ── error ──► return error
│
├── validateGet(key, op) ── error ──► return error
│
├── op.IsSerializable()?
│   │
│   ├── YES (serializable / fast local path)
│   │   │
│   │   └───────────────────────────────────────────────────┐
│   │                                                       │
│   └── NO  (non-serializable / consistent read path)       │
│       │                                                   │
│       ▼                                                   │
│   waitTillRevision(ctx, requestedRev)                     │
│       │                                                   │
│       ▼                                                   │
│   serverRevision(ctx)                                     │
│   linearizable kv.Get(prefix, Limit(1), KeysOnly)         │
│       ──► etcd leader                                     │
│       │                                                   │
│       ├── error ──► return error                          │
│       │                                                   │
│       ▼                                                   │
│   rev == 0?                                               │
│       └── yes ──► rev = serverRev                         │
│       │                                                   │
│       ▼                                                   │
│   serverRev < rev?                                        │
│       └── yes ──► return ErrFutureRev                     │
│       │                                                   │
│       ▼                                                   │
│   localRevision() >= rev?                                 │
│       └── yes ──────────────────────────────────────┐     │
│       │                                             │     │
│       ▼                                             │     │
│   progressRequestor.add()                           │     │
│   (wakes background RequestProgress loop)           │     │
│       │                                             │     │
│       ▼                                             │     │
│   ┌─────────────────────────────┐                   │     │
│   │  Poll loop                  │                   │     │
│   │  ticker: 50ms (PollInterval)│                   │     │
│   │  timeout: 3s  (WaitTimeout) │                   │     │
│   │                             │                   │     │
│   │  localRevision() >= rev? ───┼── yes ────────────┤     │
│   │       │ no                  │                   │     │
│   │       ▼                     │                   │     │
│   │  select:                    │                   │     │
│   │  ├─ ticker  ──► loop back   │                   │     │
│   │  ├─ timeout ──► return ErrCacheTimeout          │     │
│   │  └─ ctx.Done ──► return ctx.Err()               │     │
│   └─────────────────────────────┘                   │     │
│       │                                             │     │
│       ▼                                             │     │
│   progressRequestor.remove()  (deferred)            │     │
│   (background loop stops sending RequestProgress    │     │
│    once waiting drops to 0)                         │     │
│                                                     │     │
│   ◄─────────────────────────────────────────────────┘     │
│   ◄───────────────────────────────────────────────────────┘
│
▼
store.Get(startKey, endKey, requestedRev)
│
├── error ──► return error
│
▼
return GetResponse { Header.Revision, Kvs, Count }

Background: conditionalProgressRequestor.run(ctx)

┌──────────────────────────────────────────────────────┐
│  Blocks on cond.Wait() while waiting == 0            │
│                                                      │
│  When waiting > 0:                                   │
│    timer: 100ms (RequestInterval)                    │
│    sends watcher.RequestProgress(ctx)                │
│    ──► etcd responds with progress notification      │
│    ──► watch stream delivers revision update         │
│    ──► store.LatestRev advances                      │
│                                                      │
│  When waiting drops to 0:                            │
│    timer resets to 0, loop re-checks, blocks again   │
│                                                      │
│  ctx.Done ──► return                                 │
└──────────────────────────────────────────────────────┘

The progress loop only sends RequestProgress when at least one
waitTillRevision call is actively waiting, avoiding unnecessary
server round-trips for serializable-only workloads.

Part of #19371

@k8s-ci-robot
Copy link
Copy Markdown

Hi @akstron. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@akstron akstron changed the title [WIP] cache: Added consistent read for cache [WIP] cache: Added consistent reads for cache Mar 4, 2026
@akstron akstron changed the title [WIP] cache: Added consistent reads for cache cache: Added consistent reads for cache Mar 4, 2026
@akstron akstron marked this pull request as ready for review March 4, 2026 17:46
@serathius
Copy link
Copy Markdown
Member

Please sign the DCO https://github.com/etcd-io/etcd/pull/21428/checks?check_run_id=65752362584

func revLessThan(n int64) func(int64) bool { return func(r int64) bool { return r < n } }
func revGreaterEqual(n int64) func(int64) bool { return func(r int64) bool { return r >= n } }

func TestCacheConsistentRead(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrate with TestGet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the getTestCases also check for revision which is causing issue while trying to integrate. I want to send put requests outside the prefix to update the revision before every Get request but that also updates the server revision which is being checked in each test case.

example:

=== RUN   TestCacheConsistentRead/fromKey_/foo/b
    cache_test.go:638: revision: got 93, want 8

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good it's checking revision. That's the way

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I update getTestCases according to the new behaviour?

Copy link
Copy Markdown
Member

@serathius serathius Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. thanks. Also updated testWithPrefixGet test cases.

@serathius
Copy link
Copy Markdown
Member

serathius commented Mar 5, 2026

From https://github.com/etcd-io/etcd/pull/21428/checks?check_run_id=65752362584

want: Author: Alok Kumar Singh, Committer: Alok Kumar Singh; Expected "Alok Kumar Singh alokkumar.singh@alokkum-ltmybqp.internal.salesforce.com", but
got "Alok Kumar Singh dev.alok.singh123@gmail.com".

@akstron akstron force-pushed the dev/cache-consistent-read branch from f532d80 to e54ddc6 Compare March 5, 2026 13:46
@serathius
Copy link
Copy Markdown
Member

/ok-to-test

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.44%. Comparing base (84dc6e5) to head (a1cb0a2).
⚠️ Report is 25 commits behind head on main.

Additional details and impacted files

see 24 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21428      +/-   ##
==========================================
- Coverage   68.49%   68.44%   -0.06%     
==========================================
  Files         428      428              
  Lines       35291    35372      +81     
==========================================
+ Hits        24173    24209      +36     
- Misses       9723     9756      +33     
- Partials     1395     1407      +12     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84dc6e5...a1cb0a2. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@akstron akstron force-pushed the dev/cache-consistent-read branch from a102232 to 6c698b1 Compare March 22, 2026 17:16
@akstron akstron force-pushed the dev/cache-consistent-read branch 3 times, most recently from 9d89dce to 2d67fa3 Compare March 22, 2026 17:43
@akstron akstron force-pushed the dev/cache-consistent-read branch from 2d67fa3 to dc7cee7 Compare March 22, 2026 18:55
opts: []clientv3.OpOption{clientv3.WithSerializable()},
},
{
name: "single key /foo/a serializable at rev=latest+1 (future), returns error",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add serializable read at rev=latest

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a flaky test as the cache might not have caught up causing rpctypes.ErrFutureRev. Should I add with baseRev by updating optsFunc to accept baseRev as well?

@akstron akstron force-pushed the dev/cache-consistent-read branch 2 times, most recently from 2240136 to 3f2b88a Compare March 23, 2026 14:52
Signed-off-by: Alok Kumar Singh <dev.alok.singh123@gmail.com>
@akstron akstron force-pushed the dev/cache-consistent-read branch from 3f2b88a to a1cb0a2 Compare March 23, 2026 15:02
t.Run("cache_already_caught_up", func(t *testing.T) {
c, _ := newCacheForWaitTest(10, 10, newTestProgressRequestor())

if err := c.waitTillRevision(context.Background(), 10); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if waitTillRevision has a bug this test will block for very long. Would be good to have some timeout based on expected time.

t.Run("timeout", func(t *testing.T) {
c, _ := newCacheForWaitTest(10, 5, newTestProgressRequestor())

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test just waits 10 seconds on every execution. First that's too long time for happy path, second maybe using https://go.dev/blog/synctest would help with test design here.

@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akstron, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius serathius merged commit b83b67e into etcd-io:main Mar 23, 2026
33 checks passed
@serathius
Copy link
Copy Markdown
Member

serathius commented Mar 23, 2026

Great job! Please address the comments and TODOs in followup PRs.

@serathius
Copy link
Copy Markdown
Member

Next step would be to implement waitTillRevision based on signals and not polling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants