Ext authz caching#44874
Conversation
Enables cooperative caching between L7 ext_authz filter and CONE caching filter. - Updated ext_authz.proto with warning documentation. - Added raw_check_response to internal Response struct. - Implemented tryCacheHit in L7 filter to bypass external call. - Populated raw_check_response in gRPC and HTTP clients (with HTTP synthesis). - Recorded CheckResponse to dynamic metadata in L7 filter onComplete. - Added invalid_cached_response stat. - Added extensive unit tests. Signed-off-by: Todd Greer <tgreer@google.com>
Avoids creating temporary variables raw_check_response and error_response in RawHttpClientImpl::toResponse. Cleaned up trailing whitespace in common BUILD. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Inlined metadata variable, cleaned up make_unique Response construction, and added ASSERT to Denied fallback in L7 filter. Refactored cache test to use compound namespaces and removed empty constructor. Added CacheHitErrorFailClosed and CacheHitErrorFailOpen tests. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Added Cooperative Caching Bypass section to ext_authz_filter.rst describing metadata-based bypass, misses, and fallback. Added invalid_cached_response to statistics table. Addressed review comments regarding CONE mention and redundant security considerations. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Added release note to changelogs/current.yaml under new_features describing ext_authz cooperative caching bypass and the new invalid_cached_response statistic. Refined wording to refer to it as external authorization HTTP filter. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
Added docs and release notes. |
# Conflicts: # changelogs/current.yaml Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
Decoupling concerns is a nice goal, but this is very hard to use or configure. I think this needs a built-in cache implementation that does something sensible. This topic has come up before, and I think we should at least consider having the caching be an extension point on the ext_authz filter. What is the expected flow for inserting something into the cache? How does the message get from the ext_authz response to the caching filter? This requires an integration test. /wait |
|
Thank you for the quick initial review. I clearly didn't adequately explain the motivation--sorry. My team is working on a caching service for Envoy with the following goals:
From here on, I'm just going to talk about caching for ext_authz filters, but caching for ext_proc filters will use the same flows. (It just gets confusing otherwise because the caching filter is itself an ext_proc filter.) Users of this service will add an ext_proc filter downstream of their ext_authz filters. There are two new flows:
I'm leaving out a lot of details on the caching service, including mountains of configuration (various ttls, freshness and validation rules, cache key construction, variant handling, and so much more), because this PR is only about the ext_authz changes. BTW, this feature could be used for purposes other than caching. I could imagine a filter that applies some sort of custom logic to force the ext_authz filter to accept/reject a request. Perhaps there are more use cases. I'll add an integration test. Thank you, |
|
Why even use ext_authz? ext_proc is roughly a superset of what ext_authz can do. Why involve two filters and make this complicated relationship, when you could just have ext_proc send back an unauthorized response. |
Created ext_authz_cache_integration_test.cc with OK and Denied cache hit integration tests. Simulates the caching filter using header_to_metadata to dynamically inject Base64 CheckResponse. Verifies end-to-end gRPC bypass and header mutation/local reply logic in a real Envoy process. Registered in BUILD. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
Some of our proxies have multiple ext_authz filters. Their configurations have been debugged and work. They talk to mature authentication services owned by different groups. We want to put one caching ext_proc filter in front of them that makes one request to a caching service to look for cached responses for all of them. With this proposal, we add a single field to each of their configs (to say where in dynamic metadata the cached responses go), and we add one new ext_proc filter that is configured to talk to all of them. If we just use ext_proc to do the caching and multiple auth checks, we'd have to write a custom filter, because ext_proc can't talk to multiple external services. Additionally, we'd have to combine all of the config from the cache and from all of the original ext_* filters. Our proxy owners would not accept this solution. (In fact, their dislike of compiling/linking/configuring custom filters is why we started on this path.) Alternatively, we could use a single ext_proc filter, and move all the logic about contacting external services into the caching service. This would be worse, because it would move proxy-specific behaviors and config into what is intended to be a proxy-agnostic caching service. We're absolutely open to other ways to do this--this is the only good approach we've found, it's always possible that we may have overlooked something. BTW, when I wrote the design doc and the PR, I was unsure how much to go into these questions. (The ratio of design doc length to code length started to get pretty high.) I may have favored brevity too much. |
|
My concern is that you're adding code and complexity to ext_authz for a very specific use case that it's unlikely anyone else will use. |
Refactored ext_authz_cache_integration_test.cc incorporating your reviews: switched downstream to HTTP2, renamed initializeConfig() to initialize() override, removed redundant comments, and renamed the metadata key to cached_authz_response. Kept createUpstreams() to allocate secondary dynamic ports. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
It's much simpler than any other way to cache the results of ext_authz calls. If your concern is that we're the only ones who want to cache ext_authz, then my response is that there are several very different teams in Google with proxies that would like to use this, and Google isn't all that weird anymore, so something needed by several of our groups is probably also wanted by some non-Google users as well. Added integration tests. |
|
There's been plenty of interest in caching ext_authz response; I believe you on that. But this implementation is very specific and difficult for others to use. I'd like to see an implementation that is in some way usable for others. Maybe make an additional filter that is a cache implementation (with no external calls; just in-memory). I think using untyped dynamic metadata makes the interface harder to use; it would be better to fix ext_proc to support typed metadata. |
|
If we switch from untyped to typed metadata, do you think that would give this wide enough appeal? Seems like a reasonable approach. |
|
|
|
Ok, now that I've read some typed metadata docs, it again sounds reasonable. Please ignore my previous message. To use typed dynamic metadata: and another like it in ProcessingRequest.request, It would be somewhat less efficient, due to sending the type info to the caching service, but that's probably not a big deal. I prefer the existing version, due to its slightly better efficiency, but I'll see if the rest of the team raise any big concerns with using typed metadata. |
Switched L7 HTTP external authorization caching bypass to strongly-typed dynamic metadata. Replaced check_response_metadata_key with check_response_typed_metadata_namespace in API and C++ config. Refactored tryCacheHit() and onComplete() to directly read and write CheckResponse Any payloads under the configured namespace. Updated unit and integration test suites. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
Switched to typed dynamic metadata. You were right--this is better |
Hi @ggreenway , I just wanted to clarify and give a e2e example flow here: This PR is to give ext_authz a way to use cached results from ext_proc callout. In other words, we leverage the ext_proc here for external callout to caching service and other filters can use it as well: Regarding the in-memory cache option: Yes, it is viable option and it is more performant. But it has its own limitation: in-memory cache also is tied with Envoy instance (depends on implementation: per filter, thread local, or process wide). This will lead to inaccurate cache miss. In order to have a global view across envoy/ext_authz instances, I think we will need to have some kinds of external service/storage. With that said, i think we can provide multiple options here : remote vs in-memory for customer to choose based on their use cases and performance requirement. We are currently first implement the remote option due to our deployment requirement and use case. |
|
my 2c: RLQS is actually quite similar to what you'd want from a caching authz. The servers should specify a bucket and an expiration deadline for the decisions. The bucket is a matcher over the request so that you can group. I don't quite understand why you'd want to use |
|
I added some comments in #44852. I'm not sure this is really the best approach, given that this is something we're going to also need to support in gRPC. |
Introduces AuthCache and AuthCacheFactory interfaces to support pluggable caching. Integrates cache lookup, insert, and destroy into the ext_authz L7 filter. Configures the cache via the new cache field (TypedExtensionConfig) in ext_authz.proto. Added unit tests and configuration tests to verify cache hit, miss, and destruction behavior. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
- Removed check_response_typed_metadata_namespace from ext_authz.proto and renumbered cache to 33. - Removed obsolete ext_authz_cache_test and ext_authz_cache_integration_test. - Introduced initiating_lookup_ flag to Filter to correctly handle synchronous cache lookups. - Added CacheHitOkSync test to verify synchronous cache hit behavior. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
TAG=agy CONV=4e84dffc-a952-4624-bf5e-53e7036c5a07 Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Implement a simple in-memory cache extension for ext_authz and use it in integration tests to verify cache miss (populates cache) and cache hit (bypasses ext_authz call) behavior. TAG=agy CONV=4e84dffc-a952-4624-bf5e-53e7036c5a07 Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Signed-off-by: toddmgreer <toddmgreer@gmail.com>
|
@kyessenov The reason for this PR is that we have envoy deployments that currently use ext_authz, and we want to reduce their average latency and server load. We want to do this with minimal changes in the config of those envoys, and without requiring that they link in custom filters, and the many of the auth decisions are highly cacheable in nature. The cache server will indeed be configured with a specification of how to match requests (like RLQS's buckets*) and with expiry information. We considered something like a conditional feature, but that would only work well if the auth decisions were only accept/reject. However, they routinely include header modification, and we don't want to replicate that behavior into a different filter. Instead, we want to avoid the RPC to the auth server, but still let the ext_authz filter apply the decision as it does today.
|
- Fix compilation error in ext_authz_http_impl.cc by removing redundant std::make_unique. - Fix ext_authz_test.cc merge corruption by reordering and rejoining split tests. TAG=agy CONV=69b8b2f4-28b3-4ae7-a4ac-9da477171a74 Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Signed-off-by: Todd Greer <toddmgreer@gmail.com>
| * @param parent_span The parent span for tracing. | ||
| * @param stream_info The stream info. | ||
| */ | ||
| virtual void lookup(const envoy::service::auth::v3::CheckRequest& request, LookupCallback&& cb, |
There was a problem hiding this comment.
I don't think this interface should use CheckRequest as the argument type, because we don't want the data plane to have to construct the ext_authz request proto if it doesn't actually have to send the ext_authz request.
I'm not sure what the right type is to use in Envoy's implementation, but I think what we want here is to pass in the request attributes.
There was a problem hiding this comment.
I'm guessing that you don't want me to pass the AttributeContext messages, because constructing them is most of the cost of constructing a RequestCheck. To minimize cost, I switched to passing the objects that ext_authz uses as inputs to construct the RequestCheck, or the decoder_callbacks for things that can be reached through that. This works well for our internal cache implementation and seems like a reasonable choice for the general case.
There was a problem hiding this comment.
All comments are addressed, and the PR should be ready for further review.
Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Refactor the ext_authz filter to avoid constructing the CheckRequest protobuf message on cache hits. - Introduced `RequestAttributes` struct to hold raw references to Envoy objects needed for cache lookup and request construction. `RequestAttributes` is cached in the `Filter` class. - Modified `AuthCache::lookup` to accept `RequestAttributes` instead of `CheckRequest`. - Updated `Filter::initiateCall` to collect attributes first, attempt cache lookup, and only construct `CheckRequest` on cache miss. - Cached `merged_per_route_config_` in `Filter` to avoid multiple traversals of route configs, resolving test expectation failures. - Updated relevant tests and mocks to match the new signatures. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Rename "Cooperative Caching Bypass" to "ExtAuthz Caching" and update it to describe the pluggable caching mechanism (via the `cache` field). Remove reference to the obsolete `invalid_cached_response` statistic. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
As requested by the user, add the invalid_cached_response statistic back to the documentation. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Remove the `raw_check_response` field from the `Response` struct, as it is no longer needed. Update HTTP and gRPC client implementations and tests to remove usages of this field. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Rename processResponse to onComplete in the HTTP filter. Merge processResponse logic into onComplete and handle cache insertion conditionally based on whether the response came from the cache or a network call. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Remove a comment in collectAttributes that incorrectly stated we would definitely be making a network request. With caching enabled, we might have a cache hit and avoid the network request. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
markdroth
left a comment
There was a problem hiding this comment.
This looks like the right approach! Just one small comment from my side.
Clean up formatting in ext_authz_grpc_impl_test.cc and ext_authz_http_impl_test.cc. Reorder imports in ext_authz.proto to match alphabetical order. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
Add the [#extension-category: envoy.filters.http.ext_authz.cache] annotation to the cache field in ExtAuthz proto, replacing the manual comment description. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
This reverts commit 0125d5e. Signed-off-by: Todd Greer <toddmgreer@gmail.com>
|
/publish |
|
/kick |
|
/lgtm api |
|
@ggreenway, @tyxia, and @yanjunxiang-google, the api review is done so you're up! I think the failure to restore the docker cache in verify_distro must be a flake, but my attempts to trigger a retest didn't work. Maybe you have to be a maintainer? Should I push an empty commit? |
Commit Message: Cache-enable Envoy External Authentication HTTP Filter
Additional Description: Give ext_authz a way to be overridden by cached authentication results and to inform a cache of the same.
Risk Level: Low. The new feature is only active when configured, and is limited in scope.
Testing: See new tests in PR.
Docs Changes: In ext_authz_filter.rst.
Release Notes: In current.yaml
Platform Specific Features: none
Fixes #44852
I used the Gemini generative AI for both code and tests. I have reviewed and understand all generated code, though I'm new to ext_authz, so I could have misunderstandings.