feat(metrics): add optional id and properties fields to metrics output#5983
feat(metrics): add optional id and properties fields to metrics output#5983not4s wants to merge 8 commits into
Conversation
8cdc240 to
e5233da
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5983 +/- ##
==========================================
+ Coverage 83.11% 83.12% +0.01%
==========================================
Files 277 277
Lines 30207 30259 +52
==========================================
+ Hits 25106 25154 +48
- Misses 5101 5105 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
aaebe7b to
dee5388
Compare
dee5388 to
c5ba4ef
Compare
4afe981 to
ec7e4cc
Compare
| /// Maximum number of metrics properties that can be configured. | ||
| const MAX_PROPERTIES: usize = 10; | ||
| /// Maximum length (in bytes) of a metrics property key. | ||
| const MAX_KEY_LEN: usize = 64; | ||
| /// Maximum length (in bytes) of a metrics property value. | ||
| const MAX_VALUE_LEN: usize = 512; |
There was a problem hiding this comment.
Maybe I don't remember, but was there a decision to put these restrictions? imho these look very arbitrary and I don't see a lot of value in them.
There was a problem hiding this comment.
I agree that they are a bit arbitrary, but have communicated with mancio and Nemo on this. Would you say even the cap on the number of properties doesn't have a lot of value? Although the strength for the case is a bit different, I do see that there are cases where we validate input from operator input such as MAX_SUPPORTED_VCPUS or CMDLINE_MAX_SIZE
| /// Configures the metrics as described in `metrics_cfg`. | ||
| pub fn init_metrics(metrics_cfg: MetricsConfig) -> Result<(), MetricsConfigError> { | ||
| if let Some(properties) = &metrics_cfg.properties { | ||
| validate_properties(properties)?; |
There was a problem hiding this comment.
you can move this validate to the lower if check, moves all properties code in one place
| /// Enables emission of the `id` field. Returns an error if already enabled. | ||
| pub fn enable(&self) -> Result<(), MetricsError> { | ||
| if self.enabled.swap(true, Ordering::Relaxed) { | ||
| return Err(MetricsError::AlreadyInitialized); |
There was a problem hiding this comment.
Is this really an error to enable id multiple times?
There was a problem hiding this comment.
You're right, technically speaking this is an AtomicBool not a OnceLock so isn't really an error. But was thinking we might want to be consistent with the other types since the AtomicBool here is intended to be set once.
b45143b to
f5cdd70
Compare
Allow operators to attach custom key-value properties to the metrics config, via PUT /metrics or the boot config file. When set, the properties are emitted under a top-level "properties" field on every metrics line; when omitted, output is unchanged for backwards compatibility. Properties are set once at config time (pre-boot) and fixed for the VM's lifetime, matching how metrics are configured today. They are bounded (max 10 keys, 64-byte keys, 512-byte values) to keep metrics lines from bloating. This lets operators tag a microVM's metrics with their own context, such as the customer or bundle running in it. Signed-off-by: Jay Chung <jaehoc@amazon.com>
The metrics-line validator builds a strict jsonschema (additionalProperties: False), so the optional top-level "properties" map emitted by configurable metrics properties would be rejected as an unexpected key. Register "properties" in the schema as an optional object with string values, without marking it required. Real metrics stay strictly validated; the key is tolerated when present and ignored when absent. Signed-off-by: Jay Chung <jaehoc@amazon.com>
Add an "Optional metrics fields" section describing the operator-defined `properties` map: how to set it via the API or a config file, its size bounds, and example output. Also document it in the swagger Metrics definition. Signed-off-by: Jay Chung <jaehoc@amazon.com>
Operators shipping metrics from many microVMs to a shared sink (e.g. one CloudWatch log group) cannot tell which VM a line came from, since lines carry no identifier. Add an opt-in `emit_id` field to the metrics configuration that emits the microVM instance id (the jailer `--id`) under a top-level `id` field on every metrics line. Keep it independent of the custom `properties` map: either can be enabled without the other. Both default to off, so the line shape is unchanged for existing consumers. Signed-off-by: Jay Chung <jaehoc@amazon.com>
Tolerate the top-level `id` field as an optional string in the metrics schema validator, mirroring the existing `properties` tolerance, while keeping every real metric strictly validated. Signed-off-by: Jay Chung <jaehoc@amazon.com>
Add the `id` field to the "Optional metrics fields" docs and the swagger Metrics definition: how `emit_id` enables it, and example output showing it alongside the properties. Signed-off-by: Jay Chung <jaehoc@amazon.com>
Record the two new opt-in metrics fields, `emit_id` and `properties`, under the Unreleased section as a single entry for the PR. Signed-off-by: Jay Chung <jaehoc@amazon.com>
Cover the emitted metrics line shape, which the unit tests cannot reach since they exercise the serializer in isolation rather than a running microVM. Assert each combination of the two opt-in fields: neither set, `emit_id` alone, `properties` alone, and both together, checking that the `id` and `properties` keys appear (or not) and carry the configured values. Signed-off-by: Jay Chung <jaehoc@amazon.com>
f5cdd70 to
bd6f9f0
Compare
Changes
Add two optional, independent fields to the metrics configuration (
PUT /metricsand themetricsblock of a config file), each emitted as a top-level key on every metrics line and each off by default:emit_id(boolean): emits the microVM instance id (the value passed to the jailer--id) under a top-levelidfield.properties(map of string to string): emits operator-defined key-value pairs under a top-levelpropertiesfield.When neither is configured, the metrics line is byte-for-byte unchanged, so existing consumers are unaffected.
Example line with both enabled:
{"utc_timestamp_ms": 1739000000000, "id": "my-instance", "properties": {"bundle_id": "fn-abc", "customer_id": "1234"}, "api_server": {"...": 0}}Reason
Operators that collect metrics from many microVMs into a single destination (for example, a log agent that globs every VM's metrics file into one log stream) cannot tell which microVM a given line came from, because the lines carry no identifier. The original feature request asked for arbitrary key-value tagging to attribute lines to a VM; subsequent discussion clarified that the primary need is simply a stable, authoritative identifier, with operator tags as an enrichment.
Design decisions
id as a separate top-level field, not a key inside properties. An earlier idea was to auto-populate the instance id inside the properties map. A separate id field was chosen instead, for several reasons:
Two independent opt-in switches: Both default to off so the default line shape is unchanged (backwards compatibility). emit_id is a boolean toggle modeled on the existing LoggerConfig show_level / show_log_origin toggles.
No CLI flag for properties: Structured data is supplied through the API or a config file, not the command line. This matches convention that CLI arguments are scalars, paths, and booleans, with structured input provided by file (as MMDS content already does). --metrics-path continues to configure only the path.
Omitted when unset, not emitted empty: When a field is not configured its key is absent entirely.
Snapshot behavior unchanged: Metrics configuration is not part of the snapshot and is reconfigured on a restored microVM; this change adds nothing to the snapshot surface.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.