Skip to content

feat(ide/config): add unified config support to the static-analyzer server#923

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 9 commits into
mainfrom
rob/feat/support-code-security-unified-config-file-K9CODESEC-2303
Jun 25, 2026
Merged

feat(ide/config): add unified config support to the static-analyzer server#923
gh-worker-dd-mergequeue-cf854d[bot] merged 9 commits into
mainfrom
rob/feat/support-code-security-unified-config-file-K9CODESEC-2303

Conversation

@robertohuertasm-datadog

@robertohuertasm-datadog robertohuertasm-datadog commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Context

As part of the work described in Section 4 — Prerequisite: Unified Config File Support, the static-analyzer server needs to understand both the legacy static-analysis.datadog.yml format and the new unified code-security.datadog.yaml format (schema version v1.0), so the VS Code extension can drive mutations against whichever file the user has.

The dd-source backend already supports schema_version=v1 (unified) on POST /api/v2/static-analysis/config/client. This PR covers only the local static-analyzer server side — the VS Code extension changes will follow in a separate PR.

Problem

The server had two issues that would prevent it from being usable as a unified-config host:

  1. Production parsing was locked to legacy. from_yaml_content called parse_any_schema_yaml only under cfg!(test) and file_legacy::parse_yaml in release builds. A code-security.datadog.yaml sent to the server would silently be parsed as legacy content — or fail without a useful error — making every mutation endpoint incorrect for unified files.

  2. No format signal on mutation/validation requests. Endpoints like ignore-rule, add-rulesets, and can-onboard had no way for the caller to declare what format the file is in. Without that signal, the server could not validate that the content matches the declared filename format (matching CLI behaviour), could not pick the right empty-file default when creating a new config, and could not produce the correct YAML serialization on write-back.

  3. Blank-content edge case. When an existing but empty code-security.datadog.yaml (e.g. just created, content = base64("\n")) was sent to add-rulesets, from_yaml_content's early-return for blank content always yielded a legacy default, which then failed validate_format(Unified) with SchemaMismatch even though there was nothing wrong.

Approach: Option A — schema_version field on existing routes, no new routes

Following the design in the RFC above, format is signalled via an optional schema_version field added to the JSON body of existing endpoints. This avoids proliferating product-scoped routes: the IDE determines format from the filename, sends it as "v1" (unified) or omits it (legacy), and the server validates accordingly.

Backward compatibility is a hard requirement. Old extensions that never send schema_version must continue to work unchanged. This is guaranteed by:

  • schema_version is #[serde(default)] Option<String> on every request struct — missing field deserializes as None.
  • NoneConfigFormat::Legacy everywhere.
  • No #[serde(deny_unknown_fields)] on request structs, so a new extension talking to an old binary simply has its schema_version field ignored.

What changed

static_analysis_config_file.rs

  • Un-gated dual-format parsing. Removed the cfg!(test) guard; production now always calls parse_any_schema_yaml, which detects the actual schema variant and stores it in WithVersion<Legacy, CodeSecurity>.
  • ConfigFormat enum + detected_format() accessor. ConfigFormat::Legacy / ConfigFormat::Unified derived from the WithVersion discriminant. Used everywhere format validation is needed.
  • validate_format(declared) method returns Err(SchemaMismatch) when detected ≠ declared, mirroring CLI filename-strict behaviour.
  • default_unified() constructor. Returns an empty WithVersion::CodeSecurity config — used when add-rulesets is called with no existing content and schema_version=v1.
  • WithHint<T> wrapper + new TryFrom impls. Replaces the old format-unaware TryFrom<String> / TryFrom<Base64String>. WithHint<T>(T, Option<ConfigFormat>) carries the caller's declared format so from_yaml_content can pick the right empty default. Callers with schema_version pass Some(format); deprecated/legacy routes pass None.
  • Bug fix — blank content edge case. from_yaml_content now accepts Option<ConfigFormat> hint; when content is blank it returns default_unified() for Some(Unified) and the legacy default otherwise. This prevents add-rulesets and ignore-rule from failing with SchemaMismatch when a code-security.datadog.yaml exists but is empty.
  • with_ignored_rule and with_added_rulesets updated to forward declared_format via WithHint.

models.rs

Added #[serde(default)] schema_version: Option<String> to:

  • IgnoreRuleRequest
  • AddRuleSetsRequest
  • CanOnboardRequest
  • ParseConfigRequest

endpoints.rs

  • post_parse_config — now accepts schema_version; constructs via WithHint(configuration, Some(format)), then calls validate_format — returns 400 on mismatch.
  • post_can_onboard_v2 — now accepts schema_version; same parse + validate pattern; returns 500 on mismatch (existing error path the client already treats as blocking).
  • post_ignore_rule / add_rulesets — delegate to the updated with_ignored_rule / with_added_rulesets which now carry the format hint.
  • Deprecated routes (get_can_onboard, get_rulesets) updated to WithHint(..., None) — legacy-default, no behaviour change.

error.rs

Added SchemaMismatch variant (code: 4) to ConfigFileError.

mod.rs (IDE integration tests)

Extended the test suite with coverage for:

  • Strict parse: unified content + schema_version=v1 → 200; legacy content + schema_version=v1 → 400 (SchemaMismatch); unified content + no schema_version → 400.
  • Mutation (add-rulesets, ignore-rule) format validation.
  • can-onboard with schema_version.
  • Backward compat: legacy content + no schema_version → behaves as today.
  • Empty content + schema_version=v1 → creates a unified default file.

file_v1.rs

Added Default impl for YamlConfigFile, needed by default_unified().

Validation

cargo test ->? 77 tests, all pass
cargo fmt -- --check -> clean
cargo clippy -- -D warnings -> clean

Retrocompatibility: Tested this binary with the current extension and it works.

What's not in this PR

The VS Code extension changes (unified-first discovery, sending schema_version in request bodies, binary version gate, etc.) are the next step and will be in a separate PR against the datadog-vscode repo.

K9CODESEC-2169
K9CODESEC-2303

…erver

As part of the work described in [Section 4 — Prerequisite: Unified Config File Support](https://docs.google.com/document/d/1L4rdpTwafiVxFn1HIlSw0f6cJ61GihDOzMFvZwuzlMw/edit?tab=t.0#heading=h.mr90lg23mmfd), the static-analyzer server needs to understand both the legacy `static-analysis.datadog.yml` format and the new unified `code-security.datadog.yaml` format (schema version `v1.0`), so the VS Code extension can drive mutations against whichever file the user has.

The dd-source backend already supports `schema_version=v1` (unified) on `POST /api/v2/static-analysis/config/client`. This PR covers only the **local static-analyzer server** side — the VS Code extension changes will follow in a separate PR.

The server had two issues that would prevent it from being usable as a unified-config host:

1. **Production parsing was locked to legacy.** `from_yaml_content` called `parse_any_schema_yaml` only under `cfg!(test)` and `file_legacy::parse_yaml` in release builds. A `code-security.datadog.yaml` sent to the server would silently be parsed as legacy content — or fail without a useful error — making every mutation endpoint incorrect for unified files.

2. **No format signal on mutation/validation requests.** Endpoints like `ignore-rule`, `add-rulesets`, and `can-onboard` had no way for the caller to declare what format the file is in. Without that signal, the server could not validate that the content matches the declared filename format (matching CLI behaviour), could not pick the right empty-file default when creating a new config, and could not produce the correct YAML serialization on write-back.

3. **Blank-content edge case.** When an existing but empty `code-security.datadog.yaml` (e.g. just created, content = `base64("\n")`) was sent to `add-rulesets`, `from_yaml_content`'s early-return for blank content always yielded a legacy default, which then failed `validate_format(Unified)` with `SchemaMismatch` even though there was nothing wrong.

Following the design in the RFC above, format is signalled via an optional `schema_version` field added to the JSON body of existing endpoints. This avoids proliferating product-scoped routes: the IDE determines format from the filename, sends it as `"v1"` (unified) or omits it (legacy), and the server validates accordingly.

**Backward compatibility is a hard requirement.** Old extensions that never send `schema_version` must continue to work unchanged. This is guaranteed by:
- `schema_version` is `#[serde(default)] Option<String>` on every request struct — missing field deserializes as `None`.
- `None` → `ConfigFormat::Legacy` everywhere.
- No `#[serde(deny_unknown_fields)]` on request structs, so a new extension talking to an old binary simply has its `schema_version` field ignored.

- **Un-gated dual-format parsing.** Removed the `cfg!(test)` guard; production now always calls `parse_any_schema_yaml`, which detects the actual schema variant and stores it in `WithVersion<Legacy, CodeSecurity>`.
- **`ConfigFormat` enum + `detected_format()` accessor.** `ConfigFormat::Legacy` / `ConfigFormat::Unified` derived from the `WithVersion` discriminant. Used everywhere format validation is needed.
- **`validate_format(declared)`** method returns `Err(SchemaMismatch)` when detected ≠ declared, mirroring CLI filename-strict behaviour.
- **`default_unified()` constructor.** Returns an empty `WithVersion::CodeSecurity` config — used when `add-rulesets` is called with no existing content and `schema_version=v1`.
- **`WithHint<T>` wrapper + new `TryFrom` impls.** Replaces the old format-unaware `TryFrom<String>` / `TryFrom<Base64String>`. `WithHint<T>(T, Option<ConfigFormat>)` carries the caller's declared format so `from_yaml_content` can pick the right empty default. Callers with `schema_version` pass `Some(format)`; deprecated/legacy routes pass `None`.
- **Bug fix — blank content edge case.** `from_yaml_content` now accepts `Option<ConfigFormat> hint`; when content is blank it returns `default_unified()` for `Some(Unified)` and the legacy default otherwise. This prevents `add-rulesets` and `ignore-rule` from failing with `SchemaMismatch` when a `code-security.datadog.yaml` exists but is empty.
- **`with_ignored_rule` and `with_added_rulesets`** updated to forward `declared_format` via `WithHint`.

Added `#[serde(default)] schema_version: Option<String>` to:
- `IgnoreRuleRequest`
- `AddRuleSetsRequest`
- `CanOnboardRequest`
- `ParseConfigRequest`

- **`post_parse_config`** — now accepts `schema_version`; constructs via `WithHint(configuration, Some(format))`, then calls `validate_format` — returns `400` on mismatch.
- **`post_can_onboard_v2`** — now accepts `schema_version`; same parse + validate pattern; returns `500` on mismatch (existing error path the client already treats as blocking).
- **`post_ignore_rule` / `add_rulesets`** — delegate to the updated `with_ignored_rule` / `with_added_rulesets` which now carry the format hint.
- **Deprecated routes** (`get_can_onboard`, `get_rulesets`) updated to `WithHint(..., None)` — legacy-default, no behaviour change.

Added `SchemaMismatch` variant (`code: 4`) to `ConfigFileError`.

Extended the test suite with coverage for:
- Strict parse: unified content + `schema_version=v1` → 200; legacy content + `schema_version=v1` → 400 (`SchemaMismatch`); unified content + no `schema_version` → 400.
- Mutation (`add-rulesets`, `ignore-rule`) format validation.
- `can-onboard` with `schema_version`.
- Backward compat: legacy content + no `schema_version` → behaves as today.
- Empty content + `schema_version=v1` → creates a unified default file.

Added `Default` impl for `YamlConfigFile`, needed by `default_unified()`.

`cargo test # 77 tests, all pass cargo fmt -- --check # clean cargo clippy -- -D warnings`

The VS Code extension changes (unified-first discovery, sending `schema_version` in request bodies, binary version gate, etc.) are the next step and will be in a separate PR against the `datadog-vscode` repo.

K9CODESEC-2169
K9CODESEC-2303
@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 17, 2026

Copy link
Copy Markdown

🎯 Code Coverage (details)
Patch Coverage: 98.78%
Overall Coverage: 85.38% (+0.38%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 584828a | Docs | Datadog PR Page | Give us feedback!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the local datadog_static_analyzer_server IDE configuration pipeline to support both legacy static-analysis.datadog.* YAML and the unified code-security.datadog.yaml (v1.0) format, including strict format validation based on an IDE-declared schema_version.

Changes:

  • Parse legacy and unified schemas in production (no longer test-gated), track detected format, and validate it against a declared format hint.
  • Extend IDE server request models/endpoints to accept optional schema_version and propagate format hints through parsing/mutation paths (including correct defaults for blank content).
  • Add a SchemaMismatch error and expand IDE integration/unit tests to cover strict validation + unified mutations.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/static-analysis-kernel/src/config/file_v1.rs Adds a Default unified config value (schema-version: v1.0) to support “create new unified file” flows.
crates/bins/src/bin/datadog_static_analyzer_server/ide/mod.rs Adds integration tests for schema-version strict validation and unified mutation behavior.
crates/bins/src/bin/datadog_static_analyzer_server/ide/configuration_file/static_analysis_config_file.rs Introduces ConfigFormat, format-aware parsing via WithHint, unified defaults, and schema mismatch validation.
crates/bins/src/bin/datadog_static_analyzer_server/ide/configuration_file/models.rs Adds #[serde(default)] schema_version: Option<String> to relevant request structs for backward compatibility.
crates/bins/src/bin/datadog_static_analyzer_server/ide/configuration_file/error.rs Adds SchemaMismatch error variant and assigns it a stable numeric code.
crates/bins/src/bin/datadog_static_analyzer_server/ide/configuration_file/endpoints.rs Wires schema_version through routes, performs strict validation for parse/can-onboard, and updates deprecated route parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fixed misleading doc comment on `get_can_onboard`: the `content` route parameter is base64-encoded YAML, not a filesystem path.
- Updated `to_response_result` to return `400 Bad Request` for `SchemaMismatch` (a client input error) and keep `500 Internal Server Error` for genuine server failures (`Parser`, `Decoder`, `CommentReconciler`).
- Updated the `ignore_rule_rejects_schema_version_mismatch` integration test to assert `BadRequest` instead of `InternalServerError`.

K9CODESEC-2303

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@robertohuertasm-datadog robertohuertasm-datadog marked this pull request as ready for review June 17, 2026 19:25
@robertohuertasm-datadog robertohuertasm-datadog requested review from a team as code owners June 17, 2026 19:25
@robertohuertasm-datadog robertohuertasm-datadog marked this pull request as draft June 19, 2026 17:17
@robertohuertasm-datadog robertohuertasm-datadog marked this pull request as ready for review June 19, 2026 17:27
…alue

- Added "UNIFIED" as an accepted schema_version string for ConfigFormat::Unified alongside the legacy "v1" value, allowing newer extension versions to use the canonical name
- Kept "v1" mapped to ConfigFormat::Unified for backward compatibility with older extension versions that already send it
- Updated doc comments in static_analysis_config_file.rs and the add_rulesets helper to reflect both accepted values
- Removed the incorrect #[deprecated(note = "IDEs stopped supporting this flow")] annotation from post_add_rulesets_v2 — the VS Code extension actively calls POST /ide/v2/config/add-rulesets for both legacy and unified config mutations

K9CODESEC-2303
…ical

- Reverted schema_version match arm back to only "v1" for ConfigFormat::Unified (removed the "UNIFIED" alias that was added in the prior commit)
- Updated doc comment in ConfigFormat::From impl to reflect that "v1" is the sole canonical wire value, matching what the VS Code extension sends via CONFIG_FORMAT_INFO.UNIFIED.schemaVersion
- Restored doc comments in default_unified and add_rulesets helper to reference "v1" instead of "UNIFIED"

K9CODESEC-2303
- Changed SchemaMismatch error mapping from 400 Bad Request to 422 Unprocessable Entity in to_response_result helper and post_parse_config handler — the content is syntactically valid YAML but semantically wrong (format declared does not match format detected)
- Updated integration tests in mod.rs to assert 422 instead of 400 for all three schema-mismatch scenarios: ignore-rule mismatch, parse-config mismatch, and add-rulesets mismatch
- Updated doc comment on to_response_result to reflect the corrected HTTP status and rationale

K9CODESEC-2303
jasonforal
jasonforal previously approved these changes Jun 23, 2026
content_str = content_str.replace("\\", "/");
}
let config = StaticAnalysisConfigFile::try_from(WithHint(Base64String(content_str), None))
.map_err(|e| Custom(Status::InternalServerError, e))?;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like from the description that this 500 is intentional. Would you please add a comment explaining why? I would otherwise wonder why it's not 400 like elsewhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! You're right, it should be 400 and I've corrected it. Any error on this route is a parse failure on client-supplied content (bad base64 or invalid YAML), so 500 was semantically wrong. I initially kept the original 500 to avoid inadvertently breaking the Visual Studio extension (the only known caller of this deprecated route), but after checking its source I confirmed it uses a plain try/catch with no status-code inspection, so the change is safe. A comment has been added to the code explaining this. Thanks for flagging it!


config
.validate_format(format)
.map_err(|e| Custom(Status::InternalServerError, e))?;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe a comment why this isn't 422?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, changed to 422 Unprocessable Entity, consistent with every other SchemaMismatch mapping in the file. I initially kept the original 500 behavior here as well out of caution, to avoid breaking older clients. After verifying that the Visual Studio extension (the only client calling the deprecated v1 route) does not inspect specific HTTP status codes, and that VS Code's canOnboard uses a generic try/catch with no status-code check either, we confirmed the change is safe to make.

config
}
None => match declared_format {
ConfigFormat::Unified => Self::default_unified(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this here made me wonder: should we change the impl Default for StaticAnalysisConfigFile to return a unified file by default and have "default_legacy" be the exception?

Only thinking if we end up having other places to create these structs, it might not be intuitive why the Default impl for a legacy format. Up to you

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, done. Default now returns a unified file, and default_legacy() is the explicitly named fallback used only for the backward-compat paths (blank content with no declared format, and ConfigFormat::Legacy in with_added_rulesets).

- Changed get_can_onboard (v1 GET) parse failure from 500 to 400 Bad Request — errors here are always caused by client-supplied bad base64 or invalid YAML; confirmed safe by inspecting the Visual Studio extension (DataDog/datadog-visual-studio), which wraps the call in a plain try/catch and does not inspect the specific status code
- Changed post_can_onboard_v2 schema mismatch from 500 to 422 Unprocessable Entity — content is syntactically valid but semantically wrong (declared format does not match detected format), consistent with all other SchemaMismatch mappings in the file
- Added explanatory comment on get_can_onboard clarifying why 400 is correct and that SchemaMismatch cannot occur on this route (no format declared)
- Updated integration tests in mod.rs to assert the corrected status codes for both endpoints

K9CODESEC-2303
…n unified

- Changed impl Default to produce a unified (CodeSecurity) file instead of legacy, reflecting that unified is the standard going forward and legacy is the backward-compat exception
- Renamed default_unified() to default_legacy() — Default::default() now serves the unified case, making legacy the explicitly named fallback
- Updated from_yaml_content blank-content path: None/Legacy hint now calls default_legacy(); Unified hint falls through to Default
- Updated with_added_rulesets None-content path: Unified now calls Default; Legacy now calls default_legacy()
- Updated WithHint doc comment to reflect that None hint routes blank content to default_legacy
- Replaced try_from_returns_default_config_file_if_empty_string test with four focused tests that document all blank-content hint combinations (None, Legacy, Unified) and assert Default is unified

K9CODESEC-2303

@albertvaka albertvaka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping so it can be merged, I have not reviewed it though.

@MarshalX MarshalX left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping so it can be merged, I have not reviewed it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants