Skip to content

feat(fake_data_generator): Add per-record variance to static log generation#2543

Merged
jmacd merged 3 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/fakegen1
Apr 6, 2026
Merged

feat(fake_data_generator): Add per-record variance to static log generation#2543
jmacd merged 3 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/fakegen1

Conversation

@cijothomas
Copy link
Copy Markdown
Member

@cijothomas cijothomas commented Apr 5, 2026

Summary

The static log generator previously produced batches where all records were
nearly identical — same body, same attribute values, same severity, no trace
context. This made payloads unrealistically compressible by Arrow/columnar
encoders (~57:1 zstd ratio), giving misleading throughput numbers in load tests.

Changes

  • Body: 50 realistic log message templates (~150 chars each) cycled across
    records. When log_body_size_bytes is configured, templates are repeated to
    fill the target size. When 0, body is omitted entirely.
  • Attribute names: unified pool of 80 OTel semconv attribute names (e.g.
    thread.id, http.route, db.query.text). Overflow names use attr_N.
  • Attribute values: 80-entry pool of realistic strings — URLs, UUIDs,
    hostnames, SQL queries, error messages, user agents — cycled per record.
    Special-cased types for thread.id (int), http.response.status_code
    (weighted distribution), etc.
  • Severity: cycles 80% INFO / 15% WARN / 5% ERROR instead of all-INFO.
  • TraceID / SpanID: random unique IDs per log record, matching real
    log-to-trace correlation behavior.
  • Compression test: new test_compression_ratio_is_realistic asserts
    zstd ratio stays in 3:1–50:1 range (currently ~19:1), guarding against
    regression to the old all-identical regime.
  • Config docs: updated log_body_size_bytes doc to reflect pool behavior.

Impact

Both fresh and pre_generated strategies benefit since all variance is
computed at generation time. PreGenerated retains zero runtime allocation
cost — only the pre-built batch content is more realistic.

Compression ratio improved from ~57:1 (before) to ~19:1 (after) with
log_body_size_bytes: 1024, num_log_attributes: 6.

@cijothomas cijothomas requested a review from a team as a code owner April 5, 2026 17:51
@github-actions github-actions bot added the rust Pull requests that update Rust code label Apr 5, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 98.05825% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.34%. Comparing base (d8e64e0) to head (18c6b95).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2543      +/-   ##
==========================================
- Coverage   88.34%   88.34%   -0.01%     
==========================================
  Files         613      613              
  Lines      222675   222734      +59     
==========================================
+ Hits       196731   196772      +41     
- Misses      25420    25438      +18     
  Partials      524      524              
Components Coverage Δ
otap-dataflow 90.23% <98.05%> (-0.01%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.74% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 52.45% <ø> (ø)
quiver 91.92% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cijothomas cijothomas force-pushed the cijothomas/fakegen1 branch 4 times, most recently from 106c918 to 7688bd1 Compare April 6, 2026 06:06
@cijothomas
Copy link
Copy Markdown
Member Author

I was thinking of using the semantic_conventions itself, and generate a bunch of them at startup and then cycle through it - But the existing ones can only produce logs of size ~300 bytes, and we are targeting ~1KB, so we anyway have to put fillers in. This is first step in improve fake-gen, next is to have the perf tests switch to this.

@cijothomas cijothomas force-pushed the cijothomas/fakegen1 branch from 7688bd1 to 152d0c1 Compare April 6, 2026 15:28
Copy link
Copy Markdown
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Few minor nits but nothing blocking — LGTM.

@jmacd jmacd added this pull request to the merge queue Apr 6, 2026
Merged via the queue into open-telemetry:main with commit 092500c Apr 6, 2026
67 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants