feat(fake_data_generator): Add per-record variance to static log generation by cijothomas · Pull Request #2543 · open-telemetry/otel-arrow

cijothomas · 2026-04-05T17:51:31Z

Summary

The static log generator previously produced batches where all records were
nearly identical — same body, same attribute values, same severity, no trace
context. This made payloads unrealistically compressible by Arrow/columnar
encoders (~57:1 zstd ratio), giving misleading throughput numbers in load tests.

Changes

Body: 50 realistic log message templates (~150 chars each) cycled across
records. When log_body_size_bytes is configured, templates are repeated to
fill the target size. When 0, body is omitted entirely.
Attribute names: unified pool of 80 OTel semconv attribute names (e.g.
thread.id, http.route, db.query.text). Overflow names use attr_N.
Attribute values: 80-entry pool of realistic strings — URLs, UUIDs,
hostnames, SQL queries, error messages, user agents — cycled per record.
Special-cased types for thread.id (int), http.response.status_code
(weighted distribution), etc.
Severity: cycles 80% INFO / 15% WARN / 5% ERROR instead of all-INFO.
TraceID / SpanID: random unique IDs per log record, matching real
log-to-trace correlation behavior.
Compression test: new test_compression_ratio_is_realistic asserts
zstd ratio stays in 3:1–50:1 range (currently ~19:1), guarding against
regression to the old all-identical regime.
Config docs: updated log_body_size_bytes doc to reflect pool behavior.

Impact

Both fresh and pre_generated strategies benefit since all variance is
computed at generation time. PreGenerated retains zero runtime allocation
cost — only the pre-built batch content is more realistic.

Compression ratio improved from ~57:1 (before) to ~19:1 (after) with
log_body_size_bytes: 1024, num_log_attributes: 6.

codecov · 2026-04-05T17:54:10Z

Codecov Report

❌ Patch coverage is 98.05825% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.34%. Comparing base (d8e64e0) to head (18c6b95).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2543      +/-   ##
==========================================
- Coverage   88.34%   88.34%   -0.01%     
==========================================
  Files         613      613              
  Lines      222675   222734      +59     
==========================================
+ Hits       196731   196772      +41     
- Misses      25420    25438      +18     
  Partials      524      524

Components	Coverage Δ
otap-dataflow	`90.23% <98.05%> (-0.01%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.74% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.45% <ø> (ø)`
quiver	`91.92% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cijothomas · 2026-04-06T14:10:48Z

I was thinking of using the semantic_conventions itself, and generate a bunch of them at startup and then cycle through it - But the existing ones can only produce logs of size ~300 bytes, and we are targeting ~1KB, so we anyway have to put fillers in. This is first step in improve fake-gen, next is to have the perf tests switch to this.

…e buffer trace context issue

rust/otap-dataflow/crates/otap/tests/durable_buffer_processor_tests.rs

rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs

lalitb

Nice work! Few minor nits but nothing blocking — LGTM.

cijothomas requested a review from a team as a code owner April 5, 2026 17:51

github-project-automation bot added this to OTel-Arrow Apr 5, 2026

github-actions bot added the rust Pull requests that update Rust code label Apr 5, 2026

cijothomas force-pushed the cijothomas/fakegen1 branch 4 times, most recently from 106c918 to 7688bd1 Compare April 6, 2026 06:06

Improve fake generator to produce more realistic workloads

152d0c1

cijothomas force-pushed the cijothomas/fakegen1 branch from 7688bd1 to 152d0c1 Compare April 6, 2026 15:28

Add use_trace_context config option and ignored repro test for durabl…

7c2ffaf

…e buffer trace context issue

cijothomas commented Apr 6, 2026

View reviewed changes

rust/otap-dataflow/crates/otap/tests/durable_buffer_processor_tests.rs Show resolved Hide resolved

albertlockett approved these changes Apr 6, 2026

View reviewed changes

lalitb reviewed Apr 6, 2026

View reviewed changes

rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs Show resolved Hide resolved

lalitb reviewed Apr 6, 2026

View reviewed changes

rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs Show resolved Hide resolved

lalitb reviewed Apr 6, 2026

View reviewed changes

rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs Outdated Show resolved Hide resolved

lalitb approved these changes Apr 6, 2026

View reviewed changes

Fix compression test comments to match actual behavior

18c6b95

jmacd added this pull request to the merge queue Apr 6, 2026

Merged via the queue into open-telemetry:main with commit 092500c Apr 6, 2026
67 of 69 checks passed

github-project-automation bot moved this to Done in OTel-Arrow Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fake_data_generator): Add per-record variance to static log generation#2543

feat(fake_data_generator): Add per-record variance to static log generation#2543
jmacd merged 3 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/fakegen1

cijothomas commented Apr 5, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

cijothomas commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lalitb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cijothomas commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Impact

Uh oh!

codecov bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cijothomas commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lalitb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cijothomas commented Apr 5, 2026 •

edited

Loading

codecov bot commented Apr 5, 2026 •

edited

Loading