feat(fake_data_generator): Add per-record variance to static log generation#2543
Merged
jmacd merged 3 commits intoopen-telemetry:mainfrom Apr 6, 2026
Merged
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2543 +/- ##
==========================================
- Coverage 88.34% 88.34% -0.01%
==========================================
Files 613 613
Lines 222675 222734 +59
==========================================
+ Hits 196731 196772 +41
- Misses 25420 25438 +18
Partials 524 524
🚀 New features to boost your workflow:
|
106c918 to
7688bd1
Compare
Member
Author
|
I was thinking of using the semantic_conventions itself, and generate a bunch of them at startup and then cycle through it - But the existing ones can only produce logs of size ~300 bytes, and we are targeting ~1KB, so we anyway have to put fillers in. This is first step in improve fake-gen, next is to have the perf tests switch to this. |
7688bd1 to
152d0c1
Compare
…e buffer trace context issue
cijothomas
commented
Apr 6, 2026
albertlockett
approved these changes
Apr 6, 2026
lalitb
reviewed
Apr 6, 2026
rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs
Show resolved
Hide resolved
lalitb
reviewed
Apr 6, 2026
rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs
Show resolved
Hide resolved
lalitb
reviewed
Apr 6, 2026
rust/otap-dataflow/crates/core-nodes/src/receivers/fake_data_generator/static_signal.rs
Outdated
Show resolved
Hide resolved
lalitb
approved these changes
Apr 6, 2026
Member
lalitb
left a comment
There was a problem hiding this comment.
Nice work! Few minor nits but nothing blocking — LGTM.
Merged
via the queue into
open-telemetry:main
with commit Apr 6, 2026
092500c
67 of 69 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The static log generator previously produced batches where all records were
nearly identical — same body, same attribute values, same severity, no trace
context. This made payloads unrealistically compressible by Arrow/columnar
encoders (~57:1 zstd ratio), giving misleading throughput numbers in load tests.
Changes
records. When
log_body_size_bytesis configured, templates are repeated tofill the target size. When 0, body is omitted entirely.
thread.id,http.route,db.query.text). Overflow names useattr_N.hostnames, SQL queries, error messages, user agents — cycled per record.
Special-cased types for
thread.id(int),http.response.status_code(weighted distribution), etc.
log-to-trace correlation behavior.
test_compression_ratio_is_realisticassertszstd ratio stays in 3:1–50:1 range (currently ~19:1), guarding against
regression to the old all-identical regime.
log_body_size_bytesdoc to reflect pool behavior.Impact
Both
freshandpre_generatedstrategies benefit since all variance iscomputed at generation time.
PreGeneratedretains zero runtime allocationcost — only the pre-built batch content is more realistic.
Compression ratio improved from ~57:1 (before) to ~19:1 (after) with
log_body_size_bytes: 1024, num_log_attributes: 6.