chore: add tracing instrumentation to mem_wal module#6430
chore: add tracing instrumentation to mem_wal module#6430hamersaw wants to merge 1 commit intolance-format:mainfrom
Conversation
Add #[instrument] spans to key functions across the mem_wal subsystem (write, flush, scan, index, manifest) for improved observability. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
626d2f1 to
ee8353c
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
westonpace
left a comment
There was a problem hiding this comment.
This is great.
Generally I try and always use skip_all since skip(...) has dinged us in the past on more than one occasion. The problem is that it is very easy to add a new argument and forget to add it to the skip list and then you never notice until some very difficult to debug perf issue pops up.
That being said, with code being generated by LLMs now, it might not be so easy to "forget to add it to the skip list". However, given that you're already specifying fields manually, I wonder if it would be easy enough to just specify all fields manually (allow list style) and use skip_all everywhere?
| #[instrument(level = "info", skip(self), fields(has_filter = self.filter.is_some(), limit = self.limit, num_shards = self.shard_snapshots.len()))] | ||
| pub async fn try_into_stream(&self) -> Result<SendableRecordBatchStream> { |
There was a problem hiding this comment.
It might be nice to generate something similar to the plan_run events that we have on the normal scanner (
lance/rust/lance-datafusion/src/exec.rs
Line 545 in 6112a34
| @@ -1102,6 +1104,7 @@ impl ShardWriter { | |||
| /// Fencing is detected lazily during WAL flush via atomic writes. | |||
| /// If another writer has taken over, the WAL flush will fail with | |||
| /// `AlreadyExists`, indicating this writer has been fenced. | |||
| #[instrument(level = "info", skip(self, batches), fields(batch_count = batches.len(), shard_id = %self.config.shard_id))] | |||
| pub async fn put(&self, batches: Vec<RecordBatch>) -> Result<WriteResult> { | |||
| if batches.is_empty() { | |||
| return Err(Error::invalid_input("Cannot write empty batch list")); | |||
| @@ -1257,6 +1260,7 @@ impl ShardWriter { | |||
| /// Close the writer gracefully. | |||
| /// | |||
| /// Flushes pending data and shuts down background tasks. | |||
| #[instrument(level = "info", skip(self), fields(shard_id = %self.config.shard_id, epoch = self.epoch))] | |||
| pub async fn close(self) -> Result<()> { | |||
There was a problem hiding this comment.
I wonder if we want to give these traces names like sw_open, sw_put, sw_close? A lot of tracing tools will just give the span name unless you drill down and open/put/close are going to feel like regularly filesystem calls.
Summary
#[instrument]attributes from thetracingcrate to key functions across themem_walmoduleRegionWriter::open,put,close), flush path (MemTableFlusher::flush,flush_with_indexes), WAL operations, manifest store, memtable inserts, scanner/planner, point lookups, and vector searchinfofor high-level operations,debugfor internals) with relevant fields (region_id, epoch, row counts, batch counts)Test plan
cargo checkpasses — no functional changes, only attribute additionsmem_waltests continue to passRUST_LOG=debugshowing instrumented spans🤖 Generated with Claude Code