Skip to content

Fix BaseBucketApi.add incrementing _total per call instead of per value#286

Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/bucket-tracker-total-counts-iterations
Open

Fix BaseBucketApi.add incrementing _total per call instead of per value#286
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/bucket-tracker-total-counts-iterations

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

`BaseBucketApi.total`'s docstring is "Return the total number of values added to the tracker", but `BaseBucketApi.add` does:

```python
for value, count in zip(values, counts):
self._add(value, count)
self._total += 1
self._sum += value * count
```

So `tracker.add(values=[v0, v1], counts=[5, 3])` ends up with `_total == 2` instead of `8`, while `_sum` is correctly weighted by `count`.

Root cause

The increment is per loop iteration rather than per observation. This is exactly the discrepancy that `add_summary` works around:

```python
def add_summary(self, summary: SummaryTuple):
prev_count, prev_sum = self._total, self._sum
self.add(summary.bins, summary.counts)
# override the total and sum with the previous values, thats' because otherwise they are approximate
self._total = prev_count + summary.total
self._sum = prev_sum + summary.sum
```

Without the override, the total after rebuilding a tracker via `SummarySpec.to_tracker(...)` (which calls `tracker.add(values=self.bins, counts=self.counts)`) reflects only the number of bins, not the original observation count.

Fix

Increment `_total` by `count`, matching `_sum`'s per-value accumulation and the documented contract. The `add_summary` override remains correct (it now agrees with what `add` produces).

BaseBucketApi.total's docstring states it returns 'the total number of
values added to the tracker', but BaseBucketApi.add() increments
_total by 1 per (value, count) iteration regardless of count. So
tracker.add(values=[v0, v1], counts=[5, 3]) sets total=2 instead of 8.

This matters when reconstructing a tracker from a SummaryTuple: each
bin in the summary represents 'count' original observations, and the
total should reflect that multiplicity (this is exactly why
add_summary() saves prev_count and restores total/sum afterwards —
otherwise add() leaves them off).

Increment _total by count to match _sum's per-value accumulation and
the public docstring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant