Fix BaseBucketApi.add incrementing _total per call instead of per value#286
Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Open
Fix BaseBucketApi.add incrementing _total per call instead of per value#286Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Conversation
BaseBucketApi.total's docstring states it returns 'the total number of values added to the tracker', but BaseBucketApi.add() increments _total by 1 per (value, count) iteration regardless of count. So tracker.add(values=[v0, v1], counts=[5, 3]) sets total=2 instead of 8. This matters when reconstructing a tracker from a SummaryTuple: each bin in the summary represents 'count' original observations, and the total should reflect that multiplicity (this is exactly why add_summary() saves prev_count and restores total/sum afterwards — otherwise add() leaves them off). Increment _total by count to match _sum's per-value accumulation and the public docstring.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
`BaseBucketApi.total`'s docstring is "Return the total number of values added to the tracker", but `BaseBucketApi.add` does:
```python
for value, count in zip(values, counts):
self._add(value, count)
self._total += 1
self._sum += value * count
```
So `tracker.add(values=[v0, v1], counts=[5, 3])` ends up with `_total == 2` instead of `8`, while `_sum` is correctly weighted by `count`.
Root cause
The increment is per loop iteration rather than per observation. This is exactly the discrepancy that `add_summary` works around:
```python
def add_summary(self, summary: SummaryTuple):
prev_count, prev_sum = self._total, self._sum
self.add(summary.bins, summary.counts)
# override the total and sum with the previous values, thats' because otherwise they are approximate
self._total = prev_count + summary.total
self._sum = prev_sum + summary.sum
```
Without the override, the total after rebuilding a tracker via `SummarySpec.to_tracker(...)` (which calls `tracker.add(values=self.bins, counts=self.counts)`) reflects only the number of bins, not the original observation count.
Fix
Increment `_total` by `count`, matching `_sum`'s per-value accumulation and the documented contract. The `add_summary` override remains correct (it now agrees with what `add` produces).