Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a few issues of the FixedBucketsValTracker #73

Merged
merged 3 commits into from
Nov 22, 2023

Conversation

peterbjorgensen
Copy link
Contributor

  1. Make the default number of bins of the internal tracker smaller so it does not cause numerical issues and/or memory problems.
  2. Use floor() instead of int() (trunc) for rounding to have the same behaviour for positive and negative numbers.
  3. Add an extra bin in the summarization method such that the number of bins in the summary is always "number of values"+1. This is consistent with the numpy histogram convention.

Fixes #70

peterbjorgensen and others added 2 commits November 13, 2023 12:47
1. Make the default number of bins of the internal tracker smaller
   so it does not cause numerical issues and/or memory problems.
2. Use floor() instead of int() (trunc) for rounding to have the same
   behaviour for positive and negative numbers.
3. Add an extra bin in the summarization method such that the number
   of bins in the summary is always "number of values"+1.
   This is consistent with the numpy histogram convention.
@soldni
Copy link
Member

soldni commented Nov 14, 2023

tnx @peterbjorgensen !! will review it tomorrow.

@soldni soldni merged commit 00504ff into allenai:main Nov 22, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dolma stat crashes because number of bins overflows python integer
2 participants