Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offset command line option to store-histogram #3296

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dmakarov
Copy link

@dmakarov dmakarov commented Oct 24, 2024

Problem

Boundary of ancient slots is hard-coded and not configurable.

Recent mnb changes make the offset 100k by default. Would be great if histogram allowed specifying the boundary between ancient and non-ancient.

Summary of Changes

Add a command line option to pass a value of offset from which the epoch boundary should be computed.

@dmakarov
Copy link
Author

Not sure about bin widths -- do they have to be adjusted too?

@jeffwashington
Copy link

Not sure about bin widths -- do they have to be adjusted too?

I don't care about bin widths. Bins were just trying to break them up into some smaller groups.

eprintln!("storages: {}", info.len());
eprintln!("lowest slot: {min}");
eprintln!("highest slot: {max_inclusive}");
eprintln!("slot range: {}", max_inclusive - min + 1);
eprintln!(
"outside of epoch: {}",
info.iter()
.filter(|x| x.0 < max_inclusive - 432_000)
.filter(|x| x.0 < max_inclusive - outside_slot)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is no longer an "epoch".
Maybe change it to something like "ancient boundary"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Copy link

@HaoranYi HaoranYi Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this print says "how many slots are there beyond ancient boundary"? How about this?

Suggested change
.filter(|x| x.0 < max_inclusive - outside_slot)
eprintln!(
"ancient boundary: {}, num_of_slots_beyond_ancient_bondary: {}", outside_slot,
info.iter()
.filter(|x| x.0 < max_inclusive - outside_slot).count()
):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

HaoranYi
HaoranYi previously approved these changes Oct 24, 2024
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jeffwashington
Copy link

jeffwashington commented Oct 24, 2024

this is getting there.

here is the output of a master 100k offset machine:

======== Normal Ancient Histogram
storages: 412271
lowest slot: 287203785
highest slot: 297478771
slot range: 10274987
ancient boundary: 332000
number of slots beyond ancient bondary: 96886
overall stats
size 309428191656
count 412271
min size 136
max size 74019928
avg size 750545
bin width 432000
slot age count      min size   max size   sum size   avg size   ,       slot min,          count,       sum size, % size,       min size,       max size,       avg size
       0 ********** ***        ********** *          *          ,      297478771,         335074,    15929251168,     5%,            136,       67691536,          47539
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  432000 ***        ********** ********** ********** ********** ,      297046771,          77197,   293498940488,    94%,            608,       74019928,        3801947
========

======== Normal Ancient 10K Histogram
storages: 412271
lowest slot: 287203785
highest slot: 297478771
slot range: 10274987
ancient boundary: 332000
number of slots beyond ancient bondary: 96886
overall stats
size 309428191656
count 412271
min size 136
max size 74019928
avg size 750545
bin width 432000
slot age count      min size   max size   sum size   avg size   ,       slot min,          count,       sum size, % size,       min size,       max size,       avg size
       0 ********** *          ********** *          *          ,      297478771,         335074,    15929251168,     5%,            136,       67691536,          47539
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  432000 *          ********** *          *          *          ,      297046771,            758,      116690624,     0%,           4744,        5472304,         153945
  442000 ***        **         ********** ********** ********** ,      297036771,          76439,   293382249864,    94%,            608,       74019928,        3838122
========

@dmakarov jw13 has this pulled down from your branch and built.

the goal of this summary middle section is non-ancient, ancient all grouped in 1 bin each. Right now, that offset (at least printed) is 432k.

The last one (10k) can be deleted completely, I think.

@jeffwashington
Copy link

jeffwashington commented Oct 24, 2024

@dmakarov also, I said I didn't care about bin size, but it should end up with a bin boundary at the ancient/non-ancient boundary:

  324000 ********   *          *          *          *          ,      297154771,          15342,      741092680,     0%,            136,        5887104,          48304
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  345600 **         *          *          *          *          ,      297133171,           2643,      331658096,     0%,           1816,        5767976,         125485

In my case, with 100k offset, this should be 332000 as a boundary number here that is 'ancient'

@dmakarov
Copy link
Author

@dmakarov also, I said I didn't care about bin size, but it should end up with a bin boundary at the ancient/non-ancient boundary:

  324000 ********   *          *          *          *          ,      297154771,          15342,      741092680,     0%,            136,        5887104,          48304
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  345600 **         *          *          *          *          ,      297133171,           2643,      331658096,     0%,           1816,        5767976,         125485

In my case, with 100k offset, this should be 332000 as a boundary number here that is 'ancient'

I added logic to split a bin on the ancient boundary.

@dmakarov
Copy link
Author

The last one (10k) can be deleted completely, I think.

maybe should be left for another pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants