-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ytdata: check for all_data
in particle selection
#4579
ytdata: check for all_data
in particle selection
#4579
Conversation
I think this should close #4565 |
wasn't sure if any new tests were needed here -- locally I did do some rough checks that the arrays were the same (same size, mean, first 10 elements, min/max) on main and this branch, but happy to add new tests if anyone has an idea on what might be useful. |
all_data
in particle selectionall_data
in particle selection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks sound to me, and the performance gain is impressive !
Co-authored-by: Clément Robert <[email protected]>
so with some more context I don't actually think this will close #4565 as I think something else is going on there (possible duplicating of particles?). but this PR will still speed things up for folks loading back in saved data so it's worth keeping. |
I don't feel like a new test is necessary, but others might feel otherwise. Leaving this open for a couple days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a great improvement. Thanks for doing this. I note that this structure of _read_particle_fields
exists in several other frontends. I'm not asking for them to be fixed similarly here, but perhaps we should open an issue to keep track?
yes, happy to open an issue for that! I'll take a quick look and compile a list to start with. |
For the record: please feel free to self-merge whenever. I assume you'd want to open the tracking issue first but I wouldn't mind otherwise :) |
Ok, will do! And ya, I'll open the issue first. |
OK, issue opened (#4593). Will merge this now! |
The main change in this PR is to avoid unnecessarily building a mask for particle selection when the selector is
all_data()
. This cuts down the load time significantly for particle counts in the range of 1e6+ (and probably lower, but I only tested above 1e6)testing this PR
To test out this PR, I built some test particle datasets with 1e6 to 1e7 particles:
and then reloaded each
and timed the following:
here's a plot
the dashed lines are +/-2*sigma on this PR's branch, didn't calculate for main cause it was too slow... so quite a significant speedup.
other changes
I also slightly refactored the particle selection to override
_read_particle_data_file
rather than_read_particle_fields
to cut out thechunk
todata_file
code duplication.