-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of VLSV format in post-processing #19
Comments
It would be nice to have some benchmarks. For example how much memory is required to fetch all data in one cell? How much of the file has to be read to fetch all data from one cell? How does the required CPU time to fetch M variables from N cells scale? In files written by dccrg cells are not guaranteed to be in any particular order but writing a post-processing tool to sort the cells (and their data for faster sequential access) based on id to make them faster to find seems almost trivial. |
In general vslv writes data out so that the data from each process is in order, so data from rank 0 comes first then rank 1 and so on. With dynamic load balancing the data is not in any particular order when looking at the ID's of individual cells. This means that to read in the the data from a particular cell one first needs to read in all cellids so that the location can be found. The overhead when reading data from a single point is thus very large. For example: to read in rho from one particular point in a Vlasiator simulation with 1000 files, each with 4000 x 2000 cells, means that one needs to read in 64 GB of data, while the actual rho data is only 8 kB in size. There are in general two different solutions:
I would probably start by testing what the performance penalty would be when using a custom fileview to write data in order. |
@rjarvinen any chance you could toss me a sample VLSV file via Dropbox, for example, that is too slow to analyze? Also check out the pull request.. |
@galfthan Yup, bounding box per domain that limits the cell IDs would indeed make things faster, I have a much bigger update coming for vlsv where I might implement this |
looking at the ID's of individual cells. This means that to read in the
the data from a particular cell one first needs to read in all cellids
so that the location can be found. The overhead when reading data from a
single point is thus very large. For example: to read in rho from one
particular point in a Vlasiator simulation with 1000 files, each with
4000 x 2000 cells, means that one needs to read in 64 GB of data, while
the actual rho data is only 8 kB in size.
Here it would save a lot if cell ids were in order (regardless of num
procs) so a cell can be found in log(N) reads, and also without AMR cell
ids would be in the same spot in all files. Even with AMR the spot in
previous file could serve as a hint on where to start searching for the
same cell.
1) Sort cells while writing or in post-processing. I think it would be
best to do it while writing to avoid an annoying post processing step
that is potentially very slow and requires buffer space. The all-to-all
like communication step, or a complex fileview would not be free either.
How would writing cell data in cell id order work in parallel? Seems
like all-to-all would still be involved at least to find out who has
which cells.
A converter for dccrg-based file formats that sorts cell ids would use
as much memory as would be needed to sort N cell ids. Also if only cell
ids have to sorted, and not their data to e.g. make sequential access
faster, then the file can be processed in-place instead of writing a new
one. Also sorting data of all cells in-place would require of the order
as much memory as the largest amount of data in any one cell. Sorting
all data in a file would probably be easiest by writing a new one.
|
@rjarvinen Actually just giving the mesh dimensions (xcells,ycells,zcells), number of domains (roughly), and more information of what kind of data analysis you're doing might be sufficient for me to check what I can do. |
For AMR yes, for regular meshes each process can calculate the correct offsets in output file, assuming there are no holes in the mesh. I'm not a big fan of the idea of sorting data in VLSV files, however it would be possible to add indexing data as a post-processing step to speed up random accesses. |
I will prepare soon a benchmark |
Here's a quick VLSV/Visit plugin performance test with a nominal Venus run. Compared are a VTK file from the HYB simulation and a VLSV file from a Corsair/RHybrid run. Both files have the same amout of scalar and vector variables and the same grid size of 120x160x160 (+-1 cell). VTK uses STRUCTURED_POINTS grid structure. Data files are available here (file sizes are: VLSV 1.3G and VTK 610M): https://dl.dropboxusercontent.com/u/8446786/vlsv_perf_test_data_files.zip Comparison uses attached Visit python script and a shell script to run the comparison (provided that Visit is installed). The script opens the VLSV/VTK file and creates plots of 6 different scalar variables and exit. VLSV takes more than twice the time VTK does to complete the script. I don't know if the performance difference comes from the VLSV format itself, the grid type used in the VLSV file or the plugin code. I didn't test the pull request with new optimizations for UCD multimesh reader yet and don't know if it affects this test.
|
@rjarvinen One additional question: how many domains (=MPI procs) do you have in a nominal Venus run? Please test the version in pull request as it may potentially give a major performance boost in VisIt. |
720 PEs using 60 nodes on voima. Thanks, I'll check that patch! |
vtk files are still faster as compared to the vlsv in pull request, the speed difference mainly comes from the mesh formats. Structured grid is much easier to generate than a mesh where the cells appear in random order. I'll take a look if I can speed up things more, but in the meanwhile you can also do parallel visualization in Voima. I'm sure @ykempf can help you out if you weren't using voima for remote visualization already. |
The performance difference seems to come from creating individual plots. OpenDatabase(db) command runs faster on VLSV (3 seconds) than on VTK (11 seconds). Plotting only one parameter takes roughly the same amount of time for both formats (30 seconds). Additional plots increase the running time almost linearly on VLSV but not considerably on VTK. Maybe VTK has buffering or something, which makes it faster to use once the file is opened. |
After checking up memory usage with the resource monitor, it indeed does seem like that VTK plugin caches the whole file in memory, and that's why changing variables is faster. I suppose running an expression in VisIt for VLSV file may be quite slow if it reads in variable data multiple times (although reading variables shouldn't be that slow) and optimizing this a bit might be a good idea. I'm not sure what's the best way to do it for multi-domain data, though, since there are no guarantees that same MPI processes read the same domains every time. |
Study if the performance of VLSV can be improved for post-processing. Currently ~5-10 GB VLSV files are not feasible to be analyzed on laptop computers. This may not be due to a RAM memory limit but could be more an issue with the performance of the VLSV reader and the VLSV Visit plugin. For example, could stored cells be sorted for faster access by a separate post-processing tool?
The text was updated successfully, but these errors were encountered: