Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks using lindi #53

Merged
merged 26 commits into from
Apr 30, 2024
Merged

Add benchmarks using lindi #53

merged 26 commits into from
Apr 30, 2024

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented Apr 26, 2024

  • Update core/_streaming.py to add functions for reading HDF5 NWB file with Lindi and for generating a local lindi JSON index file (i.e., a LINDI reference store)
  • Add timing benchmarks for lindi in time_remote_file_reading.py. Here we need to cover the cases to:
    - Read a remote file from S3 where the LINIDI JSON file is remote in S3
    - Create a new Lindi reference store (JSON index) locally for a remote file
    - Read a remote file from S3 where the LINDI JSON file is local
    - Read a remote file from S3 where the LINDI JSON file does not exist yet (i.e., we need to both create the JSON and then read the file)
  • Add timing benchmarks for lindi in time_remote_slicing.py. Here we only need to consider the case where the remote LINDI JSON reference filesystem already exists. The case where we need to generate the JSON is covered in the time_remote_file_reading.py tests.
  • Add network tracking benchmarks for lindi in network_tracking_remote_slicing.py . Same test as in time_remote_slicing.py but with network tracking.
  • Add network tracking benchmarks for lindi in network_tracking_remote_file_reading.py . These mirror the tests in time_remote_file_reading.py, but some of the cases are removed.
  • Update to new BaseBenchmark class and parametrize scheme

@oruebel oruebel marked this pull request as ready for review April 27, 2024 06:08
@oruebel
Copy link
Contributor Author

oruebel commented Apr 27, 2024

@CodyCBakerPhD @sinha-r the outline of all the LINDI test cases is here so please take a look. I added parameter configurations so we can test that these work, but we still need to configure these for a proper test benchmark.

Note, I have not had the chance to run these test cases yet, so there may still be some bugs to fix. If you have a chance to give these a spin, then please let me know if you come across any errors (or better yet, just push fixes on the PR 😉 )

@CodyCBakerPhD
Copy link
Collaborator

I don't know LINDI well enough yet, so I trust you on this - just let me know when running as expected

@oruebel
Copy link
Contributor Author

oruebel commented Apr 29, 2024

I fixed a few small bugs and the LINDI tests seem to run now. I'm investigating a last issue with the benchmark for slicing with lindi, i.e., LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark. As far as I can tell the test runs fine, but it is still being reported as failed.

@oruebel oruebel requested a review from rly April 29, 2024 23:22
@oruebel
Copy link
Contributor Author

oruebel commented Apr 29, 2024

I did a full run of the benchmark and overall I think everything looks good. I'm still seeing strange behavior when running LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark but have not been able to figure out what is going wrong. It looks like the test case itself is running fine and I don't get a traceback or timeout error, so I'm not sure where it fails right now.

@CodyCBakerPhD @sinha-r @rly could one of you run nwb_benchmarks run --bench time_remote_slicing.LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark on this PR to see if it is running for you?

@oruebel
Copy link
Contributor Author

oruebel commented Apr 30, 2024

Thanks @rly for taking a look.

@CodyCBakerPhD @sinha-r aside from the failure for time_remote_slicing.LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark this is good to merge.

@rly
Copy link
Contributor

rly commented Apr 30, 2024

A few suggestions to clean things up. Otherwise looks good.

@CodyCBakerPhD
Copy link
Collaborator

I fixed a few things when I ran it on my end. Added explicit skips to network benchmarks too

@oruebel Please delete your .asv folder in your local copy of nwb_benchmarks and try again

If you get that same error again please share your intermediate results file (in that .asv folder, should be easy to find) so I can inspect it for abnormalities

@oruebel
Copy link
Contributor Author

oruebel commented Apr 30, 2024

@CodyCBakerPhD I deleted my .asv folder and reran the test as you suggested, but I'm still seeing the same issue with the test.

 nwb_benchmarks run --bench time_remote_slicing.LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[ 0.00%] · For nwb_benchmarks commit ef667c95 <main>:
[ 0.00%] ·· Building for existing-py_Users_oruebel_miniforge3_envs_nwb_benchmarks_bin_python
[ 0.00%] ·· Benchmarking existing-py_Users_oruebel_miniforge3_envs_nwb_benchmarks_bin_python
[100.00%] ··· ...ileSystemContinuousSliceBenchmark.time_slice             failed
[100.00%] ··· ============================================================================================================= =============== ============================================ ========
                                                                  s3_url                                                      object_name                   slice_range                          
              ------------------------------------------------------------------------------------------------------------- --------------- -------------------------------------------- --------
               https://kerchunk.neurosift.org/dandi/dandisets/000939/assets/11f512ba-5bcf-4230-a8cb-dc8d36db38cb/zarr.json   accelerometer   (slice(0, 30000, None), slice(0, 3, None))   failed 
              ============================================================================================================= =============== ============================================ ========
              

Traceback (most recent call last):
  File "/Users/oruebel/miniforge3/envs/nwb_benchmarks/bin/nwb_benchmarks", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/oruebel/Devel/nwb/nwb_benchmarks/src/nwb_benchmarks/command_line_interface.py", line 83, in main
    reduce_results(
  File "/Users/oruebel/Devel/nwb/nwb_benchmarks/src/nwb_benchmarks/setup/_reduce_results.py", line 72, in reduce_results
    raise ValueError(
ValueError: The results parser failed to find any succesful results! Please raise an issue and share your intermediate results file.

@CodyCBakerPhD
Copy link
Collaborator

@CodyCBakerPhD I deleted my .asv folder and reran the test as you suggested, but I'm still seeing the same issue with the test.

Oh, sorry - I did not read your original post closely enough to see that you were only running a single benchmark

In this case the behavior is expected, no? The one test failed so there are no sucessful results to parse. Same behavior on my system

Comment on lines 173 to 174
def track_network_activity_during_slice(self, s3_url: str, object_name: str, slice_range: Tuple[slice]):
"""Track network activity for slicing into a Zarr dataset"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just caught this one - I assume this was supposed to be timing not tracking given the file it's in

@CodyCBakerPhD
Copy link
Collaborator

OK I can confirm, quite confusingly, that running the relevant pieces of code in a normal ipython console seems to work just fine

However, and sorry if I did not document this anywhere (feel free to add it somewhere), but for things like this it can help to use the --debug flag to expose the stderr and other spam from ASV

When I do that I get

[50.00%] ··· Running 'C:\Users\theac\anaconda3\envs\nwb_benchmarks\python.exe C:\Users\theac\anaconda3\envs\nwb_benchmarks\Lib\site-packages\asv\benchmark.py run D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\benchmarks time_remote_slicing.LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark.time_slice-0 {} None C:\Users\theac\AppData\Local\Temp\tmpwhqxcyh3'.
[100.00%] ··· time_remote_slicing.LindiFileReadRemoteReferenceFileSystemContinuousSliceBenchmark.time_slice                                                                                                failed
[100.00%] ··· ============================================================================================================= =============== ============================================ ========
                                                                  s3_url                                                      object_name                   slice_range
              ------------------------------------------------------------------------------------------------------------- --------------- -------------------------------------------- --------
               https://kerchunk.neurosift.org/dandi/dandisets/000939/assets/11f512ba-5bcf-4230-a8cb-dc8d36db38cb/zarr.json   accelerometer   (slice(0, 30000, None), slice(0, 3, None))   failed
              ============================================================================================================= =============== ============================================ ========


[100.00%] ···· asv: benchmark failed (exit status 3221225477)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\theac\anaconda3\envs\nwb_benchmarks\Scripts\nwb_benchmarks.exe\__main__.py", line 7, in <module>
  File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\command_line_interface.py", line 77, in main
    assert len(globbed_json_file_paths) == 1, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: A single intermediate result was not found, likely as a result of a previous failure to reduce the results! Please manually remove these.

And that 'exit status' bit is a direct error code (might be windows specific) from the isolated subprocess that ran the timing test

Error code 3221225477 traces back to some kind of access violation in memory

Since this style of benchmarks works fine for other cases, I'm guessing this is an edge case with LINDI when running through subprocesses or something analogous. I'd suggest the LINDI team attempt to reproduce in an even simpler manner outside of ASV (even a direct timeit API might reproduce it) - this could also be relevant to multiprocessing since the methodology is similar

@CodyCBakerPhD
Copy link
Collaborator

Other than finding and fixing the access violation on the LINDI side there are two options here

I can try to hack ASV to suppress this error code when returning results for a particular benchmark; they seem to do this for a couple of specific known error codes - but this diverges us from the codebase and I have no idea how tricky it would end up being and might make other problems cascade

Or we can change that test to track_time and use our own timeit or other methodology to bypass whatever the issue might be in an effort to get 'some' timing result out of it - this could however lead to unfair comparisons with other timing benchmarks (unless we adjusted those similarly as well) since they were timed using a slightly different method

@oruebel
Copy link
Contributor Author

oruebel commented Apr 30, 2024

Or we can change that test to track_time and use our own timeit or other methodology to bypass whatever the issue might

I suspect that won't work. The exact same test also exists in network_tracking_remote_slicing where we use our own track_ function and that one fails with the same issue.

@oruebel
Copy link
Contributor Author

oruebel commented Apr 30, 2024

Other than finding and fixing the access violation on the LINDI side there are two options here

I can try to hack ASV to suppress this error code when returning results for a particular benchmark;

I filed issue #57 Hopefully the LINDI team can help with identifying the issue. I would suggest we hold of with making modifications in ASV until we have tracked down the source of the issue a bit more. @CodyCBakerPhD is it possible to attach a PDB debugger when running asv so that we could step through the procees?

@oruebel
Copy link
Contributor Author

oruebel commented Apr 30, 2024

I'll merge this PR since we are tracking the remaining issue in #57

@CodyCBakerPhD CodyCBakerPhD merged commit fb464f1 into main Apr 30, 2024
2 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the add/lindi branch April 30, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants