Replies: 2 comments
-
This is being read as a group ( You can check for this with >>> import uproot4
>>> from skhep_testdata import data_path
>>> f = uproot4.open(data_path("uproot-issue431b.root"))
>>> isinstance(f["E/Evt/hits"].interpretation, uproot4.AsGrouped)
True and you can get the counts by supplying that interpretation: >>> f["E/Evt/hits"].array(uproot4.AsDtype(">i4"))
<Array [176, 125, 318, 157, ... 84, 255, 105] type='10 * int32'> That interpretation won't work in every AsGrouped case, but the only other case I've seen is when there's no data at all associated with the branch-with-subbranches. You can check for that with >>> f["E/Evt/hits"].num_baskets
1 or catch the failure of the As for counting after the fact, prefer ak.num over ak.count, since the latter is a full-fledged reducer. (The difference becomes observable when |
Beta Was this translation helpful? Give feedback.
-
Thanks Jim, the interpretation to integer is working perfectly! The error handling hints are also useful. |
Beta Was this translation helpful? Give feedback.
-
Help, I struggle again with the transitioning ;)
With
uproot3
, there was a weird interpretation (glitch) in one of our file formats, which I/we never fully understood, but the outcome was very useful. The data in"E/Evt/hits"
below is split up into subbranches which one can read nicely witharray
/lazyarray
, however, if you access the array of the main branch directly (E/Evt/hits
), you get the number of entries per event instead of a large jagged array of structs, since as you can see below, uproot3 recognises it as avector<int>
.We used this array to quickly identify e.g. empty or large events for further cuts since it loads extremely fast. The same behaviour is btw. observed in
E/Evt/mc_hits
, E/Evt/trks,
E/Evt/mc_trks` and in other branches of different other formats we use as well:So far so good... The "problem" is that
uproot4
is able to correctly figure out how to parseE/Evt/hits
as anawkward.Record
array (vector<Hit>
) and the only way I have figured out so far to get the lengths of each sub-array is to useak.num
orak.count
, both of them however require to load a bunch of data into memory which takes time and... well, memory ;) other than that, they count all sub-arrays. I can reduce the footprint by only retrieving any of the subbranches, e.g.ak.count(f["E/Evt/hits/hits.channel_id"].array(), axis=1)
but that does not feel good (and still takes much longer compared to the uproot3 glitch.It's btw. interesting that the interpretation is still showing
AsJagged(AsDtype('>i4'))
andint32_t[]
, so I am a bit confused why it works.The
f["E/Evt/hits"].num_entries
gives the overall number of events, but I have not found any other shortcut to the subbranch lengths.Do you have a hint how to get to this information?
Beta Was this translation helpful? Give feedback.
All reactions