-
This issue arose in #572 I have difficulties with a recursive function using def mask_startend(arr, start, end):
builder = ak.ArrayBuilder()
_mask_startend(arr, start, end, builder)
return builder.snapshot()
@nb.njit # raises
def _mask_startend(arr, start, end, builder):
if arr.ndim == 2: # recursion stop
for el in arr:
if len(el) > 0 and el[0] == start and el[-1] == end:
builder.boolean(True)
else:
builder.boolean(False)
return
for subarray in arr:
builder.begin_list()
_mask_startend(subarray, start, end, builder)
builder.end_list() This gives without
But gives an error when JITted. It seems that the Any hints or ideas are appreciated!
|
Beta Was this translation helpful? Give feedback.
Replies: 12 comments 12 replies
-
Without knowing your specific problem, I put a note about recursion in the other issue: #572 (comment) I was actually in a Numba developer's meeting, but we ran overtime and I didn't get to ask about recursion. Also, because I hadn't investigated it before, I checked to see if there's any documentation on it. There is: https://numba.pydata.org/numba-doc/latest/proposals/typing_recursion.html The last note on that page about "limitations" seems to be about what I found in the code (see above comment) about the call stack. I'm going to see if I can cast your problem in the form that has been implemented. Failing that, I'll close this issue because it's actually a Numba issue. |
Beta Was this translation helpful? Give feedback.
-
Yes thanks! Saw it already Btw. I also found the NBEP 6, but could not figure out how to use type annotation with complex structures like these awkward arrays. If you have any examples for that, that would be very helpful! I already worked with type annotated Numba-JIT functions before, but only with primitive types and NumPy arrays. |
Beta Was this translation helpful? Give feedback.
-
The first quick thing is that if el[0] == start would be an array comparison, which would return an array of booleans (is the first element of So as a first step, you either want something different from el[0] == start and el[-1] == end or perhaps you instead wanted this case to be if arr.ndim == 1: # recursion stop |
Beta Was this translation helpful? Give feedback.
-
Ah, that's actually a doubly nested So if I understood correctly, the problem is that if a higher dimensional (
|
Beta Was this translation helpful? Give feedback.
-
I should still think about the Numba recursion problem, but you know, this can be solved entirely by Awkward Array primitives, too. Starting with an example of integer endpoints like >>> import awkward as ak
>>> example = ak.Array([[[0, 1, 2], []], [[3, 4]], [], [[5], [6, 7, 8, 9]]])
>>> example
<Array [[[0, 1, 2], []], ... 5], [6, 7, 8, 9]]] type='4 * var * var * int64'> Some of these are empty, so part of the solution will involve avoiding accessing the first or last element of an empty array. >>> ak.num(example, axis=-1)
<Array [[3, 0], [2], [], [1, 4]] type='4 * var * int64'>
>>> nonempty = ak.num(example, axis=-1) > 0
>>> nonempty
<Array [[True, False], ... [True, True]] type='4 * var * bool'> If we used this as a cut, it would reduce the length of the inner lists, which would make the final result hard to apply to arrays of the original length. So instead of a standard boolean slice, we'll use a mask-slice: >>> example.mask[nonempty]
<Array [[[0, 1, 2], None], ... [6, 7, 8, 9]]] type='4 * var * option[var * int64]'> We want to pick out the first and last elements of each remaining list at the deepest level, whatever that is. As in NumPy, we can use ellipsis ( >>> example.mask[nonempty][..., 0]
<Array [[0, None], [3], [], [5, 6]] type='4 * var * ?int64'>
>>> example.mask[nonempty][..., -1]
<Array [[2, None], [4], [], [5, 9]] type='4 * var * ?int64'> For a given >>> start, stop = 6, 9
>>> example.mask[nonempty][..., 0] == start
<Array [[False, None], ... [False, True]] type='4 * var * ?bool'>
>>> example.mask[nonempty][..., -1] == stop
<Array [[False, None], ... [False, True]] type='4 * var * ?bool'> We need both of those to be true: >>> good = ((example.mask[nonempty][..., 0] == start) &
... (example.mask[nonempty][..., -1] == stop))
>>> good
<Array [[False, None], ... [False, True]] type='4 * var * ?bool'> We still have >>> example[good]
<Array [[None], [], [], [[6, 7, 8, 9]]] type='4 * var * option[var * int64]'> We don't want to pick out the empty arrays as >>> ak.fill_none(good, False)
<Array [[False, False], ... [False, True]] type='4 * var * bool'> Putting it all together, >>> def mask_startstop(arr, start, stop):
... nonempty = ak.num(arr, axis=-1) > 0
... good = ((arr.mask[nonempty][..., 0] == start) &
... (arr.mask[nonempty][..., -1] == stop))
... return ak.fill_none(good, False)
...
>>> mask_startstop(example, 0, 2)
<Array [[True, False], ... [False, False]] type='4 * var * bool'>
>>> mask_startstop(example, 5, 5)
<Array [[False, False], ... [True, False]] type='4 * var * bool'> But I'll think about the Numba thing, too, because I want to understand what Numba can and can't do, and why. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the nice example, that's also very helpful. I already had a somewhat similar solution but wanted to squeeze out the last bits of performance since we need to apply this to very large datasets and I wanted to avoid allocation as much as possible. I will review your workflow and compare it with the current implementation, which contains just hardcoded functions for a few dimensions. |
Beta Was this translation helpful? Give feedback.
-
Keep in mind that |
Beta Was this translation helpful? Give feedback.
-
Oh OK, I did not know that. I'll definitely compare the performance with your approach anyways and report back! |
Beta Was this translation helpful? Give feedback.
-
I struggled with Numba recursion for a while and decided that I don't understand their call-stack restriction. But anyway, you're not going to get the most speed from that because of I have an open issue (#516) to make a public interface that would let you write fully general functions like this. Perhaps I should prioritize that. Meanwhile, here's what it would look like, fully written out. The import awkward as ak
import numpy as np
import numba as nb
@nb.njit
def base_case(arr, start, stop):
out = np.empty(len(arr), np.bool_)
for i, el in enumerate(arr):
out[i] = len(el) > 0 and el[0] == start and el[-1] == stop
return out
def mask_startstop(arr, start, stop):
def recurse(layout):
if layout.purelist_depth == 2:
np_array = base_case(ak.Array(layout), start, stop)
return ak.layout.NumpyArray(np_array)
elif isinstance(layout, (
ak.layout.ListArray32,
ak.layout.ListArrayU32,
ak.layout.ListArray64,
)):
content = recurse(layout.content)
return type(layout)(layout.starts, layout.stops, content)
elif isinstance(layout, (
ak.layout.ListOffsetArray32,
ak.layout.ListOffsetArrayU32,
ak.layout.ListOffsetArray64,
)):
content = recurse(layout.content)
return type(layout)(layout.offsets, content)
elif isinstance(layout, ak.layout.RegularArray):
content = recurse(layout.content)
return ak.layout.RegularArray(content, layout.size)
else:
raise NotImplementedError(repr(arr))
layout = ak.to_layout(arr, allow_record=True, allow_other=False)
return ak.Array(recurse(layout)) What #516 would do is replace the This will be the fastest implementation because
It will generally be true that unwrapping/reuse + Numba for the fixed-type things that return NumPy arrays will be the fastest for this combination of reasons, so there isn't really a reason to get Numba recursion to work. |
Beta Was this translation helpful? Give feedback.
-
Here is a quick comparison, it seems that the Numba version ( >>> arr2d = f.events.tracks.rec_stages[0]
>>> arr3d = f.events.tracks.rec_stages
>>> arr2d
<Array [[1, 3, 5, 4], [1, 3, ... [1], [1], [1]] type='56 * var * int64'>
>>> arr3d
<Array [[[1, 3, 5, 4], [1, ... 1], [1], [1]]] type='10 * var * var * int64'>
>>> %timeit km3io.tools.mask(arr2d, startend=(1, 3))
155 µs ± 3.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit km3io.tools.mask(arr3d, startend=(1, 3))
197 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit km3io.tools.mask_alt(arr2d, 1, 3)
3.54 ms ± 65.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit km3io.tools.mask_alt(arr3d, 1, 3)
7.04 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) |
Beta Was this translation helpful? Give feedback.
-
Here is a quick comparison of def mask(arr, start, end):
builder = ak.ArrayBuilder()
# Numba has limited recursion support, so this is hardcoded
if arr.ndim == 2:
_mask2d(arr, builder, start, end)
else:
_mask3d(arr, builder, start, end)
return builder.snapshot()
@nb.njit
def _mask3d(arr, builder, start, end):
for subarray in arr:
builder.begin_list()
_mask2d(subarray, builder, start, end)
builder.end_list()
@nb.njit
def _mask2d(arr, builder, start, end):
for els in arr:
if len(els) > 0 and els[0] == start and els[-1] == end:
builder.boolean(True)
else:
builder.boolean(False) And the def mask_alt(arr, start, end):
nonempty = ak.num(arr, axis=-1) > 0
mask = ((arr.mask[nonempty][..., 0] == start) & (arr.mask[nonempty][..., -1] == end))
return ak.fill_none(mask, False) The benchmark is dead simple: from collections import defaultdict
import awkward as ak
import numpy as np
import uproot
import matplotlib.pyplot as plt
from skhep_testdata import data_path
# This is a large file with 143381 events, each containing a 128 element
# array which consists of up ~10 elements of `int64`
# An example file with 10 events is available via
# f = uproot.open(data_path("uproot-issue431b.root"))
f = uproot.open("datav5.40.jorcarec.aanet.00006614.root")
arr = f["E/Evt/trks/trks.rec_stages"].array() # takes 3-4 minutes and 8GB mem to load
ns = 2 ** np.arange(18)
benchmarks = defaultdict(list)
for n in ns:
print(f"{n}")
bm_numba = %timeit -o km3io.tools.mask(arr[0:n], startend=(1, 3))
bm_awkward = %timeit -o km3io.tools.mask_alt(arr[0:n], 1, 3)
benchmarks["numba"].append(bm_numba)
benchmarks["awkward"].append(bm_awkward)
fig, ax = plt.subplots()
for lib, benchmark in benchmarks.items():
ax.scatter(ns, [b.average for b in benchmark], label=lib)
ax.legend();
ax.set_xlabel("number of arrays")
ax.set_ylabel("average execution time / us")
ax.grid() The overhead seems to be negligible and Numba seems to be ~3 times faster: |
Beta Was this translation helpful? Give feedback.
-
I think we can close this, thanks Jim for the fruitful discussion and lots of input (including the final implementation prototype)! EDIT: ah, I just realised it's now a Q&A ;) |
Beta Was this translation helpful? Give feedback.
I struggled with Numba recursion for a while and decided that I don't understand their call-stack restriction. But anyway, you're not going to get the most speed from that because of
ArrayBuilder
. The absolutely fastest way to do this is to unwrap the array until you're down to just the part that the Numba-compiled function applies to, then wrap the result.I have an open issue (#516) to make a public interface that would let you write fully general functions like this. Perhaps I should prioritize that. Meanwhile, here's what it would look like, fully written out. The
ak.to_layout(x)
andx.layout
parts extract a layout from an array (the latter only works forak.Array
) and theak.Array
co…