Parallel numba rime #186

sjperkins · 2020-05-07T16:22:37Z

Tests added / passed
```
$ py.test -v -s africanus
```
If the pep8 tests fail, the quickest way to correct
this is to run autopep8 and then flake8 and
pycodestyle to fix the remaining issues.
```
$ pip install -U autopep8 flake8 pycodestyle
$ autopep8 -r -i africanus
$ flake8 africanus
$ pycodestyle africanus
```
Fully documented, including HISTORY.rst for all changes
and one of the docs/*-api.rst files for new API

To build the docs locally:
```
pip install -r requirements.readthedocs.txt
cd docs
READTHEDOCS=True make html
```

sjperkins · 2020-05-08T09:32:25Z

@JSKenyon Could you take a quick look at this? I'd appreciate general feedback on the approach.

Briefly, the user can instruct codex-africanus to create a parallel implementation for numba RIME functions using the donfig configuration package.

{ 'rime.feed_rotation.parallel': True }

# or

{
    'rime.feed_rotation.parallel': {
        # number of threads
        'threads': 2,
        # prange axes
        'axes': ['source', 'row'],
    }
}

# or

from africanus.config import config
with config.set({"rime.feed_rotation.parallel": True}):
   from africanus.rime import feed_rotation
   ...

One can place the above config in a YAML file or even set options via the command line

$ AFRICANUS_RIME__FEED_ROTATION__PARALLEL="{'threads':2}" python script.py

One issue is brittleness around imports (3rd python approach above): The config import and set needs to occur before the feed_rotation import otherwise the module import code may not be given the appropriate config.

JSKenyon · 2020-05-08T10:01:08Z

That all looks really good/cool to me. Are you intending to nest pranges? Just wondering based on the ability to specify two axes in the above example.

One thing which might be useful is the ability to set this in a more global fashion if required. What I mean is that it might get tedious to specify this manually per term. Having some sensible default for the axes and the ability to just throw nthreads at the problem would be handy.

sjperkins · 2020-05-08T14:37:00Z

That all looks really good/cool to me. Are you intending to nest pranges? Just wondering based on the ability to specify two axes in the above example.

I thought I'd try to support it, although it may not be possible in all cases. I know that in order to collapse two loops in OpenMP, there can't be any code between them and the same may apply in the case of numba prange.

One thing which might be useful is the ability to set this in a more global fashion if required. What I mean is that it might get tedious to specify this manually per term. Having some sensible default for the axes and the ability to just throw nthreads at the problem would be handy.

Yeah, I guess a rime.parallel or even something more global might be appopriate. I don't want this to be in the public interface at first though.

Any thoughts on the use of get_num_threads and set_num_threads within the code block? On numba 0.49.0 disables caching in the parallel case so that means that number of threads won't be baked into the code on the first compile.

JSKenyon · 2020-05-11T07:01:09Z

I am not sure if I have understood it in sufficient detail, but I believe that by default numba.config.NUMBA_NUM_THREADS defaults to the number of available cores. Do you intend users to override this by setting the environment variable? I might just be misunderstanding how things get set using donfig. In principle I think what you have done seems sensible and I suspect that simply spinning up a huge pool of threads will work wonders.

…us into parallel-numba-rime

sjperkins · 2020-05-13T13:37:02Z

@JSKenyon. In general I've set up the RIME function generated_jit's as follows:

@generated_jit(nopython=True, nogil=True, cache=not parallel, parallel=parallel)
def fn(...):
   pass

There's something a bit off in the beam_cube_dde function. I've set nopython=False because if I set it to True I run into these recursioni errors:


africanus/rime/tests/test_fast_beams.py:77: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../../venv/afr/lib/python3.6/site-packages/numba/core/dispatcher.py:420: in _compile_for_args
    raise e
../../../../venv/afr/lib/python3.6/site-packages/numba/core/dispatcher.py:353: in _compile_for_args
    return self.compile(tuple(argtypes))
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler_lock.py:32: in _acquire_compile_lock
    return func(*args, **kwargs)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/dispatcher.py:794: in compile
    cres = self._compiler.compile(args, return_type)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/dispatcher.py:77: in compile
    status, retval = self._compile_cached(args, return_type)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/dispatcher.py:91: in _compile_cached
    retval = self._compile_core(args, return_type)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/dispatcher.py:109: in _compile_core
    pipeline_class=self.pipeline_class)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler.py:568: in compile_extra
    return pipeline.compile_extra(func)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler.py:339: in compile_extra
    return self._compile_bytecode()
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler.py:401: in _compile_bytecode
    return self._compile_core()
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler.py:381: in _compile_core
    raise e
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler.py:372: in _compile_core
    pm.run(self.state)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler_machinery.py:341: in run
    raise patched_exception
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler_machinery.py:332: in run
    self._runPass(idx, pass_inst, state)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler_lock.py:32: in _acquire_compile_lock
    return func(*args, **kwargs)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler_machinery.py:291: in _runPass
    mutated |= check(pss.run_pass, internal_state)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/compiler_machinery.py:264: in check
    mangled = func(compiler_state)
../../../../venv/afr/lib/python3.6/site-packages/numba/core/typed_passes.py:288: in run_pass
    parfor_pass.run()
../../../../venv/afr/lib/python3.6/site-packages/numba/parfors/parfor.py:2694: in run
    get_parfor_reductions(self.func_ir, p, p.params, self.calltypes)
../../../../venv/afr/lib/python3.6/site-packages/numba/parfors/parfor.py:3318: in get_parfor_reductions
    reduce_nodes = get_reduce_nodes(param, param_nodes[param], func_ir)
../../../../venv/afr/lib/python3.6/site-packages/numba/parfors/parfor.py:3399: in get_reduce_nodes
    rhs = lookup(rhs)
../../../../venv/afr/lib/python3.6/site-packages/numba/parfors/parfor.py:3389: in lookup
    return lookup(val)
../../../../venv/afr/lib/python3.6/site-packages/numba/parfors/parfor.py:3389: in lookup
    return lookup(val)
../../../../venv/afr/lib/python3.6/site-packages/numba/parfors/parfor.py:3389: in lookup
    return lookup(val)
E   RecursionError: Failed in nopython mode pipeline (step: convert to parfors)
E   maximum recursion depth exceeded
!!! Recursion detected (same locals & position)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================== 1 failed, 1 skipped, 281 deselected in 15.41s ===========================================================

There are a couple of scratch numpy arrays allocated outside the prange but used inside. Moving them inside doesn't fix the above recursion error. If I track it down with pdb, it looks like its getting very confused by the ncorrs variable.

sjperkins · 2020-05-15T08:05:04Z

To invoke the parallelism in this PR, the following structure can be used:

from africanus.config import config

cfg= {
    'rime.feed_rotation.parallel': True,
    'rime.predict_vis.parallel':True,
    'rime.phase_delay.parallel':True,
    'model.spectral_model.parallel':True,
    'model.shape.gaussian.parallel':True
}

with config.set(cfg):
   # All imports **must** happen after the config.set
   from africanus.rime.dask import phase_delay, feed_rotation, predict_vis
   from africanus.model.dask import spectral_model
   from africanus.model.shape import gaussian

I've typed the above from memory, so some of it may be incorrect, but the general idea holds.

JSKenyon · 2020-05-18T07:38:06Z

I have played with this PR briefly and the components I have used seem to work. It is worth noting that the parallelism is still bound by certain non-Numba tasks such as einsum. Of course, this will become even more powerful with a prange accelerated monolithic RIME, but this is already a useful addition for cases where we cannot afford to run a huge number of predicts in parallel. For anyone else wanting to make use of this functionality, it is worth familiarizing yourself with numba -s. This provides a summary of some useful Numba properties. The important one is whether or not tbb is the active threading layer. For tasks involving nested parallelism, it is VERY important that this is the case.

sjperkins · 2020-05-18T12:44:10Z

I have played with this PR briefly and the components I have used seem to work.

Thanks for trying this out.

It is worth noting that the parallelism is still bound by certain non-Numba tasks such as einsum.

I take it this is in the nested parallelism case where there are a mix of dask and numba threads?

Of course, this will become even more powerful with a prange accelerated monolithic RIME, but this is already a useful addition for cases where we cannot afford to run a huge number of predicts in parallel.

This PR still needs some work:

It's still difficult to get a parallel nopython version of the beam working.
rejitting the numba kernels for each test case parametrization has increased the test suite run time by double or worse.

I hope to find time for this later this week.

sjperkins added 6 commits May 7, 2020 15:58

Depend on donfig

5d55e71

Add config object

721add9

Simplify numba feed rotation

3fb8fbc

parallel numba feed rotation setup

240b447

Rejigger the parallel feed rotation

4116056

flake8

8651e89

sjperkins marked this pull request as draft May 7, 2020 16:22

sjperkins added 4 commits May 7, 2020 18:24

Simplify

f00e53d

Inherit from Config to implement custom getter

027cd67

updates and tests

60bda89

flake8

9e9c323

sjperkins added 15 commits May 12, 2020 17:34

Merge branch 'parallel-numba-rime' of github.com:ska-sa/codex-african…

c5154a4

…us into parallel-numba-rime

WIP

2020e89

Fix feed rotation test case

83c9c85

Parallel predict

74873a7

Improvements

02ac415

Only configure threads if requested

497668a

Beams

2b275e0

Move fixture

a60a65d

flake8

a1f6f6d

swizzling

62a3b3c

remove RIME conftest cfg_parallel

8509673

Gaussian shapes

ac91656

Changes

be3daf6

flake8

1cdebe6

[skip ci] Claim the PR

c1ee449

sjperkins added 3 commits May 13, 2020 15:42

Reintroduce donfig

6ebc0c6

Merge branch 'master' into parallel-numba-rime

cd41593

Merge branch 'master' into parallel-numba-rime

e268f59

sjperkins added 2 commits May 29, 2020 11:44

Merge branch 'master' into parallel-numba-rime

c223270

[skip ci] Fix dodgy merge in HISTORY.rst

346a84f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel numba rime #186

Parallel numba rime #186

sjperkins commented May 7, 2020

sjperkins commented May 8, 2020

JSKenyon commented May 8, 2020

sjperkins commented May 8, 2020

JSKenyon commented May 11, 2020

sjperkins commented May 13, 2020

sjperkins commented May 15, 2020

JSKenyon commented May 18, 2020

sjperkins commented May 18, 2020

Parallel numba rime #186

Are you sure you want to change the base?

Parallel numba rime #186

Conversation

sjperkins commented May 7, 2020

sjperkins commented May 8, 2020

JSKenyon commented May 8, 2020

sjperkins commented May 8, 2020

JSKenyon commented May 11, 2020

sjperkins commented May 13, 2020

sjperkins commented May 15, 2020

JSKenyon commented May 18, 2020

sjperkins commented May 18, 2020