Skip to content

Commit

Permalink
Batch update (#822)
Browse files Browse the repository at this point in the history
* update per deployment on HPC
  • Loading branch information
jchen6727 authored May 16, 2024
1 parent 6a98ad5 commit 8bb241c
Show file tree
Hide file tree
Showing 8 changed files with 743 additions and 13 deletions.
278 changes: 278 additions & 0 deletions doc/source/user_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2547,3 +2547,281 @@ The code for neural network optimization through evolutionary algorithm used in
.. Adding cell classes
.. --------------------
Running a Batch Job (Beta)
===================
The NetPyNE batchtools subpackage provides a method of automating job submission and reporting::
batch<-->\ /---> configuration_0 >---\
\ / specs---\
\<--->dispatcher_0 sim_0
\ \ comm ---/
\ \---< results_0 <---/
\
\ /---> configuration_1 >---\
\ / specs---\
\<--->dispatcher_1 sim_1
\ \ comm ---/
\ \---< results_1 <---/
\
\
...
1. Setting up batchtools
-----
Beyond the necessary dependency installations for NetPyNE and NEURON, several additional `pip` installations are required.
The NetPyNE installation should be handled as a development installation of the repository branch `batch`::
git clone https://github.com/Neurosim-lab/netpyne.git
cd netpyne
git checkout batch
pip install -e .
The batchtools installation either::
pip install -u batchtk
or a development install (recommended)::
git clone https://github.com/jchen6727/batchtk.git
cd batchtk
pip install -e .
Ray is a dependency for batchtools, and should be installed with the following command::
pip install -u ray[default]
2. Examples
-----
Examples of NetPyNE batchtools usage can be found in the ``examples`` directory `here <https://github.com/suny-downstate-medical-center/netpyne/tree/batch/netpyne/batchtools/examples>`_.
Examples of the underlying batchtk package can be in the ``examples`` directory `here <https://github.com/jchen6727/batchtk/tree/release/examples>`_.
3. Retrieving batch configuration values through the ``specs`` object
-----
Each simulation is able to retrieve relevant configurations through the ``specs`` object, and communicate with
the dispatcher through the ``comm`` object.
importing the relevant objects::
from netpyne.batchtools import specs, comm
cfg = specs.SimConfig() # create a SimConfig object
netParams = specs.NetParams() # create a netParams object
``netpyne.batchtools.specs`` behaves similarly to ``netpyne.sim.specs`` except in the following cases:
* ``netpyne.batchtools.specs`` automatically captures relevant configuration mappings created by the ``dispatcher`` upon initialization
* these mappings can be retrieved via ``specs.get_mappings()``
* the SimConfig object created by ``netpyne.batch.specs.SimConfig()`` will update itself with relevant configuration mappings through the ``update()`` method::
from netpyne.batchtools import specs # import the custom batch specs
cfg = specs.SimConfig() # create a SimConfig object
cfg.update() # update the cfg object with any relevant mappings for this particular batch job
The ``update`` method will update the ``SimConfig`` object with the configuration mappings captured in ``specs`` (see: ``specs.get_mappings()``)
This replaces the previous idiom for updating the SimConfig object with mappings from the batched job submission::
try:
from __main__ import cfg # import SimConfig object with params from parent module
except:
from cfg import cfg # if no simConfig in parent module, import directly from tut8_cfg module
4. Communicating results to the ``dispatcher`` with the ``comm`` object
-----
Prior batched simulations relied on ``.pkl`` files to communicate data. The ``netpyne.batch`` subpackage uses a specific ``comm`` object to send custom data back
The ``comm`` object determines the method of communication based on the batch job submission type.
In terms of the simulation, the following functions are available to the user:
* **comm.initialize()**: establishes a connection with the batch ``dispatcher`` for sending data
* **comm.send(<data>)**: sends ``<data>`` to the batch ``dispatcher``
* for ``search`` jobs, it is important to match the data sent with the metric specified in the search function
* **comm.close()**: closes and cleans up the connection with the batch ``dispatcher``
5. Specifying a batch job
-----
Batch job handling is implemented with methods from ``netpyne.batchtools.search``
**search**::
def search(job_type: str, # the submission engine to run a single simulation (e.g. 'sge', 'sh')
comm_type: str, # the method of communication between host dispatcher and the simulation (e.g. 'socket', 'filesystem')
run_config: Dict, # batch configuration, (keyword: string pairs to customize the submit template)
params: Dict, # search space (dictionary of parameter keys: tune search spaces)
algorithm: Optional[str] = "variant_generator", # search algorithm to use, see SEARCH_ALG_IMPORT for available options
label: Optional[str] = 'search', # label for the search
output_path: Optional[str] = '../batch', # directory for storing generated files
checkpoint_path: Optional[str] = '../ray', # directory for storing checkpoint files
max_concurrent: Optional[int] = 1, # number of concurrent trials to run at one time
batch: Optional[bool] = True, # whether concurrent trials should run synchronously or asynchronously
num_samples: Optional[int] = 1, # number of trials to run
metric: Optional[str] = "loss", # metric to optimize (this should match some key: value pair in the returned data
mode: Optional[str] = "min", # either 'min' or 'max' (whether to minimize or maximize the metric
algorithm_config: Optional[dict] = None, # additional configuration for the search algorithm
) -> tune.ResultGrid: # results of the search
The basic search implemented with the ``search`` function uses ``ray.tune`` as the search algorithm backend, returning a ``tune.ResultGrid`` which can be used to evaluate the search space and results. It takes the following parameters;
* **job_type**: either "``sge``" or "``sh``", specifying how the job should be submitted, "``sge``" will submit batch jobs through the Sun Grid Engine. "``sh``" will submit bach jobs through the shell on a local machine
* **comm_type**: either "``socket``" or "``filesystem``", specifying how the job should communicate with the dispatcher
* **run_config**: a dictionary of keyword: string pairs to customize the submit template, the expected keyword: string pairs are dependent on the job_type::
=======
sge
=======
queue: the queue to submit the job to (#$ -q {queue})
cores: the number of cores to request for the job (#$ -pe smp {cores})
vmem: the amount of memory to request for the job (#$ -l h_vmem={vmem})
realtime: the amount of time to request for the job (#$ -l h_rt={realtime})
command: the command to run for the job
example:
run_config = {
'queue': 'cpu.q', # request job to be run on the 'cpu.q' queue
'cores': 8, # request 8 cores for the job
'vmem': '8G', # request 8GB of memory for the job
'realtime': '24:00:00', # set timeout of the job to 24 hours
'command': 'mpiexec -n $NSLOTS -hosts $(hostname) nrniv -python -mpi init.py'
} # set the command to be run to 'mpiexec -n $NSLOTS -hosts $(hostname) nrniv -python -mpi init.py'
=======
sh
=======
command: the command to run for the job
example:
run_config = {
'command': 'mpiexec -n 8 nrniv -python -mpi init.py'
} # set the command to be run
* **params**: a dictionary of config values to perform the search over. The keys of the dictionary should match the keys of the config object to be updated. Lists or numpy generators >2 values will force a grid search over the values; otherwise, a list of two values will create a uniform distribution sample space.
**usage 1**: updating a constant value specified in the ``SimConfig`` object ::
# take a config object with the following parameter ``foo``
cfg = specs.SimConfig()
cfg.foo = 0
cfg.update()
# specify a search space for ``foo`` such that a simulation will run with:
# cfg.foo = 0
# cfg.foo = 1
# cfg.foo = 2
# ...
# cfg.foo = 9
# using:
params = {
'foo': range(10)
}
**usage 2**: updating a nested object in the ``SimConfig`` object::
# to update a nested object, the package uses the `.` operator to specify reflection into the object.
# take a config object with the following parameter object ``foo``
cfg = specs.SimConfig()
cfg.foo = {'bar': 0, 'baz': 0}
cfg.update()
# specify a search space for ``foo['bar']`` with `foo.bar` such that a simulation will run:
# cfg.foo['bar'] = 0
# cfg.foo['bar'] = 1
# cfg.foo['bar'] = 2
# ...
# cfg.foo['bar'] = 9
# using:
params = {
'foo.bar': range(10)
}
# this reflection works with nested objects as well...
# i.e.
# cfg.foo = {'bar': {'baz': 0}}
# params = {'foo.bar.baz': range(10)}
* **algorithm** : the search algorithm (supported within ``ray.tune``)
**Supported algorithms**::
* "variant_generator": grid and random based search of the parameter space (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "random": grid and random based search of the parameter space (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "axe": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "bayesopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "hyperopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "bohb": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "nevergrad": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "optuna": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "hebo": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "sigopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "zoopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* **label**: a label for the search, used for output file naming
* **output_path**: the directory for storing generated files, can be a relative or absolute path
* **checkpoint_path**: the directory for storing checkpoint files in case the search needs to be restored, can be a relative or absolute path
* **max_concurrent**: the number of concurrent trials to run at one time, it is recommended to keep in mind the resource usage of each trial to avoid overscheduling
* **batch**: whether concurrent trials should run synchronously or asynchronously
* **num_samples**: the number of trials to run, for any grid search, each value in the grid will be sampled ``num_samples`` times.
* **metric**: the metric to optimize (this should match some key: value pair in the returned data)
* **mode**: either 'min' or 'max' (whether to minimize or maximize the metric)
* **algorithm_config**: additional configuration for the search algorithm (see the `optuna docs <https://docs.ray.io/en/latest/tune/api/suggestion.html>`_)
6. Performing parameter optimization searches (CA3 example)
-----
The ``examples`` directory `here <https://github.com/suny-downstate-medical-center/netpyne/tree/batch/netpyne/batchtools/examples>`_ shows both a ``grid`` based search as well as an ``optuna`` based optimization.
In the ``CA3`` example, we tune the ``PYR->BC`` ``NMDA`` and ``AMPA`` synaptic weights, as well as the ``BC->PYR`` ``GABA`` synaptic weight. Note the search space is defined::
# from optuna_search.py
params = {'nmda.PYR->BC' : [1e-3, 1.8e-3],
'ampa.PYR->BC' : [0.2e-3, 0.5e-3],
'gaba.BC->PYR' : [0.4e-3, 1.0e-3],
}
in both ``optuna_search.py``, defining the upper and lower bounds of the search space, while in ``grid_search.py`` the search space is defined::
# from grid_search.py
params = {'nmda.PYR->BC' : numpy.linspace(1e-3, 1.8e-3, 3),
'ampa.PYR->BC' : numpy.linspace(0.2e-3, 0.5e-3, 3),
'gaba.BC->PYR' : numpy.linspace(0.4e-3, 1.0e-3, 3),
}
which defines ``3x3x3`` specific values to search over
Note that the ``metric`` specifies a specific ``string`` (``loss``) to report and optimize around. This value is generated and ``sent`` by the ``init.py`` simulation::
# from init.py
results['PYR_loss'] = (results['PYR'] - 3.33875)**2
results['BC_loss'] = (results['BC'] - 19.725 )**2
results['OLM_loss'] = (results['OLM'] - 3.470 )**2
results['loss'] = (results['PYR_loss'] + results['BC_loss'] + results['OLM_loss']) / 3
out_json = json.dumps({**inputs, **results})
print(out_json)
#TODO put all of this in a single function.
comm.send(out_json)
comm.close()
The ``out_json`` output contains a dictionary which includes the ``loss`` metric (calculated as the MSE between observed and expected values)
In a multi-objective optimization, the relevant ``PYR_loss``, ``BC_loss``, and ``OLM_loss`` components are additionally included (see ``mo_optuna_search.py``)
8 changes: 7 additions & 1 deletion netpyne/batchtools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
from netpyne.batchtools.runners import NetpyneRunner
from batchtk.runtk import dispatchers

from netpyne.batchtools import submits
from batchtk import runtk

specs = NetpyneRunner()

from netpyne.batchtools.comm import Comm

dispatchers = dispatchers
submits = submits
runtk = runtk


comm = Comm()


Expand Down
Loading

0 comments on commit 8bb241c

Please sign in to comment.