European green crabs are prolific generalist predators known to cause considerable perturbations to the ecosystems they colonize. This has had a damaging economic effect, particularly for shellfisheries in the western american coast. To counter this, there is a growing investment in catching these crabs and removing them from estuaries and bays they are in the process of colonizing. A question that naturally appears here is how to best allocate limited resources in a dynamically evolving colonizing event?
Here we approach this question from the point of view of deep reinforcement learning (RL).
RL is a broad class of machine learning algorithms that are aimed at solving adaptive control/management problems: problems in which an agent takes action on an environment based on its observations of the enviornment. These algorithms have been used for, e.g., teaching computers how to become prolific at board games like Chess and Go. Here, the environment would be the position of the Chess pieces in the chess board, which the agent observes and uses that observation to decide its next move. Other classic uses of RL are for playing atari games and for solving physics-based optimal control problems.
This project is part of a wider research program in which we take a step to extend RL out of its usual ‘‘comfort zone:’’ using it for adaptive management problems that arise in environmental science. Our focus on this project is to leverage the tools developed in RL to the problem of resource allocation for invasive green crab management.
Our project starts with the question: How to best use the data we collect on green crab populations in order to make policy decisions? Traditional methods used in adaptive management problems like this one, e.g. optimal control theory and Bayesian decision theory, have a hard time using high dimensional observations in their decision processes. However, the data collected in our problem is naturally high dimensional: each month a catch per unit effort is recorded, and several such observations are needed in order to have enough information to make an informed opinion---for example, a sequence observations are needed in order to distinguish whether a green crab population is has a firm foothold within a bay or estuary.
So that is our setting: how do we efficiently use these high dimensional observations?
Here RL offers an edge over traditional methods: RL is naturally suited for problems with high dimensional observations. (Think of chess, for instance, where an observation has 32 components---the location of each individual piece on the board.)
Our project aims to use this advantage in order to generate new, more responsive, quantitative policy rules that could complement traditional management approaches.
The agent trains by interacting with an integral projection model which describes the population dynamics of green crabs together with the agent's observation process. In short, the agent's actions correspond to numbers of cages used to trap crabs. This action hampers the growth of the crab population, and produces an observation of number of crabs caught per cage used.
Coming soon: a more detailed explanation of the integral projection model used!
notebooks/intro_pt1.ipynb
notebooks/intro_pt2.ipynb
git clone https://github.com/boettiger-lab/rl4greencrab.git
cd rl4greencrab
pip install .
(Coming soon: publishing our tools on PyPI in order to provide an easier installation!)
python scripts/train.py -f ../hyperpars/ppo-gcse.yml
Alternatively, bash scripts/train_algos.sh
trains using several algorithms in parallel.