fwdpy11_arg_example

This project shows what is possible in terms of tracking the ancestral recombination graph (ARG) during a foward-time simulation implemented using fwdpp and msprime. The interface to the implementation is in terms of a Python package implemented using fwdpy11 and pybind11.

License

GPLv3 or later (see COPYING)

Overview

We define a C++ class called "ancestry_tracker", which stores nodes and edges as they appear forwards in time. These data structures, and their updating, are non-intrusive, meaning that they don't care about any of the fwdpp internals. Rather, we simply have to define a new "iterate a generation" function that uses both fwdpp machinery and updates an ancestry_tracker as appropriate.

Using pybind11, we make ancestry_trackers visible to Python as an AncestryTracker class. The Python class has access to the nodes and edges as NumPy structured arrays, which can be viewed "for free", meaning that no copy from C++ to Python is required to look at them.

We define an ArgSimplifier class to bridge the AncestryTracker and msprime. The __call__ operator of ArgSimplifier will accept an AncestryTracker as an argument and use the msprime API to simplify the input data into a set of trees.

Caution

This is a proof of principle implementation only. It has been tested that it gives the correct sample properties in distribution and that internal data structures are sane during simulation. The API is not guaranteed to be neither ideal nor idiomatic. Further, various safety checks are missing on the C++ side. We simply do not check for integer overflow or other possible things that may happen if garbage collection occurs too infrequently. These limitations, and others, will be dealt with (and tested for) when we port these ideas into fwdpy11.

Usage instructions

This section will get you to a point where you can run the C++ simulations described in the paper. These instructions have been confirmed to work in a clean conda environment using Python3. We strongly recommend that this package is compiled within a clean conda env.

First, install a Python3 environment for your user. It doesn't matter if you use a "full" installer or a "miniconda" installer. At this point, you may need to follow the conda installation instructions.

Note

Linux is the preferred environment here. If you insist on using Apple's OS X, then you must add a --gcc flag to the last command in the following code block. I (KRT) make no guarantees about success on OS X.

Instructions for conda on Linux:

conda create --name ftprime_ms python
source activate ftprime_ms
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install gcc fwdpy11==0.1.4 msprime==0.5.0 pybind11==2.2.1 pandas
git clone https://github.com/molpopgen/fwdpy11_arg_example
cd fwdpy11_arg_example
python setup.py build_ext -i

Test simulation

For example, let's run a simulation with the following parameters:

N=5e4 diploids
a region size of theta = rho = 1e4
deleterious mutation rate equal to one percent of the neutral mutation rate
simplify ("GC", or garbage-collect) every 1,000 generations
apply mutations to a sample of 100 individuals at the end of the simulation
write the timings to a file called timings.txt

The command line for the above is:

python benchmarking.py --popsize 50000 --theta 10000 --rho 10000 --pdel 0.01 --gc 1000 --nsam 100 --outfile1 timings.txt.gz \
--seed $RANDOM

Note

The output file is gzip compressed!

Please be mindful of running these simulations on machines with little RAM! In general, forward simulations are intended to be run on HPC-strength hardware. While tree sequence simplification results in very efficient run times, we are sometimes still using a substantial amount of RAM.

An example of the output is:

prepping    sorting appending       simplifying     fwd_sim_runtime N       theta   rho     simplify_interval
0.05370585599999733 4.384206619999995       0.2173980950000004      2.9446647440000016      5.174604999999977       1000    1000.0  1000.0  100

The fields are:

prepping: cumulative time spent preparing data for a copy from the C++ side to msprime
sorting: cumulative time spent sorting tables, which is a requirement for simplification
simplifying: cumulative time spent simplifying tables
fwd_sim_runtime: The total time spent simulating

The remaining four columns are the command-line parameters.

Source code overview

The package consists of a mix of C++ and Python code. All source code is in the fwdpy11_arg_example subdirectory of thie main repository.

C++ code

We define nodes and edges as simple structs, meaning that they are "C-like", consisting only of POD and no constructors or other C++ stuff. This simple design allows C++ vectors of these structs to be treated as NumPy record arrays visible fom Python without needing to make a copy.

node.hpp defines a node as a simple C-like struct.
edge.hpp defines and edge as a simple C-like struct.
ancestry_tracker.hpp defines a C++ struct/class called ancestry_tracker to accumulate nodes and edges during a simulation.
evolve_generation.hpp handles the details of updating a Wright-Fisher population with an ancestry_tracker.
handle_recombination.cc/.hpp handles the conversion of fwdpp's recombination breakpoints into types use to make edges.
wfarg.cc defines a Python module (called wfarg) implemented in C++ via pybind11. It exposes our C++ back-end to Python. The most important user-facing type defined is AncestryTracker, which wraps the C++ ancestry_tracker.

Python code

argsimplifier.py defines ArgSimplifier, which is the bridge between the C++ code to evolve a population and the msprime functionality to simplify the simulated nodes and edges.
evolve_arg.py defines a function that evolves a population while tracking its ancestry. It integrates concepts from fwdpy11 with the types defined in this package.

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
benchmarks/cpp		benchmarks/cpp
fwdpy11_arg_example		fwdpy11_arg_example
practice		practice
profiles		profiles
tests		tests
COPYING		COPYING
README.rst		README.rst
SEEDS		SEEDS
TODO.md		TODO.md
benchmarking.py		benchmarking.py
bulk_evolve.py		bulk_evolve.py
compare.py		compare.py
setup.py		setup.py
test_evolve.py		test_evolve.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fwdpy11_arg_example

License

Overview

Caution

Usage instructions

Test simulation

Source code overview

C++ code

Python code

About

Releases 2

Packages

Contributors 4

Languages

License

molpopgen/fwdpy11_arg_example

Folders and files

Latest commit

History

Repository files navigation

fwdpy11_arg_example

License

Overview

Caution

Usage instructions

Test simulation

Source code overview

C++ code

Python code

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages