Name		Name	Last commit message	Last commit date
parent directory ..
run1		run1
run2		run2
run3		run3
HostFungusAssociations.txt		HostFungusAssociations.txt
HostFungusAssociations.txt.tsv		HostFungusAssociations.txt.tsv
Mbasal_mod1.bt.rescaled.nex		Mbasal_mod1.bt.rescaled.nex
Mbasal_mod1.bt.rescaled.nex.ms.svg		Mbasal_mod1.bt.rescaled.nex.ms.svg
Mbasal_mod1.bt.rescaled.nex.supp.svg		Mbasal_mod1.bt.rescaled.nex.supp.svg
Mbasal_mod1.bt.rescaled.nex.svg		Mbasal_mod1.bt.rescaled.nex.svg
README.md		README.md
constrained.qHB.0.txt		constrained.qHB.0.txt
constrained.qHD.0.txt		constrained.qHD.0.txt
constrained.qHG.0.txt		constrained.qHG.0.txt
constrained.qHI.0.txt		constrained.qHI.0.txt
dataprep.sh		dataprep.sh
draw.sh		draw.sh
phyla.tsv		phyla.tsv
qmatrix.run2.unconstrained.tsv		qmatrix.run2.unconstrained.tsv
run1.sh		run1.sh
run2.sh		run2.sh
run3.sh		run3.sh
states.tsv		states.tsv
taxonomy_plant_speies.xlsx		taxonomy_plant_speies.xlsx
unconstrained.txt		unconstrained.txt

README.md

Over the course of the project, the analysis has been developed and the input files have been refined to the state that this directory is in. Here now follows a brief explanation of all the moving parts.

Input files

This directory contains a number of hand-crafted input files, itemized here.

HostFungusAssociations.txt

The original character state matrix by Frida, in a non-standard, pseudo-tabular format.

HostFungusAssociations.txt.tsv

The character matrix recoded for input in BayesTraits. This has undergone the following modifications:

the sequences of four 0/1 switches that code for the absence/presence of associations with the four major groups of mycorrhiza (A, B, G, and M) have been recoded into single letter codes. The mapping between these is in states.tsv.
the updated table omits two species (Lobelia dortmanna and Pinus contorta) that had singleton observations (1010 and 1101) that needlessly complicated the Q matrix.

Mbasal_mod1.bt.rescaled.nex

The consensus tree file of a BEAST run with the preferred rooting (this rooting is coded as Mbasal with this project). This tree has been modified further to remove the two taxa that were also removed from the character state matrix: Lobelia dortmanna and Pinus contorta.

Shell scripts and command files

The following ephemeral shell scripts were created (or generated) within this directory.

dataprep.sh

This shell script invokes a number of Perl scripts that are in the script folder at the top level of this project. The first invocation, of make_ms_input.pl recodes Frida's input table into the single-character states for BayesTraits and spits out the state mapping (see above). The script then invokes make_restrictions.pl to create command files for BayesTraits (which are simply piped into STDIN of the executable). The invocations are first done four times to constrain the analyses to disallow the following transitions, respectively:

0000 => 0001 - constrained.qHG.0.txt
0000 => 0010 - constrained.qHD.0.txt
0000 => 0100 - constrained.qHB.0.txt
0000 => 1000 - constrained.qHI.0.txt

For each of these, the analysis does a stepping stone sampler to approximate the marginal likelihood given the constrained Q matrix. In the final invocation, which produces the command file unconstrained.txt, the marginal likelihood is approximated given an unconstrained Q matrix. We can thus determine by elimination which initial transition from no association to the first one is best supported by the present evidence, because that will be the one whose marginal likelihood is most affected by the constraint (spoiler: this appears to be 0000 => 0001). Whether the constraint has a significant effect is indicated by the test statistic, i.e. the log Bayes factor, whose calculation and interpretation for the present case is discussed on page 14 of the BayesTraits manual.

Finally, the dataprep.sh script generates three simple shell scripts (run1.sh, run2.sh, and run3.sh) that do the actual invocation of BayesTraits (i.e. loading the tree, the data, and piping the command file into its STDIN) in triplicate. This is because the analyses is, like any Markov chain, a process that can fail to generate a good posterior sample so it is generally advisable to do this multiple times to detect artefacts / bad runs. The generated shell scripts manage that all output files end up in their respective results directories (run1, run2 and run3). For the Bayes factor calculations, the marginal likelihoods are printed on the last lines of the *.Stones.txt files. The (very large, compressed) *.log.gz files contain the posterior samples of all the internal node states and transition rates that BayesTraits encountered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2016-12-01

2016-12-01

README.md

README.md

Input files

HostFungusAssociations.txt

HostFungusAssociations.txt.tsv

Mbasal_mod1.bt.rescaled.nex

Shell scripts and command files

dataprep.sh

Files

2016-12-01

Directory actions

More options

Directory actions

More options

Latest commit

History

2016-12-01

Folders and files

parent directory

README.md

README.md

Input files

HostFungusAssociations.txt

HostFungusAssociations.txt.tsv

Mbasal_mod1.bt.rescaled.nex

Shell scripts and command files

dataprep.sh