Thursday, December 8, 2016

How to set multiple calibration points in R's chronos function

As mentioned in this venue before, there are two fundamental ways of producing a time-calibrated phylogeny: either infer a Bayesian or Likelihood tree directly under a clock model, or infer a phylogram first and then force it into ultrametric shape afterwards. The second approach has the obvious advantage that you can use whatever tree you want, including e.g. parsimony trees. It also means that the topology is kept the same.

Some time ago I discussed how to use the program r8s, but as mentioned then it is only easily available on Mac and pretty much impossible to get to work on Windows. A more up-to-date alternative is the chronos function in the R package APE. And as mentioned in an even more recent post, it uses Penalised Likelihood with three different settings (relaxed, correlated or discrete) and an adjustable smoothing parameter lambda to time-calibrate the phylogeny.

The first few steps are well documented in APE's manual or on other online sources. Set your working directory, for example setwd("C:/Users/you/Documents/R stuff") . Load APE with library(ape). It needs to be installed, of course.

Next import your tree, either mytree <- read.tree("phylogram.tre") or mytree <- read.nexus("phylogram.tre")  depending on whether it is saved in Newick or Nexus format.

Now if you want to merely specify the root age of the tree, for example as 15 million years, you have it easy. Just establish a calibration using APE's makeChronosCalibration function as described in the manual:

mycalibration <- makeChronosCalib(mytree, node="root", age.max=15)

You hand your tree, smoothing parameter, model choice and the calibration over to the chronos function:

mytimetree <- chronos(mytree, lambda = 1, model = "relaxed", calibration = mycalibration, control = chronos.control() )

Done! Now you can plot the tree or save it as usual. But what if you have several calibration points? The documentation does not provide an example of how to do that. It mentions an interactive mode in which the phylogram is displayed, you can click on one branch after the other, and each time you are asked to provide its minimum and maximum ages manually.

A very good discussion of the interactive mode was provided by a blogger called Justin Bagley a bit more than a year ago, but the so far first and only commenter raised the obvious issue:
I am trying to place several dozens of calibrations in a very large tree so the "clicking on the tree" step in the interactive mode is kind of tricky as I just cannot see the nodes especially the more derived ones.
Also, it is just plain tedious for that many calibration points. What is more, I tried it myself earlier today and found that even when I provided a minimum age only and skipped on entering a maximum, the resulting calibration table would still have the maximum age identical to the minimum age - that is not how it is supposed to work.

So I spent some time figuring out how exactly we can build a data matrix of calibration points for chronos ourselves. Here is what you do.

Create a vector of nodes that you have calibrations for. I find it easiest to let APE find the node numbers itself by using the get Most Recent Common Ancestor function. For example, if you need to find the node where the ancestors of Osmunda and Doodia diverged, you would use getMRCA(mytree, tip = c("Osmunda", "Doodia")):

node <- c(
        getMRCA(mytree, tip = c("Equisetum","Doodia") ),
        getMRCA(mytree, tip = c("Osmunda","Doodia") ),
        getMRCA(mytree, tip = c("Hymenophyllum","Doodia") )

)

So now we have defined three nodes. Next, minimum and maximum ages. The first is the root of my imaginary tree, so I want to set a maximum age as a wall for the whole tree depth but no minimum. The others are internal nodes calibrated with fossils, so they should be minimum ages but have no maximum. I tried to enter NA or NULL when there was no minimum or maximum but it didn't work, so the easiest thing to do is to specify 0 for no minimum and the root age for no maximum.

age.min <- c(
            0,
            280,
            270
        )

age.max <- c(

            354,
            354,
            354

        )

Finally, chronos expects a last column of the data matrix that is not currently used. Still it needs to have the right dimensions:

soft.bounds <- c(
            FALSE,
            FALSE,
            FALSE

        )

Now simply unite these four vectors into a single matrix:

mycalibration <- data.frame(node, age.min, age.max, soft.bounds)

The names of the vectors turn into the column names that chronos expects, and off we go: three calibration points. I hope some readers find this useful, and that it works as well for them as it did for me.

1 comment:

  1. Thank you for this bit of tutorial! Incredibly helpful.

    ReplyDelete