PhyloBotanist: December 2016

Friday, December 30, 2016

Blue Mountains Holiday Trip, final part 4

On our way back to Canberra we stopped at Jenolan Caves, which I had so far only heard mentioned. They were really worth the admission price.

Above the entrance area, a large tunnel through the mountain. The various cave systems that can be visited branch off from this central cavity.

Before our tour started we spent some time at the amazingly blue lake nearby. It is said to have healing properties, if you will believe it.

We visited the Chifley cave. It is not even the largest of them but already extremely impressive, and it features a lot of interesting crystal and limestone formations...

... as seen above. I found it a bit weird that the guide was constantly apologising for explaining the geology. What is next, a zoo guide apologising for explaining animals?

These formations are apparently called shawls, at least if we understood correctly.

And because there will not be any plants otherwise, I will close with this yellow lichen landscape on tree bark.

Thursday, December 29, 2016

Blue Mountains Holiday Trip, part 3

Today's stations were, in order of appearance, Echo Point and Leura Cascades in Katoomba, Wentworth Falls, and Govet's Leap Lookout in Blackheath.

The obligatory Three Sisters Picture, although from this perspective it looks more like one sister.

Blue Mountains landscape as seen from the lookout behind Leura Cascades.

Leura Cascades are a paradise for various species of ferns. Here the leaf venation of Todea barbara (Osmundaceae) showing through against the light.

Cliff face near Wentworth Falls. If you look very carefully you can see tiny people; we got to that place a bit later.

The surroundings of Wentworth Falls were perhaps botanically the most interesting area today. Above Rimacola elliptica (Orchidaceae), a species restricted to a fairly small range around Sydney.

And here we have Alania endlicheri, of the strange little family Boryaceae. Its sister genus occurs far away in Western Australia.

Wednesday, December 28, 2016

Blue Mountains Holiday Trip, part 2

Today our goals were the stone pagodas of Garden of Stones and Wollemi National Parks as well as Mount Tomah Botanic Gardens.

The landscape of the sandstone-ironstone pagodas. This picture is taken from atop one of them. I had been looking forward to showing this area to my family.

One of the plants in flower on the way there was Goodenia bellidifolia (Goodeniaceae).

The only downside of getting there is that you have to pass through Newnes State "Forest". Forest as in clear-felling, apparently.

We then returned to Lithgow and drove eastwards from there to Mount Tomah. Shortly before reaching the gardens I noticed Calomeria amaranthoides (Asteraceae) on the roadside. Yes, Asteraceae; it is certainly one of the weirder representatives of this family on a continent that has its fair share of morphologically aberrant daisies.

The above is the view over Mount Tomah Botanic Gardens, with the Blue Mountains in the distance.

My daughter was particularly enchanted by the bog garden, as it had a lot of carnivorous plants on display. Here a detail of a sun-dew (Drosera).

She also liked the remnant rainforest. Here a Microsorum fern is climbing up a trunk.

Tuesday, December 27, 2016

Blue Mountains Holiday Trip, part 1

Over the holidays we are spending a few nights in the Blue Mountains. Today was mostly driving, but after our arrival in Lithgow we found the time to visit at least Hassans Wall Lookout, apparently the highest lookout in the area.

Unfortunately, as with my recent work trip, the sky was grey. Photos did not come out too badly considering the circumstances.

A bunch of tourists were going beyond the fences, into areas that are considered out of bounds. Bit too dangerous for my taste, given the landscape.

There were several interesting plants flowering in the area, nearly all of them white. Some of them I do not know the species names of, but the above is Platysace lanceolata (Apiaceae), a common shrub of heath-like habitats.

Thursday, December 22, 2016

Field trip to Moss Vale area

Pictures from a one day field trip I did today:

Landscape as seen from the Fitzroy Falls lookout; only a pity about the grey sky.

A bit further on from the falls we encountered this charming little shrub, Boronia anemonifolia (Rutaceae). The strongly divided leaves are very aromatic, somewhat fruity in their note.

At our second major stop in an area called Barren Ground we saw many interesting plants, although not the ones I was really after - in that sense the previous area was more satisfying. But this one was nonetheless a highlight of the trip: Drosera binata, the forked-leaf sundew. Way back when I was a student in Germany I bought a book on carnivorous plants, and the species is featured heavily in it, presumably because its unique leaf morphology makes it particularly attractive to collectors.

Sunday, December 11, 2016

Cladistics textbook, part 2

Coming back to the textbook

Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.

..., in my previous post I mentioned that I also ran into a section that I find hard to agree with. The chapter on support values opens with the following:

Page 118: The study of phylogeny is an historical science, concerned with the discovery of historical singularities. Consequently, we do not consider phylogenetic inference per se to be fundamentally a statistical question, open to discoverable and objectively definable confidence limits. Hence, we are in diametric opposition to those who would include such a standard statistical framework as part of cladistic theory and practice.

I can only repeat in slightly different words what I wrote some time ago about the same question in the context of biogeographic studies. I find it hard to draw a line between historical science and non-historical science, not least because, to take just one example, any physical experiment, be it ever so reproducible, turns into a singular historical event a split second after it has been conducted.

To me there is really no big difference. We always infer what is most likely to have happened in individual instances in the past and then draw more general conclusions from those instances, no matter whether it is history or social science, archeology or engineering, paleobotany or (extant) plant taxonomy, evolutionary biology or population genetics.

I assume that a big part of the difference in perspective here is about what organismal characters people are thinking of. Reading through the cladistics textbook, the focus is pretty much always on morphology. Reading through works that introduce likelihood or Bayesian phylogenetics, in other words probabilistic and model-based evolutionary analysis, the focus is pretty much always on nucleotide sequence data, with protein sequence data coming a distant second.

It makes sense to me that somebody who thinks predominantly in terms of trait shifts like the evolution of bird feathers from scales or of angiosperm gynoecia from ovules sitting nakedly on a stalk would have reason to favour parsimony analysis. In fact I myself, despite frequently using likelihood and Bayesian phylogenetics for sequence data, would still have to be counted among those who are highly sceptical whether the Mk model works better with morphological traits than parsimony.

These kinds of characters have very low homoplasy, at least if scored correctly; and where they do show homoplasy, I would say that is due to a scoring error that can be rectified (e.g. if double fertilisation has evolved independently in angiosperms and gnetophytes then the two should be scored as separate character states). And it just so happens that parsimony analysis is a better tool for the data the less homoplasy there is. What is more, it seems a bit odd to try and apply the same model to all morphological characters, given how vastly different they are.

It also makes a lot of sense to me that somebody who thinks predominantly in terms of trait shifts like an A in the DNA sequence turning into T would see reason to favour analyses using models of sequence evolution. As Prof. Bromham pointed out during her talk I heard a few weeks ago, if that A has changed into a T in two parallel instances and then all the A-carrying individuals died out there is no way in which we can ever find evidence for that.

In other words, in the case of our four letter soup of DNA sequence characters homoplasy is not a scoring error to be discovered by looking closer but a hard fact of life that we cannot rid ourselves of (except to the degree that we can choose slower-evolving markers). And it just so happens that parsimony analysis is a worse tool for the data the more homoplasy there is, while the right model-based approach can deal with that. (Or at least somewhat better - obviously, once homoplasy is so rampant that all signal is lost no phylogenetic method will work, and likelihood analysis has also been shown to suffer from long branch attraction.) What is more, it seems logical to apply the same model to all DNA sequence characters, given that they are equivalent nucleotides along a chain.

So when I call myself a cladist, what I mean is not that I prefer parsimony analysis for all data, but that I acknowledge Willi Hennig's legacy, the idea that systematists should classify consistently by relatedness.

Thursday, December 8, 2016

How to set multiple calibration points in R's chronos function

As mentioned in this venue before, there are two fundamental ways of producing a time-calibrated phylogeny: either infer a Bayesian or Likelihood tree directly under a clock model, or infer a phylogram first and then force it into ultrametric shape afterwards. The second approach has the obvious advantage that you can use whatever tree you want, including e.g. parsimony trees. It also means that the topology is kept the same.

Some time ago I discussed how to use the program r8s, but as mentioned then it is only easily available on Mac and pretty much impossible to get to work on Windows. A more up-to-date alternative is the chronos function in the R package APE. And as mentioned in an even more recent post, it uses Penalised Likelihood with three different settings (relaxed, correlated or discrete) and an adjustable smoothing parameter lambda to time-calibrate the phylogeny.

The first few steps are well documented in APE's manual or on other online sources. Set your working directory, for example setwd("C:/Users/you/Documents/R stuff") . Load APE with library(ape). It needs to be installed, of course.

Next import your tree, either mytree <- read.tree("phylogram.tre") or mytree <- read.nexus("phylogram.tre") depending on whether it is saved in Newick or Nexus format.

Now if you want to merely specify the root age of the tree, for example as 15 million years, you have it easy. Just establish a calibration using APE's makeChronosCalibration function as described in the manual:

mycalibration <- makeChronosCalib(mytree, node="root", age.max=15)

You hand your tree, smoothing parameter, model choice and the calibration over to the chronos function:

mytimetree <- chronos(mytree, lambda = 1, model = "relaxed", calibration = mycalibration, control = chronos.control() )

Done! Now you can plot the tree or save it as usual. But what if you have several calibration points? The documentation does not provide an example of how to do that. It mentions an interactive mode in which the phylogram is displayed, you can click on one branch after the other, and each time you are asked to provide its minimum and maximum ages manually.

A very good discussion of the interactive mode was provided by a blogger called Justin Bagley a bit more than a year ago, but the so far first and only commenter raised the obvious issue:

I am trying to place several dozens of calibrations in a very large tree so the "clicking on the tree" step in the interactive mode is kind of tricky as I just cannot see the nodes especially the more derived ones.

Also, it is just plain tedious for that many calibration points. What is more, I tried it myself earlier today and found that even when I provided a minimum age only and skipped on entering a maximum, the resulting calibration table would still have the maximum age identical to the minimum age - that is not how it is supposed to work.

So I spent some time figuring out how exactly we can build a data matrix of calibration points for chronos ourselves. Here is what you do.

Create a vector of nodes that you have calibrations for. I find it easiest to let APE find the node numbers itself by using the get Most Recent Common Ancestor function. For example, if you need to find the node where the ancestors of Osmunda and Doodia diverged, you would use getMRCA(mytree, tip = c("Osmunda", "Doodia")):

node <- c(
getMRCA(mytree, tip = c("Equisetum","Doodia") ),
        getMRCA(mytree, tip = c("Osmunda","Doodia") ),
        getMRCA(mytree, tip = c("Hymenophyllum","Doodia") )
)

So now we have defined three nodes. Next, minimum and maximum ages. The first is the root of my imaginary tree, so I want to set a maximum age as a wall for the whole tree depth but no minimum. The others are internal nodes calibrated with fossils, so they should be minimum ages but have no maximum. I tried to enter NA or NULL when there was no minimum or maximum but it didn't work, so the easiest thing to do is to specify 0 for no minimum and the root age for no maximum.

age.min <- c(
      0,
            280,
            270
        )

age.max <- c(
      354,
            354,
            354
        )

Finally, chronos expects a last column of the data matrix that is not currently used. Still it needs to have the right dimensions:

soft.bounds <- c(
      FALSE,
            FALSE,
            FALSE
        )

Now simply unite these four vectors into a single matrix:

mycalibration <- data.frame(node, age.min, age.max, soft.bounds)

The names of the vectors turn into the column names that chronos expects, and off we go: three calibration points. I hope some readers find this useful, and that it works as well for them as it did for me.

Saturday, December 3, 2016

Molecular clock models in three different programs

There are quite a few molecular clock models implemented in various phylogenetic programs. What I find somewhat annoying is that there is generally only very cryptic information on how they work and how they relate to each other.

What we usually get is the manual or documentation merely saying "our software offers models A, B, and C", without any details on what A means. If we are extremely lucky we find a reference to a paper. In that paper there will be a lot of complicated formulas but rarely will we find a sentence that simply says "model A assumes that rates on neighbouring branches are not correlated".

So I just slogged through documentation, references, and some more or less helpful websites to try and figure out the clock models in R's chronos function, which turns a pre-existing phylogram into a chronogram, and in MrBayes 3 and BEAST 2, which produce chronograms directly.

R: ape package: chronos function

Correlated

As far as I understand this is probably the original Penalised Likelihood approach as also implemented in the software r8s. If so, it allows rates to vary across the tree but with neighbouring internodes of the tree having more similar rates than distant branches are allowed to have. How strongly the rate can vary across the tree depends on the smoothness factor lambda of the model. Here a default of 1 is often used, and it has to changed by orders of magnitude to explore better settings (10, 100...).

Relaxed

Each internode of the tree has its own rate of evolution, with the rate distribution drawn from a gamma parameter.

Discrete

Attempts to estimate a number of categories of internodes, where all the branches of the same category have the same rate of evolution.

MrBayes 3

This is a bit more complicated because MrBayes has several strict clock models, and the relaxed models are always based on an underlying strict model to which they add at least one parameter to describe how rates vary. So how can there be different strict clock models, when a strict clock model means nothing but a constant rate? Well, really this is about the prior (expectation) for the distribution of divergence times across the tree.

Uniform (a.k.a. simple)

Basically the expectation is that a divergence is equally likely to have happened at any point in time between the divergences immediately before and after it.

Birth-death

This model has parameters for speciation rate, extinction rate, and sampling probability. It looks at the tree as something that comes in existence through a process of lineage splits and lineage extinctions, forward through time. Crucially, trees can have rather non-uniform branching patterns depending on the speciation and extinction rates. If both are high, for example, there will be many recent splits but long branches deep in time.

Coalescent

Parameters are theta and the ploidy level, to estimate effective population sizes. The perspective is the opposite of the previous one, with the tree being seen as the coalescence of contemporary lineages into common ancestors as we go back in time.

There are also a birth-death based fossilisation model and the species tree model as additional alternatives. But coming now to how the clock can be relaxed, MrBayes has three options for that (clockvarpr):

Independent Gamma Rate (igr)

I assume this is the same as the relaxed model of ape's chronos function, see above, as the MrBayes manual says it is uncorrelated, and they both have the gamma parameter.

Brownian motion / autocorrelated lognormal distribution (tk02)

The terms Brownian motion and autocorrelated suggest that it is a correlated clock model, i.e. neighbouring branches do not vary independently.

Compound Poisson Process (cpp)

The way I read the paper where this was first suggested it seems to me as if rates are correlated. It allows shifts in rate to occur anywhere on the tree.

BEAST 2

My understanding is that BEAST always uses the coalescent model. It then offers the following options for the clock model:

Strict

The same rate across all branches. So this should be equivalent to using the coalescent divergence time prior with a strict model in MrBayes.

Relaxed exponential or relaxed log normal

Two uncorrelated models differing in the shape of the rate distribution. No idea which one could a priori be expected to be more realistic.

Random local

There are different clocks applying to different parts of the phylogeny (reference). This should work well with sudden rate shifts, but I don't know if it will make clock rooting any more reliable. The way I understand it, it also means that the model is uncorrelated.

In other words, it looks as if "strict" is the only setting that is not an uncorrelated model. This brings us to two final points.

What model to use?

First, I have spoken to a senior colleague and browsed a few sources online, and the general thought seems to be that correlated clock models are more biologically realistic than uncorrelated ones. That makes a lot of sense to me, although with the caveat that there seem to be obvious shifts in some phylogenies, generally associated with a change in a crucial trait such as generation time or metabolism.

Second, one of the sources I read opined that only the strict clock model is really appropriate with the coalescent model, because the latter kind of assumes the former. I have no real opinion on this; I assume that the BEAST developers would have a reason to offer several relaxed models in their coalescent-based package.

Summary

In summary, and hoping I didn't get anything wrong, this is how I currently understand the hierarchy of the relevant clock models:

Strict clock (strict in BEAST; default in MrBayes if no clockvarpr added)
Relaxed clocks
    Correlated rates
        Brownian motion (tk02 in MrBayes)
        Compound Poisson Process (cpp in MrBayes)
        Penalised Likelihood (correlated in chronos)
    Uncorrelated rates
        Independent Gamma Rates (relaxed in chronos; igr in MrBayes)
        Discrete rate categories (discrete in chronos)
        Relaxed exponential (just that name in BEAST)
        Relaxed log normal (just that name in BEAST)
        Random local (just that name in BEAST)