Saturday, December 3, 2016

Molecular clock models in three different programs

There are quite a few molecular clock models implemented in various phylogenetic programs. What I find somewhat annoying is that there is generally only very cryptic information on how they work and how they relate to each other.

What we usually get is the manual or documentation merely saying "our software offers models A, B, and C", without any details on what A means. If we are extremely lucky we find a reference to a paper. In that paper there will be a lot of complicated formulas but rarely will we find a sentence that simply says "model A assumes that rates on neighbouring branches are not correlated".

So I just slogged through documentation, references, and some more or less helpful websites to try and figure out the clock models in R's chronos function, which turns a pre-existing phylogram into a chronogram, and in MrBayes 3 and BEAST 2, which produce chronograms directly.

R: ape package: chronos function


As far as I understand this is probably the original Penalised Likelihood approach as also implemented in the software r8s. If so, it allows rates to vary across the tree but with neighbouring internodes of the tree having more similar rates than distant branches are allowed to have. How strongly the rate can vary across the tree depends on the smoothness factor lambda of the model. Here a default of 1 is often used, and it has to changed by orders of magnitude to explore better settings (10, 100...).


Each internode of the tree has its own rate of evolution, with the rate distribution drawn from a gamma parameter.


Attempts to estimate a number of categories of internodes, where all the branches of the same category have the same rate of evolution.

MrBayes 3

This is a bit more complicated because MrBayes has several strict clock models, and the relaxed models are always based on an underlying strict model to which they add at least one parameter to describe how rates vary. So how can there be different strict clock models, when a strict clock model means nothing but a constant rate? Well, really this is about the prior (expectation) for the distribution of divergence times across the tree.

Uniform (a.k.a. simple)

Basically the expectation is that a divergence is equally likely to have happened at any point in time between the divergences immediately before and after it.


This model has parameters for speciation rate, extinction rate, and sampling probability. It looks at the tree as something that comes in existence through a process of lineage splits and lineage extinctions, forward through time. Crucially, trees can have rather non-uniform branching patterns depending on the speciation and extinction rates. If both are high, for example, there will be many recent splits but long branches deep in time.


Parameters are theta and the ploidy level, to estimate effective population sizes. The perspective is the opposite of the previous one, with the tree being seen as the coalescence of contemporary lineages into common ancestors as we go back in time.

There are also a birth-death based fossilisation model and the species tree model as additional alternatives. But coming now to how the clock can be relaxed, MrBayes has three options for that (clockvarpr):

Independent Gamma Rate (igr)

I assume this is the same as the relaxed model of ape's chronos function, see above, as the MrBayes manual says it is uncorrelated, and they both have the gamma parameter.

Brownian motion / autocorrelated lognormal distribution (tk02)

The terms Brownian motion and autocorrelated suggest that it is a correlated clock model, i.e. neighbouring branches do not vary independently.

Compound Poisson Process (cpp)

The way I read the paper where this was first suggested it seems to me as if rates are correlated. It allows shifts in rate to occur anywhere on the tree.


My understanding is that BEAST always uses the coalescent model. It then offers the following options for the clock model:


The same rate across all branches. So this should be equivalent to using the coalescent divergence time prior with a strict model in MrBayes.

Relaxed exponential or relaxed log normal

Two uncorrelated models differing in the shape of the rate distribution. No idea which one could a priori be expected to be more realistic.

Random local

There are different clocks applying to different parts of the phylogeny (reference). This should work well with sudden rate shifts, but I don't know if it will make clock rooting any more reliable. The way I understand it, it also means that the model is uncorrelated.

In other words, it looks as if "strict" is the only setting that is not an uncorrelated model. This brings us to two final points.

What model to use?

First, I have spoken to a senior colleague and browsed a few sources online, and the general thought seems to be that correlated clock models are more biologically realistic than uncorrelated ones. That makes a lot of sense to me, although with the caveat that there seem to be obvious shifts in some phylogenies, generally associated with a change in a crucial trait such as generation time or metabolism.

Second, one of the sources I read opined that only the strict clock model is really appropriate with the coalescent model, because the latter kind of assumes the former. I have no real opinion on this; I assume that the BEAST developers would have a reason to offer several relaxed models in their coalescent-based package.


In summary, and hoping I didn't get anything wrong, this is how I currently understand the hierarchy of the relevant clock models:

Strict clock (strict in BEAST; default in MrBayes if no clockvarpr added)
Relaxed clocks
    Correlated rates
        Brownian motion (tk02 in MrBayes)
        Compound Poisson Process (cpp in MrBayes)
        Penalised Likelihood (correlated in chronos)
    Uncorrelated rates
        Independent Gamma Rates (relaxed in chronos; igr in MrBayes)
        Discrete rate categories (discrete in chronos)
        Relaxed exponential (just that name in BEAST)
        Relaxed log normal (just that name in BEAST)
        Random local (just that name in BEAST)

Wednesday, November 30, 2016

Hey, since when can RAxML do that?

Maybe I missed something, but I only noticed today that the super-fast likelihood phylogenetics software RAxML can use four different models of sequence evolution.

I seem to remember (?) that it could only do the GTR model. Perhaps they added the three others in a newer version and I missed the change while I was busy using BEAST and otherwise doing non-phylogenetic research? Or did I just never notice before?

Anyway, there is the choice between GTR, JC, K80, and HKY, so the most parameter-rich model, the most parameter-poor, and the two standard models with different rates transitions and transversions. Which is good, because I have a tendency to get GTR or HKY suggested for the data I usually use. It seems, however, as if it is not possible to specify different models for the various parts of the partition.

Anyway, I have accordingly updated the posts on what number of schemes to test in jModelTest and on what models are implemented in the various phylogenetics programs I am familiar with.

Tuesday, November 29, 2016

Cladistics textbook

In my office I have two 'proper' phylogenetics textbooks, that is counting only those that cover the principles and theory as opposed to offering only a practical how-to manual. One is Felsenstein's, who is strongly associated with likelihood phylogenetics, although his book covers all approaches. The second is:

Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.

As the title implies, it is entirely about parsimony phylogenetics.

Having recently looked into Kitching et al., I noticed two short sections that I found interesting enough to discuss here. I will start with the question of ancestors. Proponents of paraphyletic taxa often make claims on the lines of cladists "not accepting the existence of ancestral species", of "ignoring ancestors", or of "treating all species as sister taxa".

Here now we have a textbook written by cladists, in other words the official version, to the degree that an official version exists. It is, of couse, not as easy as that because the only thing that unites cladists in the sense of what paraphylists argue against is that supraspecific taxa should be monophyletic. Many other details differ from cladist to cladist, and in the sense of what paraphylists argue against the concept of cladist includes those who use e.g. Bayesian phylogenetics.

I also do not want to give the impression that I, personally, take what Kitching et al. promote on this or that detailed question to necessarily be The Correct View. It is well possible that I, a cladist, find myself in disagreement with some chapter of that textbook. I am not even arguing here, in this instance, that making taxa monophyletic is the way to go (although of course I do believe that).

No, the point of the post is merely this: if Kitching et al. argue not-XYZ, then this demonstrates decisively that any claim of all cladists arguing XYZ is nonsense.

So, about ancestors, and turning to page 14 of the textbook:
In fact, to date, Archaeopteryx has no recognized autapomorphies. Indeed, if there were, Achaeopteryx would have to be placed as the sister-group to the rest of the birds.
It does not matter here whether more recent analyses have demonstrated Archaeopteryx to have autapomorphies and to actually have been a side branch relative to modern birds. We should here simply think of any species that looks exactly like the ancestral species of a later-existing clade is inferred to have looked like.

It should be clear that the above section is correct. An ancestral species would not have any systematically useful characters relative to its descendants, because that descendant clade would have started out as that species. My view - and here other cladists may differ - is actually that the ancestral species and the clade are one and the same. The ancestral species has over time turned (diversified) into the clade.
In terms of unique characters, Archaeopteryx simply does not exist. This is absurd, for its remains have been excavated and studied. To circumvent this logical dilemma, cladists place likely ancestors on the cladogram as the sister-group to their putative descendants and accept that they must be nominal paraphyletic taxa (Fig. 1.9c). Ancestors, just like paraphyletic taxa in general, can only be recognized by a particular combination of characters that they have and characters that they do not have. The unique attribute of possible ancestors is the time at which they lived.
Here is the reason why paraphylists complain about ancestors being treated as sister to their descendants: they are treated like that, crucially, so that we can do the analysis. It is a practical, not a philosophical reason.

Note also that at least the cladists who wrote the textbook do not have any problem with paraphyletic species. Whether we think that this use of the word paraphyletic makes sense or not (as do I), it is discussions like this one which make me groan in frustration whenever I read a paraphylist claim that cladists only accepted paraphyletic species as a cop-out once they could no longer deny that they existed. No, cladism was founded on the principle that monophyly applies above the species level, so it never had to backpedal like that.
After a cladistic analysis has been completed the cladogram may be reinterpreted as a tree (see below)
What they mean here is that they see a cladogram as such (merely) as a different visualisation of the data from the data matrix, while the "tree" is the cladogram's interpretation in terms of evolutionary relationships, of actual genealogical relatedness of the terminals.
and at this stage some palaeontologists may choose to recognize these paraphyletic taxa as ancestors, particularly when they do not overlap in time with their putative descendants (see Smith 199a for a discussion).
And this is the main point. Here we have a group of senior cladists who wrote, to put it in the simplest possible terms, "we need to treat every species as a terminal to get a cladogram, but then if you wish you can interpret a terminal without autapomorphies as an ancestor".

It is as if the people who claim that cladists do not accept the existence of ancestors haven't even bothered to figure out what any cladists really think.

Next time I will look at a short section of the textbook that I definitely disagree with.

Thursday, November 24, 2016

The political system of the USA

So, about that recent election. I am not an American, so I don't actually have a horse in that race except to the degree that everybody will be impacted by what one of the most influential nations on the planet decides to do.

I don't really want to discuss party politics on this blog either, so what I will focus on is simply what I consider to be certain systematic issues with how elections work in the USA. The point is, as far as I can tell the system is built so that it systematically favours conservatives, whether intentionally or not.

A major concept here is Gerrymandering.

In case that isn't clear what Gerrymandering is, imagine a political system in which the seats in parliament are given to people representing individual electoral districts as opposed to nation-wide party lists. So if you win the plurality of the votes in a district, or perhaps the majority after resolving preferences or a run-off election, you get its seat in parliament. Imagine further you have two electoral districts in your town with a hundred voters each and where voters favour the two major parties as follows*: the Yellows always get 45 votes, the Reds 55. Both seats go to the Reds.

Now assume that the Yellows have control of the state government and can redraw the district boundaries. They cleverly manage to redistrict the voters as follows: one district now has Yellows 55 votes versus Reds 45, and the other makes up the difference with Yellows 35 versus Reds 65. Eh voilà, the Yellows have won one seat in parliament without convincing a single voter to switch allegiance. The trick is to concentrate your opponent's voters in a few districts that are super-overwhelmingly safe for them while giving yourself lots of narrower margins.

Now coming to the USA, which of course have district-based representation as opposed to proportional representation.

First, the Electoral College. This is the most obvious and widely discussed, as there have now been two elections within twenty years in which conservative candidates won despite losing the popular vote. If election of presidents had been direct, their opponents would have carried the day. That being said, however, the Electoral College is probably the least Gerrymandered of all bodies, not least because it cannot possibly have been done deliberately. State boundaries are just what they are, they do not get redistricted easily. Still, I assume that urbanisation has an effect here. Many Americans are concentrated in a very small number of states, in huge metropolitan areas, which are strongly leaning progressive. Most states are rural, rural voters lean conservative, and consequently the Electoral College leans conservative.

(By the way, I find it extremely bizarre how these discussions go on American websites. On many sites I have read in the last two weeks there will be commenters who complain about the Electoral College not reflecting the popular vote. And then there will always be somebody replying to the effect of "it is doing exactly what it was meant to do, that is stopping a minority [of states] from dominating the majority [of states]". This is weird, isn't it? There are two main political camps; either the first "dominates" the second, or the second "dominates" the first. You cannot say that when the first doesn't happen but the second does there is suddenly no "dominating" going on. So why should the majority of states count more than the majority of voters? At a minimum I would need more explanation here than is usually forthcoming...)

Second, the US Senate, the upper house of the US Parliament, which has a lot of power. It consists of two senators from each state. Immediately the situation should become clear: the Senate is Gerrymandered by default. Most states are rural, rural voters lean conservative, consequently the Senate leans conservative.

Third, the states. The same principle applies. The majority of states is rural, rural voters lean conservative, and consequently conservative politicians control a majority of the states.

But that was just the districting; there are other factors.

Fourth, for some strange reason US citizens are not automatically registered for voting, they have to make a deliberate effort to become registered voters. People who have time on their hands will have an easier time doing that. People who have time on their hands are, in particular, pensioners and the independently wealthy, while the working poor will have it harder. Old and wealthy people lean conservative, consequently registered voters will lean conservative.

Fifth, out of ancient tradition the USA have elections on Tuesdays, a working day. This makes it much easier for pensioners and the independently wealthy to participate in elections, whereas the working poor will find it harder to take one of their very few days of leave to stand in a queue for voting. Old and wealthy people lean conservative, consequently voters will lean conservative.

Sixth, it is my understanding that US citizens have a lot more elections than the citizens of most other countries. They elect people for offices that are filled without formal election campaigns elsewhere, such as judges, sheriffs, school boards, etc. This means that participating in democracy is much more time-consuming for US citizens, and will easily lead to voting fatigue. The people who have lots of time to deal with all that are, in particular, pensioners and the independently wealthy. Old and wealthy people lean conservative, consequently voters will lean conservative.

Now I have read those who argue that this is simply the system that exists, so progressives will have to learn how to win elections in that system instead of e.g. whining about the unfair Electoral College. Fair enough. What is more, it works well for the conservatives, and again, I am not even an American. It just kind of seems to me, personally, that the point of a democracy is that the outcome of an election should kind of reflect the popular will. The Americans will have to know themselves what they want, and there does not appear to be any interest in change. Myself, I prefer proportional representation, party-independent committees drawing district boundaries, automatic voter registration, voting on a Sunday, and fewer elections with much shorter campaigns so that there is less voting fatigue. It seems to work well in many countries. Just my two cents.


*) Of course, one of my major concerns with district-based parliaments is that they distort the popular will even without any Gerrymandering whatsoever. If a smaller party gets 20% in each district of the country they will still get 0% of the seats in parliament, a situation that completely disenfranchises one in five voters.

Monday, November 21, 2016

Just have to share my astonishment here

This morning over breakfast I read an article in the Canberra Times. When I had finished, I first scrolled up again to make sure that I had not accidentally opened The Onion or, perhaps more likely, its Australian counterpart The Shovel. But no, it was indeed the Canberra Times. Then I thought hard if I had somehow missed that it was 1 April, but again, no such luck.

The article in question?

Housing affordability in Canberra: Renting is the ACT's 'biggest issue'. It argues that rents are so high in Canberra that people cannot save up enough to buy property, which is fair enough, ... using as its only example and case study a 23 year old student to whom, and I cite, "the great Australian dream" (of owning a house) "seems just that - a dream".

Maybe I am just weird, but when I was a 23 year old undergraduate back in Germany I would not have had the money to buy a house either. I lived off a mixture of a small competitive stipend, money earned from teaching assistantships, and my parents topping up the rest. Life was nonetheless good, as the student canteen was cheap and rents reasonable. But if I had started whinging about not being able to buy a house my friends and family would have given me a lot of side-eye, to put it mildly.

I would also argue that at 23 I was not mature enough to take on this responsibility, and I think I would have said so myself, even then. It was a time of learning, of studying, of first figuring out where I want to go with my life.

Which brings up another point. After finishing my studies and doctorate in that town I moved to a different state of the same country; two years later I moved to a different country on the same continent; and nearly one and a half years after that I moved to the other side of the planet. And really something like that was to be expected, given the way the job market in science works. So even if I had been able to afford a house I would not have wanted to buy one until I was settled. Yes, I guess there are some undergrads who study economics or law and then get into a company or public service in their home town, but that cannot be assumed to be a given.

Don't get me wrong, housing is expensive in Canberra. And clearly there must be some up-bidding of prices going on, because looking at quality and size the flats and houses are objectively not worth what they are going for, so the article seems to have got that right. I am forty now, and if we were to describe our fantastic, pie in the sky dream it would be to one day be able to afford a small two bedroom flat with a little courtyard or, if that is impossible, at least a balcony. A house is totally out of the question. This just for context - and note that I am not depressed about it. Billions of people on this planet live happy, productive and fulfilled lives while renting.

But apparently somebody at the newspaper seems to think that the average 23 year old (!) student (!) is expected to be able to buy and own a house. Further, that one's main goal in life, this "great Australian dream", cannot possibly wait until the old age of, I dunno, thirty, but has to achieved before even having finished education. Somebody looked at this article and went, yes, that looks sensible, let's click "publish". I am really, really astonished.

And I am eagerly awaiting to see the next article in the series, "Marriage prospects in Canberra: how a nine year old girl despairs of ever finding Mr Right".

Thursday, November 17, 2016

Clock rooting with strong rate shifts - or not

Today at our journal club we discussed Schmitt 2016, "Hennig, Ax, and present-day mainstream cladistics, on polarising characters", published in Peckiana 11: 35-42.

The point of the paper is that early phylogeneticists discussed various ways of figuring out character polarity (i.e. which character state is ancestral and which is derived) first and then using that inference to build a phylogeny, whereas today nearly everybody does the phylogeny building first and then uses outgroup rooting to polarise the resulting network and infer character polarity.

And... that's it, really. There does not appear to be any clear call to action, although one would have expected something on the lines of "and this is bad because...". The paper does end with an exhortation to use more morphological characters instead of only molecular data, and then there is language that may be meant to identify the author as a proponent of paraphyletic taxa without making it explicit (anagenesis!), but neither of those two conclusions appear to be to the point. There is no actual way forward regarding the question of how to polarise characters without outgroup rooting.

The approaches discussed in the paper are the following:

Palaeontological precedence. The character state appearing first in the fossil record is the ancestral one. The problem is, this only works if we assume that the fossil record is fairly complete.

Chorological progression. The character state found more frequently near the edges of a range is the derived one, whereas the ancestral state dominates at the centre of origin. Problem, this is circular because we first need to figure out where the centre of origin is. I am not too convinced of the principle either.

Ontological precedence. Because organisms cannot completely retool their developmental pathways but only change through (1) neoteny or (2) attaching steps to the end of the process, the earlier states in ontogeny are the ancestral ones. The author mentions the problem of a scarcity of ontological data; I might add that this shows a bit of a zoological bias, as it will rarely work in plants and presumably never in microorganisms.

Correlation of transformation series. I must admit I don't quite understand the logic here, and the author isn't very impressed by it either.

Comparison with the stem lineage of the study group. The state found in the ancestral lineage is ancestral. This if very obviously circular, because we would need to know the phylogeny first, and being able to infer that was the whole point of polarising the character.

Ingroup comparison. The state that is more frequent in the study group is ancestral. I see no reason to assume that this is always true, as there can be shifts in diversification rates.

Finally, outgroup comparison. The state that is found in the closest relative(s) of the study group is ancestral in the study group. It is perhaps not totally correct to call this circular, but it has something of turtles all the way down: to find out what the closest relative of your study group is you need to polarise the larger group around it, and then you have the same problem. Still this is the most broadly useful of all these approaches.

Polarising a phylogeny and polarising characters are two sides of the same coin. I have written a thorough post on the former before, which regularly seems to be found by quite a few people doing Google searches. I hope it is still useful. One of the ways I mentioned there for giving the stack of turtles something to stand on is clock rooting, and I found it surprising that the present paper did not mention it at all. It was this, however, that our journal club discussion dwelt on for quite some time.

Admittedly said discussion was a bit meandering, but here are a few thoughts:

The big problem with clock rooting is that it will be thrown off if there are strong rate shifts. Imagine that the true phylogram consists of two sister groups, one with very long branches (short-lived organisms) and the other with very short branches (their long-lived relatives). If we apply a molecular clock model to the phylogenetic analysis, e.g. in MrBayes, it will try to root the tree so that the branches all end at about the same level, the present. The obvious way to do it is to root the tree within the long-branch group. Eh voilà, it has rooted incorrectly.

What to do about this?

The first suggestion was to use an outgroup. In my admittedly limited experience that doesn't work so well. It seems that if the rate shift is strong enough the analysis will simply attach the outgroup to the ingroup in the wrong place.

The next idea was to use a very relaxed clock model, in particular the random local clock model available in BEAST (unfortunately not in MrBayes). But then it was called nice in theory but said to make it hard to achieve stationarity of the MCMC run. I cannot say.

Nick Matzke suggested that a better model could be developed. The hope is that this would allow the analysis to figure out what is going on, recognise the rate shift in the right place, and then root correctly. It would have to be seen how that would work, but at the moment something like that does not appear to be available.

Finally, another colleague said that if the clock models don't work then simply don't use them. Well, but what if we need a time-calibrated phylogeny, a chronogram, to do our downstream analyses, as in biogeographic modelling, studies of diversification rates, or divergence time estimates?

I guess the only way I can think of at the moment is to infer a phylogram whose rooting we trust and then turn it into a chronogram while maintaining topology, as with the software r8s. Maybe there are other ways around the rooting issue with clock models, but I am not ware of them.

Sunday, November 13, 2016

Black Mountain plants

Although it was windy and cool we went for a walk on Black Mountain Nature Reserve. It is interesting how completely different its flora is compared against that of the Mount Majura - Mount Ainslie range. For example, Black Mountain has many species of orchids, the other two have only very few. Apparently the difference is to a large degree one of soil chemistry, but past land-use is also said to have differed.

Telstra Tower atop Black Mountain, as seen from near the Australian National Botanic Garden's nursery.

Poranthera microphylla, a tiny plant that is widespread and common in south-eastern Australia, but presumably often overlooked. It was traditionally considered to be a member of the spurge family but apparently now belongs to the Phyllanthaceae.

And here is a representative of the type genus of the Phyllanthaceae, Phyllanthus hirtellus. In this case the plant is larger, a dwarf shrub, but the flowers are still minuscule.

Finally, beautiful Grevillea alpina (Proteaceae), or at least so I hope. There is another rather similar species of the same genus in the area, but it is supposed to have glabrous tepals.