Tuesday, April 8, 2014

Character optimisation in parsimony phylogenetics

As mentioned in my last post on parsimony analysis, there are different forms of parsimony that are used in the reconstruction of phylogenetic relationships. We could describe them as different ways of counting the necessary number of character changes to explain a given phylogenetic tree.

Wagner parsimony

One of the two most important optimality criteria in parsimony analysis, Wagner parsimony is also simply known as "ordered character states". The idea here is that in multi-state characters, where the possible values of a character may for example be 0, 1 and 2, changes between 0 and 1 or between 1 and 2 are counted as one step but a change from 0 to 2 or from 2 to 0 is counted as two steps. It is assumed that this change must taken place by passing through the intermediate stage 1.

This makes intuitive sense in morphological characters that describe sizes, intensities or amounts. If you have species in your group of interest with distinctively large, medium and small leaves, you would probably assume that in the course of evolution from large to small the lineage would have had to pass through the medium sized leaves state. Similarly, if the fruits of the species in your plant group can variously contain numerous, only two or three, or only one seed, then it is probably fair to assume that in the course of evolution from many to one seed per fruit the lineage would have had to pass through the 2-3 seed stage.

Fitch parsimony

The second of the two most important optimality criteria is then unsurprisingly the opposite. Fitch parsimony is also known as "unordered character states". Here it is assumed that changes can take place with equal ease and probability between all states of a multistate character, so that 0 to 2 is counted as one step just as 1 to 2 or 1 to 0.

This makes sense for example when we are dealing with discrete characters that do not have any obvious order amongst each other, or if we simply cannot be sure whether one exists. Unless we have a clear understanding about the underlying biochemistry, I would assume that the flower colours white, yellow and orange should be coded with Fitch parsimony, and I would assume the same for, say, serrate, smooth or crenate leaf margins.

The remaining criteria are used rather more rarely.

Dollo parsimony

This criterion demands that a given character state can only be gained once, and that all homoplasy on the phylogeny has to be explained only with losses. Dollo's law is cited in Wikipedia as "an organism is unable to return, even partially, to a previous stage already realized in the ranks of its ancestors." In that strict form it is surely poppycock; there is no reason to assume that a yellow-flowered plant that has descended from white-flowered ancestors could not revert to white flowers. The idea of using Dollo parsimony is rather that there are complex structures that we can reasonably assume have evolved only once. If the same evolutionary problem were solved again, it would be solved in a different, non-homologous fashion.

One example illustrating this in the plant kingdom are leaf-like structures. Some plant lineages growing in arid environments have lost their leaves as an adaptation to drought stress, photosynthesising with their stems instead. Cacti are one such group but there are many others. When these lineages then evolve back into less arid habitats, they generally appear unable to reverse the loss of proper leaves and instead reinvent leaves all over again by flattening side branches. These leaf-like, flattened twigs are called cladodes.

Perhaps even easier to understand is the example of extremities: The four legs of the land animals (tetrapods) have evolved precisely once out of fish fins, and it is very unlikely that the number and internal structure of separately evolving legs would be the same if another group of fish colonised the land in the future. On the other hand, these organs have been lost several times independently in various lineages of lizards, in caecilians, and, of course, in snakes.

Camin-Sokal parsimony

This is, in a way, the opposite of Dollo parsimony. Here a character may never be lost, and all homoplasy has to be explained through parallel gains. It is completely unclear to me what this optimisation is good for.

Of course, one can also define one's own criteria. For example, if you are convinced that in a character with four states (0-3) changes between 1, 2 and 3 always have to pass through the state 0, then you can code the character to behave like that in some parsimony analysis software.

It is important to think through how the analysis then actually works, because it is easy to misunderstand. A typical beginner's mistake would be to believe that an analysis under Dollo parsimony returns "the" phylogenetic tree in which each character was only acquired once. The problem is, you can map character changes in a way that this is the case on every imaginable tree. The different forms of parsimony listed above do not, as such, define optimal trees, they merely define how to map characters onto any given tree.

So what the Dollo parsimony analysis does, in reality, is to suggest various possible phylogenies, map the characters onto them in a way that assumes that they have only been acquired once, and then count the number of changes. Another way of putting it is that Dollo parsimony returns the tree on which there is the least number of secondary losses under the assumption of every character being uniquely acquired. The same goes then for all others: Wagner and Fitch return the tree with the least number of overall changes, and Camin-Sokal returns the tree with the smallest number of gains under the assumption of character gains being irreversible.

No comments:

Post a Comment