Monday, December 15, 2014

'Evolutionary' classifications still do not have any information content (that special issue on paraphyly)

(The following is the fourth part of a series of posts on an Annals of the Missouri Botanical Garden special issue on “Evolutionary Systematics and Paraphyly”. All posts in this series are tagged with “that special issue”.)

(Updated 29 December 2014 to increase clarity and to make the style a bit less strident.)

Next we come to the contribution written by Elvira Hörandl (2014). I could say that the paper is a strange thing, but then again most of them are seeing as how they generally do not fall into the usual categories of publications in my area, either original research or review articles. In the present case, the paper could perhaps most accurately be described as a review article, but one looking back not, as usual, across a rich and productive field that the author now attempts to summarise, but looking back instead onto a single previous publication: a classification of the buttercup genus Ranunculus also published by herself (Hörandl & Emadzade, 2012).

This means that except for lacking the methods section and extensive supplementary material this paper has pretty much the same content as that earlier publication: it describes the same classification and extols it as superior to one that would accept only monophyletic taxa. It contains much rapid-fire criticism of phylogenetic systematics but mostly revolves around two central claims, and it is those two which I will focus on in this post.

Hörandl's main argument in the paraphyly discussion, and the first core claim of this paper, is that an 'evolutionary' classification, one that allows paraphyletic taxa, has higher information content than phylogenetic systems. Before we examine this claim, it behoves us to stop and consider what is meant with 'information content', because that may not immediately be clear.

In a later piece of the same special issue, Tod Stuessy will quite openly write that when he is talking about paraphyletic groups being informative he means predictiveness of morphological similarity. The problem here is that to be maximally informative and predictive of that, the classification would have to be an entirely phenetic one, and that is not what he and Hörandl want; they want a mixture of phenetics and descent.

The present paper does not express itself quite that clearly, but it does contain the following: “Information is needed on karyology, phenotype, character evolution, and ecology to reflect the diversity of the genus”. That is quite something to chew on. Clearly all that information is useful, but how can it all be expressed in a single classification of nested taxa? Would one not need, and even prefer, at least four different classifications to capture it? If an 'evolutionary' systematist refers to, say, “section Ranunculastrum”, will I therefore know that those plants all have the same base chromosome number? What if she has defined that section based on ecological similarity instead?

At any rate, we can probably accurately summarise the paper's first claim as that of 'evolutionary' classifications providing information on both descent and phenotype, whereas phylogenetic and phenetic classifications provide information on only one of these each. That is what is implied by figure 1, for example. The question is if any classification, 'evolutionary' or not, can actually do that.

The second claim is perhaps even stranger: In various places the paper asserts that 'evolutionary' classification “restricts the number of equally valid options for classification”. It is strange because having only one clear, objective and universal way of classifying is usually considered to be one of the major advantages of cladism, of phylogenetic classification. By allowing only monophyletic groups it removes the subjective and authoritative element from systematics, making it a science instead of a shouting match. The 'evolutionary' approach, on the other hand, recognises monophyletic and paraphyletic taxa, so it should be obvious that the available options have already been multiplied beyond those of cladism, and that is before we examine how many different ways there are to circumscribe a paraphyletic group.

Luckily, the author has provided a very concise summary of her view in figure 1 of the paper, and she uses that figure to argue for both of her claims. It shows a very simple phylogeny of only four species with two character traits. In fact it shows the same phylogeny several times, and next to the individual trees are what she considers to be the available classifications under the various approaches. If she can make her case convincing then it will be easiest in such a clear, simple scenario; and if her argumentation doesn't even make sense in this case, then it is unlikely to be more convincing in messier real life scenarios.

Information content

The tree looks like the one below except I added colours:

To examine information content, I will pick just one possible classification as presented in the paper for each of the three approaches discussed. One option for cladism is (C,D),A,B. If you refer to a group (C,D), and I know that we are using a phylogenetic classification, what does that tell me? It tells me that C and D are each others closest relatives. So obviously the classification has information content. Yes, it is limited to relatedness, but it is information.

One option for phenetics, as presented in the paper, is (C,D). If you refer to a group (C,D), and I know that we are using a symbol shape-based phenetic classification, what does that tell me? It tells me that C and D have the same state for the character symbol shape. So obviously the classification has information content. Yes, it is limited to similarity, but it is information.

One option for 'evolutionary' classification, as presented in the paper, is (C,D),(A,B). If you refer to a group (C,D), and I know that we are using an 'evolutionary' classification, what does that tell me? Well, here is the problem: it tells me nothing. It could be that C and D are each others closest relatives. Or it could be that they are paraphyletic group united by some shared plesiomorphic character state.

But I don't know.

In contrast to the other two approaches, the group (C,D) is meaningless without tracking back to the original reasoning the taxonomist based the circumscription on, and that means that the classification itself, as a set of nested named taxa, is completely useless. Yes, both descent and phenotype went into making the groups, but from the end user perspective the information content of the groupings in the classification is nil. Classifications using only one criterion are clear and useful, but those using a plurality of criteria are muddled and uninformative.

Limiting options for classification

Now to examine whether 'evolutionary' classification is at least the best at limiting options for classification. The supposed options as listed in the paper are these:
  • 3 options for Cladism: (C,D),A,B or (B,C,D),A or (A,B,C,D)
  • 4 options for Phenetics: (A,B) or (C,D) or (A,C) or (B,D)
  • 2 options for 'Evolutionary': (C,D),(A,B) or (C,D),A,B
So 'evolutionary' classification has less options than cladism and phenetics, and is consequently more objective and scientific, amirite? Well, if the available options had been correctly represented, then yes. But none of the three approaches are represented correctly.

An important part of the confusion here is the conflation of grouping and ranking. Cladism only has multiple options as far as ranking is concerned, but they are actually four instead of three. If we want to recognise monophyletic genera we could make one (A,B,C,D) or two A,(B,C,D) or three A,B,(C,D) or four A,B,C,D, although the first and the last option are of course uninformative at this level. But ranking is always arbitrary, under any classification, so this is not what the comparison is about. What really counts is how many grouping options you have, and here cladism has only a single one. Because taxa must be monophyletic, there is only the sole grouping solution A,(B,(C,D)). That is it.

Next, phenetics. Here the only misrepresentation is that the present paper made four options out of two. A phenetic classification would have to pick one of the two characters, either symbol shape or fill colour, and then group by that one trait. Consequently, the available options are either (A,B),(C,D) or (A,C),(B,D). Both are equally objective, it just depends on what we want our classification to be informative of. And in fact it is unfair to speak of even two options because we need to treat the two classifications as separate alternatives on the same level as cladism: a phenetic classification based on symbol shape does not have the option of grouping by fill colour.

(At this point I should point out that this contribution was supposedly peer-reviewed; the last item of the special issue is called Acknowledgment of Reviewers. It is clear that they would not send the manuscripts to hardcore cladists, because then they would all be as roundly rejected as creationist submissions to the journal Evolution, but even a reviewer sympathetic to the conclusions of the paper should have remarked that (A,B) and (C,D) are not different options of phenetic classification but make one option when taken together!)

In the case of 'evolutionary', the author's own approach to classification, the paper magically makes several options disappear. If it allows for the option (A,B,C,D) in the cladist case then we have to add it to 'evolutionary' also. Further, one could go full splitter and classify as A,B,C,D. The paper brushes that option off with the remark that it is “rejected” by monophyly of (C,D), but that is intellectually inconsistent. If monophyly were of any concern then the author would be a cladist, and the fact that two species could be grouped into one genus is not yet an argument that it is there where we have to use the rank of genus. In the present case, I will not count these two options because they are uninformative and don't really group the four terminals differentially. But to be consistent one would have had to mention at least one of them because that is how the author inflated her cladist options to three.

More to the point, the author has forgotten to include the other cladist option(s), be they counted as one or several. Because 'evolutionary' classification can group either by descent or by similarity, the entirely descent based cladist solution is also available under that approach. All that is needed is for the paraphylist to consider the phenotypic characters not important enough (whatever that means, but such is the subjectivity of their method). For the sake of consistency, I will add only one option to the tally of the 'evolutionary' approach.

Together with the two admitted already by the paper we now have three options for 'evolutionary' classification, but there is at least one more: If one assumes that the ancestral state of (B,C,D) was "red fill", with C representing a reversal to the "blue fill" state characteristic of the ancestor of all species, then (B,D),A,C would be a valid grouping under her approach.

Now an 'evolutionary' systematist would probably point towards yet another tree in figure 1 where the author has provided the true ancestral states on her hypothetical phylogeny, and it shows that the ancestral state of (B,C,D) was blue fill, making the reds polyphyletic and (B,D) thus an unacceptable grouping under her approach. But that is cheating. In reality, we do not know the ancestral states, we can only infer them. Granted, the same is true for the topology of the tree itself, which was one of the arguments of pheneticists for their school of classification. However, paraphylists generally agree with cladists that we should use our tentative phylogenetic results for classification because we have reason to trust them to some degree.

But there is often much less reason to trust our reconstruction of an individual ancestral character, at least in the absence of fossils, and especially if the character is as homoplasious as it is in this scenario, where at least two changes must have happened either way. Given just what we would usually have in real life, that is only the character states at the tips of the tree, the paraphyly or polyphyly of either the red or the blue fills depends simply on the choice of tracing algorithm, e.g. ACCTRAN or DELTRAN.

This is not the first time that I see paraphylists arguing from an unrealistically perfect knowledge of evolutionary history. In his Framework, Richard Zander likewise “knows” the “true” phylogeny in the absence of any evidence that either he or a formal (“mechanistic”) analysis could use to infer it, and then faults the analysis for supposedly getting it wrong.

As for counting options for classification, it gets worse from here because in any realistic situation the four terminals A, B, C and D would have more character traits than merely symbol shape and fill colour. They might have dozens of traits, and depending on which we find 'important' they might support any and all non-monophyletic groups under the 'evolutionary' approach, for example C,(A,B,D). And who is to say which character is objectively important unless, well, unless one adopts the cladist approach and says that those characters are important that appear to be informative of relatedness, of common descent?

Now we are in a position to tally the true grouping options up:
  • 1 option for Cladism: A,(B,(C,D))
  • 1 option for fill-based Phenetics: (A,C),(B,D)
  • 1 option for shape-based Phenetics: (A,B),(C,D)
  • 3 or 4 or even more options for 'Evolutionary': A,B,(C,D) or A,(B,(C,D)) or (C,D),(A,B) or A,C,(B,D), with choice between the latter two depending on ancestral character reconstruction. Additional options arise for each additional character we examine as long as it shows a novel distribution of states, and depending merely on which character the taxonomist subjectively considers to be the most important.
In summary, 'evolutionary' classifications do not contain any information whatsoever, and 'evolutionary' systematics is the only approach that does not limit options for classification.

One could now go into the remainder of this paper. One could deal, once more, with the idea that "branch lengths on the phylogram", in reality artefacts of our ignorance, are a justification for paraphyletic taxa. One could marvel at the free admission that accepting paraphyletic taxa comes "at the expense of logical consistency". One could become exasperated at the paraphylists' constant attempts at redefining widely accepted terms (in this case "polyphyly").

But I think the most important points have been made. At least as far as I am concerned, the two central arguments for 'evolutionary' systematics presented in the paper fail by its author's own standards, i.e. higher information content and restriction of the options for classification.


Hörandl E, 2014. Nothing in taxonomy makes sense except in the light of evolution: Examples from the classification of Ranunculus. Annals of the Missouri Botanical Garden 100: 14–31.
Hörandl E, Emadzade K, 2012. Evolutionary classification: A case study on the diverse plant genus Ranunculus L. (Ranunculaceae). Perspectives in Plant Ecology and Evolutionary Systematics 14: 310–324.

No comments:

Post a Comment