Friday, April 4, 2014

An addendum on Zander's Framework

Back in January and February I wrote a few posts on Richard Zander's A Framework for Post-Phylogenetic Systematics while I was reading the book:

Why we don't consider supraspecific taxa to be ancestral

No arguments from authority please, even if it is Charles Darwin

Two possible meanings of the term "pseudoextinction"

Can parsimony analyses be mislead by 'budding' speciation?

Can we trust molecular phylogenetics?

Although I admit that my reading got a bit less attentive towards the end, partly due to the rather repetitive style of the work, I considered myself to be done with the book. Today on the bus, however, I took it with me to deposit it on the bookshelf at work, and while stuck in a traffic jam I read once more over the chapter titled Contributions of Molecular Systematics.

On page 58, Zander argues that one should assign a support value to entire phylogenetic trees (which he strangely insists on calling cladograms although most of them have branch lengths):
Whole cladograms are seldom provided with confidence intervals (here including posterior probabilities) that reflect their perceived chance of being correct. In the literature, however, many cladograms are used in their entirety to model broad conclusions, e.g., many genera grouped into multiple families. These cladograms are commonly viewed as "mostly correct." But what does "mostly correct" mean? The binominal confidence interval (BCI) is here advanced to provide a measure of confidence in whole cladograms that are used for broad conclusions. It provides the proportion of nodes (or internodes) with Bayesian support measures that one can expect to be correct all at once than total nodes being correct at once, defining "correct" as joint probability of at least 0.99.
Maybe it is a language issue because I am not a native speaker of English, but my understanding of what the terms "confidence interval" and "to model" mean differs from how they are used here, and the last sentence does not appear to be complete. After mentioning some example numbers, he continues as can be expected:
Therefore, for most cladograms that are published and used for broad conclusions, the confidence in those cladograms, each used as a whole, seldom reaches 0.95, a standard for confidence in statistics.
I have given this issue some thought and I really do not understand why I should care about the overall support for the phylogeny as a whole. A simple thought experiment should get the point across.

Imagine you want to know whether the fantasy genus Daisiella is a natural group in its current circumscription or not. You will have to sample as many species of Daisiella as possible and all relatives that you suspect might potentially be nested within it. You extract DNA, sequence it, align the sequences, and conduct a phylogenetic analysis resulting in a tree. You will either find a clade that contains all species of Daisiella and nothing else, which would be evidence for its monophyly, or you will find that the smallest clade containing all species of the genus also contains samples from other genera, which would be evidence against its monophyly.

There are now generally two ways of looking at the results. The first is to see what bootstrap, decay index or Bayesian posterior probability (PP) value the most relevant branches have. If, for example, there is a clade containing some but not all of Daisiella plus a few non-Daisiella species, and it has a PP of 0.97, that is good support for the non-monophyly of the genus. The second way is to do a Kishino-Hasegawa test or a Templeton test: you force Daisiella into monophyly in a constrained phylogenetic analysis, and then you compare the resulting tree on which the genus is monophyletic against the best tree from unconstrained analysis to see if the former is significantly less likely or less parsimonious than the latter.

Either way, the point is this: Once you have shown that Daisiella as a whole is strongly supported to be monophyletic, all relationships within the genus are entirely irrelevant. For all we care, the relationships between its species could be all unresolved, with no support value above 0.5, it would not matter in the slightest for the conclusion we want to draw for the circumscription of the genus. While it may become relevant if we want to do an infrageneric classification, the question of whether Daisiella vulgaris is more closely related to D. annua than to D. foetida or vice versa has no bearing whatsoever on whether all three of them are more closely related to each other than to anything else.

So why would anybody care about the support value for the phylogeny as a whole? I am afraid we can suspect the answer in this context: Zander's idea is to use some poorly supported clades in a phylogeny - and in a large tree you will always have some - to cast doubt by association on the well supported ones. If a phylogeny shows a genus he wants to recognize as clearly non-monophyletic, he can wave at a few irrelevant but poorly supported branches and thus discard all of it.

Which is, by the way, unnecessary even from his own perspective because he argues throughout the book that molecular data should be ignored anyway. The spirit of his approach becomes clear a bit further down the same page:
In a molecular cladogram of 64 nodes (La Farge et al. (2002) [sic] used to separate groups of genera of mosses belonging to different families, the average Bayesian posterior probability was 0.76 (multifurcations were ignored, unsupported nodes were assigned 0.33 probability); the BCI for the whole cladogram (that minimum percentage of nodes we can expect to be correct) is 40/63, or 0.64. This may seem a low value, yet those clusters that match the groupings of classical taxonomy gain corroboration for those groupings. When classical taxonomy conflicts, on the other hand, then the cladogram cannot support broad conclusions that involve those conflicts.
That just about sums it up. If support from molecular analyses is low, it doesn't matter as long as Zander really really wants to recognize the genus; and if support for a clade is high, it doesn't matter as long as Zander really really doesn't like that clade. In other words, his book chapter Contributions from Molecular Systematics could have been much shorter without losing any substance if he had only written "there are none that matter for classification".

1 comment:

  1. Hey Alexander!
    I just got around to reading your review of this book in the ASBS news letter, and subsequently stumbled over your blog while searching for more about Zander's philosophies and motivations (an activity that is akin to attending a car race in the hopes of witnessing a wreck). Kudos for slogging through it all and saving the rest of us the trouble, although I still can't quite fathom what his underlying motivation is. Superficially I assumed that it was an attempt to undermine non god-mediated evolution, but from what you say here it really is just a very elaborate attempt to support 'the authority of expert taxonomists' (self deification?!). Anyhow, thanks again!