As illustrated with the example of many species of grasses versus fewer species of oaks, lilies, grasses and ferns, the background is that PD is supposed to provide a metric for a form of diversity that we want to conserve. Now many of us would say that this form of diversity is evolutionary distinctness and be happy; we intuitively consider isolated lineages to be worthy of special protection. However, PD is often advertised as a proxy of what Kelly et al. call "feature diversity" (subsequently FD). That is, what we want to conserve in an area is not phylogenetic diversity per se but instead maximum diversity of some (morphological? ecological? genetic?) features. However, it is assumed that the more distant two species are in their evolutionary relationships, the more different they are likely to be in any given feature, and that is why we assume that high PD means high diversity of conservation relevant features.
Kelly et al. set out to test this assumption. Maybe one of their thoughts was that it may not actually be true because of convergence. One could argue that it doesn't make a lot of difference whether we protect a grass in the Poaceae family as long as there is a very similar looking grass-like Cyperaceae around. They have converged on the same ecological niche, so same thing really, right? But that is already the interpretation, perhaps it behoves me to stay with the methodology for the moment.
What they did was to download numerous data matrices from the TreeBase repository, mostly morphological or restriction site data. They then went and inferred phylogenies directly from these matrices. Many people I have spoken to frown at that because they assume that molecular phylogenies would be more reliable, but Kelly et al.'s justification is actually very clever. They write that these are the conservation relevant characters*, that a phylogeny based on them will by definition be the one with the greatest possible correlation between PD and FD because that is what the phylogeny inference maximises, and that consequently an alternative phylogeny could only ever have an equally good or worse correlation between PD and FD. That at least is actually indisputable.
Now that they had the phylogenies and the characters, they used Spearman rank correlations to examine the correlation between PD and FD, but curiously only between pairs of species. This is odd because in reality we are usually dealing with larger subsets of the phylogeny in any given cell. But my hunch is that this would not really change the results, so onwards. For comparison with the empirical datasets from TreeBase, they also produced artificial datasets with zero to 80% homoplasy, that is noisy or contradictory data, to see how the correlation worked in those.
So, what are the results? They found that
- the correlation was superb in a zero homoplasy data set but got weaker the more homoplasy a data set contained;
- even low amounts of homoplasy lead to the correlation breaking down after relatively short phylogenetic distances;
- even in very noisy data sets, there was some correlation across short phylogenetic distances;
- in most real life datasets, the correlation broke down after ca. 40% of the maximum possible phylogenetic distance.
That sounds like a damning conclusion, although they cushion it by saying that "conserving as much evolutionary history as possible is an important goal" regardless. And that is also what I think: I question the entire premise of their study. Yes, PD has been sold as a proxy of other conservation relevant characters - whatever that means. But to me, PD itself is a conservation relevant character, full stop. We should conserve as much of the tree of life as possible even if there is convergence and even if some unrelated species could substitute for each other ecologically.
Still, that leaves the question whether PD is flawed as a tool to infer feature diversity. Is it?
Intuitively - and I know that is not a very scientific way of looking at it, but I will get there - intuitively, it seems just blatantly obvious that phylogenetic distance is correlated with feature dissimilarity. Just pick any species at random, go down through evolutionary time, and compare it successively against other species having diverged from its lineage. Take us humans: apes, monkeys, rodents, marsupials, Monotremes, lizard-like mammals, lizards, Amphibia, bony fish, cartilaginous fish, jaw-less fish, worms, ... I guess we get the picture. There is an obvious correlation between the distance in time from our common ancestor and how dissimilar we are from these other lineages. The same is true in other groups of animals and in plants (perhaps over short distances not for leaf shape and overall habit, but surely for secondary chemistry and reproductive characters).
This leaves us with a potential puzzle. Why did the correlation break down after ca. 40% of maximum possible distance in the empirical datasets from TreeBase when we see what appears like a clear correlation through deepest time even in our own example? In a way it is even stranger, because while they had somewhat less correlation for higher level phylogenies than for lower level ones it was still in the same ballpark. That means that if you use a phylogeny of one genus of plants, you would not see correlation between PD and FD when including the most distantly related members of the genus. But you would see it if you used a phylogeny of the entire plant family it is a member of, and if the whole genus from before covered only a depth of 30% of the maximum distance in the new tree. How can the correlation be there for the same group using one phylogeny and gone using another?
Well, probably because you would use less variable characters or sequence markers to infer the higher level phylogeny. But then that means that there is no problem with the principle of PD as such, merely that you have to zoom out of the tree of life a bit, and it works again. The real problem appears to be simply homoplasy in real life datasets. And that, by the way, is also what I meant above with the main point of the paper being surprisingly banal: If there is homoplasy in the dataset then PD doesn't work as well as it could.
Not only had Dan Faith, the inventor of PD, apparently already drawn attention to that issue himself in 1992 ("cladograms based on a small number of characters, or on characters that exhibit large amounts of homoplasy (convergences and reversals in the derivation of features), are probably less reliable"), but we would not have needed an empirical study to gain that insight even if he had not because it is just plainly obvious. Noise weakens signal. Who would have guessed?
But from this point on opinions in our journal club diverged most sharply. On the one side, there were those who argued that homoplasy is a hard fact of life, and because there are no datasets without it PD is problematic. On the other side, and that is the one that I lean to, were those who pointed out that homoplasy is not a hard fact of life but instead an indication of human error. I wrote above that homoplasy is noisy or contradictory data. How does that happen?
When we score characters for use in reconstructing evolutionary relationships, we want to score them so that they accurately represent the real life behaviour of those characters. Meaning that if a character state has arisen twice independently, we should score that really as two different states; but in reality we may not always know that in advance. This is crucial to understand. I was thinking about what example I could come up with but luckily Greg Mayer has recently written an awesome post on flying vertebrates on Jerry Coyne's website. Here is the crucial part:
Powered flight is thus an excellent example of convergent evolution - the origin of similar structures as adaptations to similar conditions of existence. The wings, because they evolved independently, are said to be analogous (i.e. not derived from a common ancestor possessing wings), as is evident from the different nature of the air foil, and the different modifications of the bones involved in the wings of the three groups - the similarities are superficial and functional. It also nicely shows the hierarchical nature of homology. The front limbs of bats, birds, and pterosaurs are homologous as limbs (i.e. derived from a common ancestor possessing front limbs), but not as wings. The common structures (humerus, radius, ulna, etc.) are homologous at the level of tetrapods, but the modifications of these structures as wings are separate evolutionary events.Again, this is crucial to understand. If you want to build a morphology based phylogeny of the vertebrates and you go ahead and score the character "front limbs" as the state "wings" for pterosaurs, bats and birds, then that character will show up as homoplasious on your tree. But it is only so because you have made a scoring mistake and scored as homologous what really isn't; there should have been three different states for the three different, independently evolved wings. Perhaps that is not as easy to see when we are talking about molecular data but it is really the exact same issue: an adenosine being replaced with a cytosine in two lineages in parallel should ideally not be scored as the same character state. In reality we have little choice because we don't know, but that is not the fault of the PD metric.
Just like there is no paraphyly in nature, only a phylogenetic structure in which a taxonomist erroneously circumscribes taxa to be paraphyletic, there is also no homoplasy in nature but merely character states that look the same to us because we don't know a priori that they arose independently.
In summary, PD works. The problem is our ignorance of how to correctly score the characters.
*) Okay, if you wonder why restriction sites are, in their words, "conservation relevant", you are not alone.
Faith D, 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation 61: 1-10.
Kelly S, Grenyer R, Scotland RW, 2014. Phylogenetic trees do not reliably predict feature diversity. Diversity and Distributions.