PhyloBotanist: Patrocladistics 2: What if we include ancestors?

Thursday, February 11, 2016

Patrocladistics 2: What if we include ancestors?

Last post we looked at what patrocladistics is and how it works. The example case was not a real group of organisms but contrived, but it was perhaps typical in that the dataset only included extant species.

In this post I want to explore what happens to the results of a patrocladistic analysis if we add all the ancestors. There are two reasons why this is of interest to me:

First, I believe that like many other ideas for the objective delimitation of paraphyletic taxa patrocladistics relies on not having intermediate ancestors in the dataset. Not all, perhaps, but many such approaches identify a long branch or gap in variation. The problem is that such a long branch or gap is merely an illusion based on the patchiness of the fossil record. In reality, evolution is gradual. And if somebody claims to have a good approach to classification it could be argued that it should be able to deal with the discovery of intermediate fossils.

Second, proponents of paraphyletic taxa often criticise phylogenetic systematists for supposedly ignoring ancestors, or for supposedly defining them out of existence. If patrocladistics does, as I suspected, rely on the absence of ancestors, that might at least be seen as a bit ironic.

So back to our artificial phylogeny. It contains two outgroup species and five ingroup species, two of the latter on a very long branch:

Using the patrocladistic approach with single-linkage clustering, I was able to produce a dendrogram that shows the two divergent species outside of the cluster of the other three ingroup species. A paraphyletic group on the phylogeny comes out as a cluster in the dendrogram, supposedly providing a justification for official taxonomic recognition:

Now assume an 'evolutionary' classification has become widely accepted that treats the species aberrans and anomalica in one genus and primitiva, communis and vulgaris in another. Imagine further that we are as lucky as palaeontologists have been with the transition between non-avian dinosaurs and birds. Every year sees another intermediate fossil published, and after a few years our phylogram looks like this:

Every letter is an intermediate ancestor. Note that I do not consider this to be a very realistic scenario. In most real life cases we would only have some of these letters. I am just saying that a general principle for classification should be able to deal with intermediate ancestors, especially if its proponents claim that supposedly not being able to do so is a major failure of the mainstream approach. And I am personally curious to see what happens if we repeat the patrocladistic analysis. First the new patristic distances:

Now the cladistic distances... aha. First interesting observation: If we have every intermediate 'species' according to the composite species concept, cladistic distances equal patristic distances. Every trait change on the branch has turned into a node. Because the patrocladistic distances are cladistic plus patristic, they are now just 2 x cladistic. Because that multiplication with two comes out in the wash distance-wise, we can as well proceed directly from the cladistic distance matrix to the clustering analysis.

Using again the single-linkage clustering option in R's hclust function, the new dendrogram looks like this:

This was not quite what I expected; it was even worse! No clusters at all. But when we think about it, it is not surprising. Remember that single-linkage clustering always unites the two clusters that have the shortest distance between any two of their elements, so in a sense the shortest distance between their margins or outliers. But in this case, no matter how you cut the cake, there is never any distance larger than one. So everything immediately gets lumped into one flat cluster.

In a way I find this quite fitting. As mentioned above, evolution is gradual; clustering its results into paraphyletic taxa never made sense to me in the first place. And of course if we try to cluster by long branches then taking the historical, real life non-existence of long branches into account will make clustering impossible. One could now simply conclude that patrocladistics does indeed only work in the fortuitous absence of intermediate fossils and leave it at that.

But out of interest I tried a different clustering method provided by hclust, "average". Result:

Now we have a garbled version of the original phylogenetic tree back, except primitiva isn't in the right position and the ancestors are sister to their descendants, something that is anathema to many proponents of paraphyletic taxa.

Does that make sense? I think representatives of all schools of classification might actually agree that it doesn't. But the question is whether a patrocladistic analysis without ancestors makes any more sense. What is the theoretical background and justification? How do we interpret the results from a biological perspective?

21 comments:

UnknownFebruary 12, 2016 at 3:39 AM
I won't spoil again your last questions ;)

Concerning your other points, if I were to examine your UPGMA patrocladogram closer, I would say that you have indeed proven that primitiva-vulgaris-communis-F-H-G are a distinct evolutionary grade. From them arises a second grade made of B-C-D-E, which leads to a third evolutionary grade anomalica-aberrans-A.

In fact, I am surprised that patrocladistics works so well with so few species and so much ancestors included. I didn't expect it, and maybe I should revise my opinion that patrocladistics doesn't work with paleontological data. It seems it does.

You've just proven that single-linkage should be avoided.
ReplyDelete
Replies
dwbapstFebruary 13, 2016 at 8:27 AM
So, this is maybe tangential to your real interest (systems of classification and naming, which, uh, I stay out of), but I'd point out that if you tried tracing actual character evolution (say, morphological), you might find that its very hard to get much phylogenetic resolution when you have sampled ancestors. And, if you have ancestors that can persist through branching events (i.e. budding, which you don't seem to allow for in your example; it all looks like bifurcation to me), that raises even more problems for trying to resolve relationships among units:

http://www.ncbi.nlm.nih.gov/pubmed/23638034
ReplyDelete
Replies

Add comment