Friday, January 15, 2016

Aubert's analysis of phylogenetic terminology, part 6: empirical falsification

Continuing the discussion of this paper from here, here, here, here, and here, and working through the main claims of the paper as I see them:
  • The various definitions provided in the paper are in some way better than the ones that are currently accepted.
  • There is no relevant difference between the systematics-relevant relationships and structures existing at any level of the diversity of life. (E.g. mother > daughter is completely equivalent to bony fish > land animals - they can all be drawn as diamonds and arrows, right?)
  • A strictly phylogenetic classification is formally impossible.
  • Cladism is part of structuralism and therefore characterised by "anti-realism and a metaphysical way of thinking".
  • Cladism is built on biologically unrealistic assumptions that have been empirically falsified.
  • There exists an objective approach to delimiting paraphyletic groups.
  • It would be preferable to have two parallel classifications, one of clades and one that includes taxa that are allowed to be non-monophyletic.

Have the assumptions underlying cladism been empirically falsified?

The abstract of the present paper claimed that the "biologically unrealistic assumptions on which cladism is based ... have been empirically falsified". I was curious what this would be about. As far as I can tell, the only subsection of the paper dealing with the claim is 10.4., Unformitarianism and Punctualism.

Punctuated Equilibrium

There are two separate arguments, the first of which is based on a very selective perception of phylogenetic systematics. A lot of text is spent saying that cladists assumed phenetic similarity to be strongly correlated with relatedness because they assumed that evolutionary change was uniform along all lineages; thus they concluded that a classification by relatedness would maximise phenetic similarity within and dissimilarity between groups. Now Punctuated Equilibrium demonstrates that "the pace of evolution can greatly vary", consequently cladism doesn't maximise phenetic information content, consequently cladism is pining for the fjords.

It is not at all clear to me how varying rates of change mean that a phylogenetic classification isn't predictive of morphological traits. Unless we are talking about regular violations of Dollo's Law, okay, but I would not find that plausible.

But set that aside for the moment. To the best of my understanding, the cladist project was never primarily about maximising phenetic information content anyway, even if some cladists might argue that a phylogenetic classification fortuitously also happens to do that.

There are considerably more important arguments for phylogenetic systematics. For example:
  • The understanding that a clade is what an ancestral species has turned into, and that consequently a clade is just as natural as a species while part of a clade isn't.
  • The realisation that meaningful groups should be defined based on shared traits and not based on the traits that non-members of the group have.
  • Logical consistency with the tree of life.
  • Not wanting to supply the end users of a classification with groups that are evolutionarily and biogeographically misleading.
  • The insight that it merely thinks to its logical conclusion what systematists have always done, i.e. classifying specimens by their relatedness even if they are as morphologically divergent as a butterfly and its caterpillar stage. In other words, intellectual consistency.
  • Perhaps that nobody has ever come up with an alternative testable, objective and universal criterion for the delimitation of taxa. Etc.

As an aside, if one were to examine the phenetic information content of a phylogenetic system, one would first have to contemplate very carefully what a cladist of the 1970ies would have meant with that as opposed, for example, to a pheneticist from 1960 or a molecular phylogeneticist from 2016. For example, one would have to discuss similarity arising from a shared plesiomorphy, which an 'evolutionary' systematist would count but which a cladist would exclude from consideration as misleading. One would also have to discuss homoplasy, which a pheneticist might perhaps just throw into the mix but which a cladist would have to consider a character state coding error. (E.g. absence of legs in caecilians and in snakes is not homologous.) Finally, a taxonomist of the 1970ies would, when saying something like "overall similarity", be thinking in terms of morphology and anatomy, but not in terms of DNA sequence data as seen in the light of coalescent theory.

Taking into account what a cladist a generation ago would have actually meant with phenetic information content, it is easy to see how the argument could indeed be made that a phylogenetic system represents it best, and I do not see how varying rates of evolution or anything evolutionary biologists have learned since then would change that. Although I guess a pheneticist might consider the treatment of homoplasies to be circular, but that just shows that they would have a different understanding of similarity.

But again, maximising phenetic information content is unlikely to be among the arguments for phylogenetic systematics that any significant number of contemporary systematists would come up with if queried, and it is certainly not an assumption on which the current mainstream practice of classification is based. So it wouldn't even matter if it was true that phylogenetic classifications are only, say, the third best option in terms of phenetic information content. And that was that.

However, there may also be a misunderstanding of Punctuated Equilibrium behind the first argument, because that theory is here somehow used as an explanation for the superficial similarity of crocodiles to squamates as opposed to the more closely related birds. A proper discussion of P.E. would be leading too far now, but it just doesn't have anything to do with differences or similarity between such distant groups. The claim underlying Punctuated Equilibrium is that species are pretty much in morphological stasis most of the time and undergo change only at speciation events, which then appears "fast" (on a geological scale).

To the best of my knowledge it doesn't say anything about generation-to-generation changes being anything but minuscule, gradual and slow. And it doesn't say anything about massive acceleration along some branches of the tree of life either, rather it envisions something like a constant beat along all branches. Even if assumed to be true, P.E. just does not postulate evolutionary patterns that would be a problem for the morphological information content of clades in the first place.

Long Branch Attraction

The second argument implies that cladism (classification by relatedness) is empirically wrong because cladistics (parsimony analysis) suffers from long-branch attraction if data are used that are too homoplasious. Ignore for the moment that other phylogenetic criteria apart from parsimony also suffer the same problem. Doesn't matter for present purposes.

What matters is that this is a conflation of two totally different issues that only share a similar name. And as discussed in the last post, the paper is arguing against the monophyly requirement as such and thus also against all the phylogenetic systematists today who consider parsimony analysis to be outdated and use Likelihood or Bayesian phylogenetics instead. Choice of tree inference method is irrelevant for discussing the merits of rejecting non-monophyletic groups.


  1. You skipped two empirical rebuttals of cladism: 1) synchronous species can be ancestral to one another, so cladogenesis alone doesn't accurately portrays speciation, so cladification does not reflect a natural process (only a pattern in the data, at best). 2) diachronous ancestral species can be recognized, so classifying them as if they were sisters to their descendants does not make sense.

    You comment only the third empirical falsification, maybe because you already commented the others in your previous posts, but you should have made clear that it is not the only claim in the original paper.

    The third argument is simply as it follows: cladism claims that cladist classification is natural, i.e. reflects affinity. Since evolution is far from being uniformitarian (at any level), two closely species/families/phyla can be a lot more distinct than two distantly related ones. So claDification does not reflects affinity, it is therefore an artificial system. In order to claim that cladification is still natural, you have to hijack the meaning of "affinity" and "naturalness". As I explained, relatedness (in its cladistic meaning) is only a part of affinity.

    You misunderstood my statement about LBA. This was an example showing that evolution is not uniformitarian. If it were, LBA would not be a problem and parsimony would reflect both branching pattern and naturalness (hence the historical/philosophical link between parsimony and claim of naturalness). Neither is true. I didn't say that LBA was not a problem for likelihood or bayesian procedures.

    1. Maybe I misunderstand, but your two additional arguments appear to be the same, and they are both mere assertions based on assuming what is pretty much a typological species concept. There is nothing empirical about this.

      A butterfly and a caterpillar are also extremely distinct, yet strangely even 'evolutionary' systematists classify them together if they are closely related.

      As far as I understand, LBA is a result of signal saturation in a very limited set of traditional molecular markers or of what amounts to a scoring error in morphological datasets. It has nothing whatsoever to do with uniformitarianism versus punctuationalism and would be expected under either assumption. Also, it can often be solved by something as simple as better outgroup sampling.

  2. Yes, they are not completely independant arguments.

    Biological species concept, composite species concept, evolutionary species concept, etc. are not typological. Only the hennigian concept (or similar internodal ones) are contradictory with our knowledge of speciation mechanism and palaeontological record, so yes these are empirical falsifications.

    Caterpillar and butterfly is the same individual, so your argument don't apply. But even with a true example of polytypic or polythetic species, your argument still don't apply since synthetists don't consider there is only one criterion that always works to define a species, their approach is more pluralistic and empirical. Mayr already pointed out that the caterpillar/butterfly example is either misunderstanding or strawmanning (through I can't remember the ref). Furthermore, you should be aware that a "character" can applied to an individual as well as to a population, therefore variation in a phenotypic character is itself a character (e.g. allele frequencies, polymorphic traits frequencies, etc.). Trying to portray evolutionary systematists as typological thinkers is either gross misunderstanding or gratuitous provocation.

    LBA mechanisms are complex but they are not restricted to your examples. For instance, parasitism is a well known example of accelerated evolution leading to the loss of many characters and the gain of many new ones. The parasitic lineage is thus attracted toward the stem of the tree with a parsimony analysis. Several independant parasitic branches would also be attracted to each other. Improving taxon sampling and characters analysis may not always solve the problem. This scenario would never happen, or would be easy to solve, if evolution were uniformitarian. Parasitic specialization would be indeed as slow as the divergence among non-parasitic taxa.

    1. Sometimes I think we are living in different worlds. Of course the composite SC is a form of typological SC because it considers any morphological character difference to make a new species, all other biological or genetic considerations notwithstanding. And the BSC combined with an asynchronous view would mean that all of life is one species. It is among the many concepts that don't work across time.

      No, this is just a reductio ad absurdum to show that systematics, properly understood, is not about superficial similarity but about relatedness. 'Evolutionary' Systematics is merely inconsistent about that.

      AFAIK it would only be attracted to the stem if there is homoplasy, as in your second example. A long branch alone shouldn't matter as long as all changes along it are unique. And in your second example we are talking scoring errors because the loss of those characters isn't homologous.

    2. "Sometimes I think we are living in different worlds." I agree. This is the concrete effect of thinking through distinct paradigms.

      "BSC combined with an asynchronous view would mean that all of life is one species." No. Interbreeding is not necessarily transitive. This means that if A can reproduce with B, and B with C, then A and C may be unable to reproduce with each other. So you must cut somewhere between A and C by using a second criterion.

      Species is a complex phenomenon that cannot be represented by a simple definition, hence a pluralistic approach. You are confusing "reductio ad absurdum" and strawmanning. I disagree that systematics is only about relatedness. Moreover, homologuous characters cannot be called "superficial similarity". Evolutionary systematists aren't pheneticists.

      Homoplasy can result from reversal or convergence. In the first case, the loss of many synapomorphies can mimic a basal branch. In the second case, I would like to know how you distinguish non-homologuous losses, i.e. nothingness vs nothingness, before drawing the tree. Parsimony is known to be statistically inconsistent, i.e. more and more data can lead to a wrong answer.

    3. It you use a second criterion, then you are not: using the BSC. But let's cut this short before we go in circles again. Insisting on a different SC is not empirical refutation.

      Next I started to write about convergence at a superficial level versus differences in the underlying structures and the genome, but then realised that this is yet another red herring. You do not get to reject phylogenetic systematics by arguing against cladistics. Your paper was not about whether to infer the tree of life using parsimony or likelihood analysis, it was about how to classify given the best tree we can come up with. They can use BEAST or RAxML, who cares, but afterwards mainstream systematist circumscribe taxa to be clades. You are still mixing up two completely different issues.

    4. I didn't mix these issues, you did. You misinterpreted what I wrote about LBA. My aim was clearly not to criticize cladistics.

      I still disagree about species concept: demonstrating inconsistency of internodal SC on empirical grounds is indeed empirical refutation. You cannot arbitrarily choose what a species is, at least if you admit that species are real mind-independent entities (i.e. "species realism").

    5. Of course we can't do it arbitrarily, but your choice is entirely based on phenetic clustering.