Sunday, June 12, 2016

Parsimony in phylogenetics again

Just some short observations:

A few days ago I learned that somebody has found my TNT script for the Templeton test useful and is not only using it but also improving on it. A few days before that I found that my post on using TNT made it into the acknowledgements of a publication. That is really nice to see; my expectation was never that this blog would be home to a lot of real-time discussion, but rather that people can find something useful to them even years after I posted it.


I checked that 'parsimonygate' hash tag again, and found a few interesting (or perhaps revealing) tweets. The first comments on one of the graphs from my surprisingly popular post on the popularity of phylogenetic programs over the years with a laconic "TNT is thin green". Now I have no idea what the tweeter meant with that. His profile clarifies that "re-tweet does not equal endorsement", so any comment could be about anything. But in the context of the parsimonygate hash tag, it could be read as an argumentum ad populum, on the lines of: see, hardly anybody uses parsimony these days, those guys are fringe.

That, however, would make little sense regardless of one's position on the silly parsimony versus model controversy. It would be much harder to figure out how often people use methods than how often they cite programs, but it should be obvious that many of the people citing PAUP, PHYLIP or MEGA have also used the parsimony methods implemented in those programs. TNT is just one of the parsimony programs out there, and it unsurprising that it is not the most popular one, seeing how it uses an idiosyncratic data format and scripting language instead of the more widely used Nexus, Newick and Phylip formats.

The other notable tweets are the series of comments that appeared after the publication of a recent study comparing the performance of Bayesian and parsimony methods on discrete morphological characters. (This is of some interest to myself. My preference when faced with DNA sequence data is to use likelihood, but the results of using the Mk model on morphology generally seem nonsensical to me.) Samples:
Bayesian phylogenetic methods outperform parsimony in estimating phylogeny from discrete morphological data (link to paper)
Time to abandon all parsimony? (link to paper)
Wow, that paper must have really shown parsimony to have trouble! Let's look at the paper then:
Only minor differences are seen in the accuracy of phylogenetic topology reconstruction between the Bayesian implementation of the Mk-model and parsimony methods. Our findings both support and contradict elements of the results of Wright & Hillis [5] in that we can corroborate their observation, that the Mk-model outperforms equal-weights parsimony in accuracy, but the Mk-model achieves this at the expense of precision.

Again, I cannot stress enough that I am a methods pragmatist who regularly uses both parsimony and model-based approaches. I also appreciate that there are indeed phylogeneticists who are irrationally opposed to model-based methods. But are these not examples of rather, shall we say?, selective perception of what turned out to be a trade-off situation with minor difference either way?


  1. On the contrary, I think the interpretation of the various tweeters of O'Reilly et al.'s results is more fitting than the author's. Precision (more resolution) without accuracy isn't a neutral trade-off: its something about a method that we should actively avoid, because it blinds us to an appropriate assessment of the uncertainty we have regarding the data. Who wants a well-resolved tree that is mostly wrong? I'd much rather have a poorly resolved tree that is more accurate.

    1. Okay then, although that does not change the point about "only minor differences". Personally I have yet to do a model-based analysis of morphological data that returns sensible results, be it phylogenetic inference or simple ancestral state reconstruction, so I am not sure how the authors did it.