Puttick et al., 2017. Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data. Proceedings of the Royal Society Biological Series 284, doi 10.1098/rspb.2016.2290Puttick et al. constructed maximally balanced and unbalanced phylogenies, simulated sequence data for them under the HKY + G model of nucleotide substitution, turned the data matrices into binary and presumably unordered multistate integer characters, and then used equal weights parsimony, implied weights parsimony, and Bayesian and likelihood analyses under the Mk model to try and get the phylogenies back with an eye on accuracy (correctness) and tree resolution. In a second approach, they reanalysed previously published morphological datasets to see what happened to controversial taxon placement under the different approaches.
One of the problems with simulation studies is always that they can come out as kind of circular: if you simulate data under a model it is no surprise that the same model would perform best when trying to infer the input into the simulations. In this case Puttick et al. were admirably circumspect in that not only did they simulate their data under a different model (HKY + G) than that ultimately used in phylogenetic analysis (Mk), but they also repeated the analyses until they had achieved a distribution of homoplasy that mirrored the one found in empirical datasets. This is important because morphology datasets for parsimony analysis are scored to minimise homoplasy, while uncritically simulating matrices may lead to much higher levels of homoplasy, thus putting parsimony at a disadvantage.
Still, it should be observed that the HKY + G model is nonetheless unlikely to have produced data that are a realistic representation of morphological datasets, especially considering that the latter would at a minimum also include multistate characters with ordered states. Also, from a cladist's perspective homoplasy in a morphological dataset is a character scoring error waiting to be corrected in a subsequent analysis. But well, of course using zero homoplasy datasets would also have been unrealistic because real life datasets do have homoplasy in them. (And of course parsimony would "win" all the time if there was zero homoplasy, pretty much by definition.)
Now what are the results? To simplify, Bayesian was best at getting the tree topology right, followed by equal weights parsimony and implied weights parsimony, with likelihood coming in last. Likelihood always produces fully resolved trees, and Bayesian produces the least resolved ones. The authors argue, as Bayesians would, that this is exactly how it should be, as it simply tells us that the data aren't strong enough; the other approaches may give us false confidence. (Although of course parsimony and likelihood analyses can likewise involve several different ways of quantifying support or confidence.)
In conclusion, Puttick et al. make the following recommendations:
First, Bayesian inference should be the preferred approach.
Second, future morphological datasets should be scored with model-based approaches in mind. This means that the number of characters should be maximised by including homoplasious ones, because that will allow a better estimate of rates. As this is the exact opposite scoring strategy of what parsimony analysis requires this will make it hard to change habits.
What is more, I have to smile at Puttick et al.'s expectations here: they simulated data matrices of 100, 350 and 1,000 characters. Maybe you can get 400 or so for some animals (if the fossils are well enough preserved), but for any plant group I have worked on I would struggle to get 30. And wouldn't you know it, the single empirical botanical dataset they re-analysed had only 48.
Third, researchers should lower their expectations and get used to living with unresolved relationships, as Bayesian analysis produces less resolved phylogenies.
Our discussion of the paper was wide-ranging. When I commented that one of the advantages of traditional parsimony software is that it easily allows the implementation of any step matrix that is needed (imagine a character where state 0 can change into states 1, 2 or 3, but 1-3 cannot change into each other) I was informed that that is in fact possible in BEAST. That is a pleasant surprise, as I had assumed that it was limited to setting a few simple models such as standard Mk for unordered states, nothing more. However, those who have written XML files for BEAST may want to consider if that is "easy" compared with writing a Nexus file for PAUP. Personally I find BEAST input files very hard to understand.
Another concern was that while nucleotide substitution models are based on a fairly good understanding of what can happen to DNA nucleotides which, after all, have a limited number of states and transitions between those states, it is considerably less clear what the most appropriate model for any given morphological character is.
What is more, somebody pointed out that there are essentially two options in a model based analysis: either the likelihood of state transitions is fixed, which is a difficult decision to make, or it is estimated during the analysis. But in the latter case the probability of, for example, changing the number of petals would be influenced by the probability of shifting between opposite and alternate leaf arrangement. And clearly that idea is immediately nonsensical.
In summary, the drumbeat of papers on the lines of "we are the Bayesians; you will be assimilated; resistance is futile" is not going to stop any time soon. I use Bayesian and likelihood analyses all the time for molecular data, no problem. But I am still not convinced that the Mk model would be my go-to approach the next time I have to deal with morphological data. It seems to me that it is much easier to justify one's model selection in the case of DNA than in the case of, say, flower colour or leaf length; that the idea of setting one model and estimating gamma across totally incomparable traits is odd; and that I would hardly ever have enough characters for Bayesian analysis to produce more than a large polytomy.
But I guess all that depends on the study group. I can imagine there would be morphometric data for some groups of organisms for which stochastic models work quite well.