PhyloBotanist: July 2016

Sunday, July 31, 2016

Botany picture #232: Banksia marginata

Banksia marginata (Proteaceae), the silver banksia, ACT, 2016. I went to Tidbinbilla with friends today, and this cone was the only one we saw in flower; really it is not the season. I think this may also be the only Banksia in the Canberra area. If one wants to see more one would have to drive towards the coast.

Saturday, July 30, 2016

Best practices in identification key design

About two weeks ago I gave a lecture about plant identification, which always includes a few comments on what makes a good or a bad key. Not so much because the students are going to write their own keys in the near future, but in the first instance because they are going to use keys next week, and I want to make it clear that if they have problems it may well be the author's fault and not their own.

Among the literature I put on the course website as further reading is a very nice article on Best Practices in identification key design, Walter & Winterton (2007, Annu. Rev. Entomol. 52: 193-208). It already starts with a nice but sadly accurate quote (“Keys are compiled by those who do not need them for those who cannot use them”), but in my eyes the core of the paper is table 2, which enumerates ten recommendations to key writers:

1. Do not write the key to reflect your classification, write it to make identification as easy as possible even if that means having totally unrelated species next to each other.

2. Avoid couplets with only one character, in case that one character is missing or lost on the user's specimen.

3. Use clear, unambiguous characters and avoid technical jargon.

4. Have the same characters in both leads of a couplet. (Really it is shocking that this needs to be said!)

5. Show illustrations of contrasting character states next to each other.

6. Place illustrations next to where they are needed in the key.

7. “Provide a way out of a dead end: Give links to previous couplets or other means of keeping on the path.”

8. Design keys so that the couplets split the remaining species half-half instead of so that they divide them one versus all others. This makes the key shorter.

9. Provide descriptions for the taxa so that the user can check if they arrived at a plausible result when using the key.

10. Ask “naïve” end-users to test your key and provide feedback.

Although I would consider some of these points much more important than others I agree completely with all of them.

Coincidentally I had to key out a plant in the same week that I gave the lecture, and this was the very first (!) couplet of the relevant key:

A. Capitula discoid: all florets bisexual, or all florets female, and the corolla-limb of similar size in all florets, to 1.0 mm diam. at base of lobes OR capitula radiate but with only 1-3 ligules; achenes homomorphic

A*. Capitula radiate or disciform: if disciform, the corolla-limb to 0.5 mm diam. at base of lobes, with corolla-limb of marginal florets significantly smaller than that of central florets; if radiate, ligules 4 or more, sometimes inconspicuous; achenes homomorphic or dimorphic

Okay, how does this one couplet score against the list of recommendations of Walter & Winterton? Solution below the fold.

Botany picture #231: Daphne

Daphne (Thymelaeaceae), perhaps D. mezereum, as seen at Cockington Green in Canberra the previous weekend. I have always liked the genus because it flowers so early in the season and has such an amazing, strong floral scent. Unfortunately both the branches and the fruits of these species are rather poisonous, so they are not ideal for a garden frequented by small children.

The genus is not native to Australia, which has the large genus Pimelea instead.

Tuesday, July 19, 2016

I must be missing something

As I continue to contemplate Ebach and Michael's recent paper From Correlation to Causation: What Do We Need in the Historical Sciences?, I would first like to make clear that I like reading and appreciate papers like this one. If we want to get things right it is crucial to be pushed out of one's comfort zone from time to time, so asking a question like "have recent developments in the field gone totally in the wrong direction?" has its value.

That being said, however, it would appear to be a reasonable assumption that >90% of the experts are somewhat unlikely to all overlook a fundamental flaw in what they are doing. It could happen, yes, but especially as a non-expert one would need to see a rather good and clear argument before agreeing that they do.

With this in mind, I will now describe my thoughts about the core of the paper and my understanding of what the authors argue for and why. The first parts of the paper feature a lengthy discussion of the interpretation of ancestry and character evolution in phylogenetics and evolutionary biology, of unstructured and structured representations of the same data, and of the dangers of letting unwarranted assumptions distort the data. There might be quite a bit to be discussed here - for example the claim that "historical sciences, such as taxonomy and palaeontology ... are mostly descriptive and defy testing", which might be news to the discoverers of Tiktaalik, who were able to predict and subsequently test their prediction of where to look for this "missing link" - but the meat of the paper as I understand it starts with the criteria the authors suggest for "comparing ... assumptions against a well-attested set of aspects of causation".

Reference is made to the Bradford Hill Criteria of the medical sciences, and they are then adapted to the presumed needs of historical science, which, as discussed in the earlier post, the authors perceive to be fundamentally different from experimental science. The new Historical Sciences Bradford Hill Criteria are presented under key words that sometimes describe what to look for and sometimes describe what to avoid:

Selection bias. This is an obvious problem in science, although I must say that I do not find the specific examples provided by the authors to be the most convincing.

Temporality. I am afraid I do not really understand what is meant here, so I will quote the relevant paragraph from the paper in full, excepting references. "Present day distributions are the result of past events. Therefore there is the possibility that different taxa alive today may be resultant in different events that occurred at different times. In using a single historical event (e.g., the Oligocene drowning of New Zealand ~30 Ma) to address a larger biogeographical question has resulted in several debates about the age of the New Zealand biota. Single event hypothesis are rare, however little to no scrutiny is typically taken to test their validity."

Especially considering that in the original, i.e. medical science, criteria temporality was apparently about dose-response and reversibility of an effect, I am unsure where the above comes from. It is even less clear to me what is wrong with using a single historical event to address a large question. If New Zealand was indeed completely under water then it quite simply follows that all endemic land-living organisms would at that moment have perished, and that the current land-living organisms would have had to disperse into New Zealand afterwards, when it rose up again. (I am not qualified to assess if it was indeed completely under water, but that is not the point.)

Evidence for a mechanism of action. This again makes more sense to me, although we may have to agree to disagree about how plausible any given assumption of a mechanism of action is. The authors, for example, appear to consider dispersal from one biogeographic region to another to be implausible, but I believe that as long as the probability of that happening is not zero it would have to be a matter of weighing it against the plausibility of alternative explanations (such as those requiring that a family of flowering plants arose before multi-cellular life).

Coherence. In effect the question whether a claim fits what else we are currently confident we know. Makes sense.

Replicability or, as I would call it, reproducibility. The authors argue that historical data are not replicable, so the equivalent is correlation between different datasets.

Similarity. Do two different datasets for the same study group arrive at the same result? It is not entirely clear to me how this is different from the previous, but at any rate the two seem sensible enough.

In summary, some of the above is not clear to me, but other aspects appear immediately reasonable. In fact there would a trivial interpretation of what the authors want to say: Examine your assumptions; remove indefensible assumptions; all else being equal, use the simpler model instead of an unnecessarily convoluted one.

But one would assume it cannot really be that easy, because there is hardly anybody in science who would disagree with that. Modellers are all aware of the danger of over-parameterisation; the problem is simply that all else is not always equal. Sometimes a more complex model is quite simply the better explanation. If you have a plot of dots forming a straight line as data, a very simple linear model with one parameter will do nicely. If you have a plot of dots forming an S-shaped pattern you will quite simply not be able to explain them with such a simple model, you need more parameters. I would suspect the same applies to biogeography; if there are data that defy the explanation of vicariance then our explanation needs to incorporate more processes than vicariance. I thus find hard to accept the authors' judgement that "complex models are designed to extrapolate data under highly speculative assumptions" whereas "simple models, with plausible assumptions[,] are more likely to pass the [criteria]". It really depends.

Similarly, everybody in science will agree that we shouldn't base our conclusions on bad assumptions. Problem is, everybody will argue that their assumptions are the good ones. Perhaps now would be a good moment to turn to the example provided by the authors of the present paper, to see how they use their new criteria. This might also clear up those aspects that I did not understand when reading the criteria themselves.

Interestingly, the four methods compared by the authors under their criteria are at least partly apples and oranges. They are Brooks Parsimony Analysis (BPA), Ancestral Area Analysis (AAA), Ebach's own Area Cladistics (AC), and the Dispersal-Extinction-Cladogenesis model (DEC). To the best of my understanding the point of BPA is to reconstruct how biogeographic regions are related, as in "the rainforests of New Zealand are sister to the temperate rainforests of Australia, and sister to both of them are the temperate rainforests of Patagonia" (this not a quote but a hypothetical). We might also call this reconstructing the evolutionary history of biogeographic areas. In contrast, my understanding is that the other three are concerned with reconstructing the inverse, the biogeographic history of an evolutionary lineage, either in its entirety or at least to infer where the common ancestor of the lineage was found (although admittedly I was unable to look deeply into AC as the relevant paper was behind a paywall).

Still, all four are biogeographic methods. I found it easiest to proceed once more criterion by criterion.

Selection bias. DEC is criticised for assuming that "areas" are the result of dispersal and extinction, while the criterion is said to be inapplicable to the other three methods because "the type of area is not specified" in any of them. Once more I can only say that I don't get it.

There are two possible interpretations of "area" in this context. The first is that we are talking about the cells or regions defined a priori as the units of the analysis. If this is the case, then all four methods face the exact same problem, because in all cases the user has to define areas a priori. But this doesn't make sense because the cells defined for a DEC analysis are quite simply not "considered as [sic] a result of dispersal and extinction", they are the units of which a potentially larger range considered to be the result of dispersal and extinction consists.

The second possibility is that we are talking about the results. If this is the case, then yes, obviously a Dispersal-Extinction-Cladogenesis model assumes that the present ranges of organisms are the result of dispersal and extinction (and cladogenesis). That's the point. But if this is what we are talking about then we cannot simply say "doesn't apply" for the other three. AC, for example, assumes that current ranges are the result of vicariance, so at the very least it would need a green marker for a plausible assumption, if indeed we find this assumption plausible; realistically, we would have to start discussing whether vicariance as the only process makes sense.

Temporality. As mentioned above I don't understand how the things considered under this name are any more temporal than the ones that aren't. The example does not really clarify the matter for me either. BPA and DEC are criticised as "speculative" because they use "incongruence" (between distribution patterns of different lineages? I believe that is not how DEC works...) to "explain" or "justify" "ad hoc events" or "processes". First, I think what is meant here is the other way around, i.e. that BPA and DEC explain certain patterns by invoking events that the authors consider to be ad hoc assumptions, apparently in practice meaning any biogeographic process except vicariance.

Second and more importantly it is, to say the least, not clear to me why vicariance is less ad hoc than dispersal, extinction and cladogenesis, which just goes back to my earlier point that everybody thinks their preferred explanation is the plausible one. Anyway, AAA is likewise criticised as speculative because "duplicated areas are considered to be part of an original ancestral area". AC, on the other hand, is given a green for entirely plausible assumptions because it only assumes that "geographical congruence is a result of common historical processes". Taken on its own that may sound reasonable, but what about the incongruences? Are they simply ignored? As mentioned above, if there are data that defy a one parameter model then more parameters would appear to be warranted.

There is little to say about evidence for a mechanism of action because none of the four methods is given a clean bill of health. I actually find it rather impressive that the authors call this aspect even of their preferred method speculative for explaining every congruence with vicariance. I do not, however, understand what is meant with "tree topology determines all processes" in the case of DEC. Taken at face value it is plainly wrong because not only the phylogeny but also the present distribution data go into the analysis. What is more, the same necessarily applies in the three other methods, only that some of them use "areagrams" instead of the phylogenetic tree of a group of organisms.

Finally, the treatment of coherence, replicability, and similarity seems even stranger to me. CA is lauded for comparing its results against other data, and with the exception of BPA for similarity the three other methods are criticised for not doing so. But how does the method determine what the end user does with it? What if the user of CA decides not to make any further comparison? What if the user of the DEC model goes on to apply the model to the next four lineages occurring across the same biogeographic areas? How would using DEC exclude such a possibility?

Maybe I am missing something, but it seems to me as if all four methods generally merit at best the same colour, or level of plausibility, on all criteria. If anything I would look somewhat askance at BPA, AAA and CA for simply assuming that the concept of "areagrams" makes sense in the first place, because if there is any significant degree of exchange between biogeographic regions it doesn't.

Either way I am afraid I cannot claim to have understood how to apply these new criteria to in an unbiased manner going forward.

Friday, July 15, 2016

Freedom!

In the light of two more recent rounds of the perennial Free Will discussion elsewhere, I think I now finally understand the incompatiblist position. Let's see if I got this right.

Free Speech. The right to express one's opinion without being punished. Generally considered to find its limits in libel and incitement to violence. In the stricter sense limited to the understanding that the government should not be able to punish a person for expressing their political views; on the other hand it can be argued that free speech in this strict sense alone would be hollow, that expressing an unpopular opinion should not be grounds for losing one's job either. Either way, this concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Academic Freedom. Same as free speech but in the context of university employees, particularly tenured professors. Sometimes misunderstood to mean that professors have the right not to do the job they are being paid for without facing any repercussions at all, e.g. when somebody uses what should have been a science course to promote their religious beliefs or political ideology. Most importantly, this concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Degrees of Freedom (Statistics). The number of parameters that can vary, that are not determined by others. In many models or statistical tests this number is one less than the total number of parameters, as the value of the last parameter follows necessarily from the values of the others. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Degrees of Freedom (Mechanics). The number of ways in which a machine can move, counting dimensions and rotations around dimensions. A locomotive for example would have one, a car three (two dimensions and rotation around the third), an aeroplane six. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Freedom of Religion. The right to practice one's religious faith without being punished for it. Sometimes badly confused with the right to also force others to adhere to the rules of one's own religion or to discriminate against members of other religions. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Freedom of Movement. Commonly understood to mean the right to move without restriction through one's own country, including choosing one's place of residence, and to leave the country and return to it. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Free Lunch / Entry / Drinks / etc. Descriptive of receiving a service or item that usually has to be paid for, without having to pay for it in this instance, generally because somebody else pays for it. Funnily enough this concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Free Press. The right of the news media to report what is going on without being punished for it. Generally understood to be reasonably limited by the right to privacy and national security concerns. Generally understood to be an important aspect of a functioning democracy, as only a well informed electorate can make well informed decisions. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Free Range Chickens. Chickens that are, while still obviously fenced in so that they do not escape, given a healthy amount of room to move around, as opposed to "battery" hens. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Free Fall. The situation in which the only significant force acting on a body is gravity, as opposed to being held up by the ground or being slowed down by a parachute. Even this concept does not, despite having the word "free" in it, imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Free Style. Being allowed to conduct an activity without having to follow strict rules or being required to achieve a set goal. This concept does not imply anything magic, is perfectly compatible with a deterministic universe, and we are free to say it.

Free Will. The ability to contemplate different possible courses of action and then decide between them in the absence of external pressure or pathological compulsion, resulting in actions that match one's preferences. Despite its equivalence with most of the other terms above, acceptance of this concept (and only of this concept) implies a belief in magic and a rejection of deterministic rules of cause-and-effect. Although a compatibilist view like the one just described was already promoted by the determinist stoics of Greek Antiquity, this view is actually nothing but goal-post moving by unreasonable contemporary philosophers who don't want to accept that neurophysiology has shown determinism to be true. And of course until a few years ago nobody ever had that determinism idea. What stoics? No idea what you are talking about. While we are at it, please ignore all religious traditions that have promoted determinism for hundreds of years because their gods are omniscient; focus on the traditions that have promoted magical, non-determinist Free Will because they were troubled by the Problem of Evil. Using the term even under the non-magical, compatibilist definition given earlier aids them (somehow), so the term Free Will (and only this term, but none of the other equivalent concepts containing the word "free") should not be used any more. Because that is totally going to happen. And when we need to describe the difference between, say, a coldly calculating thief and a kleptomaniac we will come up with something. Perhaps just use "voluntary" and pretend it is not simply the Latin translation of Germanic "out of one's own free will". Or maybe we don't need a word to describe the difference after all, because due to determinism the former had as little choice as the latter; then again, we also believe that there is a difference after all because we would still lock the former up but give the latter treatment, so maybe a term would be useful; then again, due to determinism the former had as little choice as the latter... (Oops, I think I entered an infinite loop there.)

Is that about correct? Bit uncertain about the end, but well, I am not an incompatibilist. There might also simply be different perspectives in the incompatibilist camp.

Wednesday, July 13, 2016

Experimental versus historical science

Long time no blog; it seems to come to me in bursts.

Anyway, a colleague has drawn my attention to a paper that has recently appeared, From Correlation to Causation: What Do We Need in the Historical Sciences?, by Ebach and Michael. It argues that "the integrity of historical science is in peril due [to] the way speculative and often unexamined causal assumptions are being used", and further suggests six criteria to check these supposedly speculative assumptions against.

In effect, the issue appears to be the use of models in phylogenetics and, in particular, in biogeography, and here, in particular and unless my reading between the lines is mistaken, the acceptance of any process except vicariance.

Before even delving into any other parts of the argumentation, it would be interesting to consider one of the underlying premises, which is clear already from the title of the paper: the assumption that historical sciences are fundamentally different from experimental sciences. As the authors write, "any evidence we adduce for some historical event needs must be contemporary evidence from which we make inferences on the basis of auxiliary hypotheses". But is it really any different in experimental sciences like medicine or physics? Do they not also have auxiliary hypotheses and assumptions at every step? Perhaps it is a failing on my part, but I at least cannot clearly see a marked difference.

Yes, of course we have easier access to evidence about things that happen around us every day today, and it is much easier to gather more of it. But that is a question of quantity, not the question of a qualitative, let alone epistemological, difference. To illustrate the point, let us consider an extremely simple case, the textbook statistics example of die throws.

First assume that I give you a die and ask you if you think it is loaded. You will then perhaps roll it twelve times, and get the result 2, 6, 5, 6, 6, 6, 6, 6, 3, 6, 6, and 6. Instinctively you might now conclude that it is, indeed, loaded. If you want to be scientific about it you would do statistics to calculate what the likelihood is of rolling these results with a fair die. It is, after all, possible that a fair die produces nine sixes out of twelve rolls; in fact it could produce a hundred sixes out of a hundred rolls, the question is merely how unlikely that is.

You have the die in your hand and you just did an experiment. Experimental science, right? Okay, now assume that after the twelve rolls described above, I snatch the die away from you and drop it in the Mariana Trench. It could be argued that from that moment on the research question "is the die loaded?" has turned into historical science. The twelve rolls are all the data we will ever get. From there we can take the next step and consider a scenario where we read about the twelve rolls in a book that is hundreds of years old. Surely now the question is squarely in the realm of historical science.

But has anything changed? I don't see it. The exact same statistical approach that applied before still applies afterwards. There is no difference in how we address the problem in either case.

And of course this situation is what we always face in science, in a certain sense. We don't literally have a die snatched away, but we do have time, money and other resource constraints. At some point we stop collecting data for any given study and analyse them. Consequently I fail to see where the philosophical difference is between being limited by the data that are available due to an accident of history and being limited by the data that are available due to, for example, our luck with DNA sequencing success before the project budget ran out.

The flip side of being limited by the dataset we have in any given situation is whether we can get more data in the future. Again, with experimental science we can get additional data more easily than in, say, palaeontology. But in real life historical research we are usually not reliant on a single die that has been destroyed either, as the most interesting questions are broader than that. So even with historical data we can usually go back and try to acquire more fossils or archaeological artefacts.

What we have considered so far was inferring what process operated in the past (a fair or a loaded die?) from data we have available (the results of twelve rolls). Thinking of biogeography this would be comparable to inferring whether long distance dispersal of plants and animals happened in the past from contemporary patterns of distribution. We can also flip that around now and consider the inference of past one-off events from processes we can still observe today. In biogeography, we can today observe spores, seeds, insects, birds, and ballooning spiders being blown across vast distances and arriving on remote shores. Did Rhipsalis, the only cactus genus naturally occurring outside of the Americas, arrive in Africa through chance dispersal across the ocean or is its current distribution the result of a much older vicariance event?

Of course this was a one-off event, and yes, we will never know the answer for certain. But again I fail to see the difference in principle. I cannot possibly know for certain that the sun will rise again tomorrow morning, but I can have a great deal of confidence in my admittedly tentative conclusion. Going back to the die example, if I give you a die and then ask you, "I rolled it once yesterday evening - what do you think the result was?", you cannot know it for certain either unless I tell you. But you can observe the process - you can roll it a thousand times - and then infer a probability distribution. If you find that it is severely loaded and produces a six 81% of the time, you may be willing to go so far as to suggest that my roll was a six.

In summary, I personally do not at this moment see the big difference between experimental and historical science; at least not a difference that could be used to argue that the latter cannot employ, for example, models of the same complexity as the former. Admittedly I am not a philosopher of science though.

Sunday, July 3, 2016

The Markov k model for discrete mophological data

The most frequently used model of character evolution for morphological data is called the Markov k (Mk) model. It was suggested by Lewis (2001) and is implemented in a few Likelihood or Bayesian phylogenetics programs.

The idea here is that there are several discrete character states. So for continuous traits like organ lengths one would divide the continuum into categories, e.g. character state 0 for small than 5 cm and state 1 for larger than 5 cm. But as that is also how most people build their datasets for parsimony analysis it means that the same data can often be used for both analyses.

Some software allows the states of one character to be ordered, so that to change from state 0 to state 2 a lineage has to pass through state 1, counting as two mutation steps. Some also allow for a gamma parameter, so that the different characters can fall into categories with different rates of change (some faster-evolving and some slower-evolving).

Another important consideration with morphological data is the scoring approach. Datasets of sequence regions generally contain all the sequence data that were obtained, i.e. both the ones that are variable and the ones that are entirely constant across the study group. When scoring morphological data, however, people tend not to put data in that are constant. Imagine building a trait list for several species of frogs - would you add a column for "wings" only to have "no" as the only state across the entire group? Probably not. However, some datasets may contain constant characters, and they may or may not contain characters that differ for only one species. The analysis has to be told what to expect so that branch lengths in the resulting phylogeny are modelled well.

After my recent dive into nucleotide substitution models I also looked up how to properly set the Mk model in PAUP and MrBayes.

The Mk model in MrBayes

The Mk model is set automatically for matrices with datatype = standard. These data can have states 0-9, which should generally be enough.

Depending on the coverage, one can then use lset coding = all if the dataset includes constant characters. Alternative options are variable if there are no constant characters, and informative if there are neither constant characters nor characters that differ for only one species. The Mk model with only variable characters is also sometimes called the Mkv model.

If there are no constant characters, equal rates of change for all characters can be assumed with lset rates = equal, variable rates with lset rates = gamma. If constant characters are included, my understanding is that propinv and invgamma should be used instead.

The default is that all characters are unordered. They can be changed to ordered by using the ctype command, as in ctype ordered: 2 4 for characters number two and number four.

The Mk model in PAUP

I have tried setting the Mk model in one of the new test versions of PAUP, specifically 4a149. To set the model as such, lset nst = Mkv. Unfortunately, beyond that the options are rather limited. The model always assumes equal rates, and as that little v at the end indicates it also seems to assume that all constant characters have been excluded.

Mk model versus parsimony: my admittedly anecdotal experience

I have always made clear that I am not really that terribly interested in philosophical foundations or statistical theory when using a phylogenetic method. For me the real questions are pragmatic ones:

Does the method produce sensible results with empirical data, i.e. results that fit information that we have from other data?
Does the method produce the correct results with simulated data?
Is the method computationally feasible? (What good is a robust Bayesian coalescent approach if it takes weeks on a supercomputer even for six species?)
Can the method be mislead in certain scenarios? But if so, are these scenarios likely to be frequent, or are there other ways of dealing with them than discarding the method? (E.g. different data or better taxon sampling to deal with Long Branch Attraction.)

For the Mk model, the problem is mostly the first point. Just for the giggles, I have in the past used it on a few morphological datasets from small genera, and the results were generally much less convincing than the ones from parsimony analysis. I have also used it in Mesquite for ancestral character reconstruction along trees obtained from e.g. Bayesian analysis of sequence data, and the results were rather nonsensical.

That being said, after the recent publication claiming that Bayesian phylogenetics outperforms parsimony on simulated data, I tried again with a little dataset I am generating, at that moment only 23 traits for 13 species. I am happy to report that the results of running those data through MrBayes were much more meaningful than what I had seen in the past. So I will definitely keep that in mind as an option.

Another interesting observation, however, is that Likelihood or Bayesian analysis of morphological data tends to produce fully resolved trees where parsimony shows uncertainty clearly as polytomies. This is rather ironic given that one of the main arguments of Bayesians is that their preferred approach better shows uncertainty in the data. Of course one could point at low Posterior Probabilities and say, see, there is your measure of uncertainty, but then again support values are always worse for morphological data than for sequences simply because there are much fewer characters. It is not rare to have a dataset with fifty taxa but only twenty characters; of course you will never see a lot of 100% bootstraps or 1.00 PPs under those circumstances, even in the best cases. Thus a fully resolved tree will look very suggestive even at 0.57 PP where a polytomy tells us that we really don't know.

A final reason why I will not soon drop parsimony analysis for morphological data (even as I will give the Mk model more attention) is that there are numerous well established ways of doing parsimony according to how a character can be expected to evolve. Assume, for example, that you have four states 0, 1, 2, and 3, and that 1-3 can all arise from 0 but not from each other (meaning that to get from 1 to 2 you have to pass through 0). Or assume that you want to set a character state so that it was gained precisely once but is impossible to be regained once it is lost.

It would be easy to set this up in parsimony. Maybe it is possible to do this in a model based analysis, but if so then it is at least not part of standard implementations. More generally, the assumption behind a model that there is a general process across all the characters in the analysis makes a lot of sense for molecular data. A base pair is a base pair, and all the sequence positions will be affected by polymerase errors. But does it make nearly as much sense for morphology? A fruit shape is not the same thing as the presence or absence of stipules, and a collar bone shape is not the same thing as the possession of a red patch on the throat.

Again, I am happy to admit that the Mk model in MrBayes surpassed my expectations, and I will use it more often in the future. I am, however, still not ready to do without the option of parsimony, at least for the admittedly rare cases when I want to analyse morphological data.

References

Lewis PO, 2001. A Likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50: 913-925.