Friday, May 22, 2015

How to sample when testing the monophyly of a group

Perhaps I should write more about positive things, do posts on the lines of "look at this amazing plant" etc., but well...

Within the first few months of this year I have already reviewed one paper and am currently reviewing a second that both make the same interesting claim. They present a phylogeny something like this:


And then conclude that their results have confirmed the monophyly of the red group.

Okay, before you go below the fold, can you guess in how many ways this is ... problematic? (Trying to be polite here.)

Let us count the ways:

First, look at the long branch between the red ingroup and the blue outgroup. It is possible that these two blue species are the closest relatives of the ingroup, but that branch doesn't make it look very probable. In the paper I am currently reviewing, the authors themselves happily mention in the discussion that there are three genera that are much more closely related to the ingroup than those silly two outgroup samples they used.

So, part one of our learnings: This doesn't show anything because you would never expect these distant species to sit inside your study group anyway; if anything, you would expect its known closest relatives to sit inside it. Of course the non-avian dinosaurs are monophyletic relative to the fungi, but they are not relative to the birds. If you want to form a conclusion about the monophyly of a group, you need to include its known or suspected closest relatives in your analysis, not only some distant relatives.

Okay, now let us for the moment assume that the two blue samples are really some of the closest relatives of the red group. What is problem number two? It is really a sampling issue. If you take only one or, as in this case, two of the known relatives, and they turn out to be outside of the study group, what are the chances that you missed another close relative that would have turned up inside if you had only sampled it? Pretty good I'd wager.

Part two of our learnings: If you want to form a conclusion about the monophyly of a group, you need to sample broadly around it, not just one or two relatives.

Finally, the third and probably worst problem. The two blue samples were used for outgroup rooting. Apparently it did not occur to the various authors involved in the two manuscripts I read that this will make the ingroup monophyletic. That is literally what outgroup rooting does: you tell the program to force the ingroup into monophyly. So this is really presenting your assumption as a conclusion. Circular reasoning works because circular reasoning works, and all that.

Admittedly if one of your outgroup samples is really inside the ingroup and the other is outside then you will notice. But if all your chosen outgroup is really nested inside the ingroup then you will just polarise the phylogeny wrongly. The non-avian dinosaurs for example would look totally monophyletic on a tree that was rooted using the birds as an outgroup.

Part three of our learnings: If you want to form a conclusion about the monophyly of a group, you need to sample both broadly around it and then use a more distant, guaranteed distant, outgroup to root your group of interest and its close relatives together. This is what you would want the tree from above to look like:


NOW you can have some confidence in the conclusion that red is monophyletic.

I strongly doubt that many people who make these mistakes will come across my blog, but I just had to get that off my chest.

No comments:

Post a Comment