Monday, January 21, 2013

Ranunculus, part 4: Inferring reticulate evolution

This is part of a series on a paper by Hörandl & Emadzade (subsequently H&E) suggesting an "evolutionary", i.e. pre-cladistic, classification of Ranunculus. See the previous installments here: part 1, part 2, part 3.

Going through the introduction took quite long, but that is to be expected as it mentions many of the concepts that are controversial. Some of that will come up again in the discussion, but then it should hopefully not take as long again. What I still want to get out of the paper is essentially whether H&E have found the elusive universal, objective and testable criterion for the circumscription of supraspecific taxa that can replace the criterion of monophyly.

This question should be answered in the materials and methods section coming up now, and to be honest, apart from that I did not expect that there would be many things to discuss because there is no disagreement about how to do phylogenetic and other analyses. And for the most part my expectation was met. H&E use molecular trees from their previous publications, based on maximum parsimony and Bayesian phylogenetic analyses. They score numerous morphological and two karyological characters. They use that non-molecular dataset in all three ways they introduced earlier: (1) to infer a morphology-based phylogeny, (2) to map the characters onto the molecular phylogeny, and (3) to infer a phylogeny based on the combined molecular and morphological datasets.

The two parts of the materials and methods that I want to give closer attention to are the network analysis and, obviously, the criteria for classification. Let's start with the former, because it is at the centre of one major line of thought in this paper:
  1. Phylogenetic methods assume a tree-like structure of the data even if it is in reality not tree-like.
  2. Consequently we also need to use methods that do not assume tree-like structure of the data to discover reticulate evolution if present.
  3. There is quite a bit of reticulate evolution.
  4. Phylogenetic systematics cannot deal with reticulate evolution.
  5. Consequently we need paraphyletic taxa.
#1 and #2 are certainly true; one should not blindly assume that there is no reticulate evolution and if, as H&E argue, there are indications of its occurrence, it would be good to identify reticulation events and accommodate them in the classification. I do not know Ranunculus intimately enough to judge #3 although I will gladly take H&E's word for it - but see my comments on the chosen method below. Finally, I disagree partly with #4 and completely with #5.

As long as reticulate evolution is not rampant, i.e. as long as the greater part of the structure we observe is phylogenetic, the theoretical problem is at most that there will be some nodes in the species tree that we cannot use to define monophyletic groups because there are reticulations downstream. On the other hand, if reticulate evolution becomes too frequent for the circumscription of monophyla it is highly doubtful that any other system of classification would fare any better. For starters, there would be no morphological and ecological divergence to be used for the recognition of paraphyletic residues in an "evolutionary" classification if everything freely interbred all the time. And if there are so many reticulation events that what we have is not a phylogenetic tree any more but a network, then there is neither mono- nor paraphyly but no-phyly-at-all. There cannot be paraphyly because there is no phylogenetic structure.

However, we can verify that the outcome of evolution is mostly a tree-like structure simply by looking around us and observing a piece of nature. As Jerry Coyne likes to argue, the mere existence of gaps in the variation of biological diversity is evidence for the existence of biological species, i.e. groups that don't interbreed. Extending his observation into systematics, the fact that biological diversity is not a terrible mush but visibly nested demonstrates that phylogenetic systematics is not only possible but the most natural way of classifying that diversity. QED.

This topic also demonstrates perfectly the confusion of evolutionary systematists about tokogeny versus phylogeny: They argue that reticulate evolution produces "paraphyly". That is complete nonsense; evolution produces tokogenetic or phylogenetic structures. It does not produce paraphyly; we humans produce paraphyly if we circumscribe a group in a phylogenetic structure to be paraphyletic.

Of course, that was only the theoretical side. There is also the practical one: given a certain degree of reticulation in gene phylogenies that would be theoretically unproblematic, can we still infer the species phylogeny? I would say yes, for extant species at least and given enough data, and I will expand on this topic at the end of this post. The important point is that epistemological difficulties are not an argument against the monophyly criterion. The question should be: given the best tree of life that we can construct at the moment, what should be our criterion for the acceptance of supraspecific taxa?

But back to the network analysis itself, which constitutes H&E's approach to demonstrate reticulate evolution, regardless of whether one agrees with the conclusions that they make once it has been demonstrated. They use the Neighbor Net analysis as implemented in the SplitsTree software. It is a distance-based method, and the result is always something that looks like a cross between an unrooted tree and a spider-web.

Two issues here. The first is that there are good reasons why distance methods have fallen out of use in the reconstruction of sequence phylogenies and we generally use parsimony or likelihood-based methods these days. Distance methods are phenetic, they cluster simply by similarity and have no underlying evolutionary model or phylogenetic logic. One could of course argue that that is precisely the point in a network analysis, but that is not quite true. Other types of network analyses, like Union of Maximum Parsimony Trees (UMP) or the parsimony networks produced by the TCS software are, as their names imply, parsimony based. Now they are probably not adequate tools for what H&E want to do, but in those cases at least I can make sense of the resulting networks, as in number of steps from one haplotype to the next, missing haplotypes, possible evolutionary pathways, etc.

And that brings us to the second issue: I cannot make sense of these spider-webby figures produced by SplitsTree, or at least not quantitatively. The idea I guess is that the more webby a branch of the network is, the more reticulatedness has been inferred. But what exactly does that tell me? It certainly does not tell me: This clade of species is derived from an allopolyploid speciation event between a parental species from approximately here on the phylogeny and another from approximately there; but that is what I would need to know to make a decent classification. As it is, it boils down to the authors concluding, essentially based on a subjective hunch that a certain part of the phylogeny is sufficiently spider-webby, that reticulate evolution must have been involved. If somebody can come up with a more quantitative way of reading the results I would be glad to hear it. It could be added, and to be fair the authors mention it as an afterthought later, that incomplete lineage sorting could also be an issue here, meaning that there may have been less reticulation than it seems.

Don't get me wrong, all this is not an argument against "evolutionary" systematics - but it is not one against phylogenetic systematics either. My hope is that with the recent advent of Next Generation Sequencing techniques we will someday have the data to actually get a better picture of the species phylogeny than just "this looks quite spider-webby to me". In fact I was at a conference last year where a colleague from New Zealand presented a talk dealing with just this issue: ancient reticulate evolution in a certain group of plants. He had sequenced transcriptomes of only a very limited number of species so far, but he found several genes that had long ago diverged as single copies and subsequently reunited as separate copies in another species - evidence of ancient allopolyploidy events! With this kind of data, and enough of it, we will at some point be able to build more accurate phylogenies including the actual historical hybridogenic speciation events instead of some ill-defined reticulations in a distance network of concatenated sequence data.

There is no doubt that H&E did the best they could now with the data they had, which after all was only chloroplast and ribosomal DNA, in effect only two independent regions and the second of them with its own issues. And that is what most of us use most of the time. I am just saying that I would like to have something more quantitative and more definite in my hands before I believe, against all the evidence around us, that ancient allopolyploidy events were so frequent that phylogenetic systematics is rendered undesirable or even impossible.

Continue reading.

No comments:

Post a Comment