Sunday, March 1, 2015

Parsimony analysis in TNT using the command line version

I guess I can just as well make it a habit to blog some advice whenever I have dealt with a recalcitrant piece of software. Today: Tree analysis using New Technology (TNT).

As I have mentioned before, there are four main ways of inferring phylogenetic trees of evolutionary relationships:
  • Distance/clustering analysis. This is not really a phylogenetic analysis in the strict sense but merely clusters terminals by their similarity, but on the plus side clustering is always extremely fast. There are several programs that can do it, including good old PAUP and MEGA.
  • Likelihood analysis. Simplifying a bit one could say it searches for the tree with the best log likelihood score given a model of sequence evolution and the data. Again there are several programs available to do this kind of analysis, including PAUP, MEGA and PHYLIP. Calculating likelihood values across large phylogenetic trees is computationally intensive, and thus they can take quite some time for larger datasets. This is why somebody wrote the software RAxML, which is designed to do complex likelihood searches with seemingly ridiculous speed by cutting a few corners.
  • Bayesian phylogenetics. This approach estimates the posterior probability of phylogenetic relationships with a Marcov Chain Monte Carlo (MCMC) method. Standard software packages for this are MrBayes and BEAST. If you want a quick answer, you are out of luck though, because MCMC always takes time.
  • Parsimony analysis. The logic here is to find the tree with the lowest number of character changes along the branches, under the assumption that, all else being equal, the simplest explanation is the best. It is often considered less sophisticated than the previous two approaches but it comes with less assumptions; I like it that I know where the computer has its hands, so to say. Once more PAUP, MEGA and PHYLIP implement parsimony searches but they are fairly slow for larger datasets.
This is where TNT comes in. Published in 2003 and made free-ware through a subsidy of the Willi Hennig Society in 2007, TNT could be called the RAxML of parsimony analysis. It can take a fairly large dataset and finish the tree search before PAUP has got its shoes on. What is more, in addition to the already fast standard search it implements the innovative search strategies that gave it its New Technology name part, such as the Parsimony Ratchet. When you use these you will know what speed means!

Sadly, the program has a few downsides. First, its input and output formats are rather idiosyncratic. Second, it has a GUI only on the Windows version but not on Mac or Linux, so that you will have to use command line and scripting on the latter two systems. Third, the documentation is unsystematic and unhelpful, making it very hard to figure out how to effectively use the command line and scripting. Actually, that is not quite true; documentation on scripting per se seems to be okay, it is rather the simple standard analyses that aren't explained anywhere.

This is why I am writing this post. I have just done a simple analysis, and I want to spare others the same investment in time and frustration, and I want to be able to look up my own post in the future, especially should some time pass before I use TNT again.

Thursday, February 26, 2015

'Classical' theism and a simple god

Some time ago a visitor to this site called Cale took issue with a post in which I expressed my very personal opinion that many religious believers have got it exactly backwards: In my view, a universe without god is less depressing than a hypothetical one in which we would be the marionettes of a god; thinking standard hopes about life after death, heaven and hell through to their logical conclusion leads to absurdities and horror; and religious faith makes a spectacularly weak foundation for moral behaviour.

Although Cale was somewhat cryptic on what precisely he disagreed with, I believe he believes that I have a wrong concept of god, and that if I had the correct concept in my head the religious perspective (of the universe being run by a god etc.) would appear more desirable than I am so far willing to grant.

To give me something to think about, Cale helpfully pasted several links to other websites into the comment field. The first few are to blog posts by one Edward Feser who advertises himself as a writer and philosopher, where 'philosophy' apparently means following one very specific school of Medieval theology. The remainder have such inviting titles as “Original Sin and its Consequences”. Because something like sin is unlikely to make much impression on somebody who has yet to be convinced that there is something to sin against, it is perhaps more productive to have a look at Feser first. A quick scan shows that Feser, for his part, constantly refers back to what he calls 'classical theism', so the post with that title would probably be the best place to start.

For the following note again that this is all just my personal view, and that I do not claim any official or professional expertise in this area. Furthermore, I am not trying to antagonise anybody needlessly, but I like discussing issues like these and don't see why I shouldn't present my honest opinion, especially here where nobody has to read it if they don't want to.

Tuesday, February 24, 2015

Summary of that special issue

(The following is the tenth part of a series of posts on an Annals of the Missouri Botanical Garden special issue on “Evolutionary Systematics and Paraphyly”. All posts in this series are tagged with “that special issue”.)

Okay, time to wrap this overly long series up. This is what the special issue contributes to the discussion around paraphyletic taxa:

Stuessy & Hörandl, Evolutionary Systematics and Paraphyly: An Introduction, pp. 2-5.

Introduces the special issue and provides some background.

Lockhart et al., We are Still Learning About the Nature of Species and Their Evolutionary Relationships, pp. 6-13.

Does not actually try to make the argument that paraphyletic supraspecific taxa should be recognised but merely points out that there is sometimes ongoing hybridisation between very closely related species. The entire contribution appears to be based on the misunderstanding that phylogenetic systematics (AKA cladism) does not accept paraphyletic species, which is further based on the misconception that the term "paraphyletic species" has an actual meaning for sexually reproducing organisms, which, however, it hasn't. Where phylogenetic structure is absent things cannot be "-phyletic", be it mono or para.

Hörandl, Nothing in Taxonomy Makes Sense Except in the Light of Evolution: Examples from the Classification of Ranunculus, pp. 14-31. 

Argues that classifications accepting paraphyletic taxa are more informative than any other classification, and that the approach restricts the options for classification more than any other. I consider both claims to be false: Because the end user cannot know if a taxon in an 'evolutionary' classification is monophyletic or defined by some symplesiomorphy, the information content of such a classification is zero.

Or in other words, the misconception underlying the paper is that "using a lot of information when making the classification" translates into "the end-user can get a lot of information out of it", but that is not a given. In the present case, they cannot deduce the meaning of any individual taxon without backtracking to the original rationale of the taxonomist. The thing is, making that laborious backtracking process unnecessary is precisely the point of having a classification in the first place.

As for the second claim, because there are myriads of ways how a taxon can be circumscribed as paraphyletic and myriads of characters that one could consider 'important' enough to be used as a defining symplesiomorphy, this approach actually has the largest possible number of options for classification.

George, The Case Against the Transfer of Dryandra to Banksia (Proteaceae), pp. 32-49.

The main argument appears to be that Dryandra should not have been sunk into Banksia because the relevant studies had not sampled 100% of the species, some small mistakes were made by the authors, gene trees were incongruent in some irrelevant details, and so on. In other words, nitpicking to distract from the real issue, which is that Dryandra is conclusively known to be nested in Banksia. The author also suspects that Dryandra is polyphyletic, which if true would make the taxon unacceptable to most of his allies.

Stuessy, Paraphyly and Endemic Genera of Oceanic Islands: Implications for Conservation, pp. 50-78.

Argues that the enforcement of monophyly will make currently endemic genera disappear into more widespread genera, and that this would make it harder to conserve the relevant species. It is hard to interpret this contribution as anything but a 29 pages long appeal to base taxonomic decisions and the classification of biological diversity on political convenience. If it isn't I must have missed its point.

Ehrendorfer & Barfuss, Paraphyly and Polyphyly in the Worldwide Tribe Rubieae (Rubiaceae): Challenges for Generic Delimitation, pp. 79-88.

An extremely interesting review of the state of knowledge about phylogenetic relationships in a group that includes such well-known genera as Galium and Asperula. But although the authors want to mentally assign unknown and unavailable ancestors to one of their descendant clades, the classification they propose is nonetheless for all practical purposes an entirely phylogenetic one because those ancestors will not appear in it anyway.

Brummitt, Taxonomy Versus Cladonomy in the Dicot Families, pp. 89-99.

Reiterates an argument that Brummitt made in earlier papers: Linnean ranks and phylogenetic systematics are incompatible because classifying an ancestral species into a genus will make that genus paraphyletic to all the genera its descendants are assigned to except itself. This is true as far as it goes, but of course Brummitt took it for granted that one could ever know that one is faced with an ancestor as opposed to a side lineage, that fossils should be treated as ancestral as opposed to terminals, and that when faced with the incompatibility he argues for one should prefer the pre-Theory-of-Evolution concept of Linnean ranks over phylogenetic systematics. One can certainly challenge all three assumptions, to say the least.

Zander, Support Measures for Caulistic Macroevolutionary Transformations in Evolutionary Trees, pp. 100-107.

Although it also proposes the titular support measures, most of the paper is a mixture of criticism of phylogenetics and explanations of the author's own approach to systematics. The fundamental problem here is that Zander's methodology depends on considering some present-day organisms to be the ancestors of other present-day organisms, but well, they simply aren't. It is like claiming that my brother is my ancestor because he looks more similar to our father than I do. If, however, we sensibly conclude that two contemporary groups have a common ancestor in the past instead of one being the ancestor, the whole argumentation of the paper collapses immediately.

Liu & Viña, Pandas, Plants, and People, pp. 108-125.

I cannot see any connection to the topic of paraphyly. In fact it seems unclear why this paper has been published in that specific journal, and I half suspect that the inclusion of this paper in the special issue is down to some kind of database mix-up.


In summary, not counting the introduction and the Panda article there are seven serious contributions in this special issue advocating the recognition of paraphyletic supraspecific taxa. Of these, two do not actually appear to argue for paraphyletic supraspecific taxa at all and feel as if they are merely based on potentially easily resolved misconceptions. Of the remaining five, one uses a very distinctly non-scientific and political argument, and one argues that you shouldn't make any taxonomic changes the author doesn't like unless you have achieved an unreasonably high level of sampling and of confidence in the results.

Even apart from me disagreeing with the arguments of the final three, this special issue brings to mind an English saying I have heard. Something about barrels and their bottoms. Of course, one good argument would be enough in a case like this, but once more I cannot see it.

As before, the only one who got close is Richard Brummitt, but the problem is that his best argument also contains the seed of destruction for paraphyletic taxa: As our attention is drawn to the problem of classifying ancestors, we start to realise that they break the 'long branches' and significant 'evolutionary divergence' that paraphylists use as cleavage points when circumscribing paraphyletic taxa; evolution is gradual. And then we are soon drawn to the conclusion that a rank-free phylogenetic system is the only solution for a classification across the history of life on this planet. This is not what Brummitt wants, but it is the logical consequence of trying to accommodate ancestors in the classification.

Monday, February 23, 2015

Botany picture #194 Wahlenbergia gloriosa

On the weekend we made a little trip to Mount Franklin Road in the Brindabella Mountains, and for the first time I saw the 'royal bluebell' Wahlenbergia gloriosa (Campanulaceae) in flower. It is the floral emblem of the Australian Capital Territory, and it has been chosen as the logo of the upcoming Australasian Systematic Botany Society conference that my institution is hosting.

Friday, February 20, 2015

Stemmy large-evolutionary changes (that special issue)

(The following is the ninth part of a series of posts on an Annals of the Missouri Botanical Garden special issue on “Evolutionary Systematics and Paraphyly”. All posts in this series are tagged with “that special issue”.)

The final contribution to the special issue advocating paraphyletic taxa is Richard Zander's Support measures for caulistic macroevolutionary transformations in evolutionary trees. There are two ways of addressing it, and with previous papers in this issue I have sometimes taken one and sometimes the other: Either one can go through the paper bit by, carefully analyse the argumentation, rebut one claim here but concede another there, and so on; or one can take a step back, point at the fundamental assumption underlying the whole line of argumentation, and explain in a few words why one considers it to be wrong.

Because I am tired and have much else to do, I will mostly use the second approach and then spend just a bit more time addressing other random aspects of the paper that stick out to me.

Really it is very simple: Richard Zander sees groups of organisms that exist today as the ancestors of other groups of organisms that exist today. I, and with me presumably most systematists and evolutionary biologists, believe that a group of organisms that exists today cannot possibly be the ancestor of another group of organisms that exists today.

Unless one were to push them forcibly into a working time machine, today's chimpanzees are not going to become our ancestors, today's fish are not going to become the ancestors of the land animals, and today's ferns are not going to become the ancestors of the flowering plants. Instead, these groups have common ancestors in the past, and thus, no matter how much Zander ridicules the concept, “unknown hypothetical ancestor → (one extant group, another extant group)” remains the most appropriate way of describing evolutionary history. As a group of individuals in a time slice, the ancestral taxon is separate from all of its descendants, and as an evolutionary lineage through time it is identical to all of them, but it does not make sense to equate it with only some.

So again, I do not accept the premise that would enable us to even start thinking in terms of what Zander calls “caulistic macroevolutionary* transformations”, and thus for me the entire argumentation of this paper never even gets onto its feet. Conversely, Richard Zander does not accept the premise that ancestors should actually be ancestral to their descendants, and so nothing I can write would ever convince him. Agree to disagree and all that, I guess.

Tuesday, February 17, 2015

Dated phylogenies: My experience using r8s

In the spirit of my previous posts on species tree software and fastStructure, the following post is to summarise my experience trying to use the software r8s, again so that somebody who also tries to use it for the first time may have the chance of finding these remarks and thus avoid some of the frustrations I had.

First, what is this about? It can do more, but for my present purposes Mike Sanderson's r8s (~rates) is a program that takes a phylogenetic tree and at least one fossil calibration point provided by the user and then dates the other nodes of the tree.

Imagine you have a phylogeny of a group of plants - a tree of their evolutionary relationships, produced with molecular data and using one of the standard phylogenetic software tools, TNT, RAxML or MrBayes perhaps - and you want to know when some of the subgroups evolved. For example because you want to know whether a given climatic or geological event is associated with the diversification of one of the subgroups.

You need some kind of information that allows you to calibrate the phylogeny somewhere; either you know mutation rates in the molecular data you are using, or you have fossils that you can use to assign minimum ages to the groups they belong to (the group can be older but not younger than the fossil), or, weakest of all perhaps, you believe other people's dated phylogenies and use some of their results as calibration points. You also need to assume that branch lengths in your phylogenetic tree - mutations along the branches - have at least some kind of rough relationship with the age of that branch.

Clearly there are a lot of assumptions entering into this kind of analysis, and there are scientists who are highly sceptical of these kinds of methods. Still, the assumptions that a group is at least as old as its oldest fossil and that groups accumulate more differences the longer they are apart are surely reasonable, and so as long as we take precise ages in the results with a bucket of salt we can at least use the broad strokes to address some questions. Conversely, if we get an age of more than a billion years for a group of flowering plants we know that something must be amiss.

r8s is one of the two principal tools for doing dated analyses; the other one is the Bayesian software package BEAST. Many people, especially religious Bayesians, would probably say that r8s has been made redundant by BEAST. But as I have written here before, all of these methods have their own advantages and disadvantages. One of the major disadvantages of Bayesian phylogenetics is that it rests on an even greater number of assumptions and, specifically, priors than simpler methods. Add to that the often ridiculously long computing time especially for larger datasets or the problems BEAST often has with missing data and it should become clear that there will always be a comfortable niche for other approaches.

With this, we finally arrive at the program r8s itself, which I have tried out over the past few days. The manual does a good job of explaining its functionality and how to set up an analysis, so I will not deal with that here. Rather, I want to focus on the practical details that one usually has to find out the hard way:

Saturday, February 14, 2015

Botany picture #193: Adoxa moschata

Adoxa moschata (Adoxaceae), Germany, 2008. Australia has many weird little ephemeral plants in the arid zone; often closely related to larger and longer lived species, they have evolved to quickly grow and seed after only a few weeks, often self-pollinating in the process, because they are living in a habitat with very unpredictable rainfall. This European herb does not have the same excuse. It is a fairly close relative of several groups of shrubs that used to be in the Caprifoliaceae but for some reason it has evolved to be so tiny that it is easily overlooked on the forest floor.