As I have mentioned before, there are four main ways of inferring phylogenetic trees of evolutionary relationships:
- Distance/clustering analysis. This is not really a phylogenetic analysis in the strict sense but merely clusters terminals by their similarity, but on the plus side clustering is always extremely fast. There are several programs that can do it, including good old PAUP and MEGA.
- Likelihood analysis. Simplifying a bit one could say it searches for the tree with the best log likelihood score given a model of sequence evolution and the data. Again there are several programs available to do this kind of analysis, including PAUP, MEGA and PHYLIP. Calculating likelihood values across large phylogenetic trees is computationally intensive, and thus they can take quite some time for larger datasets. This is why somebody wrote the software RAxML, which is designed to do complex likelihood searches with seemingly ridiculous speed by cutting a few corners.
- Bayesian phylogenetics. This approach estimates the posterior probability of phylogenetic relationships with a Marcov Chain Monte Carlo (MCMC) method. Standard software packages for this are MrBayes and BEAST. If you want a quick answer, you are out of luck though, because MCMC always takes time.
- Parsimony analysis. The logic here is to find the tree with the lowest number of character changes along the branches, under the assumption that, all else being equal, the simplest explanation is the best. It is often considered less sophisticated than the previous two approaches but it comes with less assumptions; I like it that I know where the computer has its hands, so to say. Once more PAUP, MEGA and PHYLIP implement parsimony searches but they are fairly slow for larger datasets.
Sadly, the program has a few downsides. First, its input and output formats are rather idiosyncratic. Second, it has a GUI only on the Windows version but not on Mac or Linux, so that you will have to use command line and scripting on the latter two systems. Third, the documentation is unsystematic and unhelpful, making it very hard to figure out how to effectively use the command line and scripting. Actually, that is not quite true; documentation on scripting per se seems to be okay, it is rather the simple standard analyses that aren't explained anywhere.
This is why I am writing this post. I have just done a simple analysis, and I want to spare others the same investment in time and frustration, and I want to be able to look up my own post in the future, especially should some time pass before I use TNT again.