Thursday, August 14, 2014

In praise of PAUP*

Hell is freezing over! Pigs are flying! PAUP* is getting updated for the first time in twelve years!

Jokes aside, this is great news. PAUP*, short for Phylogenetic Analysis Using Parsimony (* and other methods), is one of the best known software tools for phylogenetics. Indeed to me it is pretty much the phylogenetic software tool. Yes, depending on the task at hand I also use TNT, RAxML, Mesquite, MrBayes and BEAST with various of its add-ons, but PAUP* is the one I started out with while writing my thesis and it is still the one I feel most comfortable using.

Another major issue is what you can and cannot do with the various programs. The downside of PAUP*, or at least of the previous version, is that it is comparatively slow. So if you have a large dataset with many taxa, you are better off using TNT for parsimony and RAxML for likelihood analyses. But PAUP* can do various kinds of analyses that no other software can do; for example, I would not know how to conduct a Templeton test without it.

(My experience with PHYLIP is limited. Maybe it can do some of the same things. The problem is that its combination of rather excessive modularity and a call centre style user interface - on the lines of "press 3 for this kind of analysis" - has put me off using it so far.)

So over the past few years I have sometimes worried about the day when PAUP* would suddenly stop working on the newest computers. It is good to know that a new version is coming up!

The idea is that ultimately there will be GUIs for Win and Mac that one has to buy, but that command line versions for Win, Mac and Linux will be free. I guess I will be happy to use command line myself, but it might be a good idea to get a GUI licence for small student projects where the student cannot necessarily be expected to learn the PAUP* commands.

8 comments:

  1. It is going to be interesting to see how much uptake there is of this. Really, for the hardcore, the platform limitations have been annoying (keeping a dusty old imac in the corner), but not necessarily a deal breaker. I suspect that the newer people in the field will not have been exposed to/embraced PAUP and this makeover won't do much to bring it forward in the consciousness (and people are animals of habit). For people who just want to produce a tree (thats a whole conversation), and for typically larger data sets, Bayesian (MrBayes/Beast) analyses have become almost stand-alone standard. Occasional there will be a comparative RAxML run, mostly as a concession, and I suspect only because of the ease and speed of online plug-and-play portals. *Maybe* for the ore obscure tests you mentioned, and/or if PAUP* was also available via supercomputers online with decent running times it could regain traction, but I'm doubtful.

    Apparently the beta forms of the GUI are free to download until the pay version goes live. So unless it turns out to be buggy or error prone that might be a perfectly good option. Besides, if all goes at the pace it has to date, it won't be available in a commercial form for at least another decade or so anyway ;)

    ReplyDelete
  2. Well, I certainly agree that with so many programs on the market these days it is highly unlikely that an updated PAUP* will ever, even under the most favourable circumstances, regain its formerly dominating position. Especially because there is a tendency towards larger datasets.

    That being said, I don't think that Bayesian phylogenetics can really be called the standard. Some people are very religiously Bayesian, but many others are wary of the numerous assumptions and made-up priors. Also, Bayesian analyses are really slow. On the same machine and for the same number of taxa, PAUP* would find a tree much more quickly, so if speed is the issue then TNT and RAxML are the measuring sticks, with the latter seemingly having the edge because it is likelihood based.

    ReplyDelete
  3. I probably have a jaded view of people looking for the quickest and dirtiest way to make a tree (a NJ tree from MEGA is fine, yeah?) but you're right, MrBayes is not likely to win a desktop race against PAUP* if both are now able to run on the same machine. The one thing that I have found easier in say MrBayes is splitting runs over multiple machines and recombining output. Im not sure how amenable PAUP* would be to distribution on a supercomputer cluster (I'm suspicious), but that would be a VERY useful function.

    ReplyDelete
  4. No idea. But the point about speed is not that we want the "quickest and dirtiest way" but that some datasets are so large that a Bayesian analysis just does not converge in any realistic amount of time. If you need to run it on a supercomputer for weeks and weeks and weeks you start wondering what it is good for if a different method gets you a topologically pretty much indistinguishable tree after a few hours. Not that I regularly have that problem, but the people who are doing analyses with hundreds of genes across an entire order of insects do.

    ReplyDelete
  5. Not trying to be defensive, but I really have no interest in pushing people to use PAUP* (e.g., have you ever seen me post an advertisement for its use, other than documenting its availability for certain new capabilities such as SVDQuartets?). I certainly am not interested in it regaining its "former dominating position" or even "regaining traction." It's there if you want to use it; if you don't, that's fine too.

    The only reason I am planning to charge for the GUI version is that I have no grant support, and need a small income stream that I can use to update software, occasionally buy a new computer, etc. My thinking was that I would make the "science" part of it free (and open source), but ask people who want the GUI capabilities (which I spend many nights and weekends working on) to help me out a little.

    Dave Swofford (author of PAUP*)

    ReplyDelete
    Replies
    1. Thanks for the comment.

      I did not intend to imply that I thought you were doing this for horse-race purposes. I hope it becomes clear from the OP that I appreciate the versatility of PAUP and am very familiar with how to use it, and consequently just happy that it is being updated.

      There is a large number of highly specialised programs now that can do one thing well but only one thing. PAUP is one of only a small hand-full of programs that offer a very wide array of phylogenetic methods, tests and settings in one place.

      Delete
    2. Sorry, I should have been more clear that I was referring more to the comments by Bort than to the OP.

      Delete
    3. I am very happy to know that PAUP* is still used by others too. It is my "toolbox" and I can't imagine doing my work without it, even with this age of "big data". Time and again, I have smaller datasets that I just need to do a little work on, and there it is, PAUP*, strong and sturdy as ever!

      Delete