Saturday, April 26, 2014

Methods overload

The conference I attended this week was a very methodological one, with lots of modellers in attendance. Looking back over the talks I heard and the workshops that were offered, two things have occurred to me.

First, while everybody appears to assume that we are in the era of big data, especially due to genomic sequencing and the increased availability of  biodiversity databases, it seems more to me that we are instead currently drowning in methods.

There are many many people developing new methods but few willing to invest the time and money to do the grunt work of generating datasets on this organism here or that area there. Likewise there are many people who want to be known as the person who coordinated two important databases or standardised a data format but few willing to pay somebody to do the less glamorous but crucial work of maintaining data quality at the level of the individual natural history collection. Zu viele Häuptlinge, zu wenige Indianer, as the saying goes in German.

In addition to that, there are quite simply too many analytical approaches being published at the moment, perhaps partly because it is more prestigious to come up with a new simulation test or analytical software than to be the third person to use it. Just at this conference, we heard about the nearly neutral theory of evolution and ecology, alignment-free phylogenetics, a new visualisation tool for invasive species, a new molecular phylogenetics work flow, various databases, a new GIS tool, stochastic biogeographic models, new phylogeny visualisation tools, a new genome annotation tool, a species distribution modelling approach including competition, and of course a few more that I missed.

Don't get me wrong, most of them will be very useful individually, but the overall effect is overwhelming. You cannot keep track, and if you try it is like "ooh, shiny" and you will never get anything done. To some degree, one will have to ignore it all, keep the eye on the question one wants to answer, and then carefully search for the best method to address it, as opposed to become excited about a cool new toy and search for a research question that fits it.

However, and second, there is another issue, and that is that the methods get ever more complicated. In one or two talks, I simply zoned out after the first ten minutes, and I had the same feeling with the paper we discussed in journal club this week (which also introduced a new analysis pipeline). The reason is that they were way too complicated.

That complexity causes several problems:
  • They come with too many assumptions that you just have to take on faith. This is particularly obvious with pipelines using several Bayesian MCMC analyses; often you can only guess at the myriads of priors you have to set, and some of the underlying assumptions are just plain problematic.
  • They have steep learning curves, making it very hard to actually use them unless you become an expert in one and ignore all others.
  • And finally, implementing them is such an effort that the ratio between cost and benefit becomes seriously skewed. If you have to run a dozen assumption tests, two or three subsequent Bayesian coalescent and species tree analyses and two simulations just to maybe sort of answer one single rather unimportant question in one rather unimportant plant genus, maybe your time is better spent on something else.
This is why I will always have a soft spot in my heart for parsimony methods, which have very simple assumptions (mostly that the simplest explanation should be preferred), and why I am most impressed by studies that address a big question with a very elegant test. And luckily I have also seen some of those in the last two weeks.

No comments:

Post a Comment