Most of this week I am at the 2015 CBA conference Species delimitation in the age of genomics.
Although the title suggests that you'd die of alcohol poisoning in short order if you took a drink every time you heard somebody claim that genomic data are going to solve all our problems, the actual talks happily do not fit that stereotype. Maybe people have seen enough genomic data now to dispense with the hyperbole.
Today started off with a talk by the Australian philosopher of science John Wilkins. He argued that the traditional approach taken by most people who are explicitly grappling with the species problem is to declare a theory-based species concept, try to force it onto reality, and then consider the species-ness to be an explanation for what we see in nature: This individual is the way it is because it is a member of that species. In Wilkins' opinion, that is precisely the wrong way around.
In reality, he argued, species are observed phenomena in need of an explanation, which also implies that the explanation can vary instead of being one-size fits all. As an analogy, he used national character: it is obviously wrong to say that most citizens of one nation have a certain trait because they are citizens of that nation. Instead that trait is part of the national character because most citizens have it.
So far so good. I will certainly agree that a one size fits all approach to species won't work. However, I do not really feel that the message of this talk will be very helpful in practice. If somebody asks you, "are these two populations the same species?", perhaps because one would be legally required to protect both of them if they aren't, then an academic discussion about how species are phenomena instead of groups with objective necessary and sufficient criteria will not impress them very much. Indeed they might consider everything but a clear, global criterion to be operationally useless.
The second talk was interestingly the sharpest contrast one could imagine. Dick Frankham explored the potential conservation consequences of different taxonomic decisions. If species are managed, for example through breeding programs, then over-lumping of what are really distinct biological species would lead to outbreeding depression (i.e. organisms are crossed that are so different genetically that their hybrid offspring is unfit; think mule). Conversely, over-splitting would lead to inbreeding depression as conservation managers would forbid crossings between isolated populations of what is really the same biological species to maintain their supposedly distinctive genetic heritage.
He directed most of his attention at this latter scenario and repeatedly criticised the Phylogenetic Species Concept and other "separately evolving lineage"-based concepts for leading to over-splitting. The idea here is that if an isolated population of a breeding group is only isolated for long enough - and in the case of small populations, that might be as little as two hundred years or even a few decades - then the genetic markers would show them as a separate lineage to be recognised as a distinct species although crossing them with the other populations would produce no outbreeding depression and would in fact often massively increase fitness and save them from extinction.
In other words, arguing from a conservation perspective Frankham broke a lance for the Biological Species Concept and related approaches.
Unfortunately I missed the next few talks due to other responsibilities, but I was back when Bryan Carstens argued that "gene flow should be explicitly modeled (rather than ignored) in species delimitation investigations that utilize gentic data". Although the program lists him as giving two consecutive talks, really one could say that he gave one very long talk interrupted by the lunch break. The first part discussed the practical problems that lead to the development of analytical tools whose theory was discussed in the second part.
These tools are two R packages with the names P2C2M and PHRAPL. (Le sigh. Bioinformaticians and their acronyms...) The talks became very technical very quickly, but as the title above implies the basic idea is to not only consider a "gene trees within a species tree" phylogeny that traditionally assumes complete isolation between the lineages but also varying degrees of gene flow between those lineages; in the most extreme opposite case a complete absence of phylogenetic structure with migration between all populations.
I found the talk(s) very well presented, the research humblingly impressive, and the tools potentially very intriguing, but with a few reservations, and a discussion over the lunch break showed that I wasn't alone in having the latter. First, and that is not really a criticism at all, it just has to be kept in mind that this is not really a species delimitation method anyway. It still relies on an a priori assignment of samples to lineages.
Second, and more importantly however, PHRAPL is one of those Bayesian analyses that come with insane computing times. In the example case, Carstens had only four lineages, and PHRAPL already had to test 216 models, and if I remember correctly he said that it took a week or so on a supercomputer cluster. If one is interested in higher numbers of lineages, the number of potential models explodes faster than exponentially - the only solution is to severely limit the number that one even wants to consider.
One of my conversation partners over break time also raised the question why the colleagues inventing new computationally intensive Bayesian approaches for large datasets invariably program them in R or Java, interpreter languages that are kind of, like, REALLY notorious for being the slowest there are. Even Python would already be faster despite being another interpreter language, and if one were to write these things in C and compile them one could perhaps even finish a BEAST run of a large dataset before the beginning of the next ice age. But I digress, and it is not as if I could program something as complicated as that.
The last long form talk of the day was by Paschalia Kapil who introduced Poisson Tree Processes (PTP), a new method for species delimitation with a single locus. Although the topic was in theory less technically demanding than the previous one I found that she went through her slides a bit too fast for me to digest them, thus my notes are rather fragmentary here.
The question that might occur to some readers is whether it is ever a good idea to base species delimitation on a single marker. Surely a single introgression event would already throw everything off? But as she stressed repeatedly, these kinds of methods are not usually meant for taxonomists who do systematic studies. The end users are rather the kind of researchers who, for example, sequence entire communities of soil organisms from twenty different locations for one ribosomal gene and want to roughly guesstimate the number of species in each location - without even necessarily knowing what species those are.
Anyway, previously available methods were apparently either based on sequence similarity thresholds, which has always appeared fishy to me, or on the Generalised Mixed Yule Coalescent Model, which requires ultrametric trees and tries to work with the presumed timing of speciation events along the phylogeny. The new PTP method accepts non-ultrametric trees, a great advantage that I can appreciate immediately, and tries to work with the number of nucleotide substitutions along the branches, the logic of which I unfortunately did not quite grasp. At any rate I am reasonably sure that I won't have occasion to try this method myself any time soon.
The day ended with lightning talks, which mostly presented the plans of Ph.D. students who are still in early stages of their projects, and a poster session.