As I continue to contemplate Ebach and Michael's recent paper From Correlation to Causation: What Do We Need in the Historical Sciences?, I would first like to make clear that I like reading and appreciate papers like this one. If we want to get things right it is crucial to be pushed out of one's comfort zone from time to time, so asking a question like "have recent developments in the field gone totally in the wrong direction?" has its value.
That being said, however, it would appear to be a reasonable assumption that >90% of the experts are somewhat unlikely to all overlook a fundamental flaw in what they are doing. It could happen, yes, but especially as a non-expert one would need to see a rather good and clear argument before agreeing that they do.
With this in mind, I will now describe my thoughts about the core of the paper and my understanding of what the authors argue for and why. The first parts of the paper feature a lengthy discussion of the interpretation of ancestry and character evolution in phylogenetics and evolutionary biology, of unstructured and structured representations of the same data, and of the dangers of letting unwarranted assumptions distort the data. There might be quite a bit to be discussed here - for example the claim that "historical sciences, such as taxonomy and palaeontology ... are mostly descriptive and defy testing", which might be news to the discoverers of Tiktaalik, who were able to predict and subsequently test their prediction of where to look for this "missing link" - but the meat of the paper as I understand it starts with the criteria the authors suggest for "comparing ... assumptions against a well-attested set of aspects of causation".
Reference is made to the Bradford Hill Criteria of the medical sciences, and they are then adapted to the presumed needs of historical science, which, as discussed in the earlier post, the authors perceive to be fundamentally different from experimental science. The new Historical Sciences Bradford Hill Criteria are presented under key words that sometimes describe what to look for and sometimes describe what to avoid:
Selection bias. This is an obvious problem in science, although I must say that I do not find the specific examples provided by the authors to be the most convincing.
Temporality. I am afraid I do not really understand what is meant here, so I will quote the relevant paragraph from the paper in full, excepting references. "Present day distributions are the result of past events. Therefore there is the possibility that different taxa alive today may be resultant in different events that occurred at different times. In using a single historical event (e.g., the Oligocene drowning of New Zealand ~30 Ma) to address a larger biogeographical question has resulted in several debates about the age of the New Zealand biota. Single event hypothesis are rare, however little to no scrutiny is typically taken to test their validity."
Especially considering that in the original, i.e. medical science, criteria temporality was apparently about dose-response and reversibility of an effect, I am unsure where the above comes from. It is even less clear to me what is wrong with using a single historical event to address a large question. If New Zealand was indeed completely under water then it quite simply follows that all endemic land-living organisms would at that moment have perished, and that the current land-living organisms would have had to disperse into New Zealand afterwards, when it rose up again. (I am not qualified to assess if it was indeed completely under water, but that is not the point.)
Evidence for a mechanism of action. This again makes more sense to me, although we may have to agree to disagree about how plausible any given assumption of a mechanism of action is. The authors, for example, appear to consider dispersal from one biogeographic region to another to be implausible, but I believe that as long as the probability of that happening is not zero it would have to be a matter of weighing it against the plausibility of alternative explanations (such as those requiring that a family of flowering plants arose before multi-cellular life).
Coherence. In effect the question whether a claim fits what else we are currently confident we know. Makes sense.
Replicability or, as I would call it, reproducibility. The authors argue that historical data are not replicable, so the equivalent is correlation between different datasets.
Similarity. Do two different datasets for the same study group arrive at the same result? It is not entirely clear to me how this is different from the previous, but at any rate the two seem sensible enough.
In summary, some of the above is not clear to me, but other aspects appear immediately reasonable. In fact there would a trivial interpretation of what the authors want to say: Examine your assumptions; remove indefensible assumptions; all else being equal, use the simpler model instead of an unnecessarily convoluted one.
But one would assume it cannot really be that easy, because there is hardly anybody in science who would disagree with that. Modellers are all aware of the danger of over-parameterisation; the problem is simply that all else is not always equal. Sometimes a more complex model is quite simply the better explanation. If you have a plot of dots forming a straight line as data, a very simple linear model with one parameter will do nicely. If you have a plot of dots forming an S-shaped pattern you will quite simply not be able to explain them with such a simple model, you need more parameters. I would suspect the same applies to biogeography; if there are data that defy the explanation of vicariance then our explanation needs to incorporate more processes than vicariance. I thus find hard to accept the authors' judgement that "complex models are designed to extrapolate data under highly speculative assumptions" whereas "simple models, with plausible assumptions[,] are more likely to pass the [criteria]". It really depends.
Similarly, everybody in science will agree that we shouldn't base our conclusions on bad assumptions. Problem is, everybody will argue that their assumptions are the good ones. Perhaps now would be a good moment to turn to the example provided by the authors of the present paper, to see how they use their new criteria. This might also clear up those aspects that I did not understand when reading the criteria themselves.
Interestingly, the four methods compared by the authors under their criteria are at least partly apples and oranges. They are Brooks Parsimony Analysis (BPA), Ancestral Area Analysis (AAA), Ebach's own Area Cladistics (AC), and the Dispersal-Extinction-Cladogenesis model (DEC). To the best of my understanding the point of BPA is to reconstruct how biogeographic regions are related, as in "the rainforests of New Zealand are sister to the temperate rainforests of Australia, and sister to both of them are the temperate rainforests of Patagonia" (this not a quote but a hypothetical). We might also call this reconstructing the evolutionary history of biogeographic areas. In contrast, my understanding is that the other three are concerned with reconstructing the inverse, the biogeographic history of an evolutionary lineage, either in its entirety or at least to infer where the common ancestor of the lineage was found (although admittedly I was unable to look deeply into AC as the relevant paper was behind a paywall).
Still, all four are biogeographic methods. I found it easiest to proceed once more criterion by criterion.
Selection bias. DEC is criticised for assuming that "areas" are the result of dispersal and extinction, while the criterion is said to be inapplicable to the other three methods because "the type of area is not specified" in any of them. Once more I can only say that I don't get it.
There are two possible interpretations of "area" in this context. The first is that we are talking about the cells or regions defined a priori as the units of the analysis. If this is the case, then all four methods face the exact same problem, because in all cases the user has to define areas a priori. But this doesn't make sense because the cells defined for a DEC analysis are quite simply not "considered as [sic] a result of dispersal and extinction", they are the units of which a potentially larger range considered to be the result of dispersal and extinction consists.
The second possibility is that we are talking about the results. If this is the case, then yes, obviously a Dispersal-Extinction-Cladogenesis model assumes that the present ranges of organisms are the result of dispersal and extinction (and cladogenesis). That's the point. But if this is what we are talking about then we cannot simply say "doesn't apply" for the other three. AC, for example, assumes that current ranges are the result of vicariance, so at the very least it would need a green marker for a plausible assumption, if indeed we find this assumption plausible; realistically, we would have to start discussing whether vicariance as the only process makes sense.
Temporality. As mentioned above I don't understand how the things considered under this name are any more temporal than the ones that aren't. The example does not really clarify the matter for me either. BPA and DEC are criticised as "speculative" because they use "incongruence" (between distribution patterns of different lineages? I believe that is not how DEC works...) to "explain" or "justify" "ad hoc events" or "processes". First, I think what is meant here is the other way around, i.e. that BPA and DEC explain certain patterns by invoking events that the authors consider to be ad hoc assumptions, apparently in practice meaning any biogeographic process except vicariance.
Second and more importantly it is, to say the least, not clear to me why vicariance is less ad hoc than dispersal, extinction and cladogenesis, which just goes back to my earlier point that everybody thinks their preferred explanation is the plausible one. Anyway, AAA is likewise criticised as speculative because "duplicated areas are considered to be part of an original ancestral area". AC, on the other hand, is given a green for entirely plausible assumptions because it only assumes that "geographical congruence is a result of common historical processes". Taken on its own that may sound reasonable, but what about the incongruences? Are they simply ignored? As mentioned above, if there are data that defy a one parameter model then more parameters would appear to be warranted.
There is little to say about evidence for a mechanism of action because none of the four methods is given a clean bill of health. I actually find it rather impressive that the authors call this aspect even of their preferred method speculative for explaining every congruence with vicariance. I do not, however, understand what is meant with "tree topology determines all processes" in the case of DEC. Taken at face value it is plainly wrong because not only the phylogeny but also the present distribution data go into the analysis. What is more, the same necessarily applies in the three other methods, only that some of them use "areagrams" instead of the phylogenetic tree of a group of organisms.
Finally, the treatment of coherence, replicability, and similarity seems even stranger to me. CA is lauded for comparing its results against other data, and with the exception of BPA for similarity the three other methods are criticised for not doing so. But how does the method determine what the end user does with it? What if the user of CA decides not to make any further comparison? What if the user of the DEC model goes on to apply the model to the next four lineages occurring across the same biogeographic areas? How would using DEC exclude such a possibility?
Maybe I am missing something, but it seems to me as if all four methods generally merit at best the same colour, or level of plausibility, on all criteria. If anything I would look somewhat askance at BPA, AAA and CA for simply assuming that the concept of "areagrams" makes sense in the first place, because if there is any significant degree of exchange between biogeographic regions it doesn't.
Either way I am afraid I cannot claim to have understood how to apply these new criteria to in an unbiased manner going forward.