Saturday, May 27, 2017

Reading up on biogeography part 4: track analysis for bioregionalisation

With two papers left, I was wondering whether there would still be any point to going on. The last two use track analysis and area cladograms, respectively, and those were already used by the first and second paper, so would there be any new insights into the methodology of pan- and vicariance biogeography?

However, the next paper,

Martinez et al., 2017. Biogeographical relationships and new regionalisation of high-altitude grasslands and woodlands of the central Pampean Ranges (Argentina), based on vascular plants and vertebrates. Australian Systematic Botany 29: 473-488.

... uses track analysis at least partly to do something different than the previous instance. There is the question of the "relationship" of a biome, but then there is also bioregionalisation. So that is a new angle.

The idea seems to be relatively simple. As before, the panbiogeographer looks at the occurrences of species, draws minimum-distance lines ("tracks") between them, and then identifies areas where the tracks of several species overlap as "generalised tracks". In the present case, a very short generalised track is then "used to recognise natural areas in terms of their biota because they result from more or less consistent overlapping distributions of two or more endemic taxa".

Okay, same question as always: does this make sense?

Well, more than the claim that generalised tracks are always evidence of vicariance, which this paper kind of only makes in passing (while, weirdly, explaining the panbiogeographic reasoning in words so identical to those used in the Romano et al paper that I wonder if they were in both cases copy-pasted from Croizat). To me the approach just seems part unnecessarily complicated, part not data-rich enough.

As for the first, yes, an area with several endemic taxa may well deserve recognition as a natural unit, a vegetation zone, a biome (whatever) in some area classification. But if the idea is to identify areas defined by endemic species, why do we need a track analysis as an intermediate step? Why not simply plot the occurrences of endemic species? At that point all the information is there, and tracks, generalised or not, do not add anything.

As for the second, as I mentioned before there are several other methods available for bioregionalisation. Some use clustering approaches to group grid cells or other small areas into larger areas based on shared species content or even the relatedness of those species. The newest ones use modularity or map equation analyses to examine networks of species and the grid cells they occur in; in contrast to clustering, where it is the researcher's somewhat subjective choice how many clusters to accept, these network approaches have algorithms for deciding more objectively how many truly distinct units there are.

In other words, in my eyes track analysis seems to be superfluous to requirements if we are merely interested in the simple measure of shared endemics, and it is unable to provide the depth of information that could be obtained from examining other shared distribution patterns.

Sunday, May 21, 2017

Reading up on biogeography part 3: Hopping between islands yes, hopping from continent to island no?

The third vicariance biogeography / panbiogeography paper in the special issue is

Grehan JR, 2017. Biogeographic relationships between Macaronesia and the Americas. Australian Systematic Botany 29: 447-472.

Despite being very long, its gist is easily summarised:

The mainstream explanation for the occurrence of plants and animals on the Macaronesian islands (Canary Islands, Madeira, etc.) is that they must have got there via long-distance dispersal, often from Africa but sometimes from the Americas, because the islands are of relatively young volcanic origin and distant from other land masses. However, the "model-based approaches" that this conclusion is based on cannot be accepted because they supposedly assume dispersal and ignore the possibility of vicariance.

This is followed by many pages of example cases of plants and animals illustrated with maps and phylogenies. It is not clear to me what that is supposed to show, because without a time axis it doesn't move the inference either way; at best it could show that some of the groups have a pattern that is consistent with vicariance, but if a lineage is too young then vicariance is still out, and the same if the lineage is much older than the island.

Finally, there is some speculation, again illustrated with maps, about whether there were always volcanic islands in the same area, all through from the time when the Atlantic started to open. They would have been transient on a geological scale, so the local lineages supposedly produced by vicariance when Africa and the Americas started moving apart would have had to island-hop as new volcanoes rose and older ones eroded away, over more than 100 million years.

In contrast to the previous two papers I did not really gain new insights into the methodologies favoured by vicariance biogeographers. In a sense the present paper is closer to an opinion piece or perhaps a review article than to a research study.

The supposed assumptions of "model-based approaches"

The paper claims
"Model-based approaches to Maccaronesian biogeography assume the that the [sic] sequence of phylogenetic relationships reflects a sequence of chance dispersal. Although often cited as Hennig's progression rule, it is not a rule but an assumption that does not address the equal applicability of sequential differentiation across a widespread ancestor."
And further on:
"Model-based methods use chance dispersal to explain divergence and allopatry, ..."
Unfortunately this claim at least is demonstrably false. There are various models available to do ancestral area inference (see this graphic as an example), and DIVA and very popular DEC, for example, include vicariance. That's what the V in the acronym DIVA means! If a model-based analysis with a model that allows vicariance infers no vicariance then we can assume it is not because the model does not allow vicariance, but because the data didn't support that conclusion.

I am also reasonably certain that Hennig's progression rule does not only apply to long distance ("chance") dispersal but would just as well apply to a series of range expansions followed by speciation events across a single land mass. It simply applies the principle of parsimony to historical biogeography, arguing that if several lineages along a grade occur in an area then that would probably, all else being equal, have been (at least part of) the ancestral range, because other explanations require more dispersal and/or extinction events.

It is interesting, by the way, how the word "model" seems to be used in this context, as if a mathematical description of a system is something bad.

What distribution patterns would we expect under vicariance and long-distance dispersal, respectively?

"The progression rule also assumes that a 'basal' grade is located in the source region or centre of origin, but some Macaronesian clades are basal to large continental clades, and there are also clades with 'reciprocal monophyly' in which a diverse Macaronesian clade is the sister group to a diverse continental clade. These phylogenetic and geographic incongruities do not arise in a vicariance interpretation of phylogeny, because a basal clade or grade marks only the location of the intial phylogenetic break or breaks within a widespread ancestral range."
I don't really understand the reasoning here. The idea seems to be that if an island clade is nested within a continental grade, then it may make sense to conclude dispersal, but if an island clade and a continental clade are sister to each other then it is somehow "incongruent" (with what?) and can only be explained by vicariance. Why?

I would look at the nearest outgroup to get more information, but even if that occurred in neither region then we would still have to ask if additional continental or island lineages may have simply gone extinct. The key questions are whether the lineage split is so recent that it happened considerably after continental break-up and whether an island lineage is older than the island(s). Really I don't see how we can conclude anything with confidence without a time axis.

Perhaps the idea is to equate "distribution of the species along a basal grade is evidence of a centre of origin" with "absence of such a basal grade is evidence of absence of a centre of origin"? If so, that would not be logical; absence of evidence for A is not evidence for not-A.

The paper also discusses other patterns, in this case non-overlapping ranges of related species (allopatry):
"Model-based methods use chance dispersal to explain divergence and allopatry, and yet allopatric divergence requires isolation, which cannot exist if there is effective dispersal."
The point of the second half of this sentence is a false dichotomy set up between dispersal that is so frequent that it makes speciation impossible and no dispersal at all. It seems obvious to me that the excluded middle is dispersal that happens but is too rare to make speciation impossible.
"In the same way that allopatric lineages within Tarentola are incongruent with the expectations of chance dispersal, so too is the allopatry of Tarentola and its New World sister group."
Again this makes no sense to me whatsoever, and again there seems to be some very black-and-white reasoning behind it: if species can disperse to distant islands everything should occur everywhere; but we observe that all species do not occur everywhere, so we have to conclude that dispersal is completely impossible. But this is one-to-one equivalent to the argument that you cannot produce random numbers with a die because when you cast it the second time it came up with a different number than the first time. Really, that seems to be the logic here.

One might also add that there is another fairly obvious reason why one would find patterns of allopatry even if the same region was reached two or three times by the same lineage: competitive exclusion. It is a well established, empirically tested insight of biogeography that islands (and by extension restricted areas in general) have a carrying capacity, both in overall diversity and in the number of species trying to occupy about the same ecological space. In the case of islands in particular, their species diversity is a function of size (the more land, the more species, mostly because lower area increases extinction rate) and distance from the nearest larger land mass (the closer, the more species, mostly because of higher immigration / dispersal rates filling up the species pool).

This makes a lot of intuitive sense. Assume you have a seed of a continental shrub species blown onto an island that so far has only been colonised by mosses, lichen, one species of grass, and a bunch of insects eating the former. Your shrub niche is still free, and there is nothing on the island that is adapted to eating you, so even if at first you are in a bit of trouble genetically (inbreeding) and ecologically (not used to this soil and climate) you have a reasonable chance of establishing. Now fast forward 500,000 years, and the single seed of that shrub has diversified into six species occupying every niche on the island that they could adapt to in that time, forming thick scrubland from coastal dunes to the highest peak. A new seed of a related continental shrub species ends up on the island - but now everything is occupied by relatives that have become well-adapted to this new environment. Are we really surprised that the second comer will have a harder time establishing?

Time-calibration of phylogenies, again

We had that one already in the Ung et al paper, but once more:
"Model-based methods, with rare exceptions, present molecular divergence ages as falsifications of early origins, at or before continental breakup, even though they are calibrated by fossils that can generate only minimal divergence dates. Although it is widely claimed that molecular-clock analyses are generate [sic] evidence of dispersal (Sanmartin et al., 2008), molecular divergence estimates artifically constrain the maximum age of taxa that may be much older than their oldest fossil or the age of the current island they occupy (Heads 2009a, 2012, 2014a, 2014b, 2016)."
I like the little caveat "with rare exceptions", although it is unclear what it refers to. But it is not a method, but the researcher using a method, who would draw the conclusion that a lineage diverging 12 Mya would not have diverged because of a tectonic event that happened 120 Mya. And yes, that conclusion makes a lot of sense to me, and no, "model-based" methods do not magically transform minimum ages into maximum ages. This has been discussed repeatedly in rebuttals to Heads' papers. What is more, people have run analyses using the alternative approach suggested by Heads and in the present paper and found that the results are generally absurd, such as pushing the age of the daisy family back before the origin of multi-cellular life.
"The timing of ancestral differentiation may be assessed either by fossils (including molecular extrapolations) or tectonic-biogeographic correlation."
First, fossil calibration or using estimated substitution rates are really two completely different data sources, so the former does not really "include" the latter. Second, using continental breakup to calibrate splits in the phylogeny would, as mentioned before, be circular reasoning. It would build the assumption of vicariance into the analysis to subsequently conclude vicariance as a result. I think that's not how science is supposed to work.
"Fossil data provide only the minimum known-age of taxa and [sic] fossils are often lacking for clades of interest to Macaronesia. In tectonic correlation, the estimate of clade age is more precise, because it refers to a particular, dated event, rather than a minimal (fossil-calibrated) age."
Yes, a fossil provides a minimum age. But unless I severely misunderstand something, a continental break-up could, at best, provide only a maximum age, if we assume that divergence would not have been possible before break-up. (And even that seems fishy to me, given that there are plenty of speciation events on the same landmass.) If it were to be taken as "precise" that would, once more, automatically exclude the possibility that the divergence happened later, after dispersal from one continent to the other, and that would be circular reasoning.

Even the vicariance approach would need long distance dispersal to work

Finally, I am puzzled by the idea of how the lineages would have stayed in place after the supposed vicariance event that would have happened long before the present islands came into existence:
"Island biota survives erosion and subsidence of island habitats by local dispersal onto newer volcanoes"
What I don't get is this: if a vicariance biogeographer can accept that a species hops across the ocean from one volcanic island to another, why can they not accept that it hops across the ocean from Africa onto one of the volcanic islands? What's the difference? Why is this discussion taking place again? I must be missing something very subtle here.

Friday, May 19, 2017

Reading up on biogeography part 2: Panbiogeographic Track Analysis

The second paper in this little series of posts is
Romano MG et al, 2017. Track analysis of agaricoid fungi of the Patagonian forests. Australian Systematic Botany 29: 440-446.
What I appreciated about reading it was first that it was concisely written, and second that it gave me insight into the Panbiogeographic methodology of Track Analysis. It had so far been merely a bunch of arcane terms to me, which of course makes it impossible to judge its meaning. And in contrast to the previous paper, which left out most the details of its methodology and instead referenced earlier papers, this one gives a clear explanation. This kind of stuff is exactly why I am reading through the journal issue.

So, how exactly does Track Analysis work?

First, you need species with disjunct areas of distribution - or at least species that are poorly enough sampled that they appear to be disjunct. Then you draw a line along the shortest distance between any two of their occurrences. Let's assume we have a species occurring on two islands of this little landscape I just generated in GIMP:

Panbiogeographers call this red line, with the occurrences of the species forming the end points, a Track.

If you have more than one species showing the same Track, you promote that line on the map to a Generalised Track:

To cite the present paper, in panbiogeographic logic "a generalised track ... allows inference of the existence of an ancestral biota widely distributed and fragmented by vicariance events, suggesting a shared history."

Now you may come up with other tracks in the same study group that do not run parallel. Where generalised tracks cross each other, panbiogeographers draw a circle with an X in it and call that place a Node, like this:

In this case, their interpretation is that this is "a complex area, where different ancestral biotic and geological fragments interrelate in space-time as a consequence of terrain collision, docking or suturing".

Aaaaand... that was it, really. Draw some lines on the map, conclude vicariance and "complexity". The rest of the conclusions in the present paper are largely about the need for more sampling, and that fungi can also be used as a study group.

Does this approach make sense?

Unfortunately, I don't really see it. The logic behind the panbiogeographic interpretation of Generalised Tracks is that patterns of disjunction shared by several taxa are evidence of vicariance, presumably because they assume that chance dispersal would have to be utterly random and create different distributional patterns in each and every species.

But a little contemplation should blow that idea out of the water. There are several other good reasons why disjunct ranges can be shared across taxa. One would be an a priori lack of alternative habitat - if you have two wet patches and otherwise only steppe, then all wetland species will be restricted to those two patches, even if one of the two wetlands was colonised from the other entirely through long distance dispersal. And that restriction alone will produce a shared history, without vicariance. Another option would be prevailing wind or ocean currents, which make long distance dispersal decidedly more probable in some directions even as it is still a stochastic process (dice, but a bit loaded) and, more importantly, not vicariance.

The interpretation of Nodes as showing things like terrain collision also seems to be missing a few crucial steps, at least in my eyes. Don't get me wrong, I am as aware of fossil ranges being an important part of evidence in geology as the next biologist, but still I'd actually prefer to consult a geologist instead of trying to deduce geological history from patterns of distribution alone.

Finally, this whole approach appears to have a weakness that seems quite critical. Science does not proceed by knowing how to confirm, it proceeds by knowing how to reject a hypothesis. Now the question here is this. Yes, panbiogeographic track analysis is apparently designed to conclude vicariance and an area being "complex". But if a disjunction really has not been caused by vicariance, how would a panbiogeographer conclude that? Would they ever do so?

That, alas, is left unexplained, at least in this paper.

Sunday, May 14, 2017

Reading up on biogeography part 1: area cladogram for the southwest Pacific

As mentioned in the previous post, I am hoping to learn more about the reasoning behind panbiogeography and area cladistics by going through the relevant papers in the recent special issue of Australian Systematic Botany. Starting with area cladistics, I have today finished reading

Ung V, Michaux B, RAB Leschen, 2017. A comprehensive vicariant model for Southwest Pacific biotas. Australian Systematic Botany 29: 424-439.

To the best of my understanding the key steps of the study can be summarised in a very bare-bones fashion as follows. The authors...
  1. State that very little is still known about area relationships, as most research focuses on ancestral area inference for individual taxa.
  2. Summarize at length - over nearly four pages - the geological and tectonic history of the region. I cannot judge any of this at all and will consequently take it as given, although it is puzzling that no reference seems to be provided for the claim that the now largely submerged region had much more dry land when it broke away from Gondwana.
  3. Divide the study region into areas - the details don't matter for present purposes.
  4. Compile 76 phylogenies for plant and animal taxa occurring in the study region, and replace the species with the combination of areas in which they occur.
  5. Discuss the 'problems' of incongruence between the area relationships in these individual phylogenies, of terminal taxa occurring in several areas, which they call "taxonomic paralogy", and of the same area occurring in different branches of a phylogeny, which results in what they call "paralogous nodes". They decide to exclude these confounding nodes and to use only "paralogy-free subtrees" by applying a "transparent method" that I had not heard of before.
  6. Turn the trees into "three-item statements" and use those to produce a consensus area cladogram.
  7. Present the consensus area cladogram.
  8. Argue that one larger area that they had hypothesised is not a "real biogeographic entity" because it is paraphyletic on the area cladogram.
  9. Argue that New Caledonia's "highly endemic flora and fauna are ancient" because of its "basal" position on the area cladogram. I am not sure that this follows, and am a bit concerned about the potential of scala naturae thinking here, but that is not the main point here.
  10. Agree with panbiogeographer Michael Heads that any and all time-calibrated phylogenies are unreliable. Then they proceed to a lengthy attempt at time-calibrating their area cladogram based on plate tectonics.
I would like to explore in a bit more depth items #1, #5 and #10.

Does the concept of area relationships even make sense?

I cannot say that this paper has me convinced. To quote a few sentences where the authors themselves discuss problems:
In real-world situations, individual areagrams are unlikely to be congruent with each other and the problem, therefore, arises as to how best to deal with this incongruency [sic]. The main sources of incongruency [sic] are the occurrence of widespread taxa (multiple areas on a single terminal, or MASTs, for short), redundant areas (resulting in taxonomic paralogy), missing areas and inadequate methods of analysis (dos Santos 2011). Redundancy, the repeated occurrence of the same area in different branches on the areagram is nigh on universal and results in paralogous nodes. [...] [These] yield no information about area relationships and obscure the real relationships between areas.
Honestly, when I read this I am drawn to a very different conclusion than that we have to exclude all "paralogous nodes": maybe there is so much noise because stuff moves around too much. In other words, the concept of an areagram or area cladogram makes exactly as much sense as trying to force members of the same sexually reproducing animal population into a phylogenetic tree. Where there is no phylogenetic structure, phylogenetic trees are not an appropriate representation of the data.

Another issue I wonder about is the use of the term paralogy in this context. The word comes from gene evolution. Imagine a gene has duplicated in a distantly ancestral species, and subsequently both copies A and B evolved to have different functions. (This is, of course, one of the main ways in which new genes come into existence.) All descendant species inherit both genes. If we now look at a bunch of descendant species and want to figure out their relationships, we need to make sure we compare only the A copies or only the B copies. Comparing the A copy from one descendant with the B copy of the other misleads our analysis; the A and B copies are called paralogues of each other, and the A copies from different species are called orthologues of each other.

What I do not understand is how the situation in areagrams is supposed to be equivalent enough to use the same terminology. Areas are not genes that are inherited by species lineages. At best, it is the other way around: if the assumptions of area cladistics are true (which I doubt), then species lineages are comparable to genes inherited by areas. The same mistake as taking two paralogues as orthologous in genetics would then be to treat two species lineages in different areas as orthologues although they already diverged before continental breakup.

But the way the word is used here is in the former sense, when contemplating areas on a phylogeny, not when contemplating lineages in areas. This use of genetic terminology is rather confusing, I have to say.

What is the problem with time-calibrated phylogenies?

The open access de Queiroz paper in the same issue does a good job at discussing Heads' and the present authors' criticism of molecular dating, so just very quickly, there are two arguments here:

First, that
using substitution rates derived from modern taxa and then applying them over evolutionary time, often to groups only distantly related, is not justifiable
This is true as far as it goes, but the problem is that to the best of my understanding for the conclusions favoured by Heads to be realistic, substitution rates would have to be off by an utterly unrealistic factor. We are talking cases here where he sees a divergence as having happened tens of millions of years ago when the molecular data say a few million years. And why would we assume such massive shifts conveniently in just the direction needed to make vicariance a viable explanation, and in the absence of any other argument? Sorry to say, but that looks a bit like ad-hoccery to me.

I hope this is not taken to be too inflammatory, but it reminds me of those young earth creationists who are worried about the starlight problem and then argue that a few thousand years ago the speed of light must have been orders of magnitude higher. There is, indeed, a very practical parallel: just like the creationists in question do not take into account what such a change would do to other physical parameters (E=mc^2, meaning that our planet would have been incinerated), so in this case nobody seems to consider what a massively higher mutation rate would have done to the biology of the affected species.

The second argument is that
the same can be said for dating phylogenies using the age of the oldest fossil, which, despite giving only a minimum age for divergence, becomes a maximum estimate by proxy (Heads 2014b)
As has been discussed at length in rebuttals of Heads, including again in the aforementioned de Queiroz contribution, this is half nonsense and half, let me say, odd. It is nonsense in the sense that fossils are indeed used as minimum ages, not as maximum ages. I have myself recently used the R package chronos to time-calibrate trees, and you simply tell the analysis to make a divergence no younger than so and so, and that's that. Admittedly you generally also want to have some realistic maximum age for the entire tree, but that can be way higher than any minimum age you set. In fact, I wrote a blog post about this stuff not too long ago.

In Bayesian analyses, it is true, it is necessarily the case that there will be a limit to how much older than the fossil the results can realistically be because calibration is usually done with priors. The user sets a prior probability distribution where the probability of divergence, which necessarily has to add up to 100% over all possible times, will become so close to zero as to make no difference if we only go far enough back in time. It is, after all, impossible to stretch 100% out over infinity years and still have 10% per million years left.

But here is where the argument also gets distinctively odd. What Bayesian phylogeneticists do in practice is to set a relatively high probability around the time where the fossil was dated, and then have it peter off towards the past. The question is now: what else would one do? Is it not eminently reasonable to assume that the further into the past we go from the known existence of a lineage, the less likely it is that it already existed? Surely it is reasonable to assume that if the oldest known fossil of a plant genus is from 20 Mya, then it is quite likely that the genus already existed around, say, 21 Mya, a bit less likely that it existed 30 Mya, still less likely that it existed 50 Mya, and vanishingly unlikely that it existed as long as 200 Mya?

The problem with time-calibrating a tree based on plate tectonics is, in turn, that it front-loads the analysis with the assumption that there is no dispersal between areas. For the purposes of the discussion around vicariance and dispersal it is circular reasoning.

But to end on a positive note, despite approvingly citing panbiogeographers the authors of the present paper actually do not seem to argue that dispersal between areas is impossible; they merely kick out the data that I would interpret as showing such dispersal to infer the 'real area relationships'. Admittedly that could be seen as equivalent to kicking out all the genes I share with my father to claim that my genetic relationship with my mother is the 'real' one, but well, it still makes more sense to me than hostility to the mere possibility of dispersal!

Thursday, May 11, 2017

Panbiogeography and area cladistics galore

Today the newest issue of Australian Systematic Botany came out, and oh boy is the content interesting. It is the first in a series of special issues on biogeography - but that is not the main point, to which I will come later.

I have tried before to systematise for myself what biogeography is actually about. What is its research program? Trying again, and perhaps in a way that reflects my current thinking:

1a. Inferring ancestral ranges, and closely related to that...
1b. Inferring biogeographic events and their timing.

This kind of research is focused on a given clade and tries to understand how its species came to occupy the ranges they do today. It uses a number of approaches and software tools that attempt to infer ancestral ranges given a generally time-calibrated phylogenetic tree of the study group, contemporary distributions at the tips of the tree, and a model specifying what biogeographic processes are 'allowed' to happen. Examples include originally parsimony-based Dispersal And Vicariance Analysis (DIVA), the Dispersal, Extinction and Cladogenesis model (DEC), and others.

A typical result would be on the lines of, "the ancestral range of this genus was in the south-east of the continent, and we estimate ca. three sympatric speciation events and ca. two vicariance events in its history", often illustrated with a phylogeny whose branches are labelled with the relevant ancestral ranges and biogeographic events.

2. Species distribution modelling

Research in this field tries to estimate where a species can occur, usually given presence data for the species and climatic, soil and other data for those known locations. This can be used, for example, to predict to where approximately in Australia an invasive species could spread out if it were introduced from its native range in, say, South America. Computationally intensive, species distribution modelling is a relatively recent development. That being said, it was the big hot new thing when I did my first postdoc, so recent is to be taken relative.

Obviously, a typical result would be a map with different colours indicating different probabilities of the species being able to exist in those locations.

3. Spatial studies

This field divides a study region into cells, often equal area grid cells, and attempts to quantify diversity metrices such as species richness, endemism, and phylogenetic diversity. Hotspots of diversity can then be targeted for conservation, or they simply provide information on the evolution of present diversity, especially if they are hotspots of palaeo- or neoendemism. This work has only really become possible with the availability of large biodiversity databases of geo-coded specimens.

A typical results would be on the lines of, "the study group shows the highest endemism scores in the south-west and the tropics".

4. Bioregionalisation

The idea here is to distinguish bioregions across the landscape that are significantly different from each other in their species or lineage content, and to figure out where their approximate borders are. Traditionally this was done very intuitively and based mostly on the presence or absence of key taxa. Today researchers often use computers and grid cell-based approaches similar to those in spatial studies, only that they compute pairwise dissimilarity scores between grid cells. Cells are then clustered into bioregions or, in the most novel approaches, submitted to network analysis.

A result might read: "Our analysis shows four major bioregions, the monsoonal tropics, the Eremaean, the south-west, and the temperate south-east. The border between the monsoonal tropic cluster and the Eremaean cluster is, however, considerably further south than estimated by a previous study..."

5. Area cladograms

And this is where I am leaving my comfort zone, because while I have used #1 and #3 and at least dabbled in #2 and #4, this one is weird to me and will probably remain so.

The idea in this case is to use areas or bioregions as the units of an analysis that is supposed to show how the areas are related. In other words, something like a phylogenetic analysis of areas, using their species content as data, and with a result on the lines of "the Australian temperate rainforests are sister to the New Zealand temperate rainforests, and together they are sister to the Patagonian ones" (not necessarily a true result, just to get the concept across). There are a few methods available for this, and they are generally parsimony based and by now quite dated.

The obvious problem here is that this whole procedure is based on a number of assumptions that I can only consider dubious. Just like phylogenetic reconstruction of the tree of life must assume, in that case rather sensibly I believe, that there is no significant gene flow between, say, cattle and primroses, building a tree of bioregions must assume that there is no significant dispersal or species exchange between the various area it uses as units of analysis. And that is where it all falls down for me, because of course species disperse happily from area to area. There are no barriers that are remotely as strong as as the barriers to gene flow between different species.

The present issue of Australian Systematic Botany

So we arrive at the present issue of Australian Systematic Botany, which is, as mentioned, the first in a planned series on biogeography. My personal perception is that of the above fields of research, the cutting edge is today in ancestral range inference and spatial studies. Species distribution modelling is often more seen as part of ecology rather than systematics; the scope for large numbers of bioregionalisation studies is obviously somewhat limited, given that there are considerably fewer bioregions than species; and I thought that area cladograms were more a thing of the 1980s or so.

But the papers in the present issue show that they are still being done - and so is panbiogeographic track analysis!

This is going to be very interesting, because when I read either of these approaches I have the same feeling as when examining some of the pro-paraphyly literature: intellectual challenge in the sense of having to understand a mode of thinking that is very, very alien to me. But that just makes it more important to try and follow the reasoning, even should it ultimately not be found convincing.

In particular I am looking forward to seeing a track analysis in action when I come to those papers, because so far it really has not clicked for me what they are supposed to show and how their conclusions can possibly be justified.

To summarise, the articles in the issue are:

1.&2. Two very short introductions.

3. Alan de Queiroz rebutting an earlier article by panbiogeographer Michael Heads. This one is open access, and otherwise stands out in that it seems to be the only article by a mainstream biogeographer. As I pretty much agree with everything it says I will not have any comments on it.

4. Ung et al. constructing an area cladogram for "southwest Pacific biotas", with the abstract indeed containing phrases such as "the islands of the Southwest Pacific are more closely related to each other than they are to Australia". Interestingly, they call their results a "model".

5. Romano et al's panbiogeographic track analysis of agaricoid fungi of the Patagonian forests.

6. An extremely long article by panbiogeographer John Grehan on relationships between America and Maccaronesia.

7. Martinez et al. conducting a panbiogeographic track analysis on plants and animals of the Argentinean pampas.

8. And with Corral-Rosas & Morrone another area-cladistic analysis, this time with Mexico as the study area.

Ancestral range reconstruction for individual clades or spatial analyses, on the other hand, are clearly MIA. So at a minimum one would have to say that this is, at the moment, still a rather narrow representation of the field of biogeography.

Friday, May 5, 2017

A good read on superhuman artificial intelligence

This essay written by Kevin Kelly must be the most sensible text on superhuman artificial intelligence (AI) and the allegedly imminent "singularity" that I have ever read.

Although it appears to get a bit defensive towards the end, I am in complete agreement with all main points. In my own words, and in no particular order, I would like to stress:

There is no evidence that AI research is even starting to show the kind of exponential progress that would be required for an "intelligence explosion".

There is no evidence that intelligence can be increased infinitely; in fact there are good reasons to assume that there are limits to such complexity. What is more, there will be trade-offs. To be superb in one area, an AI will have to be worse at something else, just like the fastest animal cannot at the same time be the most heavily armoured. Finally, we don't want a general purpose AI that could be called "superhuman" anyway, even if it were physically possible. We want the cripplingly over-specialised ones. That is what we are already doing today.

Minds are most likely substrate-dependent. I do not necessarily agree with those who argue that consciousness is possible only in an animal wetware-brain (not least because I am not sure that the concept of consciousness is well defined), but it seems reasonable to assume that an electronic computer would by necessity think differently than a human.

As for mind-uploading or high-speed brain simulation, Kelly points out something that I had not previously thought of myself, even when participating in relevant discussions. Simulations are caught in a trade-off between being fast because they leave lots of details out on one side, and being closer to reality but slower, because more factors have to be simulated. The point is, the only way to get the simulation of, say, a brain to be truly 1:1 correct is to simulate every little detail; but then - and this is the irony - the simulation must be slower and more inefficient than the real thing.

Now one of the first commenters under the piece asked how that can be true when emulators can simulate, 1:1, the operating system of computers from the 1980s, and obviously run the same programs much faster in that little sandbox. I think the error here is to think of the mind as a piece of software that can be copied, when really the mind is the process of the brain operating. Simulating all the molecules of the brain with 1:1 precision, and faster, on a system that consists of equivalent molecules following the same physical laws seems logically impossible.

Finally, one point that Kelly did not make concerns the idea that a superhuman AI could solve all our problems. He discussed that more than just fast or clever thinking is needed to make progress, experiments for example, and those cannot be sped up very much. But what I would like to add is that of our seemingly intractable problems the really important and global ones are political in nature. We already know the solutions, it is just that most people don't like them, so they don't get implemented. Superhuman AI would merely restate the blatantly obvious solutions that human scientists came up with in the 1980s or so, e.g. "reduce your resource consumption to sustainable levels" or perhaps "get the world population below three billion people and keep it there". And then what?

Friday, April 28, 2017

Arguments for paraphyletic taxa: orchid taxonomy edition

As usual, the following is my personal opinion and not necessarily the official stance of any person or institution that I am affiliated with or related to, and so on.

One of the recurrent topics of this blog is the controversy around the acceptance of paraphyletic taxa. Although I have become a bit jaded over the years, my original stance was, and to a certain degree still is, that I am trying to understand the reasoning offered by colleagues who think that paraphyletic taxa are acceptable or even unavoidable. Because, who knows?, there may be a novel argument that shows cladism to be misguided after all, and I want to keep an open mind.

Sadly, however, it is mostly the same few talking points that lost the discussion in the 1970s and 1980s, resurfacing again and again. It is rare, although not unheard of, that a new and truly interesting argument is presented.

Today's candidate paper freshly online is
Baranow et al. 2017. Brasolia, a new genus highlighted from Sobralia (Orchidaceae). Plant Systematics and Evolution. DOI 10.1007/s00606-017-1413-z
The authors present phylogenetic analyses and change the classification of the titular orchid genus. The only point of interest for present purposes is that they argue for the recognition of Sobralia section Sobralia at the genus level despite that group being paraphyletic, and in what follows I do not want to imply any criticism of any other part of the publication or of the hard work the authors have put into their study. It is only the theory of classification that I like to hash out.

The argumentation in favour of paraphyletic taxa runs across three paragraphs in the discussion section. Let's see if I can learn something new!
In the light of phylogenetic outcomes, the proposed taxon is paraphyletic, which means that its species have a common ancestor, but the taxon does not include all its descendants (e.g., Elleanthus).
Polyphyletic taxa also have a common ancestor, so by the reasoning implied here one could justify any classification whatsoever. I am consequently unsure what the point of this first sentence is.
Monophyly in its broader definition describes groups with a common ancestry, including both paraphyletic and monophyletic groups (sensu Hennig 1950); therefore, Hörandl and Stuessy (2010) advocate returning to this broader definition of monophyly and, adopting Ashlock's term, holophyly for monophyly s.str.
Again I am afraid I must be missing the point. The controversy is really about whether we should consistently classify by relatedness or not. I don't mean to be uncharitable, but this could potentially be taken to mean the authors hope that recognising non-monophyletic taxa would become more palatable to mainstream systematists if one could hoodwink them into forgetting what monophyletic means. It would then be equivalent to hoping that your child will accept a mountain hike instead of the promised trip to the beach if you just said "mountains are also a kind of beach" with enough conviction. Nice try, but there will still be no swimming in the ocean, and little Tommy sees right through it.
Paraphyly is a natural transition stage in the evolution of taxa (Hörandl and Stuessy 2010). According to Brummitt (2002), paraphyletic taxa are ''products of the evolutionary process, which is left behind as evolution moves on to a new level of organization.''
The logic of these quotations appears to be as follows: "We really, really want to recognise paraphyletic taxa. So we draw a paraphyletic taxon onto the phylogenetic tree. Look, cladist, there is a paraphyletic taxon in the evolutionary process! Why are you so unreasonable not to accept it?" Unfortunately, circular reasoning does not become more convincing just because it has been published somewhere and can now be cited.

To clarify, there are no paraphyletic taxa out there in nature; there is only a tree of life, and phylogenetic systematists consistently circumscribe taxa on that tree to be monophyletic, while 'evolutionary' taxonomists circumscribe some taxa on that tree to be paraphyletic.
We realize that this is in conflict with commonly accepted phylogenetic methods which declare that monophyly s.str. should be the only criterion for grouping organisms.
A "phylogenetic method" is what produced the orchid phylogeny, so I assume what is meant here is "approach to classification". But whatever, that is not the point, so onwards.
However, a somewhat analogical situation has been recognized within Coelogyne (Gravendeel et al. 2001). In this case, the authors interpreted the morphology of the studied species as not corresponding to the cladograms, probably as a result of convergent evolution and they decided to maintain polyphyletic Coelogyne. Kolanowska and Szlachetko (2016) postulate to maintain paraphyletic Odontoglossum.
This appears to be an instance of the argumentum ad populum, and not even very much populum at that. Consider: is it a good idea to shoot a stapler into your own foot? Okay, so there will have been at least two people in the history of humanity who have done that, so you could now cite them for support. But does that make shooting a stapler into your foot any more sensible? Exactly; a better argument is needed here.

Also, as I only realised some time after first drafting this, the senior author of the present paper is the same as in one of those two references. So this is apparently also an instance of the rarely seen ipse dixit. (It is, of course, valid to cite one's own prior research results, but in this case we are dealing not with an empirical question but simply with the argument that an action is acceptable because it is not unprecedented.)
Recognition of distinctive characters which have evolved in a group is essential for an understanding its evolution (Brummitt 2006).
Quite the opposite, in my eyes: having an accurate classification is essential for understanding evolution, because paraphyletic taxa mislead us about relationships. In the present case, treating Elleanthus as a subgroup of Sobralia would (correctly) show that Elleanthus evolved out of Sobralia, whereas treating Sobralia and Elleanthus as separate genera implies (wrongly) that they are evolutionarily distinct units, side by side.
This point of view is shared by numerous other authors (Sosef 1997; Dias et al. 2005; Nordal and Stedje 2005) who state that traditional classification is the optimal tool for cataloging biodiversity and requires recognition of paraphyletic taxa.
This reads like more argumentum ad populum, and sadly it is left unmentioned why paraphyletic taxa are supposedly required.
We decided to follow the Darwinian (evolutionary) classification, which requires consideration of two criteria: similarity and common descent.
Leaving aside the obvious argument from name-checking here, which is exactly as relevant as using Newton to reject Einstein (and for the same reasons), the problem remains that trying to classify by two criteria at the same time will lead to a useless classification that is not reliably reflecting either.

Assume I have never heard of Sobralia before, and then it is mentioned to me for the first time. Given a phylogenetic classification, I know that it constitutes a natural group whose members are each other's closest relatives. Given a classification as argued for in the present paper, it could be a natural group... but it could also be a group defined by similarity that includes species more closely related to another genus than to any other species of Sobralia. I just won't know.
The approach will allow us to propose a classification based on the phylogenetic relationships, but at the same time it will be practical--with clearly defined and recognizable units.
No, sorry to say so, but it quite simply will not. First, it will not be based on phylogenetic relationships, because in one crucial instance phylogenetic relationships will be ignored. Second, and again, it will not be practical, because if two criteria are mixed the end user cannot know without going back to the original publications whether a given group was circumscribed based on relatedness or based on 'similarity', see above.

Now obviously I understand that this is not a theory paper arguing for a wholesale shift in our practice of classification. What is more, I know we cannot expect all solutions to be easy or all groups immediately to be circumscribed as monophyletic the moment somebody looks at them. I can happily accept a paper concluding "we know this group is probably paraphyletic, but for the moment we don't have a better solution, let's wait until more data are in", or "the group is clearly polyphyletic, but at this moment we do not want to make hasty taxonomic changes", or something along those lines.

But the three paragraphs quoted above were specifically meant to justify the ultimate recognition of paraphyletic genera, so one would expect to find a convincing justification. Sadly I, personally, have to admit to being anti-convinced by this paper, which as previously mentioned I take to mean an argument had the effect of making me even more convinced of the idea it was meant to refute, in this case classification by relatedness.

Sunday, April 23, 2017

The unexpected dangers of rerooting phylogenies

A couple of days ago a colleague circulated the following recently published paper,
Czech L, Huerta-Cepas J, Stamatakis A, 2017. A critical review on the use of support values in tree viewers and bioinformatics toolkits. Molecular Biology and Evolution. DOI: 10.1093/molbev/msx055
The authors found something that, in retrospect, seems glaringly obvious. Phylogenetic trees are nearly always saved in the Newick format of nested brackets, for example as follows:
In this case we are dealing with a rooted tree of only three taxa. A is sister to a clade of B and C. The numbers after the colons indicate branch lengths, and the 99 directly after the brackets is a support value, most likely bootstrap, for the sister group relationship (B,C).

The problem explored by Czech et al. is ultimately that under the Newick format branch support values or other branch annotations are not actually attached to branches; they are attached to nodes. In this case, for example, the 99 is attached to the node that is the hypothetical common ancestor of B and C. Logically, because the tree is rooted we can assume that the support value is meant for the branch leading down from the ancestor of B and C towards the root.

But what if we reroot a tree with node annotations that are really meant to be branch annotations a posterioiri? (My post on the various options for rooting phylogenies can be found here.) Czech et al. found that the behaviour of the these values is undefined. For some software they were able to demonstrate that the branch annotation ended up on the wrong branch after rerooting.

How serious an issue is that? I guess it depends on what one's practice is. The problem should be pretty much limited to analyses producing unrooted trees (e.g. in RAxML, PAUP or MrBayes) under the assumption of reversibility, where the user then uses outgroup rooting to polarise the tree a posteriori. Any analysis using a clock model would avoid it, as would asymmetric step-matrices or, crucially, those analyses specifying the outgroup before the start of the analysis.

In addition, it seems as if the problem would be limited to a few branches between the pseudo-root used to save unrooted trees and the new root after rerooting, so that most relationships should be fine. I may look at one or two of my published phylogenies to see if I ever had that problem, but I am not worried; in the most recent case where support values were a critical part of my argumentation, for example, they are fairly deep inside the tree, because we sampled widely around the ingroup, and I also used Templeton tests and suchlike to demonstrate the non-monophyly of certain taxa.

Apparently Czech et al. have already achieved some success at getting software providers to make changes that will help solve the confusion around where the branch annotations end up. But nonetheless my main take-home from this is to be less blasé about a posteriori rooting. In the future I will make sure to always define an outgroup already when I set up a PAUP or RAxML run, so that the need to reroot does not arise.

Thursday, April 20, 2017

Botany picture #242: Gentianella muelleriana

Gentianella muelleriana (Gentianaceae) as seen today on the ascent to Mount Stillwell, Kosciusko National Park, New South Wales. One of the few plants still in flower this late in the season.

In the European Alps, gentians are, of course, generally blue and rarely yellow, but here white seems to be the preferred colour.

Friday, April 14, 2017

Back from Queensland

Unfortunately I was unable to transfer the pictures I had taken to a computer until I got back home, so here are the ones I want to put on the blog all in one post. We drove west from Brisbane to Chinchilla with a major stop along the way, had a day trip north to the vicinity of Wandoan, spent half a day around Chinchilla and Kogan the following day, and then returned to Brisbane.

Rainforest of Boombana in D'Aguilar National Park just west of Brisbane.

A fern climbing up a liana that climbs up a tree trunk.

Not many daisy species like rainforests, but this one does: Acomis acoma (Asteraceae). It was the reason for our detour into D'Aguilar. Admittedly it is not found in the darkest and wettest parts.

View from Jolly's lookout, still in D'Aguilar National Park.

In the Chinchilla area ecologists showed us several field sites and conservation management actions. Near Wandoan we happened to see this population of treelets with rather impressive fruits. Still need to figure this species out; we suspected it may be a native Australian lemon (Citrus, Rutaceae). But I have not seen one of those before, only other Rutaceae genera.

We learned more about what is clearly the most problematic weed in the area, buffel grass (Cenchrus ciliaris, Poaceae). As seen in the picture it forms clumps that suppress a lot of other vegetation but are not dense enough to avoid soil erosion from the gaps between individual plants - the worst of both worlds! It also accumulates litter causing very intense bush fires in a local habitat (dry rainforest and vine thicket) whose key species are not fire-adapted. On the other hand, we were told that farmers liked buffel grass due to its drought resistance and high food value for stock.

One of the species the trip was about is this phyllodinous wattle, Acacia wardellii (Fabaceae). Although currently not in flower it is quite attractive due to its straight growth and strikingly white stem. It is locally common after disturbance but has a very restricted range.

Near Kogan we were shown this site, which I found particularly interesting. The habitat is on a ridge with very poor, rocky, shallow soil, and features species that are very localised to those conditions.

Scattered across the ground was Brunoniella (Acanthaceae). I worked on a genus of the Acanthaceae family for my Diplom thesis (roughly equivalent to honours), so that brought back nice memories. However, while my study group then were large shrubs, this species is herbaceous and in fact seems to remain fairly small. I assume it spends most of its life as dormant root-stock underground and then sends these little shoots up if there has been enough rain to be worth the while.

Monday, April 10, 2017

Back to Queensland

Another trip to south-eastern Queensland, only for a few days this time.

First, the most disappointing window seat I have ever had on a flight. It is not even clear to me why this segment was the only one without a window, and only on my side :-)

The skyline of Brisbane as seen from the cultural district.

The Queensland Herbarium, which is located at the Botanic Gardens. I am very grateful to Ailsa Holland and Tony Bean for the kindness they showed us during our visit today.

Friday, April 7, 2017

Parsimony versus models for morphological data: a recent paper

I have written on this blog before about the use of likelihood or Bayesian phylogenetics for morphological data. In our journal club this week we discussed another of the small but growing number of recent papers arguing that parsimony should be dropped in favour of model-based analyses even for morphology:
Puttick et al., 2017. Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data. Proceedings of the Royal Society Biological Series 284, doi 10.1098/rspb.2016.2290
Puttick et al. constructed maximally balanced and unbalanced phylogenies, simulated sequence data for them under the HKY + G model of nucleotide substitution, turned the data matrices into binary and presumably unordered multistate integer characters, and then used equal weights parsimony, implied weights parsimony, and Bayesian and likelihood analyses under the Mk model to try and get the phylogenies back with an eye on accuracy (correctness) and tree resolution. In a second approach, they reanalysed previously published morphological datasets to see what happened to controversial taxon placement under the different approaches.

One of the problems with simulation studies is always that they can come out as kind of circular: if you simulate data under a model it is no surprise that the same model would perform best when trying to infer the input into the simulations. In this case Puttick et al. were admirably circumspect in that not only did they simulate their data under a different model (HKY + G) than that ultimately used in phylogenetic analysis (Mk), but they also repeated the analyses until they had achieved a distribution of homoplasy that mirrored the one found in empirical datasets. This is important because morphology datasets for parsimony analysis are scored to minimise homoplasy, while uncritically simulating matrices may lead to much higher levels of homoplasy, thus putting parsimony at a disadvantage.

Still, it should be observed that the HKY + G model is nonetheless unlikely to have produced data that are a realistic representation of morphological datasets, especially considering that the latter would at a minimum also include multistate characters with ordered states. Also, from a cladist's perspective homoplasy in a morphological dataset is a character scoring error waiting to be corrected in a subsequent analysis. But well, of course using zero homoplasy datasets would also have been unrealistic because real life datasets do have homoplasy in them. (And of course parsimony would "win" all the time if there was zero homoplasy, pretty much by definition.)

Now what are the results? To simplify, Bayesian was best at getting the tree topology right, followed by equal weights parsimony and implied weights parsimony, with likelihood coming in last. Likelihood always produces fully resolved trees, and Bayesian produces the least resolved ones. The authors argue, as Bayesians would, that this is exactly how it should be, as it simply tells us that the data aren't strong enough; the other approaches may give us false confidence. (Although of course parsimony and likelihood analyses can likewise involve several different ways of quantifying support or confidence.)

In conclusion, Puttick et al. make the following recommendations:

First, Bayesian inference should be the preferred approach.

Second, future morphological datasets should be scored with model-based approaches in mind. This means that the number of characters should be maximised by including homoplasious ones, because that will allow a better estimate of rates. As this is the exact opposite scoring strategy of what parsimony analysis requires this will make it hard to change habits.

What is more, I have to smile at Puttick et al.'s expectations here: they simulated data matrices of 100, 350 and 1,000 characters. Maybe you can get 400 or so for some animals (if the fossils are well enough preserved), but for any plant group I have worked on I would struggle to get 30. And wouldn't you know it, the single empirical botanical dataset they re-analysed had only 48.

Third, researchers should lower their expectations and get used to living with unresolved relationships, as Bayesian analysis produces less resolved phylogenies.

Our discussion of the paper was wide-ranging. When I commented that one of the advantages of traditional parsimony software is that it easily allows the implementation of any step matrix that is needed (imagine a character where state 0 can change into states 1, 2 or 3, but 1-3 cannot change into each other) I was informed that that is in fact possible in BEAST. That is a pleasant surprise, as I had assumed that it was limited to setting a few simple models such as standard Mk for unordered states, nothing more. However, those who have written XML files for BEAST may want to consider if that is "easy" compared with writing a Nexus file for PAUP. Personally I find BEAST input files very hard to understand.

Another concern was that while nucleotide substitution models are based on a fairly good understanding of what can happen to DNA nucleotides which, after all, have a limited number of states and transitions between those states, it is considerably less clear what the most appropriate model for any given morphological character is.

What is more, somebody pointed out that there are essentially two options in a model based analysis: either the likelihood of state transitions is fixed, which is a difficult decision to make, or it is estimated during the analysis. But in the latter case the probability of, for example, changing the number of petals would be influenced by the probability of shifting between opposite and alternate leaf arrangement. And clearly that idea is immediately nonsensical.

In summary, the drumbeat of papers on the lines of "we are the Bayesians; you will be assimilated; resistance is futile" is not going to stop any time soon. I use Bayesian and likelihood analyses all the time for molecular data, no problem. But I am still not convinced that the Mk model would be my go-to approach the next time I have to deal with morphological data. It seems to me that it is much easier to justify one's model selection in the case of DNA than in the case of, say, flower colour or leaf length; that the idea of setting one model and estimating gamma across totally incomparable traits is odd; and that I would hardly ever have enough characters for Bayesian analysis to produce more than a large polytomy.

But I guess all that depends on the study group. I can imagine there would be morphometric data for some groups of organisms for which stochastic models work quite well.

Tuesday, April 4, 2017


There is so much science spam these days that a message has to be particularly remarkable to even register; mostly I just mark as junk or report without even thinking about them. But this one is a beauty.

Let's count the ways:
  1. The message uses four different text colours (counting the links), several different font types, and more different font sizes than anybody in their right mind could consider tasteful.
  2. The title - International Journal of Humanities and Social Science Invention - is likely among the top five most convoluted titles I have ever seen, and given the competition that is saying something.
  3. The title does not make any sense either, but I guess that goes without saying.
  4. The spammer did not even write their script to personalise the message. At least other spammers have it insert the name of the recipient, but this one merely reads "dear author/researcher". Lazy.
  5. The first sentence randomly capitalises "international journal" and is poorly written.
  6. The second sentence claims the journal is indexed in "major indexing" (major indexing what?) and then lists four names none of which I have ever heard of. So whatever they are, they are certainly not "major".
  7. "IJHSSI follows the rapid publication process." So there is a rapid publication process, just one?
  8. Like many other spammers, this one sets arbitrary paper submission deadlines, presumably to create a sense of urgency. Why would a journal, which by definition publishes regular issues, ever do that?
  9. The sentence in bold and red is ungrammatical.
  10. The spammer does not even bother to invent a name for their imaginary editor-in-chief IJHSSI. Remember Robest Pual Ashcraft? That was fun. But no, here we only get a generic title.
  11. Note that there is very conspicuously no mention of the article processing fees in this message.
I think this is another, ahem, "journal" that I will pass on.

Sunday, April 2, 2017

The taxonomic impediment as illustrated by journals' criteria for the acceptance of manuscripts

About two weeks ago I learned from a co-author, who in that case is the corresponding author, that a certain systematic botany journal would consider our manuscript unacceptable no matter how much we improved it simply because it was out of scope. You see, our work was only "revisionary", as in dealing with species delimitation, and it would have to be a phylogenetic study to be acceptable. A few thoughts:

I do understand why higher-profile systematics journals do not accept descriptions of taxonomic novelties that take a qualitative approach like "hey, that looks different to that other species", or papers that merely validate taxonomic changes based on evidence presented elsewhere. But I completely fail to understand what the problem is with papers that, as in our case, use integrative, quantitative analyses of morphological, genetic and environmental data to resolve difficult species complexes. I would love to understand how a phylogenetic study is more serious than that. The conservation impact is, for example, much higher in studies finding a previously unrecognised, rare species than in those that only change the circumscription of a genus.

The journal in question is TAXON. Think about it: a journal literally called "taxon" has decided to accept no more taxonomic studies going forward. No word on when Evolution will stop accepting studies dealing with evolutionary biology, or when Heredity will reject all manuscripts dealing with genetics.

Note also that TAXON is still the go-to journal for nomenclatural suggestions in botany. In the latest issue as of writing, for example, we find Brownsey & Perrie, "Proposal to conserve the name Asplenium richardii with a conserved type" and Dorr & Gulledge, "Request for a binding decision on whether Briquetastrum Robyns & Lebrun (Lamiaceae) and Briquetiastrum Bovini (Malvaceae) are sufficiently alike to be confused". Those papers are important and need a forum, and it is good that TAXON is that forum. But the same is true for revisionary studies, and I cannot help but feel that in terms of editorial policy accepting nomenclatural suggestions like these but not evidence-based revisionary studies is the equivalent of saying, "we don't serve alcohol to minors, but we make an exception if you are under six months old."

The general problem is that there are quite a few systematics journals that have made the same decision over the last few years. I have thought about what journals there are in my field, and I cannot at the moment think of one with an impact factor of more than approximately one that would still accept revisionary studies. Most of the options are local journals published by university or state herbaria, usually named after a 19th century taxonomist or a plant genus, that either do not have an IF or one that is around 0.3-0.7. As valuable as those outlets are for publishing new species or smaller taxonomic revisions they just do not seem to be the right venue and have the right audience for a two-year study using complex analyses of genomic data. Surely if we have molecular phylogenetics journals with IFs of 2 to 5 it should be possible to have journals in that range that publish what might be called molecular taxonomy? If not, why not?

If we do not have journals like that, if the only option for a researcher doing species delimitation with cutting edge, expensive methods is to publish in journals that a job or promotion committee might consider to be a liability to publish in, then it is no wonder that fewer and fewer people will be willing to figure out how many and what species there are on our planet, and that those who are willing to do it will find it hard to get a job in academia. That is known as the taxonomic impediment: There are still many species to be discovered before we are even in a position to know what we need to conserve, but the number of people, institutions and resources assigned to that task is dwindling.

Which brings me to the final point. A year and a half ago I wrote about a study published in Systematic Biology that claimed to have disproved (!) the citation impediment to taxonomy. The authors actually mentioned the non-acceptance of taxonomic papers by high impact journals as one of the arguments underlying the citation impediment, but then argued the latter does not exist. As I wrote at the time, my interpretation of their paper is that they reached their conclusion based on defining phylogenetic studies that happen to include a taxonomic act as taxonomic papers, and then comparing them against phylogenetic studies that do not include a taxonomic act. For example, they had the Botanical Journal of the Linnean Society in their data, which at that moment had officially stopped accepting taxonomic papers for several years. In other words, the study's approach seems to have been the equivalent of examining discrimination against women by comparing men who grow a beard with men who do not grow a beard.

In the light of my recent experience, that paper now seems even more upsetting.

Saturday, April 1, 2017

People don't understand the value of biodiversity collections

An American university's decision to eliminate its natural history collection to make room for, no joke!, a running track is currently making the news. Apparently, if no other institution takes it by July it will be destroyed; and of course other institutions are likely operating under tight budgets and have no space to accommodate millions of additional specimens at short notice.

To expand on what I commented at another website:

Collection specimens are the basis of research because whenever scientists present data - morphology, anatomy, cytology, chemistry, DNA - they need to refer to the specimen ("voucher") they got them from, and that specimen needs to be deposited at an accessible, curated collection, so that the research is reproducible. I am not talking Arabidopsis, zebra fish or fruit flies here, but if somebody is doing work on non-model organisms serious journals will not publish a paper unless each data point is vouchered.

Collection specimens are the basis of research because more and more of them are databased, resulting in large databases such as GBIF or ALA, which are then used by species distribution modellers, biogeographers, conservation scientists etc. to conduct spatial studies that would have been unthinkable even just 20 years ago. And who knows what people will come up with in another 20 years? Think about it: millions and millions of data points saying "this individual was found at this time of the year in this location so and so many years ago, and according to this expert it belonged to this species". This is an invaluable resource for research.

Collections are, of course, our only access to specimens from the past. I have seen a talk by a researcher who used insect specimens collected over decades to study how pesticide resistance evolved and spread in a population, hoping to gain knowledge that will be useful for pest management in the future. Without broadly and deeply sampled natural history collections such research would be impossible.

Collections are also our only access to specimens of species that have since gone extinct. Just yesterday I handled two specimens of a plant that was last collected in the 19th century and is presumed extinct; but with modern techniques you could now study its genome! Again, who knows what other things we can do with 150 year old herbarium specimens in fifty years, things that we would not have expected to be possible?

Finally, collection specimens represent a massive investment. Even while acknowledging that they are not really replaceable because you will never again be able to collect in 1859 or from an area that is now covered in apartment blocks, natural history collections can be valued based on how much it would cost to replace them, in the sense of collecting the same number of specimens again. This includes work hours, fuel and other transport costs, equipment, specimen processing, databasing, and much more. People should look at that number and realise that this is the value that they have the responsibility to safeguard. It is not only part of our cultural heritage, it is also an investment that should not be thrown away merely to make room for a sports facility.

And make no mistake, the number that comes out of such a valuation is always going to be in "holy s***, no way" territory even for a small university museum, the kind of number that will make the institution's accountants break out in cold sweat. What is more, the specimens do not depreciate - they only become more valuable over time, because, again, you can perhaps go back and replace a specimen that was collected five years ago in the forest next door but not one that was collected two hundred years ago where the forest has since been turned into pasture.

As I have written before, I am constantly astonished that people would even so much as consider destroying a biodiversity collection, not least because the same people would not do the same to a humanities collection. Seriously, can you imagine what would happen if they said, "if you can't find somebody else to take it, we will throw all our Rembrandt and Dali paintings into the trash" or "either find a new building, or our collection of bronze age artifacts goes to landfill"?

Saturday, March 25, 2017

How not to convince a scientist that comic artists make good science communicators

Thanks to RationalWiki I found a blog post by a comic artist on science communication. It left me confused at several levels. As always I write the following not in any official capacity, and my opinion is mine alone and not necessarily shared by any person or institution I am affiliated with.
I don't know much about science, and even less about climate science.
This right here may well be the core problem of what follows.
So as a practical matter, I like to side with the majority of scientists until they change their collective minds. They might be wrong, but their guess is probably better than mine.
On the other hand, this is a very insightful paragraph. It would be helpful if we could all respect each other's expertise a bit more. Unless I have good reason not to, I assume that fully qualified primary school teachers know more about teaching primary school children than I do, plumbers know more about plumbing than I do, and so on.
That said, it is mind-boggling to me that the scientific community can't make a case for climate science that sounds convincing, even to some of the people on their side, such as me. In other words, I think scientists are right (because I play the odds), but I am puzzled by why they can't put together a convincing argument, whereas the skeptics can, and easily do. Shouldn't it be the other way around?
The implication is that it is the climate scientists' fault that there are climate change denialists, because scientists are poor communicators. Fair enough, many of us scientists probably could be better communicators. But in this context the argument only works if one assumes that everybody is rational and open to evidence in the first place. The fact is, it is just a really, really uncomfortable idea that our daily comforts like driving the car to work or cranking up air conditioning might be destroying our collective future. It is understandable that many people would reject such an idea regardless of how good a case could be made.

Whether denialists actually do make a better case than scientists is, of course, yet another matter. I do not think so, but then again, I am also a scientist, so I may not be representative.
As a public service, and to save the planet, obviously, I will tell you what it would take to convince skeptics that climate science is a problem that we must fix. Please avoid the following persuasion mistakes.
A comic book author telling scientists how to communicate science. Next up: a dentist telling comic artists how to draw, followed by a philosopher telling structural engineers how to design a bridge.
1. Stop telling me the "models" (plural) are good. If you told me one specific model was good, that might sound convincing. But if climate scientists have multiple models, and they all point in the same general direction, something sounds fishy. If climate science is relatively "settled," wouldn't we all use the same models and assumptions?

And why can't science tell me which one of the different models is the good one, so we can ignore the less-good ones? What's up with that? If you can't tell me which model is better than the others, why would I believe anything about them?
So as his first point the author assumes that there can only ever be one model in any area of science, and all the rest should be discarded. That is not how this works. That is not how any of this works. I am currently envisioning somebody applying the same logic to molecular phylogenetics: "If evolution was settled, wouldn't you all use the same model of character evolution? Why do you still have GTR, JC, F81, and all those other models?"

And how is it "fishy" if scientists have several models that "all point in the same general direction"? Logically, wouldn't the exact opposite look fishy, if each model lead to a different conclusion?
2. Stop telling me the climate models are excellent at hindcasting, meaning they work when you look at history. That is also true of financial models, and we know financial models can NOT predict the future. We also know that investment advisors like to show you their pure-luck past performance to scam you into thinking they can do it in the future. To put it bluntly, climate science is using the most well-known scam method (predicting the past) to gain credibility. That doesn't mean climate models are scams. It only means scientists picked the least credible way to claim credibility. Were there no options for presenting their case in a credible way?

Just to be clear, hindcasting is a necessary check-off for knowing your models are rational and worthy of testing in the future. But it tells you nothing of their ability to predict the future. If scientists were honest about that point, they would be more credible.
This seems more like a personal hang-up than a general problem. How many members of the general public will think "ah, the scientists say that their models work well if tested against past observations, but precisely that is a very good reason not to trust their capacity to predict the future"? Cannot imagine it would be many.

And I find the comparison with investment advisors a bit misguided; we are not talking stock performance here, where one tries to predict the future of one particular investment. We are talking something more comparable to macro-economic modeling, and while there is certainly a lot of motivated reasoning in economics such high-level processes can be predicted with some confidence. It would be hard to say where exactly IBM shares will be in two years, but it should be no problem to provide a prediction on whether inflation will go up or down if the central bank of a country prints a lot more money. (Even I know that increasing the amount of money raises inflation, all else being equal.) Likewise, it might be hard to say exactly how much rain Madrid will have in the year 2100, but it should be no problem to provide a prediction on whether temperature will go up or down if CO2 levels in the atmosphere are doubled, and by how much approximately. (Apparently up by between 1.5 and 4.5C.)
3. Tell me what percentage of warming is caused by humans versus natural causes. If humans are 10% of the cause, I am not so worried. If we are 90%, you have my attention. And if you leave out the percentage caused by humans, I have to assume the omission is intentional. And why would you leave out the most important number if you were being straight with people? Sounds fishy.
This is, again, very strange. If somebody says, "I will now push you over the cliff edge" they have your attention, but if they say "get back, quick, the cliff is crumbling under your feet!", you ignore them? What? I at least would say that even if warming were natural we should not ignore it but still prepare for flooded coastal cities and failed harvests.
There might be a good reason why science doesn't know the percentage of human-made warming and still has a good reason for being alarmed. I just haven't seen it, and I've been looking for it. Why would climate science ignore the only important fact for persuasion?
No idea where the idea comes from that climate science ignores this factor. It is widely agreed among the climate science community that humans are the main factor in what is currently happening, and in turn that expert consensus is widely known to exist.
Today I saw an article saying humans are responsible for MORE than 100% of warming because the earth would otherwise be in a cooling state. No links provided. Credibility = zero.
Why credibility = zero? Does the author not know that the earth underwent some noticeable cooling during the early modern period? Little ice age, anyone? There is also a good argument to be made, based on the timing of previous glacial cycles, that we are due for the start of another ice age, although of course such a change would take hundreds to thousands of years. I haven't looked into it deeply, but the idea that the earth would be cooling a bit if not for the use of fossil fuels is, in fact, at the very least credible to me given these considerations.
4. Stop attacking some of the messengers for believing that our reality holds evidence of Intelligent Design.
What "messengers"? What has any of this to do with Intelligent Design - where does that suddenly come from?
Climate science alarmists need to update their thinking to the "simulated universe" idea that makes a convincing case that we are a trillion times more likely to be a simulation than we are likely to be the first creatures who can create one. No God is required in that theory, and it is entirely compatible with accepted science. (Even if it is wrong.)
Ye gods, the simulated universe... Although I cannot find the link again I once read a very nice analogy for it. "Look, we can do simulations - so probably we are also simulated" is entirely equivalent to some Renaissance philosopher seeing the first paintings that used realistic perspective and concluding that because the real world also has perspective we must be paint pigments on another being's canvas.

It is all about getting caught up in enthusiasm about a new technology, with no evidence being involved anywhere along the chain of reasoning. There is no evidence that something like us could even be simulated, and it seems rather implausible that somebody would be motivated to run such a simulation. I guess one could play the mysterious ways card regarding the simulator's motivations, but then we are deeply in religious apologetics territory.

But still, the main point is that point #4 is completely besides the point.
5. Skeptics produce charts of the earth's temperature going up and down for ages before humans were industrialized. If you can't explain-away that chart, I can't hear anything else you say. I believe the climate alarmists are talking about the rate of increase, not the actual temperatures. But why do I never see their chart overlayed on the skeptics' chart so we can see the difference? That seems like the obvious thing to do. In fact, climate alarmists should throw out everything but that one chart.
Sorry to say, but reading this item I cannot help but think of the term Not Even Wrong. Of course temperatures go up and down naturally, so no scientist is ever going to "explain that away". The implied claim that climate scientists assume no non-anthropogenic climate change has ever taken place is shades of crocoduck, a ridiculous straw-man that would only be brought up by somebody who has not made the slightest effort at understanding the science in question. Scientific publications "produce" the very same charts of natural change, that is where the denialists get them from. The question is, do I have to "explain away" the fact that people die of natural causes all the time before I can object to somebody trying to kill me?

And why rates of increase? Of course a higher rate of change is a problem because it gives us less time to adapt and wildlife less time to move with their climate zone, but ultimately that is not all that "alarmists are talking about". Yes, if Miami is going to turn into Atlantis it may matter whether rates of change are different to, say, the onset of the current interglacial, but first and foremost it matters that the population of Miami will have to move, right?
6. Stop telling me the arctic ice on one pole is decreasing if you are ignoring the increase on the other pole. Or tell me why the experts observing the ice increase are wrong. When you ignore the claim, it feels fishy.
Maybe I missed something, but to the best of my understanding ice is shrinking on both poles. But even if this refers to some reference saying that ice is growing in some part of the Antarctic (a weblink would have been helpful), nobody would claim that every place on earth will experience the same effect with the same effect size. It is, for example, entirely to be expected that it will get drier in one place but wetter in another. In fact, the reason the former place is now drier is most likely that the rain it usually got is now falling in the latter place!
7. When skeptics point out that the Earth has not warmed as predicted, don't change the subject to sea levels. That sounds fishy.
This must either refer to some isolated incident that is not referenced or represent a misunderstanding: It sounds like a garbled version of the observation that the ocean has absorbed some of the warming that was expected to be absorbed by the atmosphere.
8. Don't let the skeptics talk last. The typical arc I see online is that Climate Scientists point out that temperatures are rising, then skeptics produce a chart saying the temperatures are always fluctuating, and have for as far as we can measure. If the real argument is about rate of change, stop telling me about record high temperatures as if they are proof of something.
This is merely a repeat of #5.
9. Stop pointing to record warmth in one place when we're also having record cold in others. How is one relevant and the other is not?
I already touched on this with regard to #6. North America seems to have unusually cold winters precisely because the north pole has unusually warm ones, due to shifting air currents. Truth be told, this objection really astonishes me. Some denialists sound as if they would be surprised by workplaces being empty at the same time as when beaches are full of people. "So are there more people or less people? You don't make sense!"
10. Don't tell me how well your models predict the past. Tell me how many climate models have ever been created, since we started doing this sort of thing, and tell me how many have now been discarded because they didn't predict correctly. If the answer is "All of the old ones failed and we were totally surprised because they were good at hindcasting," then why would I trust the new ones?
This is partly a repeat of #1 and partly a severe misunderstanding of how science works. "If Newton's theory of gravity was superseded by Einstein's theory, why should I now trust Einstein?"

Also, this.
11. When you claim the oceans have risen dramatically, you need to explain why insurance companies are ignoring this risk and why my local beaches look exactly the same to me.
To the best of my understanding, even Donald Trump's Irish golf course has lobbied the local government for a sea wall to protect against rising sea levels...
Also, when I Google this question, why are half of the top search results debunking the rise? How can I tell who is right? They all sound credible to me.
Yes, when I google about health, the search results variously suggest certified pharmaceuticals, homeopathy, reiki, acupuncture, chiropractics, and much more. There are quacks on one side and science-based medical research on the other. How can I tell who is right? I am so confused!
12. If you want me to believe warmer temperatures are bad, you need to produce a chart telling me how humankind thrived during various warmer and colder eras. Was warming usually good or usually bad?

You also need to convince me that economic models are accurate. Sure, we might have warming, but you have to run economic models to figure out how that affects things. And economic models are, as you know, usually worthless.
To be fair, the author may not realise that the last time global temperatures underwent several degrees of change we did not have billions of people living in coastal areas that are going to be flooded, or billions of people to be fed by crops that will suddenly find themselves under heat and drought stress.
13. Stop conflating the basic science and the measurements with the models. Each has its own credibility. The basic science and even the measurements are credible. The models are less so. If you don't make that distinction, I see the message as manipulation, not an honest transfer of knowledge.
Once more this probably refers to an unreferenced incident, so it is difficult to address. More generally, every mathematical description of a system is a model. If I say, "every day this plant grows 5 mm" I have formulated an (admittedly simplistic) model. It not sure how that is so much less credible than a chart showing the plant to have a stem height of 4.3 cm, 4.8 cm, and 5.3 cm on successive days. It is merely a different way of expressing the same pattern.
14. If skeptics make you retreat to Pascal's Wager as your main argument for aggressively responding the climate change, please understand that you lost the debate. The world is full of risks that might happen. We don't treat all of them as real. And we can't rank any of these risks to know how to allocate our capital to the best path. Should we put a trillion dollars into climate remediation or use that money for a missile defense system to better protect us from North Korea?
Yet another instance of what was presumably an unreferenced incident experienced by the author. I would not know how any serious climate scientists would ever have to propose Pascal's Wager, given that the action of CO2 as a greenhouse gas has been established for more than a century and that evidence of rising sea levels, shrinking glaciers, rising atmospheric temperatures, and increasingly extreme weather events are all around us. But then again, I am not even a climate scientist myself, so I don't know very much how they generally argue.
Anyway, to me it seems brutally wrong to call skeptics on climate science "anti-science" when all they want is for science to make its case in a way that doesn't look exactly like a financial scam.* Is that asking a lot?
This is a hilariously naive understanding of denialism. Sure, everybody everywhere is totally open to argument and merely "want[s] for science to make its case in a way that doesn't look exactly like a financial scam". Financial and political interests or tribal instincts do not exist. Riiight.

So in summary, I am sure that many scientists, me included, could learn a lot more about how to communicate. This post, however, was the equivalent of "hey medical profession, you could convince people not to use homeopathy if only you admitted that magic works, and you should stop all that double-blind experiment nonsense, because that just looks as if you have something to hide".