Sunday, November 22, 2015

Do half the natural history specimens in the world really have the wrong name?

On Friday a colleague drew everybody's attention to a recently published study from Edinburgh and Oxford: Goodwin et al. 2015, Widespread mistaken identity in tropical plant collections, is presented in press releases and media as demonstrating that, to quote the University of Oxford's news page, "half the world's natural history specimens may have the wrong name".

As museum and herbarium specimens are used to extract DNA for phylogenetic and evolutionary studies, to draw distribution maps, to inform conservation decisions, and to examine spatial patterns of diversity, this sounds pretty dire. Luckily, in my eyes at least, this interpretation is totally over-hyped.

(Full disclosure: I know one of the authors of the relevant study, as he was involved in selecting my Diplom thesis topic and allowed me to join a field trip he had organised.)

My first reaction to reading the sensationalist headline was that perhaps they are talking about insects, which account for the vast majority of natural history specimens. That would have sounded at least remotely plausible given how difficult insect identification is and how few entomologists there are. It turned out, however, that the study was dealing with plant (herbarium) specimens.

My next thought was simply, no way; just looking at the herbarium I am working at the situation is nowhere close to that bad. 5% perhaps, okay. So how do they arrive at these seemingly shocking numbers?

The problems I have with the way these results are discussed fall into three categories: selection of example cases, hasty over-generalisation, and equivocation on the term "wrong".

Selection of example cases

From Oxford's press release:
The team studied 4,500 specimens of the African ginger genus Aframomum, a detailed monographic study [of] which had been completed in 2014, providing an accurate account of all the species and their specimens. The team were surprised to find that prior to this monograph at least 58% of specimens were either misidentified, given an outdated or redundant name, or only identified to the genus or family. As few plant groups have been recently monographed, the team suggests that a similar percentage of wrong names might be expected in many other groups.
I have myself revised two tropical plant genera, for my Diplom and doctorate, respectively. The one and only reason for doing a monograph in 2014 would be that the genus had not been taxonomically revised for the preceding decades. The genus I did my Diplom thesis on had last been examined in 1970, and the one from my doctorate in 1936. The Kew Bibliographic Database does not show any taxonomic work on Aframomum over the last few decades except isolated flora treatments or species descriptions.

The point is, lots of problems in a long neglected genus should not have "surprised" the team but are precisely what is expected. If only 3% of the specimens had been mislabelled before 2014, then of course nobody would have invested the time to complete a detailed monograph of the genus.

But the question is then to what degree the number of mislabelled specimens in a long-neglected genus is indicative of the relevant number in well-studied genera. This is a bit as if the authors went walking through a suburb, deliberately ignoring dozens of well-maintained houses until they found one that has stood empty for twenty years and is scheduled for demolition, and then concluded based on this one 'representative' sample that the whole suburb is decrepit.

Hasty over-generalisation

This is a simple one. The original study concluded that
more than 50% of tropical specimens, on average, are likely to be incorrectly named.
The Oxford communications people must have thought that this is not sensationalist enough and turned it into the headline
Half the world's natural history specimens may have the wrong name
That... is not quite the same conclusion. And of course the sentence from the original study makes much more sense. Even under the worst possible assumptions it can hardly be argued that half of the locally collected specimens in British, French, Canadian or Japanese herbaria will be mislabelled, given how well the local floras are known.

Equivocation on "wrong"

But the most annoying aspect is probably how the word "wrong" is being treated. If you read the phrase "half the world's natural history specimens may have the wrong name", you may naively assume that the authors and their press people are talking about misidentified species.

If all of those 50% "wrong" names are something like specimens of Xerochrysum bracteatum misidentified as Chrysocephalum apiculatum, a totally different species, then that would indeed be very, very bad. It is as if somebody points at a woman, asks you if you know her, and you reply, "yes, that is Jane Smith, she works in HR", only for them to point out that the woman in question is in fact called Samantha Coles and works in research. In that case you really were wrong and did not know her.

However, this is what it says in the news item:
prior to this monograph at least 58% of specimens were either misidentified, given an outdated or redundant name, or only identified to the genus or family
and then later:
they found that 40% of these were outdated synonyms rather than the current name, and 16% of the names were unrecognisable or invalid. In addition, 11% of the specimens weren't identified, being given only the name of the genus.
In other words, they counted synonyms and even plants not identified to species as "wrong". In the former case, I would argue that the name is quite simply not wrong in any reasonable sense but only outdated. If a researcher using specimen data for a study compares them against a well curated taxonomic index (as they should anyway) they will quickly figure out what is going on. That is what I did when I used data from Australia's Virtual Herbarium for a study of spatial patterns of diversity.

To use the same analogy as before, imagine somebody points at a woman, asks you if you know her, and you reply, "yes, that is Jane Smith, she works in HR." If they then reply, "no, Jane married last week and took her husband's name, so she is now known as Jane Hudson; seems you don't know that woman at all!", you would certainly be justified in considering them a bit of an idiot. You do indeed know her, and you did identify her correctly, you just didn't know the most up to date name. Similarly, a specimen labelled with a synonym of the currently accepted species name is also correctly identified.

"Being given only the name of the genus" isn't wrong either, it is just imprecise. If you are asked where Jane Hudson lives and you know only it is in Miller Street, then you will also be miffed if that information is called "wrong" because she lives in Miller Street number 4. No, Miller Street is not wrong in any reasonable sense even if you didn't know the house number.

This equivocation appears particularly annoying to me because I notice it happening more and more. For example, the claim of the ENCODE consortium that 80% of the human genome is functional depended on redefining functional - commonly understood to mean "producing an RNA or protein that actually does something necessary for survival" into "shows some spurious biochemical activity in vitro", while relying on the reader to assume the first definition. Here, the claim of massive problems in the natural history collections depends on redefining wrongly named - commonly understood to mean "labelled with the name of a totally different species" - into "having an outdated name of the correct species, or no species name at all", while likewise relying on the reader to assume the first definition.

Conclusion: nope, no way, no

In summary, while I surely haven't counted misidentified specimens worldwide myself, even just reading through the press release makes clear that it relies on several rhetorical tricks to make the situation appear more shocking than it is.

