Showing posts with label biodiversity collections. Show all posts
Showing posts with label biodiversity collections. Show all posts

Saturday, September 2, 2017

Having fun with biodiversity databases

If you have ever professionally used a biodiversity database you will soon have noticed that we still have a long way to go before they are as reliable as we would like them to be.

Today I looked into the Atlas of Living Australia records for Senecio australis (Asteraceae). Except for a rather odd specimen from South Africa the distribution records look like this:


What do we have here? First, the four Australian mainland records all appear to be misapplications of the name. The Flora of New South Wales, for example, does not even mention the species, so I think we can safely assume it does not occur in Australia at all.

Second, the record in the middle of the top of the map is right in the ocean, no matter how closely we zoom in. If we look into its details, we see that it was collected on Norfolk Island, which is the cluster of red dots to its right, so somebody must have got the coordinates rather wrong.

Third, there is a cluster around Auckland, on New Zealand's North Island. I am not sure if Norfolk Island and North Island is a plausible area of distribution for this species, but it may well be. Zooming in closer to Norfolk Island, however, ...


... it looks as if somebody had played darts after having had a few too many beers. ALA informs us dryly under the section data quality tests, "habitat incorrect for species". No kidding. Or as my wife joked, unwilling to believe that the coordinates would be so badly off for such a large percentage of the specimens, "is there a fish that is also called Senecio australis?"

These are the problems that we are dealing with, more generally.
  • Whenever we do a study using data from biodiversity databases, as we increasingly do, we have to be very careful about cleaning the data. The main issues are outdated taxonomy, misidentifications, spatial data entry errors (which are particularly easy to recognise if an outlier record is exactly ten degrees away from a known occurrence), and imprecise spatial data. Just think of what it would do to species distribution modeling if we uncritically accepted all the records for Senecio australis.
  • While we can identify obvious mistakes while using a database, the data are "ground-truthed" in the actual specimens in some herbarium or museum, and the policy is usually (and quite sensibly) that the database won't update until a correction is made to that specimen in its home institution and then filters through from there. But many institutions do not have the resources to update data just because somebody sent them an eMail pointing out that their specimen is misidentified or that they made a data entry error; many herbaria on the planet are so understaffed that even the word understaffed is a euphemism. What is more, even if a database allows a registered user to annotate a record with corrections, the information may not necessarily flow back to the institution holding it, depending on whether somebody thought to set up such procedures or not.
  • Overall, Australia actually has excellent data quality, the Atlas of Living Australia actually allows annotations to be made, and several important Australian herbaria actually have the staffing to update their data. What I am saying is that this is as good as you can have it at the moment. It is much more difficult in many other parts of the world, and of course it would be good if we could have the same or better data quality for those areas.
Also, perhaps it is best not think too much about records that are not specimen-based but "human observations" or photos submitted by random people. There are, obviously, non-taxonomists whose knowledge of the flora is extensive, and citizen science can be awesome, but I have also seen several cases one the lines of "aaargh ... this is so misidentified it is not even the right tribe of the family, and now the database is using it as the profile picture of this nationally significant weed species!"

Sunday, June 18, 2017

To publish or not to publish (locality information for rare species)

This week our journal club discussed Lindenmayer & Scheele, Do not publish (Science 356(6340): 800-801). While acknowledging the trade-offs involved, the paper argues for researchers, journals and data providers to self-censor locality information for rare species to keep them safe.

The problem, in short, is that some rare species are highly valued by professional poachers and private collectors, and they may in short order wipe out a rare species if they know where to find it. The article itself mentions a rare Chinese gecko; participants in our discussion provided other astonishing examples from various parts of the planet. It did not surprise me to learn that there are people digging up cycads to sell them to wealthy home-owners who want to adorn their front gardens, but I was definitely surprised to learn that rare beetles are traded for hundreds of dollars apiece by a demented subculture of beetle enthusiasts.

Nobody really disagreed with the sentiment of the article per se, but obviously people immediately raised scenarios where making the data available actually helped conservation. A particular concern is that it has to be known that a rare species exists in a spot when there is a development proposal; what is the use of keeping the information safe from poachers only to have an open-cut mine wipe out the species?

A comparison was made with medical data. While biodiversity researchers are used to having all data openly available, the medical research community has long had strict procedures for keeping safe medical information of individual people, but they still manage to do research. In other words, biodiversity science should not suffer from more restricted access to locality information if the right procedures are adopted. That being said, some raised the concern that this would simply add another layer of bureaucracy to a field already burdened with often unreasonable procedures around collecting permits and specimen exchange.

What the article and our discussion were mostly about are specimen data typed off the specimen labels and made available through databases such as GBIF or Australia's ALA. The idea would then be to have those data providers make the locality descriptions and GPS coordinates just fuzzy enough that nobody can find the exact spot where a species was seen or collected, while still providing that information to legitimate and trusted researchers. What should not be overlooked, however, is that currently a major push is underway to photograph the actual specimens and make those photos available online. Has anybody thought about systematically blurring out such locality information for rare species on the photographed labels? Not sure I have ever heard that discussed before.

Finally, there was some agreement that it would be good to have a global policy recommendation on this instead of leaving it up to individuals to self-censor without guidelines. Given that there are working groups agreeing on data formats etc. it should surely be possible to find agreement on this problem.

An off-topic excurs on hobgoblins

In this context it was interesting that somebody said, "consistency is the hobgoblin of small minds", a phrase that I have run into before. Of course, the idea here was that while a rule or recommendation is nice to have, people will still have to weigh trade-offs, and even if the recommendation would be to generally blur the data one may in some cases need to publish it (see a few paragraphs earlier).

And yes, I see where that is coming from. The fundamentalist wants a clear rule and apply it blindly, whether it makes sense or not; the intellectually mature realise that rules were introduced to achieve a good, and if applying the rule hurts that very same good then one should not apply the rule.

But still throwing a phrase like that around makes me a bit uncomfortable. In most cases consistency is important. When we are talking rules it should be clear that consistency is usually just another word for fairness. People who want to apply rules inconsistently would have to provide a very good reason for why they should not simply be seen as trying to get away with something that they would not let others get away with.

When we are talking argumentation, discussion and logic, intellectual consistency is the very first hurdle somebody has to clear to be taken seriously, and only then is it worth the investment to look into whether they have evidence on their side or not. People who are proud of being inconsistent in this sense (because it makes them Not Small Minds, you see) would have to explain carefully how they are not simply somewhere on the spectrum from slightly confused to totally insane, or alternatively on the spectrum from obfuscating the issue to gaslighting their conversation partner.

Saturday, April 1, 2017

People don't understand the value of biodiversity collections

An American university's decision to eliminate its natural history collection to make room for, no joke!, a running track is currently making the news. Apparently, if no other institution takes it by July it will be destroyed; and of course other institutions are likely operating under tight budgets and have no space to accommodate millions of additional specimens at short notice.

To expand on what I commented at another website:

Collection specimens are the basis of research because whenever scientists present data - morphology, anatomy, cytology, chemistry, DNA - they need to refer to the specimen ("voucher") they got them from, and that specimen needs to be deposited at an accessible, curated collection, so that the research is reproducible. I am not talking Arabidopsis, zebra fish or fruit flies here, but if somebody is doing work on non-model organisms serious journals will not publish a paper unless each data point is vouchered.

Collection specimens are the basis of research because more and more of them are databased, resulting in large databases such as GBIF or ALA, which are then used by species distribution modellers, biogeographers, conservation scientists etc. to conduct spatial studies that would have been unthinkable even just 20 years ago. And who knows what people will come up with in another 20 years? Think about it: millions and millions of data points saying "this individual was found at this time of the year in this location so and so many years ago, and according to this expert it belonged to this species". This is an invaluable resource for research.

Collections are, of course, our only access to specimens from the past. I have seen a talk by a researcher who used insect specimens collected over decades to study how pesticide resistance evolved and spread in a population, hoping to gain knowledge that will be useful for pest management in the future. Without broadly and deeply sampled natural history collections such research would be impossible.

Collections are also our only access to specimens of species that have since gone extinct. Just yesterday I handled two specimens of a plant that was last collected in the 19th century and is presumed extinct; but with modern techniques you could now study its genome! Again, who knows what other things we can do with 150 year old herbarium specimens in fifty years, things that we would not have expected to be possible?

Finally, collection specimens represent a massive investment. Even while acknowledging that they are not really replaceable because you will never again be able to collect in 1859 or from an area that is now covered in apartment blocks, natural history collections can be valued based on how much it would cost to replace them, in the sense of collecting the same number of specimens again. This includes work hours, fuel and other transport costs, equipment, specimen processing, databasing, and much more. People should look at that number and realise that this is the value that they have the responsibility to safeguard. It is not only part of our cultural heritage, it is also an investment that should not be thrown away merely to make room for a sports facility.

And make no mistake, the number that comes out of such a valuation is always going to be in "holy s***, no way" territory even for a small university museum, the kind of number that will make the institution's accountants break out in cold sweat. What is more, the specimens do not depreciate - they only become more valuable over time, because, again, you can perhaps go back and replace a specimen that was collected five years ago in the forest next door but not one that was collected two hundred years ago where the forest has since been turned into pasture.

As I have written before, I am constantly astonished that people would even so much as consider destroying a biodiversity collection, not least because the same people would not do the same to a humanities collection. Seriously, can you imagine what would happen if they said, "if you can't find somebody else to take it, we will throw all our Rembrandt and Dali paintings into the trash" or "either find a new building, or our collection of bronze age artifacts goes to landfill"?

Monday, May 9, 2016

When do we need voucher specimens?

Yay, in the last few days this blog passed 100k views! And just when I had a whole week of not being able to find the time to add anything...

Anyway, on Friday I have been considering voucher specimens. First, in case somebody from outside the field reads this, what are they?

Imagine somebody did a study of essential oils found in the South American mint genus Minthostachys, and they published a paper reporting a pulegone-dominated oil for Minthostachys glabrescens, a menthone-rich oil for M. verticillata, and a carvone-rich oil for M. mollis. Twenty years later, a taxonomist revises the genus and finds, for example, that the name M. glabrescens had for decades been misapplied to a completely wrong species (as per the type), and that the circumscription of M. mollis needed to be changed.

A new taxonomic treatment of the genus is published, and you might now, if you were interested in its ethnobotany, biochemistry or commercial exploitation, be interested in knowing how the old oil data relates to species as currently circumscribed. What those guys who did the oil study called glabrescens definitely wasn't true glabrescens, but what was it instead? Which currently accepted name applies to the sample that had the pulegone-rich oil?

If all there was in the oil paper were names and biochemical data you're stuck. This is where voucher specimens come in. For good scientific practice, the authors of that study should have deposited a herbarium specimen of each sample they analysed in an officially recognised and accessible research herbarium (the kind of institution serious enough to be listed in the Index Herbariorum), so that we can examine them even fifty years later and figure out what exactly it was that they had in their study.

So, in short: a voucher specimen is a herbarium / museum / biodiversity collection specimen that is cited in a publication to allow later scientists to verify the taxonomic affiliation of a sample used in a scientific study. It could be a dried and pressed plant, a needled insect, a fish skeleton or a stuffed bird; it could be connected to a morphological data set, a DNA sequence, a biochemical profile or a new species name. (In the latter case it would not be a mere voucher but a type. Although types are even more valuable, the principle is the same.)

Among biodiversity researchers the importance of vouchers is well understood. It is, or should be, virtually impossible to publish a study in a good botanical journal without citing a list of voucher specimens underlying your data either in a table, in an appendix, or as part of the paper's online supplement. And we are often rather exasperated that not all colleagues in related fields have the same approach. The biochemical example from above was chosen deliberately, as I have run into many essential oil or ethnobotanical studies that neglected to cite vouchers, meaning that their results are pretty much unreproducible and scientifically near worthless.

That being said, however, I have started to wonder whether some colleagues don't go a bit overboard with this. The occasion is discussing a seed reference collection, about which several people have asked me "is it vouchered"? So the idea is, we can only use the seed samples that have a herbarium specimen as a reference somewhere.

But we are not talking here about biochemical data, a DNA sequence uploaded to GenBank, or a morphological description. The seeds themselves are biological specimens, aren't they? Can't they be their own voucher?

Granted, there may be many cases where only having the seeds is not good enough to narrow taxonomic affiliation down to species level. But is that really different in principle from a perfectly acceptable herbarium specimen of a flowering plant ... in a genus where you need fruits to identify to species?

So to me the point of a voucher is that it is a lasting, biological reference specimen for a piece of data. But a lasting biological specimen should not necessarily need another lasting biological specimen as its reference. In some cases it may be good enough to be a specimen in its own right. Or to look at it another way, there can be botanical specimens that are not dried and pressed whole plants.