Wednesday, June 28, 2017

In praise of Linux

I am not saying I want to proselytise or anything. I completely understand Windows users; after all, I was a happy Windows user myself until they produced Windows 8. And if somebody is into gaming, for example, then Windows is the obvious choice. But I am really, really happy using Ubuntu now.

For starters, we do not have to worry about the most common cybersecurity issues, such as the Petya attack that is currently making the rounds. Admittedly neither my wife nor I would open a suspicious attachment anyway, but still, it is nice to know that everything that attacks the most common operating system is irrelevant to us.

More to the point of what I did this evening, I program a lot in Python these days. Well, for certain values of "a lot". I am obviously not a programmer, but the language is very useful for many tasks in science, from quickly reformatting a large data file to scripting complex analyses.

And the thing is, Windows makes it unnecessarily hard to use Python (or most programming languages, really). First, I need to install the language. Okay, perhaps understandable. But then I need to figure out how to tell Windows to look for Python in the Python folder whenever I try to run a Python program. Then perhaps I need to install a specialist Python library like BioPython to run certain analyses, and that is where things really go downhill, because it usually doesn't install because some dependency is missing or whatnot.

Now compare the Linux variant Ubuntu. First, it comes with Python already installed. Second, it is clever enough to automatically access Python from any folder where you start a Python script. Third, Linux makes it really easy to install things on top of Python, because it is usually smart enough to recognise dependencies and install them also. In fact that is such an obvious advantage that it seems bizarre in retrospect that Windows won't do it.

Anyway, today I decided to spend the evening coding a simple script. I was able to just plop down in front of a computer that had Ubuntu installed two weeks ago, and I did not need to do anything in preparation. So. Nice.

Saturday, June 24, 2017

Botany picture #246: Tanacetum vulgare


No energy to write something substantial at the moment, and instead I find myself thinking about plants. Here, Tanacetum vulgare (tansy, Asteraceae), Germany, 2016. It is in the Chamomile tribe of the daisy family. It has been on my mind because I was recently looking through our herbarium for specimens that have mature fruit on them, and while we have quite a few specimens there aren't any that fulfill that particular criterion.

Many daisies are usually collected in flower because they look rather less attractive in fruit, which is rather ironic given that there are many subgroups where fruits are extremely important for identification, e.g. among the dandelion tribe. But often you will at least get mature fruits as by-catch; the whole plant is collected because one head was in flower, but others lower down are already fruiting. In this case, however, the specimen is generally a single stem, and all its heads in the terminal corymbose panicle are at about the same stage, meaning none of them are fruiting. A bit frustrating.

Monday, June 19, 2017

Botany picture #245: Salvia patens


Today's botany picture is Salvia patens (Lamiaceae), a New World sage species photographed at the Botanic Gardens of Goettingen University while I was doing my PhD there. This plant has the most amazing blue flower colour, and consequently I was rather bemused to find a few years later that plant breeders have selected and were selling a white variant of this species. What is the point of that? It is like breeding an onion without taste, or a rose that doesn't produce flowers.

Okay, cranky get off my lawn mode deactivate.

Sunday, June 18, 2017

To publish or not to publish (locality information for rare species)

This week our journal club discussed Lindenmayer & Scheele, Do not publish (Science 356(6340): 800-801). While acknowledging the trade-offs involved, the paper argues for researchers, journals and data providers to self-censor locality information for rare species to keep them safe.

The problem, in short, is that some rare species are highly valued by professional poachers and private collectors, and they may in short order wipe out a rare species if they know where to find it. The article itself mentions a rare Chinese gecko; participants in our discussion provided other astonishing examples from various parts of the planet. It did not surprise me to learn that there are people digging up cycads to sell them to wealthy home-owners who want to adorn their front gardens, but I was definitely surprised to learn that rare beetles are traded for hundreds of dollars apiece by a demented subculture of beetle enthusiasts.

Nobody really disagreed with the sentiment of the article per se, but obviously people immediately raised scenarios where making the data available actually helped conservation. A particular concern is that it has to be known that a rare species exists in a spot when there is a development proposal; what is the use of keeping the information safe from poachers only to have an open-cut mine wipe out the species?

A comparison was made with medical data. While biodiversity researchers are used to having all data openly available, the medical research community has long had strict procedures for keeping safe medical information of individual people, but they still manage to do research. In other words, biodiversity science should not suffer from more restricted access to locality information if the right procedures are adopted. That being said, some raised the concern that this would simply add another layer of bureaucracy to a field already burdened with often unreasonable procedures around collecting permits and specimen exchange.

What the article and our discussion were mostly about are specimen data typed off the specimen labels and made available through databases such as GBIF or Australia's ALA. The idea would then be to have those data providers make the locality descriptions and GPS coordinates just fuzzy enough that nobody can find the exact spot where a species was seen or collected, while still providing that information to legitimate and trusted researchers. What should not be overlooked, however, is that currently a major push is underway to photograph the actual specimens and make those photos available online. Has anybody thought about systematically blurring out such locality information for rare species on the photographed labels? Not sure I have ever heard that discussed before.

Finally, there was some agreement that it would be good to have a global policy recommendation on this instead of leaving it up to individuals to self-censor without guidelines. Given that there are working groups agreeing on data formats etc. it should surely be possible to find agreement on this problem.

An off-topic excurs on hobgoblins

In this context it was interesting that somebody said, "consistency is the hobgoblin of small minds", a phrase that I have run into before. Of course, the idea here was that while a rule or recommendation is nice to have, people will still have to weigh trade-offs, and even if the recommendation would be to generally blur the data one may in some cases need to publish it (see a few paragraphs earlier).

And yes, I see where that is coming from. The fundamentalist wants a clear rule and apply it blindly, whether it makes sense or not; the intellectually mature realise that rules were introduced to achieve a good, and if applying the rule hurts that very same good then one should not apply the rule.

But still throwing a phrase like that around makes me a bit uncomfortable. In most cases consistency is important. When we are talking rules it should be clear that consistency is usually just another word for fairness. People who want to apply rules inconsistently would have to provide a very good reason for why they should not simply be seen as trying to get away with something that they would not let others get away with.

When we are talking argumentation, discussion and logic, intellectual consistency is the very first hurdle somebody has to clear to be taken seriously, and only then is it worth the investment to look into whether they have evidence on their side or not. People who are proud of being inconsistent in this sense (because it makes them Not Small Minds, you see) would have to explain carefully how they are not simply somewhere on the spectrum from slightly confused to totally insane, or alternatively on the spectrum from obfuscating the issue to gaslighting their conversation partner.

Monday, June 12, 2017

ANBG impressions

Although it is winter, and although it was foggy the first half of our visit, the Australian National Botanic Gardens always have something to see.


Moss cushion on a tree branch.


Golden everlasting flower-head waiting for the sun to come out.


Shadows cast onto a bridge in the rain forest gully.


Spider's web covered with dew.

Sunday, June 11, 2017

How the sausage is made: peer reviewing edition

One of the aspects of working as a scientist that I find most intriguing is peer reviewing each other's work. The main issue is that while how to write the actual manuscripts is explicitly and formally taught and further supported by style guides, helpful books and journals' instructions to authors, there is much less formal instruction on how to write a reviewer's report.

Essentially one is limited to (1) relatively vague journals' instructions to reviewers usually on the lines of "be constructive" or "be charitable", (2) deducing what matters to the editor from the questions asked in the reviewer's report form, and (3) emulating the style of the comments one receives on one's own papers. Apart from generic, often system-generated thank you messages there is generally no feedback on whether and to what degree the editors found my reviewer's reports appropriate and helpful or on how they compared with other reports.

In other words, most of it is learning by doing; after years of practice I now have a good overview of what reviewer reports in my field look like, but as a beginner I had very little to go by.

It is then no surprise that the style and tone in which reviewers in my field write their reports can differ quite a lot from person to person. There is a general pattern of first discussing general issues and broad suggestions and then minor suggestions line-by-line on phrasing, word choice or typos, and there is clearly the expectation of being reasonable and civil. But:
  • Some reviewers may summarise the manuscript abstract-style before they start their evaluation, while others assume that the editor does not need that information given that they have the actual abstract of the paper available to them.
  • Some stick to evaluating the scientific accuracy of the paper, while others obsess about wording and phrasing and regularly ask authors who are native speakers of English to have a native speaker of English check their manuscript.
  • Some stick to judging whether the analysis chosen by the authors is suitable to answer the study question, while others see an opportunity to suggest the addition of five totally irrelevant analyses just because they happen to know they are possible. And sometimes they recommend cutting another 2,000 words from the text despite suggesting those additions, as if those would come without text.
  • Some unashamedly use the reviewer's report for self-promotion by suggesting that some of their own publications be cited, relevant or not.
  • Some use a professional tone and make constructive suggestions on the particular manuscript in question, but others apparently cannot help disparaging the authors themselves. Luckily that behaviour is rare.
  • Some write one paragraph even when they recommend major revision (meaning they could have been more explicit about what and how to revise), others write six pages of suggestions even when their recommendation is rather positive.
Certainly then scientists in my field will have very different ways of approaching the task right from the start. Nonetheless, and for what it is worth, this is how I generally find it useful to do it.

First, I like to print the manuscript - I am old-fashioned like that. I try to begin reading it fairly soon after I accept the job, and for obvious reasons I also try to read through more or less over one day. Often I will read when I need a break from some other task like computer work, on a bus or in the evening at home.

Already on the first read I attempt to thoroughly mark everything I notice. Using a red or blue pen I mark minor issues a bit like a teacher correcting a dictation, while making little notes on the margins where I have more general concerns ("poorly explained", "circular", "what about geographic outliers?").

Usually the following day I order my thoughts on the manuscript and start a very rough report draft by first typing out all the minor suggestions. (I would prefer to use tracked changes on a Word document for that, but unfortunately we generally only get a PDF, and I find annotating those even more tedious than just writing things out.) Then I start on the general concerns, if any, merely by writing single sentences on each point but do not expand just then.

In particular if the study is valuable but has some weaknesses I prefer to sleep over it at this stage for 2-3 nights or, if the task has turned out to be a bit unpleasant, even a few days more, and then look at it again with fresh eyes. That helps me to avoid being overly negative; in fact I tend to start out rather bluntly and then, with some distance, rephrase and expand my comments to be more polite and constructive.

That being said, if the manuscript is nearly flawless or totally unsalvageable I usually finish my review very quickly. If I remember correctly my record is something like 45 min after being invited to review, because the study was just that deeply flawed. In that case I saw no reason to spend a lot of time on trying to develop a list of minor suggestions.

More generally I have over the years come to the conclusion that it cannot be the role of a peer reviewer to check if all papers in the reference list have really been cited or to suggest language corrections in each paragraph, although some colleagues seem to get a kick out of that. If there are more than a handful of language issues I would simply say that the language needs work instead of pointing out each instance, and if there are issues with the references I would suggest the authors consider using a reference manager such as Zotero, done. Really from my perspective the point of peer review is to check if the science is sound, and everything else is at best a distant secondary concern.

At any rate, after having slept over the manuscript a bit I will return to it and write the general comments out into more fluent text. I aim to do the usual sandwich: start with a positive paragraph that summarises the main contribution made by the manuscript and what I particularly like about it. If necessary, this is followed by something to the effect of "nonetheless I have some concerns" or "unfortunately, some changes are required before this useful contribution can be published".

Then comes the major stuff that I would suggest to change, delete or add, including in each case with a concrete recommendation of what could be done to improve the manuscript. I follow a logical order through the text but usually end with what I consider most important, or repeat that point if it was already covered earlier. To end the general comments on something positive I will have another paragraph stressing how valuable the manuscript would be, that I hope it will ultimately be published, etc. Even if I feel I have to suggest rejection I try to stress a positive element of the work.

Finally, and as mentioned above, there is the list of minor suggestions. Most other reviewers I have run into seem to structure their reports similarly.

When submitting the report, however, one does not only have to provide the text I have discussed so far, although it is certainly the most useful from the authors' perspective. No, nearly all journals have a field of "reviewer blind comments to the editor", which I rarely find necessary to use, and a number of questions that the reviewer has to answer. The latter are typically on the following lines:
  • Is the language acceptable or is revision required?
  • Are the conclusions sound and do they follow logically from the results?
  • Are all the tables and figures necessary?
And so on. The problem I usually have is that these questions are binary, but I would like to write something like "mostly yes, except for that instance here which really needs to be dealt with".

Thursday, June 8, 2017

Botany picture #244: Primula veris


Perhaps one of the most artsy pictures I have ever taken, this shows a Primula veris (Primulaceae) at the Zurich Botanic Gardens, taken in 2009. I was at that time involved in pollination experiments on the species.

Saturday, June 3, 2017

Reading up on biogeography part 5: time-slices

Today finishes up, at least for the moment and until the next special issue comes out, the little series on panbiogeography and vicariance geography. The last paper is

Corral-Ross V, Morrone JJ, 2017. Analysing the assembly of cenocrons in the Mexican transition zone through a time-sliced cladistic biogeographic analysis. Australian Systematic Botany 29: 489-501.

It uses area cladograms, as did one of the papers already discussed, but as the title indicates it does so in a way that examines different "time slices".

Before I get to the methodology I would like to establish an analogy.

Imagine you read a recipe for what are supposed to be very amazing pancakes. The instructions are as follows: (1) mix eggs and milk; (2) place the concrete in the deep freezer; (3) pour the mixture into the frying pan. Looking at such instructions you may well wonder first what concrete has to do with anything - not only would you not expect concrete to be part of pancake-making in the first place, but it does not even seem to be used for anything. Next you may notice that there is no mention of flour, although some kind of flour would appear to be necessary to produce pancakes.

There are now at least three possibilities. One is that the authors of this recipe have no idea how to make pancakes and merely pretend they do. Another is that they do in fact know what they are doing in the kitchen but merely wrote very incomplete and confusing instructions. Finally, there is the possibility that we, the readers, are just too ignorant or blinkered to understand the brilliance of the approach.

Some of the papers in cladistic biogeography and panbiogeography are very clear in their methodology, and I can immediately understand what they did and perhaps even what their logic is, even if I may have concerns. But with others I feel as if I am confronted with the above pancake recipe. Either the authors have no idea how to do biogeography, or the methods section could be clearer, or I have no idea how to do biogeography.

In the present case, the authors assembled 49 phylogenies ("cladograms") of various groups of organisms occurring in the Mexican biota that they were interested in. They then, as usual for area cladistics, replaced the names of the terminal taxa in the phylogenies with the areas those terminals occur in, and then extracted "paralogy-free subtrees" for analysis.

To this point it is the same approach as in the previous area cladistics paper, and once again I am a bit uncertain how precisely it works and, more importantly, how it could possibly be justified. When a molecular phylogeneticist removes paralogous alleles from the analysis they do so because we understand a lot about gene duplication, gene families, pseudogenes and suchlike. When an area cladist picks subtrees out of a larger area cladogram, what is the parallel? What is the theory behind it? How do they explain the existence of what they call paralogy in a way that does not make the whole idea of having area cladograms appear absurd? I cannot help but wonder if it is anything more than "this is too complicated so we will ignore it". Maybe I just haven't seen the proper justification, but the papers I have looked at so far seem to limit themselves to saying that paralogy exists and needs to be removed.

But now for the time-slicing. This is now really the pancake recipe: if it works the way I believe I understand the methods then it doesn't make any sense. But if it works in a different way that actually makes sense then it isn't explained well enough for me, at least, to understand. The way it looks to me is that the authors assigned each group of organisms for which they had a phylogeny in their analysis to a "cenocron", which they defined as a "set of taxa that share the same biogeographical history, which is recognised as a subset within a biota".

They then conducted three different analyses supposedly corresponding to the Miocene, the Pliocene and the Pleistocene, using phylogenies from only one, two and then all three cenocrons, respectively. In other words (again, if I understand correctly), the idea seems to be that only organisms from one of the cenocrons would have been in the study area in the Miocene, with the others arriving later. I think.

The conclusion after all this effort is that "the Mexican Transition Zone is a complex area that differs in delimitation from one analysis to another. The present study showed that the results may depend on the assemblage of the taxa analysed, with time-slicing being an adequate strategy for deconstructing complex patterns in cladistic biogeography". That is not exactly the most concrete conclusion I have ever seen. What is more, the second paragraph of the introduction already explained that "the Mexican transition zone, as defined by Halffter (1987), is a complex area", so at this moment I am not really on top of what new insights the analysis produced.

But my more important question is this one: How does the claim work that the "Miocene analysis" examines the Miocene time-slice when the authors appear to have used phylogenies of contemporary (that is Pleistocene) species? The Miocene was 5 to 23 million years ago. The species in the phylogeny would not have existed yet, only their distant ancestors would have, with potentially very different geographic ranges. We are talking the time of our common ancestor with the chimps and waaay beyond!

Do the authors assume that all contemporary species existed 20 Mya ago and have remained utterly static since that time? Where is the flour that one would absolutely need to get a pancake out of this? (Time-calibration of all the phylogenies they used followed by ancestral area estimation, in case that isn't immediately clear.)

Again: maybe I am missing something, perhaps even something that will be obvious to many others, which would make sense of this approach. But at the moment I pointlessly have a lump of concrete sitting in the freezer, and the promised pancake looks very much like a watery omelet to me.