Wednesday, July 30, 2014

Botany picture #167: Anthyllis vulneraria subsp. praepropera


Anthyllis vulneraria subspecies praepropera (Fabaceae), France, 2014. This picture was taken in the grounds of the Chateau de Peyrepertuse. Not only is it an amazingly beautiful castle ruin, not only does it have an awesome view over the landscape of the Corbieres mountains, but it also has a diverse and colourful rock-garden like flora. Definitely worth a visit but due to the steep climb it takes a bit of stamina.

Tuesday, July 29, 2014

Bracketed and indented keys

As mentioned before, the identification keys published by taxonomists and used by all manner of end-users of taxonomic literature to identify organisms to species are traditionally analytic, single entry and dichotomous. That means they consist of series of nested questions that have to be answered one after the other in a fixed order to arrive at the correct species name.

In technical terminology, the individual questions are called couplets, and the two alternative answers of each are called leads. A good key should have leads describing a limited number (perhaps ca. three) of clear, distinctive, and easily understood characters.

However, even for the exact same content there are two different ways of formatting these keys. The first option is a bracketed key. It looks as follows:

1     Leaves 0.5-2.0 cm long ... 2
      Leaves 3.0-10.0 cm long ... 3
2(1)  Corolla red ... Planta australis
      Corolla white or yellow ... Planta latifolia
3(1)  Fruit a capsule ... 4
      Fruit a berry ... Planta vulgaris
4(3)  Flowers solitary ... Planta palustris
      Flowers in axillary clusters ... Planta debilis


Here the two leads of the same couplet are always directly below each other. This has the advantage that it is very easy to compare the two alternatives, but on the other side the couplets need to be numbered in some way so that we know where we have to go next. In larger keys, it may also be harder to find the next or, if back-tracking, the previous couplet because one has to jump around so much.

In the above case, each couplet except the first also has a number in brackets directly after its own. This indicates where the user had to be coming from to arrive at this place, making back-tracking in large keys easier. But of course not all large keys have them.

The second option is called an indented key. Here is exactly the same key as above in an indented form:

Leaves 0.5-2.0 cm long
   Corolla red ... Planta australis
   Corolla white or yellow ... Planta latifolia
Leaves 3.0-10.0 cm long
   Fruit a capsule
      Flowers solitary ... Planta palustris
      Flowers in axillary clusters ... Planta debilis

   Fruit a berry ... Planta vulgaris

In this case, the follow-up, nested couplets are directly after the lead that, well, leads to them. On the one hand, this makes navigation easier, and couplet numbers are not strictly necessary. On the other hand, it is harder to compare the two leads of each couplet. Worse, indentation does not work very well to indicate what couplets belong together if the key is long enough to spread over several pages. Imagine reading trying to find the second lead two or three pages down the book and then deciding that the first one was right after all! Unfortunately, some authors do not seem to see a problem with that.

Reading between the lines, it may have come across that I prefer bracketed keys, and that is indeed the case. It seems that bracketed keys are more popular in continental Europe while indented ones are traditionally more popular in Anglo-Saxon countries. Be that as it may, with the increased availability of digital keys the bracketed variant appears to be on the rise.

Of course internet and other computer based keys allow designs that go far beyond the limitations of analytic single entry keys. But the latter are still widely used, and they do have their advantages. At a conference two years ago some colleagues pointed out that working through a single entry key makes it much easier to learn about the whole diversity of and character distribution in the group, whereas using a table-based key is much more of a 'black box' experience.

The thing is, when single entry keys are used on a digital medium, there is pretty much no point in separating the two leads, especially if the key is supposed to work on small displays. In effect digital keys therefore tend to be increasingly bracketed.

Monday, July 28, 2014

Botany picture #166: tiny Western Australian sundew


Today (or to-die, as the Australians pronounce it) I lectured about leaves and indumentum, that is the hairs, scales or glands that can be found on plant leaves and stems. One of the pictures I showed in the presentation was the above, a really really tiny Drosera or sundew from Western Australia.

It nicely demonstrates not only two of the very many functions that leaves have evolved to fulfil - in this case, photosynthesis and capturing and digesting insects to improve nutrition - but also, obviously, glandular hairs.

Thursday, July 24, 2014

The Doomsday Argument

There exists something called the Doomsday Argument, and it is considered to be one of the most controversial probabilistic arguments that have been advanced.

Randall Munroe has given a very good summary of the argumentation:
Humans will go extinct someday. Suppose that, after this happens, aliens somehow revive all humans who have ever lived. They line us up in order of birth and number us from 1 to N. Then they divide us divide them into three groups--the first 5%, the middle 90%, and the last 5%:
Now imagine the aliens ask each human (who doesn't know how many people lived after their time), "Which group do you think you're in?"
Most of them probably wouldn't speak English, and those who did would probably have an awful lot of questions of their own. But if for some reason every human answered "I'm in the middle group", 90% of them will (obviously) be right. This is true no matter how big N is.
Therefore, the argument goes, we should assume we're in the middle 90% of humans. Given that there have been a little over 100 billion humans so far, we should be able to assume with 95% probability that N is less than 2.2 trillion humans. If it's not, it means we're assuming we're in 5% of humans--and if all humans made that assumption, most of them would be wrong.
To put it more simply: Out of all people who will ever live, we should probably assume we're somewhere in the middle; after all, most people are.
If our population levels out around 9 billion, this suggests humans will probably go extinct in about 800 years, and not more than 16,000.
He goes on to state that most people immediately conclude that the idea is obviously wrong, but "the problem is, everyone thinks it's wrong for a different reason. And the more they study it, the more they tend to change their minds about what that reason is."

Well, there are two reasons why that could be so. One is that the argument is really quite clever but most people don't realise it. The other is that there is so much wrong with it that people discover new layers of wrongness every time they look at it.

I guess I would have to be counted among those who think that the Doomsday Argument is, indeed, idiotic. Admittedly I cannot come up with a super-deep Bayesian counter-argument such as are referenced in the linked Wikipedia article. But I don't think that is necessary because this does not look like a job for probabilistic reasoning anyway.

Wednesday, July 23, 2014

Another brilliant piece of science spam

Lately I get the feeling that, on average, scientific spam trolling for paper submissions to poor quality, for-profit journals gets somewhat sleeker and more professional looking. And perhaps I have now blocked so many of these spammers that I get less of the really obvious ones.

But sometimes a real howler still gets through. Look at this beauty:

Tuesday, July 22, 2014

Botany picture #165: Blackstonia perfoliata


Blackstonia perfoliata (Gentianaceae), south-western France, 2014. This little yellow Mediterranean gentian was all over the place when we visited family in June. Strangely we never noticed it during previous visits to the area.

Monday, July 21, 2014

How to make a bad identification key

Today was my first lecture of the year, and after an introduction to the course as a whole I started with identification tools.

For the non-biologists, I have blogged before about identification keys; the most common type consists of a series of nested questions asking about characters of the organism you are trying to identify. In the style of a choose your own adventure book, answering the questions correctly will ultimately lead to your happy end, in this case the name of the species you are interested in.

Looking over my lecture brought to mind some of my own painful experiences trying to identify plants in the past. (The willow key in the Flora Europaea, argh argh argh...) I may write something more positive soon, for example explain different ways of designing a key or point towards really good examples, but for the moment let's brainstorm a list of what a taxonomist should do if they want to produce a really atrocious key.

To make them as difficult to use as possible, one should:
  1. Use arcane terminology. Anybody can confound an untrained lay user of the key by writing "adaxial leaf side" instead of "upper leaf side". The real prize is to find expressions that are so specialised to your group of organisms or so local to your geographic area that only three other taxonomists on the planet will be able to understand what you mean. Prime example in Australian daisies: the "claw", which is the petiole of an involucral bract. (And non-botanists will of course not know either expression.)
  2. Instead of writing questions that divide the number of remaining species neatly into two approximately equal halves, have a string of questions that divide between one single species and all others. Bonus points if the first answer to each question is a long list of characters and the alternative is always a laconic "characters not found in this combination".
  3. Divide the species so that the 'distinguishing' characters are strongly overlapping, for example "glomerule diameter 1.5-2.7 mm versus glomerule diameter 2.2-3.5 mm". When faced with categorical characters, the ideal solution is something like "hairy versus naked or hairy", although a similar effect can be achieved through liberal use of "usually", "sometimes", or "except in [species that really shouldn't be in that part of the key]".
  4. In a similar vein, distinguishing characters can be made less useful through the simple expedient of adding the prefix "sub-", which means kind of but not really. Examples: subglabrous = not quite entirely without hairs, subacute = a bit pointy but not really pointy, subspicate = somehow like a spike-like inflorescence but, I dunno, also a bit different? No idea really.
  5. Make liberal use of characters that cannot be observed in the field, for example because you'd need a microscope. This will particularly endear you to all the park rangers and bushwalkers trying to use your key.
  6. Another option is to design the key so that the first few questions are all about characters that most users will not have available. The classic example are keys to some subgroups of the daisy family (Asteraceae) or to the parsley family (Apiaceae) that focus obsessively on fruit characters. Collected a Calotis in full bloom but before fruits start to develop? Well, you are out of luck - even if in that particular genus half the species have yellow flowers and the other half have purple ones, that character is never mentioned in the key.
  7. If all else fails, write a long and convoluted key that contains the same question twice, such as the key to European willows mentioned above. So you want to identify a dwarf willow from the alpine zone of the Pyrennees, and the very first question in the entire key is whether you are (a) dealing with a dwarf willow from an Arctic or alpine area or (b) a large shrub usually at least 2 m tall when mature from a lowland temperate area. You chose (a)? Silly you, of course you should have gone down the path (b) because ten or so questions in you will again be asked if you have an alpine dwarf willow, and that will then lead you to Salix pyrenaica.
The sad thing is, I am fairly sure that the colleagues in these cases I am thinking of actually thought they were writing simple and useful keys. Nobody would do these things on purpose, right?