Saturday, July 30, 2016

Best practices in identification key design

About two weeks ago I gave a lecture about plant identification, which always includes a few comments on what makes a good or a bad key. Not so much because the students are going to write their own keys in the near future, but in the first instance because they are going to use keys next week, and I want to make it clear that if they have problems it may well be the author's fault and not their own.

Among the literature I put on the course website as further reading is a very nice article on Best Practices in identification key design, Walter & Winterton (2007, Annu. Rev. Entomol. 52: 193-208). It already starts with a nice but sadly accurate quote (“Keys are compiled by those who do not need them for those who cannot use them”), but in my eyes the core of the paper is table 2, which enumerates ten recommendations to key writers:

1. Do not write the key to reflect your classification, write it to make identification as easy as possible even if that means having totally unrelated species next to each other.

2. Avoid couplets with only one character, in case that one character is missing or lost on the user's specimen.

3. Use clear, unambiguous characters and avoid technical jargon.

4. Have the same characters in both leads of a couplet. (Really it is shocking that this needs to be said!)

5. Show illustrations of contrasting character states next to each other.

6. Place illustrations next to where they are needed in the key.

7. “Provide a way out of a dead end: Give links to previous  couplets or other means of keeping on the path.”

8. Design keys so that the couplets split the remaining species half-half instead of so that they divide them one versus all others. This makes the key shorter.

9. Provide descriptions for the taxa so that the user can check if they arrived at a plausible result when using the key.

10. Ask “naïve” end-users to test your key and provide feedback.

Although I would consider some of these points much more important than others I agree completely with all of them.

Coincidentally I had to key out a plant in the same week that I gave the lecture, and this was the very first (!) couplet of the relevant key:
A. Capitula discoid: all florets bisexual, or all florets female, and the corolla-limb of similar size in all florets, to 1.0 mm diam. at base of lobes OR capitula radiate but with only 1-3 ligules; achenes homomorphic

A*. Capitula radiate or disciform: if disciform, the corolla-limb to 0.5 mm diam. at base of lobes, with corolla-limb of marginal florets significantly smaller than that of central florets; if radiate, ligules 4 or more, sometimes inconspicuous; achenes homomorphic or dimorphic
Okay, how does this one couplet score against the list of recommendations of Walter & Winterton? Solution below the fold.

It follows recommendations #4 and #8. I would further argue that #7 and #9 do not apply because we are only considering one couplet in isolation (although there are descriptions later). And as there are no illustrations in the medium I am currently viewing the key in #5 and #6 do not apply either. But the rest...

How many non-specialists would know what disciform versus radiate means? Yes, one can perhaps work it out from context, but why not simply ask whether ray florets are present or not? That would apply to the whole family, but why have an extra term only for this one genus? What is more, the character used is far from unambiguous. Not only is the actual question “ray florets 3 or less versus 4 or more” phrased in an unnecessarily convoluted way, the little addition “sometimes inconspicuous” also shows that it is just not a good character full stop. Especially for the very first question in a key! So much for recommendation #3.

Because the couplet is written in such a convoluted way, it may at first not be clear that it also fails on the criterion of recommendation #2. It cannot be answered at all if the plants are past flowering, as by then the ray florets will have fallen off. There is a second character but it is A versus A or B, so it is useful only in the lucky case of having B. (Admittedly not as bad as the Lomandra example I once wrote about.)

The biggest, and really the underlying, problem relates to recommendation #1. The entire structure of the key was designed to accommodate groups of supposedly related species, not to facilitate fast and secure identification. That is the reason why the first question came out so convoluted. To maximise ease of use it should have been something like “ray florets present versus ray florets absent”. Yes, that would have made it impossible to arrive at larger species groups because presence of ray florets is a very homoplasious trait, easily lost from one closely related species to another. But who cares? Again, reflecting classification is NOT the point of an identification key. Identification is the point of an identification key. It's in the name!

Finally, recommendation #10. If this key was tested on non-specialist users and their feedback taken into account I will eat my hat. I am a specialist in that plant family and even my feedback would have been to start over from scratch.

Note that I am not writing this to be nasty. I just genuinely want to be able to identify plants efficiently. I have been told that next week I will be given a specimen of the very same genus and asked to identify it. As we are dealing with a weed it would be good to get it right. That is all this is about.

Anyway, the Walter & Winterton paper is great. If only more people would read it!

No comments:

Post a Comment