Although I had been aware of its existence for quite some time, I have only recently seriously tried out KeyBase. Under the motto "teaching old keys new tricks", this website hosted by the Royal Botanic Gardens Victoria is a repository of dichotomous identification keys. The following post is a quick note on my experiences so far.
What is in KeyBase?
Coverage so far seems to be mostly Australian plants, but there are also some keys to plants of California, Sri Lanka and New Zealand, and very few non-botanical projects. Maybe coverage will expand over time, it all depends on whether the taxonomic community in other taxa will take it up.
At least for Australia, many of the keys appear to be either previously published in traditional print floras or draft keys meant to be published in upcoming floras, then together with full descriptions, drawings and suchlike. Having them on KeyBase is of course great. In the former case, it makes identification aids freely available to people who cannot afford to buy, say, a four volume state flora for several hundred dollars, or to people who are currently in the field and did not bring those books along, as long as they have an internet connection. In the latter case, it makes keys available that one would otherwise have several years to wait for, and in the ideal case they could be tested and improved in the meantime.
How does it look from the end-user perspective?
If you want to use the key repository, you can enter the scientific name of the group you are interested in into the search field in the upper right corner. Usually it works well. If, for example, you were to enter "Asteraceae", you would be presented with links to keys covering the states of Victoria and New South Wales, all of Australia, and California. Sometimes, however, it doesn't. I know that there is a separate subkey to only the Gnaphalieae tribe in Australia, but searching for "Gnaphalieae" currently draws a blank. You have to go into the Australian Asteraceae key, search the web page for that name, and then click on the little triangle next to it.
This brings me to the next point. Many keys are very helpfully inter-connected. You can use a family key to get to a genus, and then you may find such a little triangle leading you to the genus key. At the moment a species may link to other, external resources, for Australian plants often the Atlas of Living Australia page of the species, which is very helpful as they usually feature a photo and a distribution map.
Each key can be presented in three different ways. The obvious ones are bracketed and indented, terms that I have explained in a separate post. Unfortunately, Firefox has a tendency to freeze on me when I try to have a very large key displayed as indented, but bracketed always seems to work, and indented still works well for smaller keys.
Given that all the keys in KeyBase are dichotomous, it is a bit surprising to find that the third display option is "interactive". As far as I can tell, there is nothing interactive about it though, and it is certainly not multi-access. It just seems to walk through the questions in the usual pre-determined order, displaying only a single question at a time. It thus seems to have only downsides compared to the other two options, and I see no point in using it.
As for the keys themselves, there is obviously very little quality control going on, but that is to be expected in a crowd-sourced project, even if the crowd is largely professional taxonomists. For example, I recently found that a daisy genus containing perennial species can only be reached by answering a couplet as "annual", and then there are the usual problems of hard to judge characters and user-unfriendly, arcane terminology.
How does it look from the contributor perspective?
That is a very good question. I have two small genus keys that I might consider contributing, so I got myself an account today and started looking around for information on how to submit a key and what format it needs to have. The experience was a bit disappointing.
The Help page reads "coming your way soon". Under Manage Account I found the following: "Sorry, we ran out of time and have not been able to create this page yet. In the not too distant future, you'll be able to reset your password and apply to become a contributor to a KeyBase project here." Yeah, that is not very helpful. In that way the repository will not really grow very much.
In conclusion
Although more and more online keys in the future will be multi-access, I see a market niche for KeyBase, especially for collecting in one point lots of previously published, otherwise hard to obtain dichotomous keys, from paper floras and obscure taxonomic publications. And there are also people who actually prefer dichotomous keys.
Already I have made some use of the keys that are available in the repository, so that is good. On the other hand, it definitely needs to become easier to contribute, otherwise it will largely remain restricted to Australian plants, as it is now.
Thursday, April 28, 2016
Thursday, April 21, 2016
The joys of single-character taxonomy
Time for a little rant. Two days ago I tried to identify an Australian native Asteraceae. I already knew that it had to belong to one of two genera, and had always wondered why those two genera were recognised as distinct in the first place. If you put a randomly chosen species from the first next to one from the second you will be hard pressed to see any difference beyond hair cover or suchlike.
I assumed there would be some fruit character, for example feathery versus smooth pappus bristles. That would be bad enough because it would probably still mean that one genus is phylogenetically nested within the other, as is usually the case when there is only a difference in one trait. This is because then one genus is defined by having the trait and the other merely by lacking it; in systematics, we call that an 'apomorphic segregate'. But okay, such a fruit character, even if evolutionarily irrelevant and phylogenetically uninformative, is at least user-friendly. You can look at the pappus (or beak, or whatever) and quickly conclude: ah yes, it must be this genus.
What was the difference in the present case? "Florets homogamous" or "florets heterogamous". Before we consider the trait itself, hands up everybody who knows what that means! Yes, that's what I thought. The identification key in question was apparently written for an end user group of about half a dozen fellow taxonomists in Australia or so, but certainly not for conservation managers, community ecologists, or plant-enthusiastic non-scientists.
Now the trait itself. It means whether the flowers in the daisy flower-head are all of the same type or if there are different types present; and here we are not talking about the presence or absence of petal-like ray florets or anything easy to see like that. We are talking about one of the two genera sometimes having a few female flowers at the edge of the flower-head in addition to the normal, bisexual flowers. In other words, get out the anatomy grade tweezers and a dissecting microscope!
And as expected we are dealing with a single character difference. It is extremely unlikely that the two genera are reciprocally monophyletic, so they probably don't make sense in modern systematics. But even from a so-called 'evolutionary' taxonomy perspective this is weird. Again, you place species from the two genera next to each other, you will not see any significant difference; and surely having or not having a few female flowers in the head is not going to put a species into a different 'adaptive zone' or something. So what is the idea?
What is weirder is that this criterion is not even applied consistently. Another closely related genus has got several homogamous and one heterogamous species.
Of course this is not the first time I have seen a situation like that. The genus I did my honours on was Suessenguthia (Acanthaceae), a group of (now) six species with four fertile stamens and little hooks on the anthers. Some of its species are pretty similar to those of the larger genus Sanchezia except that the latter has only two fertile stamens. In addition, there is a monotypic genus Trichosanchezia that looks exactly like certain hairy, northern Peruvian Sanchezias but has four fertile stamens without the little hooks. Even better, there was once a likewise monotypic genus called Steirosanchezia characterised by two fertile stamens without hooks; that one, however, has already been put out of its misery and sunk back into Sanchezia.
So once there were four genera based merely on minutiae of the androecium, for species that are so similar that they constantly get misidentified to each other's genera, and obviously all forming one tight natural group. How is that helpful? How did that ever make sense even before Phylogenetic Systematics, even before the Theory of Evolution?
Sunday, April 17, 2016
Phylogenetics software: My own journey
This is just a personal note to conclude the ruminations on the popularity and user friendliness of phylogenetics software. Thinking back to my time as a student, and looking at my own publications, what software did I actually use myself, and why?
As I mentioned in the popularity through the years post, my very first exposure to phylogenetics software was through an undergraduate course in Systematic Botany in my third year at Göttingen University, which should have been in 1998/99. We had parsimony analysis explained to us and were shown how to use Hennig86. I remember thinking that its terminal interface was rather clunky, and the commands hard to intuit. Later in the same semester I also had a course in Phycology, and it is possible that we were shown PAUP, but if so then not in any depth.
The first time I had to do a professional phylogenetic analysis was for my Diplomarbeit, which would here be called an honours project, and one of the two resulting papers, which only came out in early 2005. According to the PDF, I used PAUP (already v4) to do parsimony analysis of the morphological data and distance analysis of the molecular (AFLP) data. Why? Well, there was no informed decision making going on, no careful comparison of the merits of the different programs available. Instead, this is simply what others were using and what the institute had licenses for. So you learned how to write a data file from an older student and ran the analysis on an old Mac, because that is how it was done.
Accordingly I also did not, at this stage at least, question the choice of methods. You analysed morphological data with parsimony, sequence data with likelihood, and restriction site type binary data with distance, because that is what everybody did and what the peer reviewers expected. I was rather proud of discovering Farris' successive reweighting approach for myself though, so it is not as if I didn't do my own method-finding.
As I mentioned in the popularity through the years post, my very first exposure to phylogenetics software was through an undergraduate course in Systematic Botany in my third year at Göttingen University, which should have been in 1998/99. We had parsimony analysis explained to us and were shown how to use Hennig86. I remember thinking that its terminal interface was rather clunky, and the commands hard to intuit. Later in the same semester I also had a course in Phycology, and it is possible that we were shown PAUP, but if so then not in any depth.
The first time I had to do a professional phylogenetic analysis was for my Diplomarbeit, which would here be called an honours project, and one of the two resulting papers, which only came out in early 2005. According to the PDF, I used PAUP (already v4) to do parsimony analysis of the morphological data and distance analysis of the molecular (AFLP) data. Why? Well, there was no informed decision making going on, no careful comparison of the merits of the different programs available. Instead, this is simply what others were using and what the institute had licenses for. So you learned how to write a data file from an older student and ran the analysis on an old Mac, because that is how it was done.
Accordingly I also did not, at this stage at least, question the choice of methods. You analysed morphological data with parsimony, sequence data with likelihood, and restriction site type binary data with distance, because that is what everybody did and what the peer reviewers expected. I was rather proud of discovering Farris' successive reweighting approach for myself though, so it is not as if I didn't do my own method-finding.
Friday, April 15, 2016
Modern art or something
I am currently re-counting some photos of root tip squashes for a paper I am preparing, and I stumbled across this photo taken by the student who was working with me. Seems like the exposure was somewhat off, but I find the result actually quite interesting.
The thin area was air caught under the cover, the surrounding larger areas were liquid.
Unfortunately it seems as if the photos we haven't counted yet were largely left over for a good reason: they can't really be counted with confidence. Sigh.
Sunday, April 10, 2016
User-friendliness, or lack thereof, in scientific software
Having now posted about all those phylogenetics programs and written negative-sounding things like "not the most user-friendly", one might wonder (and I even did so myself) what exactly are my criteria for user-friendly, not only with regard to phylogenetics software but for special purpose scientific analysis tools in general, be they for population genetics or whatever.
This is actually not as easy as it seems, because it soon becomes obvious that there is not just one type of user, even in the same narrow field of science. That means that different interfaces have different advantages and disadvantages depending on what use we are thinking of. Let's start with how a user interacts with a program, using phylogenetics software as example cases.
I think there are three main solutions that people implement most of the time, and then there is a fourth rarely used option:
This is actually not as easy as it seems, because it soon becomes obvious that there is not just one type of user, even in the same narrow field of science. That means that different interfaces have different advantages and disadvantages depending on what use we are thinking of. Let's start with how a user interacts with a program, using phylogenetics software as example cases.
I think there are three main solutions that people implement most of the time, and then there is a fourth rarely used option:
Friday, April 8, 2016
Botany picture #227: Gossypium sturtianum
Gossypium sturtianum (Malvaceae), the Northern Territory state flower, from our recent weekend visit to the Australian National Botanic Gardens. This is in the still very young Red Centre Garden, with the typical red sand specifically imported to design this section as genuinely as possible.
The question was always how many of the plants originally planted out would like the Canberra climate. Some have not done so well, but others are thriving, among them some Solanum, Acacia, Calandrinia, several Asteraceae including the poached egg daisy, and this species.
Botanically it is basically a native Australian cotton. Typical for many Malvaceae is the fusion of many stamens into a filament tube around the female organs.
Saturday, April 2, 2016
Species tree method update: iGTP
iGTP - short for Gene Tree Parsimony - is a highly specialised software for the inference of species trees from gene trees using parsimony criteria. It offers the criteria Minimising Deep Coalescences (MDC) to deal with incomplete lineage sorting and Minimising Gene Duplications as well as Minimising Gene Duplications and Losses for when dealing with gene families.
Input data are all the gene trees in Newick format in one text file. Several alleles or individuals in the same species should have the exact same name, which is rather unusual. Most other species tree programs use allele assignment tables.
Sadly it failed to work on Ubuntu at home; it started but did not actually do anything after importing the gene trees. Yesterday I finally gave iGTP a try on Windows. It worked fine with exactly the same gene tree file as I tried on Ubuntu, further demonstrating that there is something wrong with the Linux version I downloaded.
So how did it go? The program is fast and simple to use, but there are two little problems. First, it produced a very different tree for the same dataset than did *BEAST and ASTRAL, and the solutions found by the latter two make considerably more sense in the light of the morphology and biogeography of the species in question. I am afraid in this case MDC may have been mislead.
Second, iGTP has an integrated tree viewer that I found both unnecessarily complicated and, sorry to be so frank, ugly. Complicated in that it is some kind of fancy 3d viewer where a simple FigTree/TreeView style representation would be more helpful, and ugly in that the semi-transparent white font on an intensely blue background hurt my eyes. Seriously, I could barely make out the species names.
MDC is known to find the wrong answer under a very specific set of circumstances (just like parsimony on a single character matrix has a problem with long branch attraction), and maybe that is what is happening here. I am sure there are many cases where it will work better; in fact in the past I have often found that MDC and *BEAST would produce meaningful results where algorithmic/distance based approaches similar to ASTRAL would produce nonsense. But in this specific case I will leave it at having tried out how iGTP works.
Will update the big species tree post to include these new observations.
Input data are all the gene trees in Newick format in one text file. Several alleles or individuals in the same species should have the exact same name, which is rather unusual. Most other species tree programs use allele assignment tables.
Sadly it failed to work on Ubuntu at home; it started but did not actually do anything after importing the gene trees. Yesterday I finally gave iGTP a try on Windows. It worked fine with exactly the same gene tree file as I tried on Ubuntu, further demonstrating that there is something wrong with the Linux version I downloaded.
So how did it go? The program is fast and simple to use, but there are two little problems. First, it produced a very different tree for the same dataset than did *BEAST and ASTRAL, and the solutions found by the latter two make considerably more sense in the light of the morphology and biogeography of the species in question. I am afraid in this case MDC may have been mislead.
Second, iGTP has an integrated tree viewer that I found both unnecessarily complicated and, sorry to be so frank, ugly. Complicated in that it is some kind of fancy 3d viewer where a simple FigTree/TreeView style representation would be more helpful, and ugly in that the semi-transparent white font on an intensely blue background hurt my eyes. Seriously, I could barely make out the species names.
MDC is known to find the wrong answer under a very specific set of circumstances (just like parsimony on a single character matrix has a problem with long branch attraction), and maybe that is what is happening here. I am sure there are many cases where it will work better; in fact in the past I have often found that MDC and *BEAST would produce meaningful results where algorithmic/distance based approaches similar to ASTRAL would produce nonsense. But in this specific case I will leave it at having tried out how iGTP works.
Will update the big species tree post to include these new observations.
Friday, April 1, 2016
The popularity of phylogenetic programs over the years
Some time ago a colleague commented on this blog that "Bayesian (MrBayes/Beast) analyses have become almost stand-alone standard". My feeling is that the situation is much more mixed, with many palaeontologists still using parsimony, people with very big datasets using RAxML, some fields sticking to MEGA, and so on.
I got curious: What is the actual market share of different phylogenetics programs? How has it developed over time?
The following are the programs I considered:
I got curious: What is the actual market share of different phylogenetics programs? How has it developed over time?
The following are the programs I considered: