Saturday, May 30, 2015

Non-hierarchical clustering of mixed data in R

I had wondered for some time how one could do an analysis similar to STRUCTURE on morphological data. Of course not quite like STRUCTURE, as in using a model of population genetics, but in the sense of having a method that gives you the best phenological clusters of your specimens for a given number k of clusters.

The problem was always that I didn't know where to get a method that can handle all data types. You may have numerical variables such as leaf length, where it appears obvious that 5 cm is twice as far away from 3 1 cm as from 1 3 cm (ye gods, did I really get that wrong?); but in the same dataset you may also have categorical variables such as hairy or not hairy. These could be coded as 0 and 1 for Euclidean distances, but what if they are multi-state? What is more, you can have categorical variables where there is a clear ordered relationship, e.g. subglobose is somewhere between globose and hemispherical, and others where there is no clear relationships, e.g. white, yellow and purple flower colour.

The R package homals can do a homogeneity analysis, a generalised form of PCA-type ordination analysis, with such mixed datasets. But that is it, it will give you a 2d or 3d plot. It will not do clustering.

This week some concerted googling finally produced the solution: The R package cluster (unsurprising, I know) and its functions daisy and pam (of which I had not heard before). The name doesn't have anything to do with the Asteraceae family, the distance functions in cluster rather appear to be named after female names. When presented with mixed data, daisy defaults to the Bower metric as a distance measure, which is exactly what I wanted.

Because others might be interested in doing non-hierarchical clustering with morphological data, and because getting this right was quite a struggle (as so often with R), I thought I would post a how-to.

Tuesday, May 26, 2015

Botany picture #202: Allium (Nectaroscordum) siculum

Allium siculum (Amaryllidaceae), photo taken presumably in the Botanic Garden of Zurich, 2010. This wild garlic species is both an ornamental and a culinary plant.

Saturday, May 23, 2015

An interesting congruence of objectivist and singularitarian beliefs

In the latest instalment of his dissection of Ayn Rand's Atlas Shrugged Adam Lee of Daylight Atheism discusses, and quotes at length somebody else who discusses, the enormous complexity of production and supply chains that are needed to make items as simple as a pencil, let alone an engine, exposing the absurdity of Rand's belief that "the only thing that's essential to build a tractor, a railroad or an airplane is a rational mind".

I couldn't agree more; the Randian tenet promoted in her books, that all that matters is to be a rational capitalist, and that all company employees and public servants are merely superfluous parasites, falls apart the moment one tries to fit it against the reality of any economy more complex than early Middle Ages subsidence agriculture. And that is also all that needs to be said about those who seriously believe that they shouldn't have to pay taxes because they built all they have by themselves - I'd believe that if they had spent all their life on a lonely island and started by fashioning their own crude stone tools, but not if they are running a company in an industrial age society.

But what really only just occurred to me is that this tenet - if you are only rational and talented enough you can achieve anything, regardless of resource limits and laws of physics - is pretty much identical to a central assumption underlying singularitarianism:

Singularitarians believe that within the next few decades humanity will create a self-improving artificial intelligence which will then quickly achieve an unimaginable level of intelligence. Depending on their general outlook, they are then either hopeful that this event will usher in paradise on Earth, with space colonisation, inexhaustible wealth and immortality for all, or worried that the resulting god-like intelligence will squash us like insects.

In either case a necessary assumption is the same as Rand's: This self-improving supercomputer only needs to be intelligent enough, and then it will be able to achieve anything. Survivable space flight - laws of physics don't matter any more because it is just that intelligent. Solution of all the world's economic and ecological problems - resource limits somehow don't matter any more because it is just that intelligent. Immortality for all - biology doesn't matter any more because it is just that intelligent. Extinction of humanity - and we are helpless and cannot just take an axe to its power supply because it is just that intelligent.

Apparently quite a few Californian information technology entrepreneurs, who are of course the primary support base of the singularitarianism movement, are also libertarians in their political outlook. So perhaps that shouldn't have surprised me, but I just never before made the connection between these two belief systems.

Friday, May 22, 2015

How to sample when testing the monophyly of a group

Perhaps I should write more about positive things, do posts on the lines of "look at this amazing plant" etc., but well...

Within the first few months of this year I have already reviewed one paper and am currently reviewing a second that both make the same interesting claim. They present a phylogeny something like this:

And then conclude that their results have confirmed the monophyly of the red group.

Okay, before you go below the fold, can you guess in how many ways this is ... problematic? (Trying to be polite here.)

Wednesday, May 20, 2015

'Monophyletic species' once more

Originally I had decided not to write anything on the "open letter to scientific community" [sic] from about two weeks ago.

Because the author not so much refutes but rather rejects the logic of the arguments of those he criticises, a response could at best be a rephrasing in different words of what I wrote in the paper that incensed him so. It consequently seemed pointless to write a reply, and indeed I would just refer again to that paper anybody who is interested in arguing about the information content of 'evolutionary' classifications, the feasibility of delimiting taxa based on long branches, and the relevance of a distinction between paraphyletic and polyphyletic taxa.

However, a few days later I picked the manuscript attached to the letter up another time and gave more attention than before to the second half, the one where the author's ire is directed towards the publications of Frank Zachos. There I found a section that motivated me to write something after all; not much, but something, simply because the section in question is so depressingly typical of much of the opposition to phylogenetic systematics:
Another subterfuge is to exempt species from the ban on paraphyly ("the concept of paraphyly does not apply to the species category"), so the "actual common ancestor is (or was) a species, but it does (did) not belong to any supraspecific subdivision of the descendant group" - again a desctructive [sic] (making classification cripple [sic], with millions - one for each "accepted" non-monotypic taxon! - of species "not belonging anywhere") and illogical "convention" designed only to defend the indefensibly harmful dogma
It contains two claims of interest.

Wednesday, May 13, 2015

Botany picture #201: Achillea chritmifolia

No energy left to write anything in the evenings these days, so I will just post a botany picture. I have recently gone through my photo collection to pick out representative examples of Asteraceae (sunflower family) to decorate a phylogenetic tree, and I came across this photo of Achillea chritmifolia that I took in 2010, presumably at the Botanic Garden of Zurich.

I am still as impressed now as I was then by the large number of small capitula making up the corymbose panicles of this species, because the species I had been most familiar with since childhood, Achillea millefolium, has a rather poorer capitulescence.

Undergrads at my old university often got confused between Achillea of the Asteraceae family and superficially similar members of the Apiaceae family, which in Germany are usually about the same size, have similarly divided leaves, and sport double umbels of nearly always white flowers. (The ones here in Australia are morphologically much more diverse.)

In oral exams we always had a bouquet of flowers on the table, and the first task for the student would have been to pick two or three, say what plant family they belong to, and explain why. I remember a case when I was co-examiner; the student confidently picked Achillea millefolium and started as follows: "this is an Apiaceae because it has double umbels, flowers with four free petals..." Around that point the main examiner interrupted her and suggested she take another look.

Sunday, May 10, 2015

A new classification of all living organisms

Of course, the day after I write that there is a near-unanimous consensus that taxa should be monophyletic, I get an alert that a new classification of all of life has been suggested, and it turns out to be proudly non-phylogenetic (Ruggiero et al. 2015, A higher level classification of all living organisms, PLOS One 10: e0119248).

Interestingly, the authors describe their classification as "neither phylogenetic nor evolutionary". There are two ways to read this. Either they don't know what they are talking about, because 'evolutionary' classification is what the proponents of paraphyletic taxa call theirs, and Ruggiero et al's classification has paraphyletic taxa and is consequently 'evolutionary' in that twisted sense; or they mean to indicate that their classification is pragmatic and theory-free.

The latter interpretation would fit with the repeated mention of "serving ... database providers" and "its immediate use as a management tool". In other words, this is about archiving, not science, which is fine as far as it goes. Weirdly, however, the abstract also claims that the new classification would be "immediately valuable as a reference for taxonomic and biodiversity research, as a tool for societal communication, and as a classificatory 'backbone' for biodiversity databases, museum collections, libraries, and textbooks". But that is precisely what a non-phylogenetic classification is not useful for. Naming incomplete, non-natural groups is downright misleading to subsequent taxonomic and biodiversity research, it misinforms the public, and it would misinform students if used in textbooks.

So, how much non-monophyly is there in this system? Lots. They recognise the prokaryotes, although it seems fairly clear now that the archaea are at least more closely related and more similar to the eukaryotes than to the bacteria if not paraphyletic to the eukaryotes. They recognise various groups of algae that are paraphyletic to the land plants; the crustaceans, which are paraphyletic to the insects; the paraphyletic reptiles. And that is just scratching the surface. If somebody were to interpret this as a summary of our knowledge of the tree of life they would be seriously mislead.

Thursday, May 7, 2015

The hegemony (or not) of cladism

The opponents of phylogenetic systematics ("cladism"), that is the proponents of non-monophyletic taxa, usually seem to have one of two views of the current situation in systematic biology:

Some of them feel that they are part of a silent majority that is increasingly asserting itself, that there is a sea change, a growing realisation that cladism doesn't make sense and that non-monophyletic taxa need to be accepted. They perceive an increasing number of scientific publications arguing for their position.

Others feel that they are a suppressed minority, whose warnings about the wrong turn that systematics has supposedly taken over the past few decades are ignored and unfairly kept out of the major journals in the area.

These two perspectives are not entirely mutually exclusive - it could be possible that a cabal of influential editors is maintaining the dominance of cladism but that their days are numbered as the next generation moves in - but they still make for a very different perspective on where things are going at the moment. Does phylogenetic systematics have a clear hegemony, and will it last?

My own feeling is that it depends on what you count, what you particularly care for.

Sunday, May 3, 2015

Ancestral character state reconstruction in Mesquite 3: parsimony versus likelihood

One of the more curious recent developments in my area is that some journals now make all reviewer reports available to all of the peer reviewers of a given manuscript. I like it because it allows me to get a better feeling for whether I have been too lenient or too critical, see other colleagues' style of making comments, and so on.

Very recently I have reviewed a manuscript, and just two days ago I saw what the second reviewer thought. Our recommendations turned out to be generally the same, but one sentence of theirs really annoyed me. When discussing ancestral character state reconstruction, they complained that all reconstructions in the present study were done "only" with parsimony.