Tuesday, April 30, 2013

Botany picture #59: Ruellia longipedunculata

Ruellia longipedunculata (Acanthaceae), Bolivia, 2007. Ruellia is one of the largest genera of the Acanths. The American species that I am familiar with are mostly butterfly- or, as in this case, hummingbird-pollinated. The species shown here is a shrub and grows in eastern Bolivia.

Monday, April 29, 2013

"Monophylogenetic" species

This continues a series on species. The previous episodes introduced the topic, provided an intuitive classification of species concepts, and dealt with biological, genotypic cluster and "typological" species.

 The term "phylogenetic" became so popular after phylogenetic systematics gained ascendency in the systematic and taxonomic community that several quite unrelated species concepts were published under that label. In the previous post you may have noticed that I call something the autoapomorphic species concept following this list compiled by a philosopher of science although it was really published as “The phylogenetic species concept (sensu Wheeler and Platnick)”. Not only does that clarification in the brackets nicely demonstrate the problem of homonymy here, but I am also unsure what exactly is so phylogenetic about a concept considering species to be groups of samples with a unique character combination.

I will therefore limit this post to discussing the so-called phylogenetic species concepts that demand species be monophyletic (i.e. the Phylogenetic Taxon Species in the list mentioned above), although that necessarily means that I will in part reiterate what I already wrote before.

Friday, April 26, 2013

Botany picture #58: Scutellaria ventenatii

Scutellaria ventenatii (Lamiaceae), Botanic Garden of Bogota, Colombia, 2007. The skullcap genus Scutellaria is perhaps the easiest to recognize of all Lamiaceae because of its very distinct calyx. It is two-lipped, both lips are smooth (i.e. the five calyx teeth are not really distinct any more), and there is generally a scale-like protrusion on the upper lip. The calyx closes around the fertilized ovule; when the seeds are ripe, it dries out and the upper calyx lip falls off, like a lid. This species is, like many other of its striking South American congeners, clearly hummingbird-pollinated, but the northern hemisphere species are generally bee-pollinated.

Wednesday, April 24, 2013

Comparison of species tree methods

Update 10 June 2013: This post originally from 24 April 2013 has been updated extensively because I have since tried out a set of new species tree methods, got STEM to run and a bit more experience with some others. I have also promoted the post to one of the "recommended phylogenetic systematics" posts on this site despite not being about theory of classification.

Update 27 March 2016: Added ASTRAL and iGTP, restructured the post to be more software-focused.


I spent part of the last few days trying out different species tree methods, partly to help a colleague produce an example tree that he can use in a workshop he is planning and partly because I want to infer a species tree for one of my own projects in the next few weeks. This post was written for two reasons: as a note to myself for future reference and as a pointer for somebody who might want to infer a species tree and does not know which of the many programs to choose. A person like myself a few days ago, one could say, only if they find this post via search engine it might save them some of the frustrations I experienced.

Note that this is not a post for a methods wonk or for somebody who wants to learn about the theoretical or methodological background. It is strictly from the end user perspective, directed at those who want to know what is available, how user friendly the tools are and where to get them.

If you don't know what this is about you might want to refer to my earlier post on the topic. To summarize: these days we mostly use molecular data, in particular the DNA sequences of genes or intergenic spacers, to infer the evolutionary relationships of species. However, any individual gene phylogeny may or may not be congruent with the species phylogeny or with other gene phylogenies because each species inherits a random subset of the pre-existing allele diversity in of its ancestral species. Alternatively, discrepancies between a gene tree and the species tree or between different gene trees may also arise from introgression, rare gene flow between distinct species.

Tuesday, April 23, 2013

Botany picture #57: Arthropodium milleflorum

Arthropodium milleflorum (Anthericaceae), the "vanilla lily", Australia, 2012. I took this picture on a weekend trip this summer up in the mountains. The flowers are intricately beautiful but rather small as you can see from the size of my fingers.

Sunday, April 21, 2013

Watson Woodlands Open Day

Today we spent the morning at the Watson Woodlands Open Day at Justice Robert Hope Park. This is a patch of bush next to the much better known Mt Majura Nature Reserve. It was protected after some controversy in the 1990ies but unfortunately still has development creeping up onto its borders.

At the entrance the event looked deceptively small but it had much more going on than you would think from the picture above. The stall on the right had information booklets, an exhibit of local fossils, cookies and drinks. The stall on the left were Reptiles Inc., a group offering reptile exhibitions. In this case, visitors could pet a constricting snake from Queensland and admire several blue tongue and shingleback lizards and turtles. Finally, there was some very enjoyable didgeridoo music.

This is one of the fossils that were exhibited. Unfortunately, I do not know from what geological epoch it comes but considering that there is a trilobite it must be pretty damn old.

Our greatest interest, however, was in the guided tour offered by botanist John Briggs. He related the local efforts to improve habitat quality, keep weeds at bay and reestablish rarer native plants, and he named native and weedy species and explained how to recognize them. The above picture shows a specimen of Eucalyptus blakelyi that he estimated at ca 400 years old.

Finally, this is a detail of the local dominant 'yellow box gum' Eucalyptus melliodora in flower. I am still not very good at naming eucalypts, so it is good to have a picture of a local species with a name on it. I may use it in a lecture later this year...

Saturday, April 20, 2013

Botany picture #56 1/2: Hottonia palustris

Hottonia palustris (Primulaceae), Germany, 2008. Called Wasserfeder in German and Featherfoil in English due to its feather-shaped leaves, this beautiful member of the primrose family is a water plant. While the roots and leaves are submerged, the large inflorescences are held stiffly erect above the water surface.

Thursday, April 18, 2013

Typological species (?) and autapomorphic species

This post continuing my series on species will perhaps not do its topic full justice, but unfortunately I currently cannot seem to find the time to write as much.

The term "Typological Species Concept" (TSC) is one that you can hear rather often from taxonomists, generally accompanied by a sneer, but it appears surprisingly ill-defined. Googleing around a bit one can, for example, encounter these teaching materials from Southern Illinois University which describe it as follows: "typological species are defined by similarity to a type specimen or ideal type." This is then dismissed as biologically unjustified and followed by several alternative species concepts.

Defining the TSC like this is in my opinion rather strange because it confuses systematics and nomenclature. No matter how a systematist circumscribes species and no matter what species concept they use to do that, afterwards a name has to be assigned to each of these species, and that is always done with reference to a type specimen. A species that contains the type specimen of, say, Bellis perennis will be called Bellis perennis regardless of whether it is a biological species, a phenotypic cluster species or a phylogenetic species*. Those are simply the rules of nomenclature and not another distinct species concept.

So, be the TSC formally defined as such somewhere or not, I will now describe how we in the business appear to use the term. As mentioned above, it is generally used as an accusation, as in "yeah, that guy used a typological species concept, his treatment is completely useless". What the detractors mean in that case is that the taxonomist in question has used a very unbiological and schematic approach to circumscribing species. Typical criticism would include:
  • Excessive splitting, especially "single character taxonomy" in which a specimen showing any morphological deviation whatsoever is immediately given taxonomic recognition as a variety, subspecies or even species
  • Failure to take the plasticity of the organisms into account, basing taxonomic decisions on characters that are variable under varying environmental conditions; this is often due to the taxonomist not having seen the study group in the field
  • Failure to reflect the hypothesized relatedness of the samples in the classification, in extreme cases assigning varieties that are admitted to be most closely related to each other to separate species because they differ in a character that was arbitrarily considered "important enough" to define the species level
As an example, we might consider a breeding group of daisies in which some populations have a pappus and others don't. Now although there are such cases this happens rarely - usually an entire species of daisies either has a pappus or it hasn't. A competent taxonomist using the Phenotypic Cluster Concept would look at overall variation and conclude that the character "pappus presence" does not correlate with anything else and that the overlap in morphological variability between the populations with and those without a pappus is otherwise so huge that they should be one species. A colleague using the Biological Species Concept would find that the two forms interbreed and make them one species. In contrast, a taxonomist with a typological approach (as commonly understood) might well consider them two different species. Their gut feeling would be that presence or absence of a pappus has always been a "taxonomically important character" in daisies, and that is that.

Wednesday, April 17, 2013

Botany picture #56: Lomandra filiformis

Lomandra filiformis (Lomandraceae), Australia, 2010. Lomandras come in various sizes and leaf widths but still all look more or less the same: grass-like with tough wiry leaves, relatively indistinct flowers and often unpleasantly spiny infructescences. In other words, they are uninteresting and ugly, which makes it all the more surprising that they are very popular with gardeners and landscape architects. The solution to that puzzle is, as my colleagues assure me, that these plants are so easy to grow. They cover a flower bed, suppress weeds and need hardly any attention for the next few years. Okay then...

Tuesday, April 16, 2013

Occasionally irritating aspects of paper writing

Writing scientific papers is very much a collaborative effort. Even when you are a single author, you still usually have to go through peer review to see your manuscript published. At a minimum, you have to satisfy one or two reviewers and the journal editor that it is acceptable; in other cases, it might be three or four reviewers, a managing editor and the chief editor. And all of them may have ideas about how you have to change or rewrite some part of the manuscript.

If you have one or several co-authors it becomes even more difficult because they surely have even more to say about how exactly the manuscript should look like when their name is on it than an anonymous reviewer. How well the collaboration on a manuscript works out is very dependent on the compatibility of the coauthors' personalities.

Friday, April 12, 2013

Combined botany and zoology picture

Sulfur-crested cockatoo enjoying the fruits of the season just across from our balcony. The tree it is sitting in, a Callitris, is called "native pine" but is not a Pinaceae but really a member of the Cupressaceae family of the conifers, together with the cypresses (as one might expect, given the family name), thujas, junipers and sequoias. I have seen Callitris forests during field work, and they are much poorer in biodiversity than Eucalyptus forests.

The cockatoo was pulling off Callitris cones with his claws then cracked them with its strong bill to eat the developing seeds inside. These birds are highly intelligent but they also share a less desirable feature with us humans: they are wasteful and needlessly destructive, often defoliating half a tree just for fun. In this case, they have a tendency to drop the cones when they are only half eaten.

Thursday, April 11, 2013

Botany picture #55: Digitalis parviflora

Digitalis parviflora (Scrophulariaceae), Germany, 2008. Another plant that I had established in the garden of my parents, although unfortunately it did not prove to be very resilient. The foxglove genus Digitalis is one of the few groups left in the Scrophulariaceae after they were disintegrated based on phylogenetic studies showing them to be, in their traditional circumscription, a wastebasket family. Most species of Digitalis have considerably larger flowers than this one, and strikingly pink to white D. purpurea is of course a well known ornamental. But this small species with its unusual brown, presumably wasp-pollinated flowers is attractive in its own way.

Tuesday, April 9, 2013

This science spam is getting ridiculous

There was a time when all the spam e-mails I received were the normal, mundane kind, advertising sildenafil citrate, websites supposedly showing pictures or movies of people without their clothes, bargain watches, and of course deals on the lines of "I am the widow of a Nigerian banker and I will send you $4 million if you send me $10,000 first".

That changed, if I am not mistaken, when I participated in one specific conference in 2011. The organizers must have put the details of all participants, including their work e-mail addresses, onto the web and from there they must have ended up in the mailing lists of scientific spammers. This type of spam has been mentioned on this blog before - e-mails inviting the recipient to register for some for-profit conference or a predatory for-profit online only journal. The latter are more often than not willing to "publish" whatever nonsense you send them in exchange for a fee. The blog Scholarly Open Access helpfully documents their dubious practices, including a recent not entirely surprising case in which an open access journal simply disappeared, deleting all the articles authors had paid for to get "published" there. Really, don't waste your manuscripts on outfits like these.

Anyway, I feel that these spam e-mails are getting ever more frequent and stupider. Here are just three from the last few days, indeed two of them appeared in my inbox only today.

Monday, April 8, 2013

Genotypic cluster species (and similar)

This post continues a small series on species that started here and continued with this post.

When I wrote about the Biological Species Concept and its relatives, I wrote that it is what most non-scientists would spontaneously suggest when asked for a definition of species, and that it is also very popular especially with zoologists. At least in theory - it is not very practical to conduct crossing experiments every time you write a monograph or describe a novelty you found as a single specimen on a field trip. It is, however, clearly the concept that evolutionary biologists and everybody studying speciation have in their minds. In contrast, I would argue the concept to be discussed today is the one that most competent taxonomists use in actual practice, often intuitively but increasingly explicitly and supported by quantitative analysis.

The Genotypic Cluster Concept of species (subsequently GCC) was formalized by Mallet (1995). It sees species as groups of individuals that form genetic or morphological clusters with few or no intermediates to other such clusters.

Sunday, April 7, 2013

Botany picture #54: Papaver atlanticum

Papaver atlanticum (Papaveraceae), cultivated in Germany, 2005. Together with willows and Allium, the poppy genus is one of the few plant groups I sometimes really miss in this country. I don't know why, because Australia surely has much more biodiversity to make up for it; especially the local Asteraceae, Lamiaceae and Proteaceae are very much to my liking.

This particular species from NW Africa is not as intense in its color as many other poppies, more orange than red, but it has the advantage of being perennial. I established it in my parents' garden where it has been growing happily for many years now.

Friday, April 5, 2013

Biodiversity Genomics Conference, last day

Today I participated in a workshop on phylogenomics. It was a somewhat mixed bag; on the one hand, there were a few really useful elements, on the other hand one of the presenters merely repeated, virtually one to one, a talk that he had already given on Wednesday. Ah well.

The conference was very rewarding and clearly a great success. Apparently significantly more people wanted to register than there were places. The participants hailed from various continents and represented many different fields of research - from US American evolutionary biologists over German entomologists and Australian soil researchers to New Zealand conservation biologists. I have heard from many colleagues how much they enjoyed it and how much they got out of it, and even that several people have suggested to have a conference like that every year, especially considering how fast the genomics field is evolving.

I cannot help, however, to end on a somewhat skeptical note considering the advances in genomics and Next Generation Sequencing from the perspective of my research interest in phylogenetics. One of the presentations today drove home the point just how mind-bogglingly, unmanageably and ridiculously huge the amounts of data are that are being produced. Again, the 1KITE project sequences the transcriptomes of 1,000 insect species, that means all genes that the insects had expressed at the moment they were sampled. And then they want to use these data to produce a better phylogeny of the insects. Admittedly, they can probably use the same data to do a couple dozen other things in addition, but from a phylogenetics perspective and considering that many other people are doing the same, does sequencing entire whatever-omes really, if we come right down to it, make any sense?
  • Nobody appears to know even where to put all these data. That is true on the level of the individual researcher, who buys a cutting edge external hard drive only to run out of space one project later, but also on the level of the scientific community as a whole. One participant actually asked that question on Wednesday: Genbank accepts annotated traditional Sanger sequences of individual genes and they are already struggling to keep up, but where do I submit a terabyte of genomic DNA sequences to fulfill the requirement that it is publicly available to colleagues who want to be able to reproduce my results? This is getting out of hand pretty quickly.
  • Nobody seems to know how to analyze them appropriately (for phylogenetic purposes). A major topic discussed controversially in today's workshop was the data analysis. I don't want to go into details, but for massive amounts of genomic data for numerous samples the only chance currently seems to be to concatenate all of it and to use phylogenetic tools that trade sophistication for insane speed. At the same time the people generating all these data are keenly aware that what they really want and need are complex models of DNA evolution, complicated partitioning of the data, and species tree methods, but the software that could do those analyses falls over if you try to do them with even just 10% of the available data.
  • And here is the kicker: For phylogenetic purposes, nobody actually needs that much data. Alan Lemmon was entirely correct when he said that 400-500 independent loci are more than enough for phylogenetic analysis. But even that might already be overkill, as Leaché & Rannala (2011) found that 10-100 loci are generally sufficient even in difficult cases, and as few as 10 in simple ones. In other words, do we really need to expend a lot of effort and money on sequencing entire genomes and produce reams and reams of data if using 0.5% of those data would already give us precisely the same result? Don't get me wrong, this all makes sense if you are interested in exploring signals of adaptation in the genome and suchlike, but at the moment it appears the phylogeneticist is presented with a shiny new nuclear bomb and told that this is a good way to kill the flies in their house. What we need would be a labor- and cost-efficient way of capturing, say, 40-50 loci for a few hundred samples (preferably from low amounts of extracted DNA) instead of laborious ways of producing insane amounts of useless data for a few samples. They same goes for population geneticists, by the way.

Thursday, April 4, 2013

Biodiversity Genomics Conference, third day

Again a lot of talks today, but again I could not catch all of them. However, this time I heard the session of so-called "lightning talks", seventeen talks in one and a half hours, each of them about five minutes long and without discussion time. The experience has left me doubtful about the idea of lightning talks. Such a short time is not at all adequate to present scientific results. It may work for putting ideas out there, for presenting projects still at their beginning to ask for feedback or collaborations, or for advertising new methods or software tools as long as they are not too complex. But in those cases a question or discussion time afterwards is all the more important.

Most speakers managed to finish their talks in time but one of them had a ridiculously large number of slides prepared which looked as if they would have been hard to cover in a standard twelve minute talk. The topics varied widely - from identifying prey items by genotyping predator feces or stomach contents to the exploration of the invertebrate fauna of Antarctica - and were mostly interesting despite few of them being directly related to my own research interests.

Wednesday, April 3, 2013

Biodiversity Genomics Conference, second day

Lots of talks today. After the opening words, Leo Joseph of the Australian National Wildlife Collection gave an inspiring talk about the potential of combining the resources of biological collections (animal skins, needled insects, herbaria, etc.) with Next Generation Sequencing. Specimens in these collection provide not only genomic data that can be extracted from them but also usually spatial data (where collected), temporal data (when collected) and phenotypic data that could be combined with the genotype. He also pointed out that conserved specimens with their degraded DNA are actually better usable with genomic tools than with traditional Sanger Sequencing because the former are designed to read short DNA pieces. Next, Justin Borevitz (Austr. Natl. Univ.) presented a very dense talk on his research in Arabidopsis and Australian Pelargonium, focusing on attempts to correlate genetic variation with phenotype (such as flowering time) and geographic locality to understand local adaptation.

The next session started with Graham Coop (UC Davis) who presented recent progresses in providing null models of genetic variation. Admittedly much of it was too mathematical and went over my head, but I got the general idea. When you want to know whether some genetic data signals local adaptation, the null hypothesis cannot simply be that the genes are distributed randomly. Instead, you have to account for covariance or in other words the possibility that the relevant gene copies are locally frequent due to the shared history of the local populations. For example, are Europeans light-skinned because they just happened to descend from a light-skinned ancestral population or because it is an adaptation to something? The solution is to produce a population by population covariance matrix with lots of genetic data which can be assumed to be mostly neutral, and then to compare the genetic variants of interest against it.

John Novembre (Univ. of Chicago) gave a really clear and well presented talk on four methods of increasing sophistication for spatial assignment of samples. The idea is that you have genetic information for a representative number of samples of a species across its range, e.g. all African elephants. Now customs seize a shipment of ivory from a black market in South East Asia. What analyses could you use to infer the country of origin from genetic information extracted from the seized tusks? The four methods he presented are PCA based, logistic frequency surfaces, Gaussian process based ("SCAT") and a joint analysis of genomic and isotopic data with SCAT. The first are simple but imprecise, SCAT is sophisticated but computer intensive.

Rose Andrew (Univ. of British Columbia) presented her research on incipient speciation in sunflowers. She used genomic data to examine patterns of divergence and what they mean for local selection, gene flow and isolation. The last speaker of the session was Ke Bi (UC Berkeley) who used genomic data from ca. 100 year old and contemporary chipmunk specimens to test the impact of climate change on their genetic structure; a very nice example of how museum specimens can provide the basis for cutting edge research!

In the afternoon session, Sonal Singhal (UC Berkeley) showed the pattern of genetic variability across five contact zones between species that have diverged at different times, with a clear negative correlation of introgression and time since divergence. Alan Lemmon and Emily Moriarty Lemmon (Florida State Univ.) advertised their recently published "anchored hybrid alignment" technique for rapidly producing data from hundreds of loci for ca. 100 samples for phylogenetic analysis. This was probably the talk that was most relevant for my own future research interests but, well, they have only done it for vertebrates and insects so far. Bernhard Misof (Univ. of Bonn) provided an update of the 1KITE project which is close to reaching its goal of sequencing the transcriptomes of 1,000 species of insects. The amounts of data produced in the course of the project are just mind-boggling.

Finally, Tariq Ezaz (Univ. of Canberra) presented some of his work on the evolution of sex chromosomes in lizards. Sex determination is really odd if you think about it. There are at least three systems in land vertebrates: XY, as we humans have it and where the females are XX homozygous, ZW, where the males are ZZ homozygous, and temperature dependent. In other groups of organisms, it gets even weirder, with some animals changing sex as they age, some choosing a sex depending on what the current sex ratio in their population is, and hymenopterans having haploid males. After that session ended, there were fifteen very short "lightning talks", but I had to get a few things done and missed them.

To be honest, I feel pretty overwhelmed by all the lab method wonkishness going on here. I am more interested in the phylogenetic analyses and, well, the actual organisms themselves, and my ideal situation would be to outsource the lab work part. On the other hand, I am probably the only person at the conference who could tell Pogonlepis stricta apart from Angianthus micropodiodes, or Odixia angusta from Ozothamnus rodwayi, so there is that. We all have our areas of expertise and it would be pretty boring if we all tried to be sequence capture experts...

Finally, from a purely technical perspective, the talks were generally at a very high level. Certainly variable in quality - the major problems I saw were some very overcrowded slides and the odd overly rushed presentation - but there were definitely no bad or uninteresting ones. It was also interesting to note that while one usually sees much more variability on conferences, nearly all of the speakers today used slides with a very simple black font on white background template and without any fancy logos or corporate identity nonsense. I appreciate that. One talk was white font on black background, which is harder on the eyes, but in this particular case it might be considered somewhat justified because most of the figures it showed were microscopy images with a black background.

Tuesday, April 2, 2013

Biodiversity Genomics conference, day one

Today begins the CBA Biodiversity Genomics Conference here in Canberra. Its topic is the use of genomic data and Next Generation Sequencing (NGS) in biodiversity studies. Today only the welcoming reception and a public lecture took place. Tomorrow will mostly consist of talks about applications to evolutionary biology, landscape genomics and systematics, Thursday will then feature talks about conservation issues and the conference dinner. Friday is for workshops; I will participate in one on phylogenomics. Let's see how it goes.

This is the first time since 2008 that I am going to a conference in my city of residence, so that feels a bit odd. Cheaper, of course - no airfares and hotel necessary. It is also the first time since the exact same year that I am going to a conference without giving a talk. Not having done any NGS in my research so far, I am really participating only to learn.

One thing I always wonder about, by the way, is the curious English phrase "to present a conference paper" meaning really "to give a talk at a conference". Generally the only paper involved are a few square centimeters in the abstract book, otherwise it is all sound waves.

Monday, April 1, 2013

Botany picture #53: Tolpis umbellata

Tolpis umbellata (Asteraceae), Australia, 2010. This daisy is not native to Australia but as weeds go it appears to be a fairly unproblematic one. The plants are very thin and ephemeral and do not seem to be able to outcompete native vegetation as far as I can tell. As shown in this picture, the flower heads are also fairly attractive once you give them a closer look - note the three different flower colors in the same head despite the fact that it is a member of the Cichorieae and thus has only the ligulate type of florets. Unfortunately, other daisies are much more aggressive invaders on this continent.