Thursday, August 28, 2014
Botany picture #172: Acacia decurrens
I am reasonably optimistic that this is indeed Acacia decurrens (Fabaceae), one of many wattles that are currently in flower here in Canberra. This was a small tree in Mount Majura Nature Reserve. It is the first time that I notice an Acacia with the styles of the individual flowers sticking out of the heads like that.
Wednesday, August 27, 2014
Final update on using fastStructure and similar software
After my somewhat mixed experience trying to use fastStructure, I have recently found the time to throw my data at two other programs for inferring population structure.
To recap, I have thousands of SNPs for two groups of species, in one case from 91 individuals and in the other from 224 individuals and I want to know how best to group the individuals into separate 'populations', in the present case potential species. I originally used fastStructure because it was new and supposedly written specifically for large numbers of SNPs, but the results were ultimately odd. The clusters didn't make very much sense and the program found virtually no admixed individuals, that is hybrids, although there really should have been some.
Earlier this week I then tried the R package adegenet. On the plus side, it turned out to be very simple and user-friendly. Of course you need to know how to use R, but the manual of the package is well written, and adegenet has a straightforward "read" function for importing datasets. It easily imported my Structure file without any hiccups, and after that it was a simple manner of handing my data over to adegenet's "find.clusters" function.
However, I tried different settings and did not get reasonable populations with any of them. One problem in my dataset are missing data, and I found that setting allele frequencies to zero for those cases produced the most meaningful results, but still there were several populations with no samples in them and the populations that had samples didn't make a lot of sense.
Yesterday I finally tried my luck with good old Structure itself - somewhat hesitatingly because I feared it would be very slow with such a big dataset. Yes, even for my smaller dataset what I wanted to do ran overnight, but that is still faster than I feared, and the results are worth it. The populations make sense, and in marked contrast to fastStructure it finds evidence of admixture. My larger dataset will probably need several days to be analysed, but if that is necessary so be it.
There is probably a reason why that program is the most popular in the area...
To recap, I have thousands of SNPs for two groups of species, in one case from 91 individuals and in the other from 224 individuals and I want to know how best to group the individuals into separate 'populations', in the present case potential species. I originally used fastStructure because it was new and supposedly written specifically for large numbers of SNPs, but the results were ultimately odd. The clusters didn't make very much sense and the program found virtually no admixed individuals, that is hybrids, although there really should have been some.
Earlier this week I then tried the R package adegenet. On the plus side, it turned out to be very simple and user-friendly. Of course you need to know how to use R, but the manual of the package is well written, and adegenet has a straightforward "read" function for importing datasets. It easily imported my Structure file without any hiccups, and after that it was a simple manner of handing my data over to adegenet's "find.clusters" function.
However, I tried different settings and did not get reasonable populations with any of them. One problem in my dataset are missing data, and I found that setting allele frequencies to zero for those cases produced the most meaningful results, but still there were several populations with no samples in them and the populations that had samples didn't make a lot of sense.
Yesterday I finally tried my luck with good old Structure itself - somewhat hesitatingly because I feared it would be very slow with such a big dataset. Yes, even for my smaller dataset what I wanted to do ran overnight, but that is still faster than I feared, and the results are worth it. The populations make sense, and in marked contrast to fastStructure it finds evidence of admixture. My larger dataset will probably need several days to be analysed, but if that is necessary so be it.
There is probably a reason why that program is the most popular in the area...
A new low in science spam
Science spammers sometimes use scripts to mine journals or article databases for authors' contact details and then generate automatic spam. That saves them a lot of work, of course, but one has to wonder about the efficiency of that approach because the results are so bizarre and off-putting.
Behold:
Behold:
It is even worse because all the fields that were filled in by the script are in italics in the original (like the keywords above), making it even easier to see what is going on. So sad.Dear [my name]; [name of co-author]; [name of co-author]; [name of co-author]; [name of co-author]; [name of senior author],Once you published a paper titled [name of our paper] in [name of the journal] . With such attractive theme [what follows are the keywords of our article] Cell size; chromosome; flow cytometry; genome size; guard cell size; ploidy, the article is so outstanding. It shows your professional and rigorous attitude. On behalf of the academic world, we appreciate your contribution to the research filed [sic!] very much. And we wonder if you have any new progress or you are doing any new study about your interested field.Science Publishing Group [link], who [sic!] publishes Journals, Special Issues, Books and Conference Proceedings, now sincerely invites you to contribute your new articles to the website.
[another link]Submit Your Latest ResearchNew progress of your latest research
New study in your research field
A view for the new research trends
Please submit your new papers via: [another link]Advantages to Publish with SciencePGPeer Review: Effective and professional
High Visibility: Up to 40,000 visitors per day
Open Access: Open to the public free of charge
Low APC: Article Processing Charge ranges from 70 to 270USD
Abstracting and Indexing: WorldCat, CrossRef, JournalSeek, CASSI, etc.If you are doing some new study, please kindly notify us. And we are looking forward to your participation.
Monday, August 25, 2014
Categorical Analysis of Neo- and Paleo-Endemism
So, after laying the groundwork by looking into biodiversity metrics such as (Species) Richness, Corrected Weighted Endemism (CWE), Phylogenetic Diversity (PD) and Phylogenetic Endemism (PE), we can come to the first of two recent papers that I wanted to discuss. A few weeks ago several friends and colleagues, some of them from my own institution, published a new, quantitative method for locating hotspots of endemism called Categorical Analysis of Neo- and Paleo-Endemism (CANAPE; Mishler et al., 2014).
Saturday, August 23, 2014
Botany picture #171: Banksia coccinea
Banksia coccinea (Proteaceae), Western Australia, 2012. To my great surprise I found this species being sold as a cut-flower in our little local supermarket just yesterday. Because my wife likes the genus so much I bought one of them and it is now sitting on our dinner table.
Also, we are rather proud that our five year old daughter was able to immediately identify it as a Banksia although she had never seen this particular species before.
Wednesday, August 20, 2014
Diversity metrics
I want to blog about two recently published papers, one on keys and one on a method for spatial analyses of biodiversity, but for the latter some groundwork is necessary. This post will provide that groundwork so that I can then cunningly link back to it.
The last 25 years or so have seen the rise of spatial studies of patterns of biodiversity. They have been made possible by the increased availability of large databases with specimen occurrence records such as Australia's Virtual Herbarium, for example. Where a generation ago most information on the occurrence of species came from distribution maps drawn by specialists on the various groups of organisms, we can now enter a species name into a database search and are rewarded by a large list of geocoded specimens ready for use in our analyses.
Over the same time, several new diversity metrics have been developed to allow ever more sophisticated analyses. What is a diversity metric? It is a numerical value that tells us how diverse the organisms of our study group are in a particular part of our study area.
The study area as a whole is divided into cells; ideally these are equal area cells of for example 100 km x 100 km, alternatively they are biogeographical or political units. We can then look at our diversity metric and say, aha, in this cell there is particularly high diversity, and that might influence our decisions about what areas to prioritise for conservation. Okay, now what metrics are there?
The last 25 years or so have seen the rise of spatial studies of patterns of biodiversity. They have been made possible by the increased availability of large databases with specimen occurrence records such as Australia's Virtual Herbarium, for example. Where a generation ago most information on the occurrence of species came from distribution maps drawn by specialists on the various groups of organisms, we can now enter a species name into a database search and are rewarded by a large list of geocoded specimens ready for use in our analyses.
Over the same time, several new diversity metrics have been developed to allow ever more sophisticated analyses. What is a diversity metric? It is a numerical value that tells us how diverse the organisms of our study group are in a particular part of our study area.
The study area as a whole is divided into cells; ideally these are equal area cells of for example 100 km x 100 km, alternatively they are biogeographical or political units. We can then look at our diversity metric and say, aha, in this cell there is particularly high diversity, and that might influence our decisions about what areas to prioritise for conservation. Okay, now what metrics are there?
Tuesday, August 19, 2014
Botany picture #170: Banksia integrifolia
Banksia integrifolia (Proteaceae), Jervis Bay, 2014. This may be the best known Banksia, for a variety of reasons. First, it occurs the coast near Sydney where a lot of foreign tourists can see the species. Second, that also means that it was one of the first species of the genus ever to be collected by a European scientist, Joseph Banks himself. Third, it is an easily cultivated plant found as an ornamental tree or even Bonsai in various parts of Australia. And fourth, it has even been introduced to other countries, and there is slight concern that it might turn into an invasive weed in New Zealand.
As can be seen in the above picture, the flower spikes as well as the fruiting cones are very attractive.
Monday, August 18, 2014
A metric of Twitter and blog hits for scientific articles
On this blog I have sometimes mentioned, and complained about, the influence that the obsession with article citations and journal impact factors (IF) has on scientific publishing, scientific careers, and the hiring choices of scientific institutions.
In short, instead of actually reading and understanding scientific papers, many people 'assess' their value by looking at how often they have been cited. Instead of reading and understanding their work, many people, even members of search committees or advisors of funding agencies, 'assess' scientists by looking at how often their papers have been cited. And instead of reading and understanding the articles published in them, many people 'assess' scientific journals by looking at how often their average article is cited within the first two years after publication.
And as mentioned before, this approach systematically favours areas of science that have a quick turn-around and lots of practitioners able to cite each other whereas it systematically disadvantages areas of science where many publications are written for long term use, published in books as opposed to journals, and while very useful to many people may not even be meant to be cited. Such as the floras and monographs produced by taxonomists, for example.
But of course, once you think you know what bad looks like somebody will introduce you to something worse.
In short, instead of actually reading and understanding scientific papers, many people 'assess' their value by looking at how often they have been cited. Instead of reading and understanding their work, many people, even members of search committees or advisors of funding agencies, 'assess' scientists by looking at how often their papers have been cited. And instead of reading and understanding the articles published in them, many people 'assess' scientific journals by looking at how often their average article is cited within the first two years after publication.
And as mentioned before, this approach systematically favours areas of science that have a quick turn-around and lots of practitioners able to cite each other whereas it systematically disadvantages areas of science where many publications are written for long term use, published in books as opposed to journals, and while very useful to many people may not even be meant to be cited. Such as the floras and monographs produced by taxonomists, for example.
But of course, once you think you know what bad looks like somebody will introduce you to something worse.
Saturday, August 16, 2014
Botany picture #169: Corymbia gummifera
Corymbia gummifera (Myrtaceae), Jervis Bay, 2014. Only slowly I am beginning to really appreciate the Myrtaceae family. I must admit that when I came here I decided to focus on learning other groups first, especially Asteraceae, Lamiaceae, Acacia and Proteaceae, but now I am starting to get a handle on the generic concepts in that family.
Corymbia is generally easily recognised by the attractive, urn-shaped capsules, as seen in this picture.
Thursday, August 14, 2014
In praise of PAUP*
Hell is freezing over! Pigs are flying! PAUP* is getting updated for the first time in twelve years!
Jokes aside, this is great news. PAUP*, short for Phylogenetic Analysis Using Parsimony (* and other methods), is one of the best known software tools for phylogenetics. Indeed to me it is pretty much the phylogenetic software tool. Yes, depending on the task at hand I also use TNT, RAxML, Mesquite, MrBayes and BEAST with various of its add-ons, but PAUP* is the one I started out with while writing my thesis and it is still the one I feel most comfortable using.
Another major issue is what you can and cannot do with the various programs. The downside of PAUP*, or at least of the previous version, is that it is comparatively slow. So if you have a large dataset with many taxa, you are better off using TNT for parsimony and RAxML for likelihood analyses. But PAUP* can do various kinds of analyses that no other software can do; for example, I would not know how to conduct a Templeton test without it.
(My experience with PHYLIP is limited. Maybe it can do some of the same things. The problem is that its combination of rather excessive modularity and a call centre style user interface - on the lines of "press 3 for this kind of analysis" - has put me off using it so far.)
So over the past few years I have sometimes worried about the day when PAUP* would suddenly stop working on the newest computers. It is good to know that a new version is coming up!
The idea is that ultimately there will be GUIs for Win and Mac that one has to buy, but that command line versions for Win, Mac and Linux will be free. I guess I will be happy to use command line myself, but it might be a good idea to get a GUI licence for small student projects where the student cannot necessarily be expected to learn the PAUP* commands.
Jokes aside, this is great news. PAUP*, short for Phylogenetic Analysis Using Parsimony (* and other methods), is one of the best known software tools for phylogenetics. Indeed to me it is pretty much the phylogenetic software tool. Yes, depending on the task at hand I also use TNT, RAxML, Mesquite, MrBayes and BEAST with various of its add-ons, but PAUP* is the one I started out with while writing my thesis and it is still the one I feel most comfortable using.
Another major issue is what you can and cannot do with the various programs. The downside of PAUP*, or at least of the previous version, is that it is comparatively slow. So if you have a large dataset with many taxa, you are better off using TNT for parsimony and RAxML for likelihood analyses. But PAUP* can do various kinds of analyses that no other software can do; for example, I would not know how to conduct a Templeton test without it.
(My experience with PHYLIP is limited. Maybe it can do some of the same things. The problem is that its combination of rather excessive modularity and a call centre style user interface - on the lines of "press 3 for this kind of analysis" - has put me off using it so far.)
So over the past few years I have sometimes worried about the day when PAUP* would suddenly stop working on the newest computers. It is good to know that a new version is coming up!
The idea is that ultimately there will be GUIs for Win and Mac that one has to buy, but that command line versions for Win, Mac and Linux will be free. I guess I will be happy to use command line myself, but it might be a good idea to get a GUI licence for small student projects where the student cannot necessarily be expected to learn the PAUP* commands.
Wednesday, August 13, 2014
Effective altruism: earning to give
Until recently, I was only marginally aware of the Effective Altruism Movement, but after reading a somewhat odd blog post from one of its proponents, Chris Hallquist, I decided to at least look up the Wikipedia article. It summarises the principles of the movement as follows (accessed 13 August 2014):
That logic leads then to considerations such as this one where Hallquist weighs the hypothetical benefits of working in a high-paying job at Google against contributing to a technological start-up company and frames it as a question of charity.
This is where it occurred to me that the EA movement might actually not be that new a concept. Basically, it is like any rich comfortable people throughout history trying to soothe their conscience, with the only difference that EAs start working on that before they even got rich.
The question is, of course, whether trying to get rich does not contribute to precisely the things that they will afterwards have to put right with their donations. Of course they will not necessarily do anything as crass as investing in a company that is very profitable because it exploits its workers to the point where they commit suicide and then donating money to the widows and orphans, or working for a company that poisons a lake and then donating to the clean-up efforts. But even if their efforts at generating the greatest possible profit for charity are not as directly destructive as that, there is possibly some truth to the following poem from Bertold Brecht:
After all, if the amount of money in an economy is kept constant, then for somebody to become richer somebody else must become poorer. If the amount is increased by money printing, as it must be if the economy is supposed to grow without experiencing destructive price deflation, then relative wealth still works as a zero sum game. For everybody who manages to become part of the 1% top earners somebody else has to drop out of that percentile.
I guess an Effective Altruist could tell themselves that if they get rich, and the person who gets less wealthy was somebody who did not give to charity, then the net effect is positive. But as this post will have shown, so far the logic of the movement does not entirely convince me. Its principles seem to be an odd combination of no-brainers, noble but misguided ethics, and thin justifications for careerism and profiteering.
All this, obviously, assuming that Wikipedia does a good job of summarising these principles. That is not a given, but one would hope that EAs have contributed to the article.
- Cost-effectiveness: Effective Altruists (EAs) aim to make donations where they will work the greatest good per unit of currency spent. Although I found Wikipedia's remark that "many effective altruists have backgrounds in philosophy, economics, or mathematics, fields that involve rational and quantitative thinking" rather puzzling because I don't think that economists are necessarily rational (or good at empiricism for that matter), working out where the greatest difference can be made is surely something that everybody should be able to get behind.
- Cause prioritisation: This is similar to the first point, in that EAs think hard about what charitable causes are the worthiest. Again, in principle this should make sense to anybody, but the obvious problem is that opinions about what should be prioritised differ. Wikipedia tells us, "most effective altruists think that the most important causes to focus on are currently poverty in the developing world, the suffering of animals on factory farms, and humanity's long term future." In my eyes, the second one should be a couple dozen levels of priority behind the first one, and if it is indeed seen as a serious problem then it could quite simply be addressed by making a law that forbids factory farms. The third cause needs clarification: what is meant here? For example, Hallquist appears to be one of those who believe that the greatest risk humanity faces is that we develop an artificial superintelligence that will kill us all; I, on the other hand, consider the people who collect donations to "work on that problem" to be charlatans, and the money donated to them to be wasted. If, on the other hand, the money were put towards solving a real problem such as how to develop cheaper solar cells, then we'd be talking... But maybe that's just me.
- Impartiality: all human lives have equal value, no matter how distant they are from us. That is a noble sentiment and should, of course, be a fundamental rule of every civilised society. But applied as an ethical guideline to individuals, as it would have to be in a decentralised charitable movement, it is unrealistic to the degree of being inhumane. Nobody can expect me to care as much about somebody I will never meet as about my own daughter, nor would I expect, say, a Norwegian teacher to care as much about me as she cares about her own sister. An ethical system that is utterly incompatible with human nature is a dubious proposition.
- Donating to charity is morally required as opposed to merely laudable. Hm. I would rather prefer to construct society in such a way that everybody is cared for by the state, and thus not in need of charity in the first place. You can tell people that charity is a duty all you want but if times get tough they will still look after their own, and then a system that relies on charity will fail to help the weak.
- Counterfactual reasoning: This is a bit of an odd name for what is going on here, and more importantly this is the kind of argument that got me hooked in the first place, so more below.
That logic leads then to considerations such as this one where Hallquist weighs the hypothetical benefits of working in a high-paying job at Google against contributing to a technological start-up company and frames it as a question of charity.
This is where it occurred to me that the EA movement might actually not be that new a concept. Basically, it is like any rich comfortable people throughout history trying to soothe their conscience, with the only difference that EAs start working on that before they even got rich.
The question is, of course, whether trying to get rich does not contribute to precisely the things that they will afterwards have to put right with their donations. Of course they will not necessarily do anything as crass as investing in a company that is very profitable because it exploits its workers to the point where they commit suicide and then donating money to the widows and orphans, or working for a company that poisons a lake and then donating to the clean-up efforts. But even if their efforts at generating the greatest possible profit for charity are not as directly destructive as that, there is possibly some truth to the following poem from Bertold Brecht:
Reicher Mann und armer Mann(Rich man and poor man stood and looked at each other. And the poor one said, if I wasn't poor you wouldn't be rich.)
Standen da und sahn sich an.
Und der Arme sagte bleich:
Wär ich nicht arm, wärst Du nicht reich.
After all, if the amount of money in an economy is kept constant, then for somebody to become richer somebody else must become poorer. If the amount is increased by money printing, as it must be if the economy is supposed to grow without experiencing destructive price deflation, then relative wealth still works as a zero sum game. For everybody who manages to become part of the 1% top earners somebody else has to drop out of that percentile.
I guess an Effective Altruist could tell themselves that if they get rich, and the person who gets less wealthy was somebody who did not give to charity, then the net effect is positive. But as this post will have shown, so far the logic of the movement does not entirely convince me. Its principles seem to be an odd combination of no-brainers, noble but misguided ethics, and thin justifications for careerism and profiteering.
All this, obviously, assuming that Wikipedia does a good job of summarising these principles. That is not a given, but one would hope that EAs have contributed to the article.
Monday, August 11, 2014
Bad systematics
On the recent field trip, I saw the weird pesticide below in the station:
As most readers will know, this is wrong in two distinct ways: First, flies are insects, so this is a bottle of insect and insect killer. Second, spiders are NOT insects. So why not rename the pesticide into "spider & insect killer"?
The little insect icons are pretty weird too. Note how all of them are merely sitting there - except the flea, which is depicted as dead with cracked eggs above it. I imagine the latter are supposed to indicate that the eggs are also killed, not only the imagines, and then the designers probably thought that drawing a sitting flea in the same box would somehow imply that the imagines are not killed.
Anyway, this "fly & insect" business reminds me of an anecdote from the time of my postgraduate studies. The head of our department, a liverwort specialist, wanted to hire a student to work on a database. From what I was told his approach to job interviews was as follows: He showed each candidate the database website which, at that moment, he deliberately given the title "bryophytes and liverworts". Apparently most candidates said merely that it looked nice. He hired the only one who immediately remarked that the title didn't make sense because one of those is a subgroup of the other.
Also on the field trip, one student collected a Lomandra. When I helped her with the identification of the species, I quickly despaired of the key we were using. Don't want to mention names here, but the questions were often unhelpful; the worst of them was probably this one:
3 Male inflorescence usually unbranched; female inflorescence unbranched or rarely branched.
3* Male inflorescence branched; female inflorescence branched or unbranched.
So basically, unless you have an unbranched male inflorescence on your specimen (and she didn't) the couplet is useless; if it is branched, you have to try both ways, and the text about the female inflorescences is entirely uninformative. When the taxonomist wrote that, didn't they realise what they were doing?
As most readers will know, this is wrong in two distinct ways: First, flies are insects, so this is a bottle of insect and insect killer. Second, spiders are NOT insects. So why not rename the pesticide into "spider & insect killer"?
The little insect icons are pretty weird too. Note how all of them are merely sitting there - except the flea, which is depicted as dead with cracked eggs above it. I imagine the latter are supposed to indicate that the eggs are also killed, not only the imagines, and then the designers probably thought that drawing a sitting flea in the same box would somehow imply that the imagines are not killed.
Anyway, this "fly & insect" business reminds me of an anecdote from the time of my postgraduate studies. The head of our department, a liverwort specialist, wanted to hire a student to work on a database. From what I was told his approach to job interviews was as follows: He showed each candidate the database website which, at that moment, he deliberately given the title "bryophytes and liverworts". Apparently most candidates said merely that it looked nice. He hired the only one who immediately remarked that the title didn't make sense because one of those is a subgroup of the other.
Also on the field trip, one student collected a Lomandra. When I helped her with the identification of the species, I quickly despaired of the key we were using. Don't want to mention names here, but the questions were often unhelpful; the worst of them was probably this one:
3 Male inflorescence usually unbranched; female inflorescence unbranched or rarely branched.
3* Male inflorescence branched; female inflorescence branched or unbranched.
So basically, unless you have an unbranched male inflorescence on your specimen (and she didn't) the couplet is useless; if it is branched, you have to try both ways, and the text about the female inflorescences is entirely uninformative. When the taxonomist wrote that, didn't they realise what they were doing?
Sunday, August 10, 2014
Jervis Bay
Over the weekend I was on a field trip to Jervis Bay with students of the Australian National University. Jervis Bay is on the eastern coast of the continent, in New South Wales, although through an accident of history the southern part of the land around the bay is part of the Australian Capital Territory. When Canberra was founded, it was apparently thought that every decent capital city needed a harbour (even if the city itself was far inland), and so this area was assigned to the ACT. The harbour never happened except for a marine base of the Australian defence forces, and so most of it is now a national park.
The purpose of the trip was to learn about the rich local flora, and to give the students the opportunity to collect specimens for their herbarium project. Each of them has to hand in six mounted specimens so they know how to collect and prepare plant specimens for research.
There are several distinct types of vegetation in the area depending on the shallowness of the soil and how well it is drained. The above picture shows the wet sclerophyll forest in which we started our walk on Saturday. It is dominated by Eucalyptus with a rich Proteaceae and Ericaceae understorey.
This picture taken on the coastal heathland shows Jervis Bay to the left and the Pacific Ocean to the right. The little island left of the centre is a breeding ground for penguins but access is obviously restricted so that the birds are not disturbed.
And this shows the beach at Green Patch, with two students braving the cold waters during a well-deserved lunch break. Of seventeen students on the trip they were the only ones who jumped into the ocean. Although certainly not as chilly as in Canberra, it is still winter after all.
On the way back we stopped at this alleged waterfall which, however, does not really have any significant amount of water at this particular moment. It seems as if there has been little rain in the area recently. Still, the view was well worth a stop.
Note to self: Next time, perhaps don't mention that the green drupes of Persoonia (Proteaceae) are edible and were consumed by Aboriginals. Because after hearing it, a few adventurous students went and picked the superficially similar green fruits of a completely different plant, above Leptomeria acida (Santalaceae) instead, giving me a bit of a shock because I did not immediately know if they were also edible or, for example, poisonous to humans.
Fortuitously it turns out that they are not only edible but actually very healthy, long known to constitute a good source of vitamin C (we checked in a book on bush tucker by wife bought some time ago). Still, perhaps the students should have looked a bit closer and realised that in contrast to Persoonia this plant does not have conspicuous leaves...
The purpose of the trip was to learn about the rich local flora, and to give the students the opportunity to collect specimens for their herbarium project. Each of them has to hand in six mounted specimens so they know how to collect and prepare plant specimens for research.
There are several distinct types of vegetation in the area depending on the shallowness of the soil and how well it is drained. The above picture shows the wet sclerophyll forest in which we started our walk on Saturday. It is dominated by Eucalyptus with a rich Proteaceae and Ericaceae understorey.
This picture taken on the coastal heathland shows Jervis Bay to the left and the Pacific Ocean to the right. The little island left of the centre is a breeding ground for penguins but access is obviously restricted so that the birds are not disturbed.
And this shows the beach at Green Patch, with two students braving the cold waters during a well-deserved lunch break. Of seventeen students on the trip they were the only ones who jumped into the ocean. Although certainly not as chilly as in Canberra, it is still winter after all.
On the way back we stopped at this alleged waterfall which, however, does not really have any significant amount of water at this particular moment. It seems as if there has been little rain in the area recently. Still, the view was well worth a stop.
Note to self: Next time, perhaps don't mention that the green drupes of Persoonia (Proteaceae) are edible and were consumed by Aboriginals. Because after hearing it, a few adventurous students went and picked the superficially similar green fruits of a completely different plant, above Leptomeria acida (Santalaceae) instead, giving me a bit of a shock because I did not immediately know if they were also edible or, for example, poisonous to humans.
Fortuitously it turns out that they are not only edible but actually very healthy, long known to constitute a good source of vitamin C (we checked in a book on bush tucker by wife bought some time ago). Still, perhaps the students should have looked a bit closer and realised that in contrast to Persoonia this plant does not have conspicuous leaves...
Thursday, August 7, 2014
Classification by internet poll?
Today two colleagues independently drew my attention to the fact that the Angiosperm Phylogeny Group (APG) is conducting an internet survey on the classification of various groups of vascular plants. (Thanks, Bort and Jim!)
It is really a somewhat peculiar idea that scientists would gather input on how to decide a scientific question by conducting a poll. Surely science would not get very far if the age of the earth or the efficacy of homoeopathy were decided by public vote as opposed to based on the evidence.
However, viewing it like that is somewhat missing the point. This is, after all, the Angiosperm Phylogeny Group, and they make clear right at the beginning that the recognition of natural groups in classification is non-negotiable. It can be safely assumed that they put the question whether stability or monophyly should be prioritised in there to see where the individual poll participant is coming from, not because they will say, hey, perhaps we should accept non-monophyletic groups after all now that 52% have voted for that. (In actual fact, at the moment 75% of the participants prioritise a natural classification over a stable one, as they should if they are scientists.)
No, what this survey is really about is what to give second priority, so to say. Apart from describing biological diversity correctly, a classification of vascular plants can aim to fulfil several other criteria, and often one will have to be traded off against the others:
But that is also one of two possible sources of frustration with the APG's poll: the family rank is taken entirely too seriously by many of us botanists. Instead of obsessing about where to draw the line for this category perhaps we should acknowledge that there are simply clades inside of clades. I have the feeling that most zoologists are way ahead of us in this regard (although admittedly entomologists tend to think in terms of insect orders).
The second source of frustration is the one mentioned by Bort in his comment. The APG solicits opinions on questions without providing sufficient information to the participants of their poll. How am I supposed to know the clade support and character distribution in, say, Dioscoreaceae, Tecophiliaceae or Restionaceae? Ultimately, unless more data are provided the only people qualified to have an opinion in those cases are a handful of phylogeneticists working on those groups which, as Bort pointed out, kind of defeats the purpose of the poll.
In some cases, however, the description of the problem itself makes clear that there is currently insufficient support in the phylogeny to decide how to circumscribe natural groups. I find it strange that such cases even need to be discussed; it seems obvious that one should not make any taxonomic changes until more data are available.
It is really a somewhat peculiar idea that scientists would gather input on how to decide a scientific question by conducting a poll. Surely science would not get very far if the age of the earth or the efficacy of homoeopathy were decided by public vote as opposed to based on the evidence.
However, viewing it like that is somewhat missing the point. This is, after all, the Angiosperm Phylogeny Group, and they make clear right at the beginning that the recognition of natural groups in classification is non-negotiable. It can be safely assumed that they put the question whether stability or monophyly should be prioritised in there to see where the individual poll participant is coming from, not because they will say, hey, perhaps we should accept non-monophyletic groups after all now that 52% have voted for that. (In actual fact, at the moment 75% of the participants prioritise a natural classification over a stable one, as they should if they are scientists.)
No, what this survey is really about is what to give second priority, so to say. Apart from describing biological diversity correctly, a classification of vascular plants can aim to fulfil several other criteria, and often one will have to be traded off against the others:
- Stability, already mentioned above, means that one tries to minimise disruptive changes in the classification even as our knowledge advances. When faced with the decision whether to reclassify many species or few species, one would prefer the second option.
- Recognisability of the taxa. In the present case, there is clearly a concern that plant families be defined by some clear, preferably exclusive characters that make it easier for students to learn about the groups and for end-users of the classification to understand it. Uniting the Orobanchaceae with the Lamiaceae, for example, would pretty much make it impossible to explain how to recognise the family.
- Conversely, one would like to avoid making many small groups that don't really differ from each other by any obvious character.
But that is also one of two possible sources of frustration with the APG's poll: the family rank is taken entirely too seriously by many of us botanists. Instead of obsessing about where to draw the line for this category perhaps we should acknowledge that there are simply clades inside of clades. I have the feeling that most zoologists are way ahead of us in this regard (although admittedly entomologists tend to think in terms of insect orders).
The second source of frustration is the one mentioned by Bort in his comment. The APG solicits opinions on questions without providing sufficient information to the participants of their poll. How am I supposed to know the clade support and character distribution in, say, Dioscoreaceae, Tecophiliaceae or Restionaceae? Ultimately, unless more data are provided the only people qualified to have an opinion in those cases are a handful of phylogeneticists working on those groups which, as Bort pointed out, kind of defeats the purpose of the poll.
In some cases, however, the description of the problem itself makes clear that there is currently insufficient support in the phylogeny to decide how to circumscribe natural groups. I find it strange that such cases even need to be discussed; it seems obvious that one should not make any taxonomic changes until more data are available.
Tuesday, August 5, 2014
Botany picture #168: Aphyllanthes monspeliensis
Aphyllanthes monspeliensis (Asparagaceae), France, 2014. One of the plants that are very characteristic of the area in south-western France where my in-laws are living. Apparently the species is phylogenetically very isolated in its plant family.
Monday, August 4, 2014
Inflorescences
Today I lectured on the shoot, and one of the topics was inflorescence structures. Admittedly there is a seemingly large number of terms, but I am often puzzled why even so many professional plant taxonomists who otherwise know the most obscure details of plant morphology get so confused about inflorescences, because really the definitions there are much more straightforward than, say, in fruit morphology.
So here a short overview, in the form of an informal key:
Inflorescence is simple, that is unbranched
Flowers stalked, inflorescence long
... raceme
Flowers stalked, inflorescence compressed
... umbel
Flowers sessile, inflorescence long
... spike
Flowers sessile, inflorescence compressed
... head/capitulum
Inflorescence is branched
Main axis dominant, many side branches
along its length ... panicle
Panicle overall cone-shaped
... still a panicle
Flowers +/- on same level
... corymb(ose panicle)
Lower branches overtopping upper
... anthela
Each axis has only one node with bracts,
then ends in flowers ... cyme
One side axis per axis ... monochasium
Two side axes per axis ... dichasium
Higher-order inflorescences, that is
inflorescences consisting of inflorescences
Raceme or spike of cymes ... thyrse
Higher order inflorescence of heads
... capitulescence
Head of heads ... glomerule
Umbel of umbels ... compound/double umbel
There are perhaps a few more that I have missed, but those should be the most important terms. The take home messages here are as follows: First, plants really don't have that many options. They can have unbranched or branched inflorescences, and if they are branched they can be monopodial (dominant main axis) or sympodial. And then they can stack one type onto another, but that's about it.
Second, because there are so few possibilities, several of the inflorescence types are variants of the same fundamental theme. Racemes, umbels, spikes and heads are all very similar and easily transformed into one another, which is also why the same plant family may easily exhibit several of them. In the pea family Fabaceae, for example, there are many groups with racemes (Vicia, Lathyrus) but also with umbel- or head-like inflorescences (clover genus Trifolium). The latter are just the former with shorter axes.
Third, to communicate systematically relevant morphological information it is important to use the term that actually describes the structure as opposed to the overall impression. Just searching for those links I added above I found a depressing number of misconceptions. I'd say that upwards of 80% of the hits I found for panicle and corymb were wrong, often on the level of diagram does not fit the actual structure found in the plant that the diagram is supposed to depict, such as when somebody used Achillea as an example for a corymb (correct) and then drew... an umbel (ouch; I mean, just look at Achillea and you'll see that its capitulescence is richly and irregularly branched).
So here a short overview, in the form of an informal key:
Inflorescence is simple, that is unbranched
Flowers stalked, inflorescence long
... raceme
Flowers stalked, inflorescence compressed
... umbel
Flowers sessile, inflorescence long
... spike
Flowers sessile, inflorescence compressed
... head/capitulum
Inflorescence is branched
Main axis dominant, many side branches
along its length ... panicle
Panicle overall cone-shaped
... still a panicle
Flowers +/- on same level
... corymb(ose panicle)
Lower branches overtopping upper
... anthela
Each axis has only one node with bracts,
then ends in flowers ... cyme
One side axis per axis ... monochasium
Two side axes per axis ... dichasium
Higher-order inflorescences, that is
inflorescences consisting of inflorescences
Raceme or spike of cymes ... thyrse
Higher order inflorescence of heads
... capitulescence
Head of heads ... glomerule
Umbel of umbels ... compound/double umbel
There are perhaps a few more that I have missed, but those should be the most important terms. The take home messages here are as follows: First, plants really don't have that many options. They can have unbranched or branched inflorescences, and if they are branched they can be monopodial (dominant main axis) or sympodial. And then they can stack one type onto another, but that's about it.
Second, because there are so few possibilities, several of the inflorescence types are variants of the same fundamental theme. Racemes, umbels, spikes and heads are all very similar and easily transformed into one another, which is also why the same plant family may easily exhibit several of them. In the pea family Fabaceae, for example, there are many groups with racemes (Vicia, Lathyrus) but also with umbel- or head-like inflorescences (clover genus Trifolium). The latter are just the former with shorter axes.
Third, to communicate systematically relevant morphological information it is important to use the term that actually describes the structure as opposed to the overall impression. Just searching for those links I added above I found a depressing number of misconceptions. I'd say that upwards of 80% of the hits I found for panicle and corymb were wrong, often on the level of diagram does not fit the actual structure found in the plant that the diagram is supposed to depict, such as when somebody used Achillea as an example for a corymb (correct) and then drew... an umbel (ouch; I mean, just look at Achillea and you'll see that its capitulescence is richly and irregularly branched).
Friday, August 1, 2014
Trying to use fastStructure
Updated 16 August 2014 and 27 Nov 2014
A well established software tool in ecology and systematics is Structure. Using population level molecular data such as Amplified Fragment Polymorphism (AFLP), microsats or Single Nucleotide Polymorphisms (SNPs), it tries to find the underlying population structure. In practice, you run the program with a range of values for number of populations (K) and then compare the resulting likelihood values for each.
Apart from the best K value, that is the number of populations or clusters that your samples are best divided into, Structure also outputs how much of the genome of each of your samples is derived from each population. This is then often depicted in a graph such as the one seen here. Each line in that figure represents the results of one Structure run, from K = 2 to K = 6. Each colour is one of the clusters or populations. Each pixel column is one sample, with its colour showing to what population it belongs. So you may get many samples that belong fully or nearly fully to one population but also some that are 'admixed' between two, perhaps representing hybrids.
This methodology is used to examine population structure below the species level but also, and this is what is more interesting to a systematist like myself, to study the delimitation of species. I have such a project going with a colleague from a different herbarium, and we just got our data. They are more than 9,000 SNPs for 91 samples, and unfortunately Structure is rather slow especially when analysing such large amounts of data.
So you can imagine I was very happy to find that a few months ago the same lab released a program called fastStructure specifically to deal with large numbers of SNPs and promised to be one or two orders of magnitude faster than Structure. In fact it is so new at this point that the paper announcing it has only been cited twice - once in the editorial of the same journal (which doesn't really count) and once in a minor review article. In a few months papers will start coming out by people who have actually used it, but at the moment there is little practical experience to build on except the comments and questions of people on the Structure Google Group.
After our high performance computing staff kindly installed the program on a supercomputer, I spent most of today trying fastStructure out. I learned a lot but so far the results have been mixed. I write this partly to spare other people some of the frustrations I experienced.
A well established software tool in ecology and systematics is Structure. Using population level molecular data such as Amplified Fragment Polymorphism (AFLP), microsats or Single Nucleotide Polymorphisms (SNPs), it tries to find the underlying population structure. In practice, you run the program with a range of values for number of populations (K) and then compare the resulting likelihood values for each.
Apart from the best K value, that is the number of populations or clusters that your samples are best divided into, Structure also outputs how much of the genome of each of your samples is derived from each population. This is then often depicted in a graph such as the one seen here. Each line in that figure represents the results of one Structure run, from K = 2 to K = 6. Each colour is one of the clusters or populations. Each pixel column is one sample, with its colour showing to what population it belongs. So you may get many samples that belong fully or nearly fully to one population but also some that are 'admixed' between two, perhaps representing hybrids.
This methodology is used to examine population structure below the species level but also, and this is what is more interesting to a systematist like myself, to study the delimitation of species. I have such a project going with a colleague from a different herbarium, and we just got our data. They are more than 9,000 SNPs for 91 samples, and unfortunately Structure is rather slow especially when analysing such large amounts of data.
So you can imagine I was very happy to find that a few months ago the same lab released a program called fastStructure specifically to deal with large numbers of SNPs and promised to be one or two orders of magnitude faster than Structure. In fact it is so new at this point that the paper announcing it has only been cited twice - once in the editorial of the same journal (which doesn't really count) and once in a minor review article. In a few months papers will start coming out by people who have actually used it, but at the moment there is little practical experience to build on except the comments and questions of people on the Structure Google Group.
After our high performance computing staff kindly installed the program on a supercomputer, I spent most of today trying fastStructure out. I learned a lot but so far the results have been mixed. I write this partly to spare other people some of the frustrations I experienced.
Subscribe to:
Posts (Atom)