PhyloBotanist

Did Edward Gibbon blame Christianity for the decline of the Roman Empire?

2020-06-20T11:22:00.001+10:00

I have now finally read Edward Gibbon's classic History of the Decline and Fall of the Roman Empire. Although certainly opinionated in a way that one might not consider up to the standards of historical research today, it is considered to be a ground-breaking work for its time (1776-1788) in the way that it used primary sources to tell its story.

But the most important reason I became interested in it is the pejorative way in which Gibbon is referred to in critiques of New Atheism. There is a general impression that Gibbon laid the blame for the collapse of the Roman Empire and the loss of its technology and learning at the feet of Christianity. In some circles, "Gibbonian fantasy" or "Gibbonian fiction" seems to be a short-hand for the belief that Christianity is singularly responsible for retarding scientific progress and causing the Dark Ages.

Having now read the book I have absolutely no idea where this is coming from. Maybe this idea is more clearly developed in some other work by the same author, but in the Decline and Fall I search for it in vain.

Don't get me wrong - it is clear that Gibbon had no love for Christianity. He argued in at least two sections of his book that early Christianity was more intolerant than the paganism that preceded it, and that Christianity wasted resources on piety that could have been better used for other, more practical purposes. (Nobody can seriously doubt the first claim, but the second seems a bit silly. Wasting resources on piety is not a Christian characteristic, it is a religious one; every pagan priest offering sacrifices to the gods and every Vestal Virgin performing a pointless ceremony could have more been more productively employed as an engineer, scholar, teacher, navigator, trader, or a variety of other professions.)

Due to his visible aversion to religion it is unsurprising that Christian apologists don't like Gibbon and want to cast aspersions on his work. But that does not mean that he actually made the argument that Christianity brought down the Roman empire and destroyed ancient learning. I really don't see where he did.

Even the Wikipedia page on the book as of today 20 June 2020 quotes a section that makes clear that despite Gibbon's dislike of Christianity he did not see it as a decisive factor: "if superstition had not afforded a decent retreat, the same vices would have tempted the unworthy Romans to desert, from baser motives, the standard of the republic" - in other words, the same decline would have taken place without Christianisation.

To any open-minded reader of his book it should become clear that his main culprit is an institutional process that can perhaps be usefully summarised as follows:

(1) As the republic expanded, the military was professionalised to increase its efficiency and flexibility of operation. What used to be a citizen army made up of free men expected to serve to defend the republic turned into an army of professional soldiers rewarded with property after a period of service.

(2) One of the consequences was that the average Roman citizen did not need, and ultimately did not want, to risk his life fighting to defend the republic. As the republic became an empire and grew even further, more and more of the armed forces were recruited from non-citizens, increasingly barbarian mercenaries or foederati, foreign tribes bribed to defend the empire from other, closely related tribes. It should be immediately obvious that these kinds of soldiers have considerably less loyalty to Rome than Roman citizen soldiers, and that mercenaries are only useful to you as long as you can guarantee their pay and nobody out-bids you. That alone could have been the empire's death knell, but...

(3) In addition, the empire had a severe institutional weakness in that there was never a clear rule of succession. There were phases where the next emperor was the previous emperor's son, whatever his ability, and others where the previous emperor would adopt a successor to ensure that the empire would be left in competent hands. But what if the emperor was a tyrant and got assassinated, with either no successor in place or his plan for succession as discredited as he was? Although technically an emperor needed to be recognised by the senate, imperator was a military title, and at any rate having control of a lot of swords is more of a, shall we say, practical argument than being endorsed by a bunch of elderly guys in togas. In practice the senate did not want those swords to be turned against themselves. It thus happened more and more that the next emperor was selected by the army and merely ratified by the senate. This again had two important consequences.

(4) First, the way for an ambitious officer to be elected emperor by the army was obviously to promise his fellow mercenaries a lot of money. In several parts of his book Gibbon is quite explicit about this tendency: the constant need to bribe the army to either get elected or to tolerate an emperor that the soldiers had not elected themselves drained the tax payers. Gibbon claims it also weakened the military vigour of the soldiers, who were at times living the good life spending their bribes while neglecting their training and insisting they shouldn't have to carry heavy armour.

(5) Second, frequently different parts of the army would elect different officers to be the new emperor. If both of them felt strong enough to give it a serious try, the empire would immediately be plunged into another short civil war. Just to decide whether the guy nominated by the Gaulish legions or the guy nominated by the Syrian legions gets to be the new ruler, they wasted the lives of thousands of soldiers who might more productively have been used to keep barbarian invaders or the Sassanian empire at bay.

So there we have it: the two key problems were the decreasing loyalty and increasing corruption of the armed forces and the institutional weakness of the republic. And both of them were probably entirely unavoidable. You cannot conquer and control an empire with an army made up of free farmers who have to travel back to northern Italy to bring in the harvest just when the enemy attacks in Mesopotamia, so you need a professional army. And even if you have very nice institutional arrangements they won't be of any use against a large army that has no loyalty to those institutions. The only alternative would have been not to have an empire in the first place.

I am not a historian. I do not know if this is accurate in all details. I do not know if this is really why the Roman empire declined, and I understand at least that plagues may have been another factor. The point is: this is Gibbon's argument, not that Christianity caused the decline.

Overpopulation

2020-02-02T20:14:00.000+11:00

Recently the famous primatologist Jane Goodall attracted criticism for saying that environmental issues "wouldn't be a problem" if our numbers were at the levels they were 500 years ago. I assume what she meant is that they would be much more manageable, not absent, as even mere hundreds of millions of people would produce waste, use non-renewing resources, etc., but the point is that whenever somebody brings up population pressures, the immediate reaction in much of the social media that I frequent is an evidence-free rejection of the idea combined with personal attacks on the person who made the statement. The same happened in this case.

I am really quite puzzled by this. While a discussion could be had about whether the planet is overpopulated or not, depending on what resource constraints we assume, Goodall's statement is irrefutable. Of course 500 million people would have less of an environmental footprint than seven billion. One could just as well try to reject the idea that four people will have it easier to fit into a car than twelve.

Consequently I was quite interested to see an article in The Conversation titled "why we should be wary of blaming 'overpopulation' for the climate crisis". It was written by an academic and of some length, so the argument can be expected to have been developed more clearly than in a rage-tweet. After citing Goodall the author, Heather Alberro, begins her argumentation as follows:

This might seem fairly innocuous, but its an argument that has grim implications and is based on a misreading of the underlying causes of the current crises. As these escalate, people must be prepared to challenge and reject the overpopulation argument.

So we are promised here two different arguments:

First, nobody should say that the planet is overpopulated because it has "grim implications". This is not about whether a statement is wrong, it is about whether somebody else could misunderstand or deliberately exploit the statement to justify something terrible.

I am always uncomfortable with this stance, because the logical end-point is that nobody should be allowed to state a demonstrable fact if there is an interest group that will use this fact to promote a harmful agenda - and there will always be such a group if a topic is controversial at all. Or in other words, the logical end-point of this stance is to restrict your ability to use evidence in decision making towards your own, hopefully benevolent agenda, which means that your decisions will be ill-informed and less likely to solve the problem you are dealing with.

Therefore any argument along those lines must, in my eyes, meet a fairly high standard. It cannot simply be, "you aren't allowed to state this fact because somebody somewhere could do something bad". It would have to provide a clear, convincing causal chain from stating the fact to the probable occurrence of a harm that would be significantly less likely to happen if the fact were kept confidential. A causal chain like:

Somebody says the world is overpopulated -> ???? -> genocide

What needs to be in the middle of this chain to make genocide plausible?

Second, Alberro also argues that the world is not in fact overpopulated, that this would be a "misreading" of the situation. This claim is what I find particularly interesting, because as mentioned in the beginning the relevant conversation on social media can be comfortably summarised as "everybody who says there is overpopulation wants to commit genocide against the developing world", in other words the previous argument plus character assassination. I have yet to see anybody addressing the question whether the world is, actually, overpopulated; it simply gets ignored.

I will therefore start with the examination of this factual argument, which also makes up the majority of the article.

In reality, the global human population is not increasing exponentially, but is in fact slowing and predicted to stabilise at around 11 billion by 2100.

I do not quite understand what it means that "population is slowing". I assume what is meant is that population growth is slowing. That is great... but. First, it is like wanting praise for promising to only stuff eleven people into your five seater car, instead of infinity people. Yes, okay, but you already have seven people in there right now, and even that is not safe. So will you please stop putting the eighth in there, like, right now please, instead of ignoring the problem? Second, slowing growth is clearly irrelevant to Goodall's statement, which implied that we are already too many.

More importantly, focusing on human numbers obscures the true driver of many of our ecological woes. That is, the waste and inequality generated by modern capitalism and its focus on endless growth and profit accumulation.

This is the key argument of this article, which is subsequently detailed in various ways: inequality is the problem, not number of people. Now in what way would ending inequality solve ecological woes? First we need to consider how it would be ended, as there are at least two ways to do so.

First, raise the living standards of the poor. This would be the humane approach, but I do not see how it helps with carbon levels in the atmosphere, waste, soil erosion, groundwater overuse, whatever. It would only make things worse. Second, reduce the living standards of everybody to that of the poorest people on the planet. That would help reduce emissions etc., but spelling it out like this should reveal the obvious problem: nobody is going to accept this immiseration, neither the wealthiest nor the poorest, who desperately want a better life too.

Now one could reply to this that we could all live sustainably in equality if we just made our economy carbon-neutral. That is correct, but the whole controversy is about whether we turn the population-growth dial or the economic equality dial. (Why we should only do one of those is left unexplained.) Goodall said that we would have less of a problem if there were less people using fossil fuels, and Alberro (somehow) tries to argue that this specific statement is false, that we should not turn that dial but only the other. The third dial, carbon-neutrality, is orthogonal to this discussion.

The industrial revolution that first married economic growth with burning fossil fuels occurred in 18th-century Britain. The explosion of economic activity that marked the post-war period known as the "Great Acceleration" caused emissions to soar, and it largely took place in the Global North. That's why richer countries such as the US and UK, which industrialised earlier, bear a bigger burden of responsibility for historical emissions.

That is true and an important point but also completely irrelevant for the question of whether the planet is overpopulated. "Yes, I am trying to stuff eleven people into my five-seater, but that is totally safe because last week that other guy was speeding."

In 2018 the planet's top emitters - North America and China - accounted for nearly half of global CO2 emissions. In fact, the comparatively high rates of consumption in these regions generate so much more CO2 than their counterparts in low-income countries that an additional three to four billion people in the latter would hardly make a dent on global emissions.

Well, clearly adding more people will not make a dent in emissions, it would increase them, but I get what is meant here: adding more poor people to the planet will have less of an impact than the rich increasing their consumption even further. I find it ironic, however, that China is mentioned as one of the two worst offenders, because if we should be worried about how racists spin concerns about overpopulation then we should also be worried about somebody concluding "see, it is China's fault, so we Europeans don't need to do anything".

Consumption levels in China have risen, of course, but they are still much lower per person than in the USA. A key reason why China rivals North America in emissions is that it accounts for about one fifth of the world population all by itself. And we are back at Goodall's point, which applies across all consumption levels.

There's also the disproportionate impact of corporations to consider. It is suggested that just 20 fossil fuel companies have contributed to one-third of all modern CO2 emissions, despite industry executives knowing about the science of climate change as early as 1977.

This is another argument that frequently comes up on social media, very often citing the number twenty, so that meme must all go back to one statistic somewhere. Now I will agree immediately that more powerful people bear more moral responsibility, because a CEO has more influence than a single supermarket customer or a single assembly line worker.

But still, what rarely seems to be considered is that these corporations produce the emissions to offer products and services that we seven billion humans buy and consume. It is not the case that these industry executives run polluting factories and refineries at a loss and for the giggles because they like pollution, like some cartoon villain. It is not the case that they are doing their thing over there, and we normal people are doing our entirely unrelated thing here. They run their factories because we buy stuff, and they run them unsustainably because many of us insist on buying stuff as cheaply as possible. Conversely, that also means that if there were only one billion of us these corporations would produce only one seventh of the emissions that they are producing now.

Inequalities in power, wealth and access to resources - not mere numbers - are key drivers of environmental degradation. The consumption of the world's wealthiest 10% produces up to 50% of the planet's consumption-based CO2 emissions, while the poorest half of humanity contributes only 10%.

I have already mentioned how likely the poorer half are to accept that they should never see the wealth of the wealthier half themselves (i.e. not very likely), but let's assume that we reduce everybody's standards of living to those of the poorer half: even those who are currently billionaires will live in crowded little, non-AC'd apartments with only a bicycle and the bus as transport options. I am a frugal person myself, so I could happily live with that. But what exactly does that for us, carbon-wise? According to the numbers cited above, it would logically reduce "consumption-based" emissions to a fifth of what they currently are.

Now it is a bit unclear, at least to me, what exactly is meant with "consumption-based". Googling for some guidance I find that, according to the EPA, US greenhouse gas emissions in 2017 were distributed as follows: transportation 29%, electricity 28%, industry 22%, commercial & residential 12%, agriculture 9%. What is consumption? I assume all of agriculture and at least part of industry, so at a minimum perhaps 20%. The rest is less easy - is electricity consumption in Alberro's sense? Maybe yes, maybe no, but transport probably not.

So we are talking about reducing to a fifth what may now be somewhere between 20% and 50% of the total emissions. That means tackling inequality by impoverishing everybody would reduce our overall emissions by 16-40%, or to 60-84% of what they currently are. That is hardly going to stave ecological and societal collapse off by many years, much less solve the problem. Reducing inequality by the more desirable approach of lifting everybody out of poverty or an approach in the middle between those extremes would, of course, have the opposite effect.

With a mere 26 billionaires now in possession of more wealth than half the world, this trend is likely to continue.

I do not like inequality either because of how unfair it is and how it distorts democracy, and I certainly do not believe anybody morally deserves to have a billion dollars, but purely in terms of greenhouse gases I doubt that somebody with a hundred thousand times my money contributes a hundred thousand times the emissions that I do. There are only so many private jets or yachts any single billionaire can use at a time and only so much a single person can "consume". Much of their wealth is invested or used for gambling at the stock exchange as opposed to buying a hundred thousand people's worth of food, for example.

Developing regions in Africa, Asia and Latin America often bear the brunt of climate and ecological catastrophes, despite having contributed the least to them.

Again true but again does not contradict Goodall's statement that environmental problems would be less pressing if there were less humans.

The problem is extreme inequality, the excessive consumption of the world's ultra-rich, and a system that prioritises profits over social and ecological well-being. This is where where we should be devoting our attention.

See above. Again, I would prefer a much more equal distribution of wealth. But that does not mean that overpopulation is not an issue. It is simply an undeniable fact that all else being equal, reducing our numbers by half reduces our ecological impact by half. Just as it is a fact that all else being equal, reducing our consumption would also reduce our ecological impact. There is no either-or, and we can devote attention to several problems at the same time.

Coming now to the political or strategic argument:

The idea that there were simply too many people being born - most of them in the developing world where population growth rates had started to take off - filtered into the arguments of radical environmental groups such as Earth First! Certain factions within the group became notorious for remarks about extreme hunger in regions with burgeoning populations such as Africa - which, though regrettable, could confer environmental benefits through a reduction in human numbers.

Here the implied causal chain is: Somebody says the world is overpopulated -> fringe group without any political influence whatsoever thinks that famine is beneficial -> genocide. Not sure I am convinced. Then towards the end of the text:

Issues of ecological and social justice cannot be separated from one another. Blaming human population growth - often in poorer regions - risks fuelling a racist backlash...

Somebody says the world is overpopulated -> racist backlash -> genocide. There might be a few steps missing here. Also, see above regarding the backlash that could result from blaming China for half the problem even from an inequality angle.

...and displaces blame from the powerful industries that continue to pollute the atmosphere.

Somebody says the world is overpopulated -> fossil fuel industries can claim it isn't their fault. I do not see this logic at all, sorry. They still sell the fossil fuels, and the only difference is that they sell them to even more people.

Is it really that difficult to keep three factors in one's head at the same time? Imagine a three-dimensional graph, a cube. In one corner are very few humans, all getting along with very little, and what little they need is produced using renewable energies. In the opposite corner are billions of humans, each of them consuming massive amounts of throw-away goods, and these goods are produced burning the dirtiest coal you can imagine. To change the emissions we produce we can move along all three axes of this graph, along all three edges of the cube.

Not only is entirely unclear to me how the idea that we would have less emissions if we moved down on the population-axis is refuted by arguing that we should move down on the consumption per person axis; it is not even clear whether Alberro even argues for that, because again, inequality could also be resolved by moving up that axis, making our ecological impact worse.

So in summary, I do not see how this article refutes Goodall, and nor could it have, because the relationship between population size and ecological footprint is obvious. It does not appear to make an argument that the planet is not overpopulated either - as always the question is merely deflected. And finally, it does not even provide a plausible causal chain leading from the discussion of this question to "grim implications", leaving the intermediate steps up to the imagination of the reader, who, however, could just as well say, "what is so terrible about empowering people to be able to do family planning?"

Arguments for paraphyletic taxa, part 543,997 or so

2019-10-19T18:31:00.000+11:00

Although having largely moved on from blogging, I found myself writing another post on the most frequent topic of this blog, arguments for the acceptance of paraphyletic taxa and whether they make sense. A paper has recently appeared that describes a new species of flowering plants (Carnicero et al 2019, Bot J Linn Soc: boz052). The first paragraph of its introduction argues for paraphyletic taxa as follows:

From a cladistic perspective, monophyly of taxa is desirable, but important evolutionary processes such as hybridization, anagenetic and anacladogenetic speciation (budding sensu Mayr & Bock, 2002) unavoidably result in non-dichotomous branching patterns (Hörandl, 2006; Hörandl & Stuessy, 2010).

I am afraid I already find this first bit confused in several details. First, from a cladistic perspective, monophyly is not merely desirable but required. That is the entire point of cladism.

Second, non-dichotomous branching patters are polytomies, meaning the branch splits into more than two sub-branches. Polytomies are no problem for making supraspecific taxa monophyletic, so on the face of it, it is not clear what the argument is. But none of the mentioned processes necessarily produce polytomies anyway, and some of them do not even produce any branching at all.

Hybridisation - presumably the authors mean hybridogenic speciation, e.g. by allopolyploidy, and not actually hybridisation per se, which is usually a dead end - is not branching, it is the opposite. The problem for the argument here is that reticulation does not just mean there is no monophyly, it also means there no is paraphyly either, as there is no phyletic (tree-like) structure. It makes no sense to argue for paraphyly in a situation where there is no paraphyly. (More on that below.)

'Budding' speciation is dichotomous, just like any other lineage split, unless an ancestral species fractures into three or more descendant species at the exact same moment, just like could happen with a non-'budding' lineage split. It is no problem whatsoever for making supraspecific taxa monophyletic.

Third, anagenetic means that something happens along a lineage without a lineage split, so it is again odd to speak of a "non-dichotomous" branching pattern. If anagenesis is happening there is by definition no branching pattern, dichotomous or otherwise. Nor is there any problem for making supraspecific taxa monophyletic. So yes, the observation that there is no dichotomy is correct, but merely in the same trivial sense as the observation that a book isn't a car. You can go around saying that, but book authors or publishers will simply say, "we know, so what?" Cladists likewise when told that anagenesis happens.

Anacladogenesis is a case of peripatric speciation, in which a population or a group of populations from a species diverge, resulting in a derivative monophyletic species (Stuessy, Crawford & Marticorena, 1990). Unlike in cladogenetic processes, the ancestral species remains essentially unchanged and often becomes paraphyletic (Mayr & Bock, 2002; Crawford, 2010).

With this the two closely related misconceptions at the heart of the paper's argumentation become clear. The first is that the cladist approach requires making species monophyletic. It doesn't. The second is that it makes sense to call species monophyletic or paraphyletic in the first place. It doesn't. (Although this is a very, very common and widespread misconception.)

As already indicated above, the concepts ending in -phyly apply in tree-like structures, such as the tree of life. The individuals of sexually reproducing species, however, do not form a tree-like but instead a net-like structure. Consequently, -phyly does not apply inside sexually reproducing species. Another attempt at an analogy: I can be asleep, but the molecules I consist of do not sleep. The concept "asleep or awake" does not apply to individual molecules, just as monophyly does not apply to individuals of the same sexually reproducing species. Fallacy of division is the keyword here.

This is not a new idea that cladists came up with only as a rearguard action, as frequently claimed by paraphyletists. We can go back all the way to the inventor of cladism, Willi Hennig. The central and best known figure in his book illustrates the different relationships that species, individuals, and life stages have to each other. Phylogenetic systematics ('cladism') is the approach to take when classifying species into supraspecific taxa, but not when classifying individuals into species. The claim that a species is monophyletic or paraphyletic is a category error.

Over time, the ancestral species may converge to monophyly through gene flow and lineage sorting (Baum & Shaw, 1995).

Same as above, but in addition it has to be unclear what is meant with 'gene flow', as on the face of it such flow would work against lineage sorting. It is possible that the authors meant to say 'restriction of gene flow'.

This sentence also makes clear where the conceptual error is located that leads a surprising number of people to the idea that species can be something or other-phyletic. Lineage sorting happens to alleles, and yes, the alleles of a gene occurring inside a sexually reproducing species can be paraphyletic to the alleles occurring inside a different sexually reproducing species. But taxonomists do not classify alleles into species, they classify individuals into species, so this would be another category error.

Far from an exception, anacladogenetic speciation has been considered to be of main importance in plant evolution (Rieseberg & Brouillet, 1994; Anacker & Strauss, 2014). As integrative taxonomy advocates that taxa should reflect evolutionary processes (Stuessy, 2009; Schlick-Steiner et al., 2010), it may be necessary to recognize certain paraphyletic entities.

The argument that Integrative Taxonomy requires paraphyly was not familiar to me. My understanding has always been that Integrative Taxonomy is about combining diverse kinds of evidence to support taxonomic decisions in species delimitation, e.g. a combination of ecological niche, population genetics, and morphology. The seminal Schlick-Steiner paper, for example, was clearly about alpha taxonomy, i.e. species delimitation. Searching it for the snippet "paraph" brings up only one entry in its reference list. (Stuessy is a different story, as he is one of the two or three most vociferous botanists still arguing for paraphyletic taxa; but then again he is not to my understanding a founding figure of Integrative Taxonomy.)

Again the central problem is, however, not what Schlick-Steiner et al may have thought about paraphyletic taxa, but that Integrative Taxonomy is about species delimitation, where paraphyly applies just as much as decibels apply to colours, and not about supraspecific taxa, where there concept properly applies.

The paragraph ends with something like an argumentum ad populum.

Indeed, examples of recognized paraphyletic taxa exist at various taxonomic levels (e.g. class Reptilia: Mayr & Bock, 2002; Pozoa coriacea Lag.: López et al., 2012; Helichrysum Mill.: Galbany- Casals et al., 2014; Plethodon wehrlei Fowler & Dunn: Kuchta, Brown & Highton, 2018; Columnea strigosa Benth.: Smith, Ooi & Clark, 2018).

The individual species used as examples are irrelevant for the reasons outlined above, because unless they are reproducing clonally, in which case they should have been circumscribed to be monophyletic, they are not paraphyletic but instead tokogenetic (net-like), and cladism does not apply inside tokogenetic structures. That leaves two supraspecific taxa that the taxonomic community has long recognised as ill-circumscribed due to their paraphyly: reptilia and Helichrysum.

One might point out that Mayr, for example, remained opposed to phylogenetic classification even as he saw it being adopted by the scientific community around him, and that recognition of reptilia as a paraphyletic taxon is not state of the art in zoology today. The vast majority of animal systematists today classify animals consistently by relatedness.

But more importantly, there is no way to base the acceptance of paraphyletic reptilia or Helichrysum on the argumentation presented in this paper, which argues entirely from the existence of hybridogenic and 'budding' speciation. This illustrates an extremely common pattern in papers arguing for paraphyletic taxa: an argument is made that applies inside a species (although even that only if we misconstrue the conceptual basis and actual practice of phylogenetic systematics), and then the entirely unwarranted jump is made to the conclusion that paraphyly should be accepted at a much higher level of classification, where the argument would not apply even if it were correct.

Incongruence Length Difference test in TNT

2019-09-26T20:26:00.003+10:00

Because I am fed up with figuring it out anew every time I need to use the Incongruence Length Difference (ILD) test (Farris et al., 1994) in TNT, I will post it once and for all here:

Download TNT and the script "ildtnt.run" from PhyloWiki. In the script, you may have to replace all instances of "numreps" with "num_reps" to make it functional. I at least get the error "numreps is a reserved expression", suggesting that the programmer should not have used that as a variable name.

Open TNT, increase memory, and set data to DNA and treating gaps as missing data. Then load your data matrix, which should of course be in TNT format:

mxram 200 ;
nstates DNA ;
nstates NOGAPS ;
proc (your_alignment_file_name) ;

Look up how many characters your first partition has, then run the test with:

run ildtnt.run (length_of_first_partition) (replicates) ;

There is an alternative script for doing the test called Ild.run, but I have so far failed to set the number of user variables high enough to accommodate my datasets. They seem to be limited to 1,000?

Perhaps this guide will also be useful to somebody besides me.

Reference

Farris JS, Källersjö M, Kluge AG, Bult C, 1994. Testing significance of incongruence. Cladistics 10: 315-319.

Still not convinced by Vicariance Biogeography

2019-04-05T21:24:00.000+11:00

When reading recent methodological papers, review articles, or publications on my study group I sometimes add to the mix the odd paper that is not directly relevant for my work and maybe not even very recent but which is relevant to my broader interests. In this case I decided to take a look at Heads 2009, Inferring biogeographic history from molecular phylogenies, Biol J Linn Soc 98: 757-774.

Michael Heads is perhaps the most published proponent of Vicariance Biogeography, the school of biogeography that rejects speciation following long-distance dispersal (LDD) because... and that is where it gets interesting, because I still find that rejection puzzling. To the best of my understanding at least some vicariance biogeographers consider the conclusion of LDD to be unscientific because they believe it can explain any possible contemporary range, on the lines of 'if your hypothesis can explain every observation it explains nothing'. This does not make sense to me, because LDD would still be more or less plausible depending on the dating of cladogenesis events relative to tektonic events or island ages, prevailing wind and water currents, dispersal ecology, and many other factors. It also seems rather more unscientific to reject a possible explanation a priori, regardless of any evidence in its favour. But to get a better understanding of the arguments of vicariance biogeographers is precisely my reason for picking up this paper. So, on with it.

In a section titled "critique of founder dispersal in population genetic studies", Heads first describes the concept as "the founder individual has been isolated from its parent population by dispersing over a barrier (an apparent contradiction)". Right out of the gate this seems odd. I may be missing something, but it appears as if Heads would accept only extremes: either there is a barrier, meaning zero dispersal, or there is none, meaning panmixis. I have previously observed similar arguments in other papers from the vicariance school.

Assume I have a garden with a fence around it, and then one day a cat jumps over it. Does this mean I have no barrier around the garden? Of course not, it may still have kept various stray dogs and neighbours' children out. On the other hand, it was never a barrier to birds or insects. The same in biogeography. No barrier on this planet is absolute, and each barrier has a different force for different groups of organisms. A channel that is near-insurmountable to a monkey may be crossed by insects if blown over by a strong enough storm, and it may be no barrier at all to fern spores. Perhaps even more importantly, dispersal is a stochastic process. The Atlantic Ocean did not keep all cacti from crossing (Rhipsalis made it over to Africa), but it kept the seeds of >99.9% of them away, so it is still a barrier even if not an absolute one.

Beyond that the argument of the section relies on citing five papers that "failed to corroborate predictions of founder effect speciation", of which one is missing from the reference list. I checked three of the remaining four papers, and in all cases they are experiments on fruit flies limited to time frames on the order of ten years and designed to test the very narrow question whether severe population bottlenecks will cause pre-mating isolation. Now I may completely have misunderstood the claim made by mainstream biogeographers regarding founder speciation, but I believe it was not "ten years after an organism has dispersed to an island it will have achieved biological pre-mating isolation". The way I understand it the claim is more on the lines of the large distance from the parental population producing geographic pre-mating isolation, which enables speciation to take place subsequently. The point is not the speed with which the new population evolves (although that is an exciting research question in itself) but rather that it has become geographically isolated.

The argument consequently seems to miss the point. If there is a problem for founder speciation then it would be whether a single pregnant female or a single seed can establish a viable population. Potential problems are inbreeding and, in plants that have such features, self-incompatibility systems that cause failure to set seed. But if a population establishes, helped perhaps by herbivore release and lack of competition, subsequent speciation is not an extraordinary claim. It really does not matter if isolation has been achieved by vicariance or by LDD, the subsequent process of divergence is the same except the latter will also cause a genetic bottleneck.

The section "critique of founder dispersal in biogeographic studies" points out that there is good evidence for similar vicariance patters in many taxa. I am unaware of anybody who denies that vicariance is an important process - but it does not logically follow that LDD is therefor implausible. I can agree that a lot of white swans exist without therefore having to believe that black ones cannot possibly exist.

This is followed by "founder dispersal and new ideas on rift tectonics", where the idea seems to be that seemingly young oceanic islands do not require LDD to be colonised because they kind of have always been there. It is not entirely clear to me if the claim is that the individual islands are all much older than the oldest still observable lava flows or if, as implied by the reference to "seamounts", the local species would have constantly hopped from one short-lived and now submerged island to the next. If the first, it seems rather ad-hoc; if the second, one wonders why species that can so easily jump ten times from one disappearing island to the next island in the chain cannot simply jump a single time from continent to island. What is the more parsimonious conclusion here?

Next, molecular clocks and time calibration of phylogenies are rejected. All inferences, be it from fossils but in particular from geological events such as the formation of the isthmus of Panama, are dismissed as unreliable, but apparently present distributions are reliable evidence of ancestral distributions. Unfortunately I remain anti-convinced.

To quote the following paragraph in full:

"In Ronquist's (1997) method of dispersal-vicariance analysis, inferences of dispersal events are minimized as they attract a 'cost'. Extinction also attracts a cost but vicariance does not. It was not explained why this approach was taken and it appears to be based on a confusion of the two different concepts of 'dispersal'. Ecological dispersal in the sense of ordinary movement should not attract any cost in any model; founder dispersal would attract no cost in a traditional dispersalist model, but, in a vicariance model of speciation or evolution, it is rejected a priori."

What Heads does here is reject a formal parsimony-based inference of ancestral ranges in favour of, to judge from the second half of the paper, an informal, intuitive, pencil-on-a-map deduction process. What does he not like about Dispersal-Vicariance Analysis (DIVA)? Apparently primarily that dispersal events have a parsimony cost. It may be that he did not contemplate how such an analysis would work or if it could even work at all, if the only process having a cost would be extinction - of course it would mean that dispersal would be much too 'cheap', and every single ancestral species would always be inferred to have occupied the union of the ranges of its two descendants.

The great irony here is that even with a dispersal cost DIVA is well known for mercilessly (and implausibly) favouring vicariance as a process. I ran that analysis on two or three data sets a few years ago, and unless one restricts the maximum range size of ancestral species to something biologically plausible one pretty much always ends up with the vicariance biogeographers' preferred conclusion: the ancestor of the study group was already everywhere where any of its descendants occur today.

The second part of the paper is taken up by a large number of case studies, taxa which have sometimes been suggested to have undergone LDD but for which Heads presents a vicariance explanation instead. Some of these I find more plausible than others, but I do not want to go into each of them in detail. Instead, it seems more efficient to discuss what I see as three problems running through the entire argumentation:

First, there seems to be a lot of ad-hoccery going on. Where necessary to arrive at the conclusion of vicariance, for example to explain the overlapping distributions of African Arctotideae, 'normal ecological' range expansion is invoked as common and easy. But where necessary to arrive at the conclusion of vicariance, for example when distantly related subclades of a taxon occur right next to each other in Tasmania or New Zealand (suggesting relatively recent LDD from elsewhere), they are assumed to have been sitting in these narrow localities for tens of millions of years, apparently unable to move at all, so that a very ancient vicariance event can have taken place between their present ranges. Is that not rather convenient?

Which brings me to the second point. The text presenting the case studies certainly uses words like "may" and "might" a lot. To be honest, I sometimes found myself reminded of Erich von Daniken, whose style was to the effect of "the traditional explanation is that the pyramids were build by the ancient Egyptians - but could it not have been extra-terrestrials?" Yes, in each of these cases vicariance (or extra-terrestrials) could be the explanation. But mere possibility is a low hurdle to clear; the real question is, is that the most plausible explanation?

Third, as always with vicariance- or panbiogeography the problem is that dispersal is still required. Somehow this taxon here must have reached this volcanic island, somehow that taxon there must have spread all over the world. How does the vicariance biogeographer arrive at contemporary ranges without invoking jumps across oceans? Partly by hiding the dispersal away before the start of the analysis. To quote the present paper, "assuming a worldwide ancestor..." Well, if we can just assume that at our leisure it becomes easy to conclude few dispersal events, long distance or otherwise.

Now quite apart from the question whether a single species occurring worldwide is biologically realistic for all groups of organisms (I'd say it isn't), the problem remains that we have a lot of nested groups that would all have to have been ancestrally cosmopolitan, requiring several global range expansions in between. The daisy family is an excellent example. With reference to them, Heads writes that "through the history of the family as a whole, only a small number of widespread ancestors may have existed (groups such as Senecioneae and Astereae each require their own global ancestor)." I think that is a wee bit of an underestimate.

To walk through just one example in order of containing taxon to subordinate taxon: The Asteraceae family is cosmopolitan. The Asteroideae subfamily is cosmopolitan. The Astereae tribe is cosmopolitan. And the genus Conyza is cosmopolitan. If vicariance is the explanation for all speciation events we still need at least four consecutive cases of spreading across all continents. The same applies to a large number of the other tribes in the family: yes, that includes the aforementioned Senecioneae, but also Gnaphalieae, Anthemideae, Heliantheae, Cichorieae, Cardueae, Inuleae, and Vernonieae. And several of these include genera occurring across several continents or even (as with Senecio) all of them except Antarctica.

There is certainly a lot of dispersal required to explain that even in a vicariance approach, and unless we assume that most speciation in these groups took place before the breakup of Pangaea 175 million years ago (meaning the early dinosaurs would have known many of the same daisies as we do now, tens of millions of years before the oldest estimates for the origin of the daisy family) we will have to assume that some of that dispersal was long-distance.

Why not simply accept that organisms can sometimes, rarely but often enough to matter, cross an ocean and establish on the other side, followed by speciation? What is is so extraordinary about that conclusion, really? What is so different about it compared to being separated by vicariance, followed by speciation? I am still puzzled.

Review of the Aachen Memorandum

2019-02-20T21:59:00.000+11:00

I picked this book up at a book fair after having read that it was a satire on bureaucracy and 'political correctness'. Although I am not the kind of person who believes that not being able to use sexist and racist insults is the end of the world and thus unlikely to agree with the author politically I nonetheless thought I might still find this kind of book interesting. I can, for example, read the original Conan novels through to the end without believing myself, as their author did, that all civilisation is corrupt and deserves to be destroyed.

Unfortunately, Robert E. Howard was a master of wit and subtlety compared to Andrew Roberts, and I only made it halfway through the Aachen Memorandum before giving up. Roberts took everything he dislikes - immigration, high taxes on the rich, animal protection, weed, speed limits, feminism, anti-racism, grade inflation, concern for healthy nutrition, and so much more, stuffed it all into one pot and then scrawled 'Europe' onto it.

The results are, unfortunately, not even intellectually coherent. The book has all European nations dissolved into a Euro-superstate, but somehow France is still able to buy the Channel Islands off England. The dominant culture is depicted as a caricature of feminist prudery, while the protagonist is constantly lecherous and voyeuristic, but he also complains that advertisements are all using sex to sell products. Europe is a total dictatorship with complete surveillance of communications, no free press, and continental armies stationed in England to forcefully squash nationalist protests, but (what follows is the only minor spoiler here) somehow the entire edifice collapses the moment somebody finds evidence that a referendum a generation ago was manipulated. The ruling ideology is clearly supposed to be left-wing and cosmopolitan, but at the same time Adolf Hitler is venerated in the schools.

How does that any of that even start to make sense? It seems as if the author believed that everybody who is not part of his own political sect is interchangeable and in cahoots with each other.

Underneath the visceral hatred of everybody outside of Britain oozing from the pages it is just about possible to see the outline of a potentially amusing thriller, but the problem is that I cannot maintain willing suspension of disbelief. Yes, the reader will soon understand that the author despises the European Union in general and Germany and Polish taxi drivers in particular, so well done communicating that, but novels also need an at least somewhat plausible and logically coherent setting, otherwise they don't work. And that is before even mentioning how blatant a wish-fulfillment self-insert the protagonist is.

I assume there was, and still is, a very particular audience for this book in one particular country, but at least in my eyes everybody else would be better served by doing something more entertaining than reading it, such as watching paint dry or counting how many grains there are in one kg of sugar.

TreeBASE and Dryad

2018-06-29T22:13:00.001+10:00

It is now generally expected that scientists, unless working on commercial or otherwise confidential projects, make the data underlying their scientific publications freely and publicly available, so that the studies can be replicated if necessary and so that others can use the data for further research.

Sometimes the data are submitted as supplementary material to be published on the journal website, together with the article itself. Some research organisations have their own data repositories. In many cases, however specialised databases are used. GenBank, for example, is a repository of DNA sequence data. Further down the analysis pipeline, I have in the past used TreeBASE to make available sequence alignment matrices and phylogenetic trees, and in one case I have reanalysed other people's data after obtaining them from there.

Recently I had reason to submit another such set of data matrices and phylogenetic trees to a database, and I thought I would go back to TreeBASE. Somehow it did not work out as well as it did a few years ago.

I was able to log in, I created a new submission, I submitted my files, and I described our analysis. The latter process is rather clunky, but okay, it works. Then it turned out that we needed to redo one of the phylogenetic analyses minus one sequence, so I had to delete one of the matrices and one of the trees and replace them with updated versions. That is when the fun started.

Although googling around a bit suggests that other people can do so, I find it impossible to delete anything in TreeBASE. There is no delete button next to anything except co-authors and submissions (i.e. the entire studies). Being unable to change data in a submission, I decided to delete the entire submission and start from scratch. That is surely not how it is meant to work, and it is a lot of extra effort, but what can I do?

As it turns out, not even that. When I ask that a submission be deleted, the web interface thinks for a bit an then throws a Java error at me. I now have three submissions under identical names and cannot delete the first two. Hurray.

At some point I thought I could maybe try out the alternative data repository Dryad. Perhaps that would work more reliably? At least I have seen it used in several publications lately. I have now twice submitted my eMail address on their 'sign up for a new account' form, been told twice that a confirmation eMail has been sent, and days later neither I nor my spam folder have received any such message.

Perhaps the journal will accept our manuscript without us having the matrix and trees in a public repository? This process is becoming somewhat off-putting.

Update: After a mere four days I have now finally been sent a confirmation link by Dryad. Will see how that repository works.

A particularly striking example of how paraphyletic taxa confuse our thinking about evolution

2018-06-09T18:07:00.001+10:00

I recently reread Jason Rosenhouse's Among the Creationists and came across the following extended quote from Stephen Jay Gould, a widely admired and famous evolutionary biologist.

If mammals had arisen late and helped to drive dinosaurs to their doom, then we could legitimately propose a scenario of expected progress. But dinosaurs remained dominant and probably became extinct only as a quirky result of the most unpredictable of all events - a mass dying triggered by extraterrestrial impact. If dinosaurs had not died in this event, they would probably still dominate the domain of large-bodied vertebrates, as they had for so long with such conspicuous success, and mammals would still be small creatures in the interstices of their world. [...] Since dinosaurs were not moving toward markedly larger brains, and since such a prospect may lie outside the capabilities of reptilian design, we must assume that consciousness would not have evolved on our planet if a cosmic catastrophe had not claimed the dinosaurs as victims. (Gould 1989, 318)

The context is the controversy around convergence and contingency in evolution. Rosenhouse discusses convergence as one of the hopes of Christians trying to reconcile evolution and Christian teachings, citing various proponents of the idea that their god set up the universe in a way that human-like intelligence was guaranteed to arise, thus producing beings that can have a "relationship" with said god.

Convergence is, of course, not only an observation considered helpful by the proponents of one variant of theistic evolution. To what degree the organisms that evolved on our planet would again turn out to be kind of similar if we replayed the tape or if organisms on other planets can be expected to look very similar to those on ours are very interesting questions of broad interest. Even an atheist may ask if we can expect lots of other planets where life arose to produce land plants, something a bit like insects, and perhaps even sentient beings given enough time, or if the vast majority of them will, for example, remain populated only by bacteria, because even evolving as much as multicellularity was a rare fluke.

Rosenhouse cites Gould as a well-known proponent of the importance of contingency. Although I tend much more towards the opposite view, I understand Gould's position. I believe the strongest argument for the contingency side is that while there are many impressive cases of convergence there are also quite a few crucial events in the history of life on this planet that appear to have happened only once: complex Eukaryotic cells; colonisation of dry land by multi-cellular plants; vertebrates; and of course human-like intelligence.

If, for example, the independent evolution of wings by insects, pterosaurs, birds and bats is counted as evidence for the importance of convergence, should something happening only once not be counted as evidence for the importance of contingency? My response would be competition, or in other words the change in the adaptive landscape caused by the first organisms to settle on a new peak. Where there may have been a ridge connecting the niches "kelp" and "large land-living plant" when nobody had occupied the latter, the first lineage to do so quickly became so good at being large land-living plants that the ridge crumbled away and became a canyon. If all land plants were wiped out, however, I would expect the land to be colonised anew, this time perhaps by red or brown algae.

But that is not actually about the main argument Gould is quoted as making in the above excerpt, and not what I found interesting about the quote. To take it in smaller pieces:

If mammals had arisen late and helped to drive dinosaurs to their doom, then we could legitimately propose a scenario of expected progress.

"Expected progress" is a bit of an odd term here. I am not sure if that is what is meant, but it could be read as if any group of animals that does not evolve towards large brains and intelligence is a refutation of the possibility that one group on each planet might evolve towards larger brains. But I do not think that this works as a refutation. And few proponents of the importance of convergence would argue that it is all about one linear progression towards large brains anyway. There are also progressions, for example towards body shapes that work well for swimming, towards paternal care for the young, towards powered flight, etc., and all of these happen at the same time but only in those lineages for which they solve relevant problems or create new opportunities.

If I understand the argument correctly, it is like pointing at a hole in the ground and saying, "if I now throw a pebble into the air and it does not end up in this specific hole, gravity is refuted", whereas the argument for convergence is that, what with evolution throwing thousands of pebbles into the air every year, we are very likely to find a few of them at the bottom of this hole as opposed to half way up its wall.

But dinosaurs remained dominant and probably became extinct only as a quirky result of the most unpredictable of all events - a mass dying triggered by extraterrestrial impact. If dinosaurs had not died in this event, they would probably still dominate the domain of large-bodied vertebrates, as they had for so long with such conspicuous success, and mammals would still be small creatures in the interstices of their world.

Although this is not my field, and I understand that it is an active area of research, I believe it can already be said with some confidence that mass extinction is not random. There are generally some reasons for why an extinction event claims this lineage here but leaves that other one over there largely intact. If a mass extinction of marine life is caused, for example, by a massive drop in the oxygen content of the oceans, then we would expect lineages that can survive under low oxygen conditions to come out in relatively good shape, all things considered, while those with a high oxygen need would be hammered.

In the present case, if we hypothesise that the impact of a large meteorite would have caused massive shockwaves followed by a few years of something like nuclear winter, we could expect the following: Species of small animals may find it easier to survive because they need less food per number of individuals. Bonus points if you have a burrow to hide in when the devastation sweeps across your area (small mammals) or if you can move easily to other areas where a bit more food is left (flight-capable birds). Large animals that can go with little food for long times may also have a good chance, in other words being cold-blooded may help to survive several bad years (crocodiles). If, however, you are large and (!) at the same time you have a high rate of metabolism then you might be in trouble, as you constantly need lots of food per number of individuals. As far as I understand, that describes the non-avian dinosaurs: large and warm-blooded.

The point is, catastrophes do happen from time to time, and once one happened it would probably have decimated the largest animals, even if it had come ten million years later than it did. Their niches are filled up again by small animals evolving to be large (another good example of convergence). What killed off the pterosaur lineage, for example, may well have been that the birds had already out-competed all small pterosaurs, leaving only the very large species when the meteorite struck. But again, this is not my area of expertise really.

Since dinosaurs were not moving toward markedly larger brains, and since such a prospect may lie outside the capabilities of reptilian design, we must assume that consciousness would not have evolved on our planet if a cosmic catastrophe had not claimed the dinosaurs as victims.

And this last part is really what I find the most interesting, because it illustrates so nicely how paraphyletic taxa can confuse the thinking even of the smartest of us, even of experts in evolutionary biology. What is the problem with the argument here?

First, and most obviously, birds are dinosaurs. Second, corvids (crows and ravens) and parrots are highly intelligent. Not quite human-level intelligence, but in some experiments corvids have proved to be smarter even than chimpanzees, our closest relatives. It follows that dinosaurs have actually "moved toward markedly larger brains", meaning here relative to the size of the body as a whole and, crucially, in terms of actual intelligence. Gould's premise is simply false, but his mistake is understandable, because at fault is really a misleading, i.e. non-phylogenetic, classification.

"Outside the capabilities of reptilian design" is, by the way, the same mistake at a deeper phylogenetic level. Mammals were not created fully formed, as mammals. Some of our ancestors were "reptiles", and here we are, having human-like intelligence by definition, what with us being humans and all that, so apparently there was a way of evolving human-like intelligence from a reptilian starting point. And from a fish starting point, and from a worm starting point, and from a bacterial starting point. All it took was lots of time and open niches waiting to be filled.

But I am not saying that anything here decisively refutes the idea that our sentience is a very rare fluke, unlikely to happen again should we go extinct. Maybe it is. The point is really how corrosive paraphyletic taxa are to reasoning about evolutionary processes.

Reference

Gould SJ, 1989. Wonderful Life: The Burgess Shale and the Nature of History. W.W. Norton, New York.

Manuscript submission then and now

2018-06-06T21:50:00.000+10:00

When I started in science, back in the dark ages, submitting a manuscript to a journal was still quite simple, if perhaps a bit inefficient:

Print the manuscript in triplicate.
Write a cover letter and print it.
Put everything into an envelope and send it off to the editor.

And that was that.

The first innovation was that you only had to send the manuscript to the editor as an eMail attachment, which was actually faster and saved a lot of paper. Unfortunately, however, things have changed again since then.

This is how it works today:

Log into the editorial management software of the journal of my choice. If I do not have an account with that journal yet, create one first.
Go to the author interface, click new submission.
Select the type of article.
Paste the title and abstract into an online form.
Select key words or topics that supposedly help the journal to assign editors and/or reviewers. Click 'save and continue'.
Upload main manuscript file, generally as an MS Word document.
Upload all the figures as separate files, generally as TIF or EPS, although JPGs may be acceptable at the review stage. Paste figure legends and write 'link texts' into the form fields.
Upload all the supplementary data files. If necessary, update the order of the files. Click 'save and continue'.
Next, the authorship page. As the corresponding author with an account at that journal I am already in, but I may be asked to link my ORCID. (I have no idea if anybody actually uses it for anything - I only ever look people up with their ResearcherID or Google Scholar.)
Search for my co-authors by name or eMail. I find the second co-author, great. The first and third co-authors aren't in the system, so I create entries for them. Click 'save and continue'.
Error: No telephone number provided for second co-author. But he was in the system, so you accepted him before! Also, will any editor really ever want to use it? Argh. Let's look up his number. Okay, edited. Click 'save and continue'.
Suggesting an editor for the manuscript. Oh dear, that's a long list. Hm. I know this guy hates one of the methods we used, he is out. This one is highly qualified but he will probably require us to add this other analysis that he likes. Ah well, worse things could happen. This one is also very qualified, but she works at a university I have a connection to - is that already a conflict of interest? Well, they can always choose somebody else, done.
Okay, suggesting peer reviewers and providing their contact information. This guy is an obvious choice as he is the expert for one of the analyses we used, but darn, he is currently between institutions. Let's google his name. No, that's outdated. This one too. Ah, I'm lucky: he has an updated CV on this third page I found, complete with the new phone number and eMail address. Okay, now for reviewer suggestion number two. She is another obvious choice as one of the world experts on our study group. Easy to find her information on a staff page, so that's good. Who else? Maybe two more experts on the study group? Ah yes, she would be interested in this, and I have her contact details. And then this other guy from Europe. Google. Darn, nothing, despite the unique name. Perhaps there is contact info on recent papers. No, he is too senior, the corresponding authors are always others. Ah, wait, here? No, an eMail address from 2012 going "director@institution.org" sounds fishy, most likely somebody else is director now. More Google. Ah, finally, was able to click myself through to a staff website, well hidden and not in English. Ye gods. Four qualified reviewers, that should be enough to get going. Click 'save and continue'.
Long, complicated page with miscellaneous information and declarations. First, write or upload cover letter. Done.
Next, declare that we have not submitted this manuscript elsewhere. Okay.
Is this a resubmission? No.
Declare that we have followed protocol so-and-so on ethical collection practices. Yes.
Declare that we have added a section on data availability. Wait, was that in the instructions to authors? Don't remember that. Argh. Save. Open manuscript file. Add data availability section. Back to file upload. Delete manuscript file. Re-upload manuscript file. Reorder files. Click 'save and continue'. Back to declaration. Yes, we now have a section on data availability.
Declare no conflicts of interest. Okay. Click 'save and continue'.
Large summary page. Check everything I entered so far. Down at the bottom: have to check PDF proofs before being allowed to submit. Click button, wait while the editorial manager bundles everything into a PDF.
Open PDF. One of the EPS figures does not display. Argh. Argh. Argh. Back to file upload. Delete offending figure. Re-upload figure - as a TIF this time, that should be foolproof. Reorder files. Click 'save and continue'. Back to summary page.
Re-check everything I entered so far. Click button, wait while the editorial manager generates a new PDF. Looks good this time.
The big moment is there: click here to submit. "Are you certain? This will submit your manuscript." Yes!

Yay, progress?

What are monotypic genera good for?

2018-05-12T22:16:00.000+10:00

There are a lot of monotypic genera around. In the group I am currently working on the most, the daisy family Asteraceae in Australia, there are an awful lot of monotypic genera indeed. Why do we need so many of them?

I would argue that there are two different scenarios to be considered. First, however, we need to keep in mind that:

We should classify organisms by their degree of relatedness, meaning that supraspecific taxa (including genera) should be monophyletic, and
while this previous rule tells us how we should group it does not tell us how we should rank. There is no genusness to be discovered in nature. Whether it is here in the phylogeny where we call a clade a genus or four nodes deeper down the tree is ultimately an arbitrary human decision.

This may at first suggest that there is no good argument to be had against monotypic genera either. If ranking is arbitrary then a classification consisting entirely out of monotypic genera - each species in the tree of life gets its own genus - is just as valid as the current one, so why not?

It is true that this is one of many possible ranking solutions compatible with phylogenetic systematics, but to decide between those many possible ranking solutions we can bring other criteria to bear. And here I would argue that it would be useful to minimise the number of monotypic genera as far as possible. Why? Because I would consider the genus level 'wasted' in many of those cases.

The entire point of a classification is that each taxon provides a piece of information. That information is: The members of this taxon are more closely related to each other than they are to non-members of this taxon. If we have a species, the species-taxon provides this information for all the members of that species. If we now have that species classified in a monotypic genus, the genus-taxon provides... the exact same information over again. It doesn't add anything. It is wasted.

Consequently, I believe that the proper use of monotypic genera is for when they are actually required for phylogenetic classification, but that there is a good argument for sinking them into larger genera whenever things could be made monophyletic without them. Two examples may illustrate the argument.

The above presents a case where the monotypic genus in red is actually needed. There are two genera marked in blue and green, and so obviously the phylogenetically isolated lineage in red cannot be lumped into either of them without making them paraphyletic. It is 'left over' and needs its own genus.

A perfect example for this is the ginkgo tree, Ginkgo biloba, which is a phylogenetically isolated living fossil. It is here photographed as an alley tree in front of our apartment block in Zürich, back when I was a postdoc there.

In the above phylogeny, however, the monotypic genus in red is sister to another genus in blue, and that latter genus isn't very large either. Now I can understand why it might perhaps be desirable to recognise the two as different genera if their divergence happened many tens of millions of years ago and they are morphologically quite distinct. Unfortunately, however, the world is full of monotypic genera that are very young and look exactly like the slightly larger sister genus, but differ from it in a single morphological character.

In those cases, do we really need that kind of taxonomic inflation? What then is the use of the genus rank?

The species that occasioned these ruminations in me is the above Tasmanian daisy tree Centropappus brunonis, which is clearly just a Bedfordia without hairs on the leaves; otherwise the two genera are pretty much indistinguishable. And Bedfordia itself has a mere three species, so it is not as if it would get unmanageably large if they were united.

There are many, many similar cases.

Time-calibrated or at least ultrametric trees with the R package ape: an overview

2018-04-20T23:32:00.000+10:00

I had reason today to look into time-calibrating phylogenetic trees again, specifically trees that are so large that Bayesian approaches are not computationally feasible. It turns out that there are more options in the R package APE than I had previously been aware of - but unfortunately they are not all equally useful in everyday phylogenetics.

In all cases we first need a phylogram that we want to time-calibrate or at least make ultrametric to use in downstream analyses that require ultrametricity. As we assume that our phylogeny is very large it may for example have been inferred by RAxML, and the branch lengths are proportional to the probability of character changes having happened along them. For present purposes I have used a smaller tree (actually a clade cut out of a larger tree I had floating around), so that I could do the calibrations quickly and so that the figures of this post look nice. My example phylogram has this shape:

We fire up R, load the ape package, and import our phylogeny with read.tree() or read.nexus(), depending on whether it is in Newick or Nexus format, e.g.

mytree <- read.tree("treefilename.tre")

Now to the various methods.

Penalised Likelihood

I have previously done a longer, dedicated post on this method. I did not, however, go into the various models and options then, so let's cover the basics here.

Penalised Likelihood (PL) is, I think, the most sophisticated approach available in APE, allowing the comparison of likelihood scores between different models. It is also the most flexible. It is possible to set multiple calibration points, as discussed in the linked earlier post, but here we simply set the root age to 50 million years:

mycalibration <- makeChronosCalib(mytree, node="root", age.max=50)

We have three different clock models at our disposal, correlated, discrete, and relaxed. Correlated means that adjacent parts of the phylogeny are not allowed to evolve at rates that are very different. Discrete models different parts of the tree as evolving at different rates. As I understand it, relaxed allows the rates to vary most freely. Another important factor that can be adjusted is the smoothing parameter lambda; I usually run all three clock models at lambdas of 1 and 10 and pick the one with the best likelihood score. For present purposes I will restrict myself to lambda = 1.

Let's start with correlated:

mytimetree <- chronos(mytree, lambda = 1, model = "correlated", calibration = mycalibration, control = chronos.control() )

When plotted, the chronogram looks as follows.

Next, discrete. The command is the same as above except for the text in the model parameter. The branch length distribution and likelihood score turned out to be very close to those for the correlated model:

Finally, relaxed. Very different branch length distribution and a by far worse likelihood score compared to the other two:

I have only considered testing a strict clock model with chronos for the first time today. It turns out that you get it by running it as a special case of the discrete model, which by default is set to assume ten rate categories. You simply set the number of categories to one:

mytimetree <- chronos(mytree, lambda = 1, model = "discrete", calibration = mycalibration, control = chronos.control(nb.rate.cat=1) )

In my example case this looks rather similar to the results from correlated model and discrete with ten categories:

The problem with PL is that is seems to be a bit touchy. Even today we had several cases of an inexplicable error message, and several cases of the analysis being unable to find a reasonable starting solution. We finally found that it helped to vastly increase the root age (we had played around with 15, assuming that it doesn't matter, and it worked when we set it to a more realistic three digit number). It is possible that our true problem was short terminal branches.

PL is also the slowest of the methods presented here. I would use it for trees that are too large for Bayesian time calibration but where I need an actual chronogram with a meaningful time axis and want to do model comparison. If I just want an ultrametric tree the following three methods would be faster and simpler alternatives. That being said, so far I had no use case for them.

A superseded but fast alternative: chronopl()

This really came as a surprise as I believed that the function chronopl() had been removed from ape. I thought I had tried to find it in vain a few years ago, but I saw it in the ape documentation today (albeit with the comment "the new function chronos replaces the present one which is no more maintained") and was then able to use it in my current R installation. I must have confused it with a different function.

chronopl() does not provide a likelihood score as far as I can see, but it seems to be very fast. I quickly ran it with default parameters and lambda = 1, again setting root age to 50:

mytimetree <- chronopl(mytree, lambda = 1, age.min = 50, age.max = NULL, node = "root")

The result looks very similar to what chronos() produced with the (low likelihood) relaxed model:

Various parameters can be changed, but as implied above, if I want to do careful model comparison I would use chronos() anyway.

Mean Path Lengths

The chronoMPL() method time-calibrates the phylogeny with what is called a mean path lengths method. The documentation makes clear that multiple calibration points cannot be used; the idea is to make an ultrametric tree, pick one lineage split for which one has a credible date, and then scale the whole tree so that the split has the right age. Command is simply:

mytimetree <- chronoMPL(mytree)

The problem is, the resulting chronogram often looks like this:

Most of the branch length distribution fits the results for the favoured model in the analysis with chronos(), see above. That's actually great, because chronoMPL() is so much faster! But you will notice some wonky lines in particular in the top right and bottom right corners of this tree graph. Those are negative branch lengths. Did somebody throw the ancestral species into a time machine and set them free a bit before they actually evolved?

Some googling suggests that this happens if the phylogram is very unclocklike, which, unfortunately, is often the case in real life. That limits rather sharply what mean path lengths can be used for.

The compute.brtime() function

Another function that I have now tried out is compute.brtime(). It can do two rather different things.

The first is to transform a tree according to what I understand has be a full set of branching times for all splits in the tree. The use case for that seems to be if you have a tree figure and a table of divergence times in a published paper and want to copy that chronogram for a follow-up analysis, but the authors cannot or won't send it to you. So you manually type out the tree, manually type out a vector of divergence times (knowing which node number is which in the R phylo format!), and then you use this function to get the right branch length distribution. May happen, but presumably not a daily occurrence. What we usually have is a tree for which we want the analysis to infer biologically realistic divergence times that we don't know yet.

The second thing the function can do is to infer an ultrametric tree without any calibration points at all but under the coalescent model. The command is then as follows.

mytimetree <- compute.brtime(mytree, method="coalescent", force.positive=TRUE)

It seems that the problem of ending up with negative branch lengths was, in this case, recognised and solved simply by giving the user the option to tell the function PLEASE DON'T. I assume they are collapsed to zero length (?). My result looked like this:

Note that this is more on the lines of "one possible solution under the coalescent model" instead of "the optimal solution under this here clock model", so that every run will produce a slightly different ultrametric tree. I ran it a few times, and one aspect that did not change was the clustering of nearly all splits close to the present, which I (and PL, see above) would consider biologically unrealistic. Still, we have an ultrametric tree in case we need one in a hurry.

It is well possible that I have still missed other options in APE, but these are the ones I have tried out so far.

Something completely different: non-ultrametric chronograms

Finally, I should mention that there are methods to produce very different time-calibrated trees in palaeontology. The chronograms discussed in this post are all inferred under the assumption that we are dealing with extant lineages, so all branches on the chronogram end flush in the present, and consequently a chronogram is an ultrametric tree. And usually the data that went into inferring the topology was DNA sequence data or similar.

Palaeontologists, however, deal with chronograms where many or all branches end in the past because a lineage went extinct, making their chronograms non-ultrametric and look like phylograms. And usually the data that went into inferring the tree topology was morphological. This is a whole different world for me, and I can only refer to posts like this one and this one which discuss an R package called paleotree.

There also seems to be a function in newer APE versions called node.date() which is introduced with the following justification:

Our software, node.dating, uses a maximum likelihood approach to perform divergence-time analysis. node.dating is written in R v3.30 and is a recent addition to the R package ape v4.0 (Paradis et al., 2004). Previously, ape had the capability to estimate the dates of internal nodes via the chronos function; however, chronos requires ultrametric trees and is thus unable to incorporate information from tips that are sampled at different points in time.

This suggests that the point is the same, to allow chronograms with extinct lineages, but in this case aimed more at molecular data. Their example case are virus sequence data.

Monophyletic species, kind of

2018-04-13T18:35:00.001+10:00

A paper by bryologist Brent Mishler and philosopher of biology John Wilkins has just come out, with the title The Hunting of the SnaRC: A Snarky Solution to the Species Problem. It is open access in the journal Philosophy Theory and Practice in Biology, so anybody with internet access can check it out.

Many bloggers have issues that they return to again and again even if they are not necessarily the nominal topics of their blogs - for example, Jerry Coyne frequently posts about Free Will and about students trying to shut down talks by speakers they don't like, and Larry Moran regularly takes apart papers claiming that junk DNA has been disproved. This much less widely known blogger can reliably be coaxed out from behind the oven by at least two such recurring issues: bad arguments for the acceptance of paraphyletic taxa, and the in my eyes incoherent concept of "monophyletic species".

As the title indicates, Mishler & Wilkins present a solution for the species problem, i.e. the perennial question in biology of what 'a species' even is. Especially as the paper is freely accessible it would serve no purpose to summarise its introduction, so I will move immediately to what I find most interesting: their views on how to view species and some pointers on how to do classification at the lowest levels in practice.

Note that I say "their views", plural, deliberately, because this is one aspect of the paper that I have not quite understood yet:

Wilkins has argued in the past that the popular approach of developing a theoretical species concept and then applying it to a potentially recalcitrant reality is a dead end. What biologists should do is the opposite, i.e. consider species as empirical phenomena in need of individual explanations. And here in this paper, Wilkins' argument is reiterated concisely in section 3, A Way Forward: Species Are at Least Initially Phenomena.

What I like about this flip in perspective is that it allows much more flexibility; obviously the empirical phenomena that we generally identify as species, be it popularly or as biologists - generally gaps in morphological or genetic variation - need a different scientific explanation for example in asexual than in sexual species, making one-size-fits-all species concepts difficult to apply.

Mishler, in turn, has argued in the past that species are not a special biological category different from e.g. monophyletic genera and families. The species category is arbitrary, and we should just classify all organisms into nested monophyletic groups, AKA clades, all the way down to the individual specimens. And here in this paper, Mishler's argument is reiterated in sections 4, Rankless Taxonomy, 5, Capturing the SNaRC, and 6, Using SNaRCs in Systematic, Evolutionary, and Ecological Studies.

The thing is, while there is perhaps technically no direct contradiction between those two arguments to the degree that there is a contradiction between "all taxa should be monophyletic" and "taxa should be allowed to be paraphyletic", they appear to be two rather different prescriptions. If I understand correctly, the first says,

We should treat species as empirical phenomena in need of explanation instead of indiscriminately applying a given theoretical concept to them.

The second says,

It makes no sense to even talk of species, we should stop doing so, and here is a single theoretical concept (everything is clades) that we should indiscriminately apply to all specimens.

In fact I am currently unable to see how sections 4-6 and the conclusions of this paper would have to change if section 3 were to be deleted in its entirety. What am I missing?

What I found most useful about this paper was that it has some thoughts on how to do classification into nested clades all the way down to the individual specimens in practice, because that was completely unclear to me in all past instances when this approach was suggested. There are some apparent problems with it, particularly that we need items forming a tree structure to even have clades. It is sometimes difficult to illustrate the issue, but it can perhaps be presented as follows:

The prescription is, as mentioned above, that a classification should be clades (= monophyletic groups) all the way down to individual specimens.
A clade is a complete branch in a tree structure, and usually understood to be specifically a complete branch of a species phylogeny.
In other words, the way the term clade is defined, it applies only in a tree-structure but is inapplicable in a net-like structure.
Sexually reproducing species are systems consisting of individual specimens that have net-like relationships with each other, because they share numerous ancestors instead of one ancestor in each sufficiently earlier generation.
It follows necessarily from the previous two points that the term clade cannot be applied to describe the relationship between specimens if what we are looking at includes multiple specimens from the same sexually reproducing species.
If follows then that it is logically impossible to classify into clades all the way down to these specimens, unless the meaning of the word clade is changed to a degree that the whole purpose of having that word is defeated.

To my understanding this is why Hennig spent so much time discussing the different ways that specimens (or snapshots of them, which he called semaphoronts) can be related to each other. The relationship between four (non-hybridogenic) species is tree-like, so they can, and should, be classified into clades. But relationships between individuals within a sexually reproducing species are net-like, so they cannot possibly be classified into clades, as the word does not even have a meaning in that structure.

The point at which approaches to classification change is approximately at the species level. Phylogenetic systematics applies only above it, and it uses species as the units that it groups into clades, because if it used any smaller units there would not be clades. This is also why in my opinion one cannot coherently reject the reality of species and be a phylogenetic systematist and, conversely, coherently accept the reality of species and promote paraphyletic taxa, because clades are species that have diversified. Many others, of course, disagree.

Now, what is the practical approach suggested by the present paper? It argues that the terminal units of classification should be "the finest-scale clades that can be convincingly demonstrated with current data", here called Smallest Named and Registered Clades (SNaRCs). Obviously such a 'clade' cannot be based on information from a single gene, as it may show a different history than other genes, for example because of introgression or incomplete lineage sorting. The solution is to use as evidence for monophyly "the preponderance of gene lineages making up a clade", or in other words "congruence among the majority of gene trees and other types of phylogenetic characters available".

On the plus side, this is a very empirical and testable prescription. But consider two thought experiments. First, take three samples A, B and C, look at, say, 100 gene trees, and if 51 of them show ((A,B),C) then A and B form a 'clade', even if all three of them are members of the same sexually reproducing species. Again, that is doable, empirical and testable, and we get a clear answer.

Nonetheless this approach does not convince me at the moment, nor will it even if we assume a scenario of 100 gene trees supporting (A,B), simply because no matter what the gene trees say, in reality there is no tree-structure inside the species. Yes, we can easily sequence for example the DNA of three siblings and run an analysis that will produce a phylogenetic tree for each gene, but in reality these three people just don't have a tree-relationship with each other, so it does not make sense to me to use terminology or a classification that implies there is one.

For the second thought experiment, take three samples D, E, and F, and if 33 gene trees say ((D,E),F), 33 say (D,(E,F)), and 34 say (E,(D,F)), we are inside a SNaRC and should not delimit any more narrowly, even if D is a specimen from an arid zone ephemeral, E from an alpine perennial, and F from a narrow endemic of the northwestern Blue Mountains that only occurs on ironstone-sandstone outcrops, and all three of them are geographically isolated from each other.

This hypothetical case has three very distinct entities that show a lot of gene tree discordance for the genes we used for our analysis. This is a much weaker problem than the previous one because Mishler & Wilkins argue that SNaRCs are, as all scientific hypotheses, tentative and await revision after the examination of more data. Maybe the next 100 gene trees will clinch it for (A,(B,C)), and then at least we could separate out A; more realistically, sampling more individuals of all three species will presumably resolve the three species as three SNaRCs, even if we cannot figure out the relationship of those three SNaRCs with each other (they may even form a true polytomy, and that's fine).

Still it bothers me that in a situation where we unfortunately have only one sample per species available for analysis the approach promoted in the present paper might lead to the tentative lumping of clearly distinct entities. And unless something is added to the approach, or unless I am missing something, it would have to, because it does not seem to include a way of recognising single-specimen SNaRCs except in the case of one being left alone as sister to another SNaRCs, that, in turn, would still consist of two potentially vastly different specimens. But maybe I am taking this too literally.

On top of that there is perhaps another methodological issue, or again maybe just something I don't understand. It seems to me as if "majority vote of the gene trees" is not actually how multi-locus phylogenetic analyses generally work. To the best of my understanding they reconcile gene trees in rather more complex ways, even in the case of such a simple approach as Gene Tree Parsimony, let alone the multi-gene coalescent model. Many of these approaches actually presuppose the existence of species or populations, and for the same reason as I argued above: what happens within a sexually reproducing lineage is rather different from what happens between such lineages.

More than anything what I find uncomfortable about the approach presented here is that it seems to care not so much about the actual patterns of common descent of what it classifies as about character or gene tree distribution. The difference may come across as subtle, admittedly. What I am trying to say is that I believe phylogenetic systematics should be about classifying organisms by relatedness, by exclusivity of common descent.

I do not, for example, care very much about the fact that most of the ancestral chloroplast genome has been moved over into the nucleus of the host cell, because the chloroplasts are directly descended in an unbroken line from the first cyanobacterium that colonised a plant cell, and the plant species we have today are descended in an unbroken line from that plant cell. To me chloroplasts are a subclade of cyanobacteria and plants are a subclade of eucaryotes, all regardless of what happened to the individual genes.

To use an example from within a species, I have mentioned in the past that it is possible, although statistically unlikely, that I have inherited no genetic material whatsoever from my maternal grandfather, if it just so happened that all the chromosomes my mother gave me were those she got from her mother (the Y chromosome is of course always from the paternal grandfather, by necessity). But even if that were the case we would nonetheless consider it to be an important piece of information that I descended from my maternal grandfather, and I would nonetheless not exist without his involvement. So yes, we use the genes to infer common descent, but the point is really the common descent itself, and the genes are just a data source that can potentially mislead us. Sometimes the right answer may be (A,(B,C)) even if most genes say ((A,B),C).

The "majority vote of the gene trees" approach, however, feels as if its practical concern starts and ends at the pattern shown by the genes, regardless of what the patterns of descent are. To me that feels the wrong way around.

Another way of looking at the issue may be this: If we truly accept the argument made in section 3, that we should look at natural phenomena, consider them to be explananda, and find the most appropriate scientific explanation for each of them, would the logical result not be Hennig's original approach? The phenomenon that a beetle specimen shares more traits with a bee specimen than either share with a slug specimen has an explanation, and that is that the former two share a much more recent common ancestor from which they inherited the shared traits. We express that reality by grouping the former two into a taxon called 'insects' while leaving the slug out.

The fact that I may easily in some cases share more genetic similarity with somebody born in Italy than with another northern German, however, would most likely be due to the stochastic nature of allele inheritance inside our sexually reproducing species. There is no clade wherein two specimens of humanity - the hypothetical Italian and I - share one and only one most recent common ancestor. Instead, beyond some point in the past we share thousands of ancestral 'specimens' in each generation. Because this is a different biological phenomenon than ((beetle,bee),slug), we need a different approach to classification at that level.

Botany picture #257: Gentianella aspera

2018-04-11T14:34:00.000+10:00

Has it been that long since I posted the last botany picture? With my mind still on the mountains, here is a European gentian, Gentianella aspera (Gentianaceae), European Alps, 2004. Although sometimes split off into their own genus, the Australian gentians are phylogenetically also Gentianella.

One thing that I found strange about the Australian ones, by the way, is that they are generally white, because the European gentians are rather famously blue, violet, or very rarely yellow. There is even an obnoxious German Schlager song making that point, with the first line of the chorus translating as "blue, blue, blue blooms the gentian".

WARNING: follow that link at your own risk.

Sam Harris and Ezra Klein on intelligence and race

2018-04-10T20:01:00.000+10:00

Recently atheist activist Sam Harris and journalist Ezra Klein had a discussion about intelligence and race. The background is that Harris had Charles Murray, the author of The Bell Curve, as a guest on his podcast, Klein's Vox site published an article critical of that interview, and Harris felt that that article was unfair.

Having read through the transcript of Harris' and Klein's conversation, I must say that it went reasonably well, considering the topic. Harris' discussion with Noam Chomsky, for example, was much worse, as his first argument went completely over Harris' head, and they just went in circles from that moment on.

The frustrating thing is that at the bottom of what Harris is trying to argue there are quite a few ideas that are valid. Yes, scientific results should be accepted for what they are instead of being pushed aside for fear of being politically incorrect. But his otherwise reasonable points are completely overshadowed by his tendency to make it all about how mean his critics are to him for calling him biased and his inability to see that making it all about how his critics are mean to him while bracketing out how this discussion fits into its historical and political context in the United States is his own unacknowledged bias at work.

What is in my eyes particularly ironic, however, is that while Harris makes it all about how unfair his critics are, he argues at the same time that the science should be the focus. So I tried to have an eye on how the scientific evidence was discussed, and as far as I can tell it seemed to go as follows:

Klein sometimes brings up evidence that shows that intelligence (as measured by IQ or similar tests, which is another whole can of worms) is strongly influenced by the environmental conditions under which somebody grows up, e.g. when children from disadvantaged backgrounds are adopted by affluent families, and cites, by name, relevant scientists who argue that at the very least there is at this moment no evidence yet for any significant genetically determined IQ difference between groups. (And I have no idea where such evidence could even potentially come from, unless there is behind this the usual misunderstanding of what heritability means.) Harris never addresses those arguments, as far as I can tell. His counter-arguments appear to be:

(1) "genes are involved for basically every[thing]". But that is so trivially true as to be meaningless. Genes are involved for the development of fingers, still there are no differences in the number of fingers between different populations. And even if we are talking about traits that vary, it gets us nowhere, because it doesn't necessarily follow that the genes determine more than, say, 5% of the variation. And even if intelligence is strongly heritable it says nothing about significant differences between groups either, as he readily admits that variation within is much stronger than between.

(2) Then there is Harris' sports example, where he says that West Africans dominate certain running sports. He argues "if you have populations that have their means slightly different genetically, 80 percent of a standard deviation difference, you’re going to see massive difference in the tail ends of the distribution, where you could have 100-fold difference in the numbers of individuals who excel at the 99.99 percent level". Now I get that this might be a valid argument to explain the underrepresentation of a group with a hypothetically slightly lower mean at excelling at the >99.9% level under the Utopian assumption of complete equality of opportunity, but then we would be talking about Field Medal winners or Nobel laureates. As an explanation for lower societal achievement on average, i.e. why members of a group are vastly overrepresented in prisons and have vastly lower household wealth than the majority, it is a non-starter and thus irrelevant to the discussion from the get-go.

(3) Harris cites unnamed scientists who, he says, do not want to have their names published because of fear of being called racist, but who are said to agree with him. Not knowing who they are one is, of course, unable to confirm what they said or meant as well as to assess their qualifications, their potential agendas and biases, and if they are even from a relevant field of research. (Note that according to Wikipedia Charles Murray, with whom that whole discussion started, is a "political scientist, author, and columnist" working for a conservative think tank. That is, he is not an expert in the areas of population genetics, human cognitive development, comparative assessments, or any other field of relevance.)

I find that a bit disappointing. For all Harris' claims that the science is clearly on Charles Murray's side, it rather looks to me as if his argumentation runs simply as follows: There are differences in IQ between groups, and these differences must obviously have a genetic component, because everything has a genetic component. And that's it, at least as far as one can tell from the conversation with Klein.

How problematic is the jump dispersal parameter in ancestral area inference?

2018-04-02T22:08:00.000+10:00

I recently read an article in the Journal of Biogeography titled "Conceptual and statistical problems with the DEC + J model of founder-event speciation and its comparison with DEC via model selection". Its authors are Richard Ree, the developer of the original DEC model, and Isabel Sanmartin.

The main problem with discussing the paper here is that it would probably take 5,000 words to properly explain what it is even about. I will try to provide the most superficial introduction to the topic and otherwise assume that of the few people who will read this blog most are at least somewhat familiar with it.

The area of research this is about is the estimation or inference of ancestral areas and biogeographic events. Say we have a number of related species, the phylogeny showing how they are related, a number of geographic areas in which each species is either present or absent, and at least one model of biogeographic history. For the purposes of what I will subsequently call ancestral area inference (AAI) we assume that we know the species are well-defined and that the phylogeny is as close to true as we can infer at the time, so that they will simply be accepted as given. How to objectively define biogeographic areas for the study group is another big question, but again we take it as given that that has been done.

The idea of AAI is to take these pieces of information and infer what distribution ranges the ancestral species at each node of the phylogeny had, and what biogeographic events took place along the phylogeny to lead to the present patterns of distribution. What model of biogeographic events we accept matters a lot, of course. Imagine the following simple scenario of three species and three areas, with sister species occurring in areas A and B, respectively, and their more distant relative occurring in both areas B and C:

Assuming, for example that our model of biogeographic history favours vicariant speciation and range expansions, we may consider the scenario on the left to be a very probable explanation of how we ended up with those patterns of distribution. First the ancestral species of the whole clade occurred in all areas, and vicariant speciation split it into a species in area A and one in areas B and C. The former expanded to occur in both A and B and then underwent another vicariant speciation event, done.

If we have reason to assume that this is unlikely, for example because area A is an oceanic island, we may favour a different model. In the right hand scenario we see the ancestral species occurring in areas B and C and producing one of its daughter species via subset sympatry in area B. At least one seed or pregnant female of that new lineage is then dispersed to island A. An event such as this last one, where dispersal leads to instant genetic isolation and consequent speciation, is in this context often called 'jump dispersal' or, as in the title of the paper, 'founder-event speciation', to differentiate it from the much slower process of gradual range expansion followed by vicariant or sympatric speciation*.

I am not saying that either of these scenarios is the best one to explain how the hypothetical three species evolved and dispersed. In fact I would say that three species are too small a dataset to estimate biogeographic history with any degree of confidence, but it provides an idea of what ancestral area inference is about.

Perhaps the best established approaches to AAI are Dispersal and Vicariance Analysis (DIVA) and the Dispersal, Extinction and Cladogenesis model (DEC). The former was originally implemented as parsimony analysis in a software with the same name, and it has a tendency to favour vicariance, as the name suggests. Likelihood analysis under the DEC model became popular in its implementation in the software Lagrange, and in my limited experience and to the best of my understanding it is designed to have daughter species inherit part of the range of the ancestor, often leading to subset sympatry. And there are other approaches, of course.

As the result of his PhD project, Nick Matzke introduced the following two big innovations in AAI: First, the addition of a parameter j, for jump dispersal, to existing models. This allows the kind of instantaneous speciation after dispersal to a new area that I described above, and which can be assumed to be particularly important in island systems. Second, the idea that the most appropriate model for a study group should be chosen through statistical model selection, as in other areas of evolutionary biology. He created the R package BioGeoBEARS to allow such model selection. It implemented originally likelihood versions of DIVA, DEC and a third model called BayArea, all assuming the operation of slightly different sets of biogeographic processes. Each of them can be tested with and without the j parameter and, after another update, with or without an x parameter for distance-dependent dispersal.

Now I come finally (!) to Ree & Sanmartin. Their eight page paper, as the title implies, is a criticism of these two innovations. What do they argue? I hope I am summarising this faithfully, but in my eyes their three core points are as follows:

A biogeographic model with events happening at the nodes of the tree as opposed to along the branches, as is the case with jump dispersal, is not a proper evolutionary model because such events are then "not modeled as time-dependent". In other words, only events that have a per-time-unit probability of occurring along a branch are appropriate.

Under certain conditions the most probable explanation provided by a model including the j parameter is that all biogeographic events were jump dispersals. The j parameter gets maximised and explains everything by itself. They call this scenario "degenerate", because the "true" model must "surely" include time-dependent processes.

DEC and DEC + j (and, I assume, by extension any other model and its + j variant) cannot be compared in the sense of model selection.

I must, of course, admit that model development is not my area. Consequently I am happy to defer regarding points one and three to others who have more expertise, and who will certainly have something to say about this at some point. I can only at this moment state that these claims do not immediately convince me. Certainly it is often the case that models with very different parameters are statistically compared with each other?

Is it not possible that the best model to explain an evolutionary process may sometimes indeed have a parameter that is not time-dependent but dependent on lineage splits? In the present case, if it is a fact that jump dispersal caused a lineage split, then both events quite simply happened instantaneously (at the relevant time scale of millions of years); in a sense, they were the same event, as the dispersal itself interrupted gene flow.

Perhaps more importantly, however, I am not at all convinced by the second point. Generally I am more interested in practical and pragmatic considerations than theory of statistics and philosophy. In phylogenetics, for example, I am less impressed by the claim that parsimony is supposedly not statistically consistent than by a comparison of the results produced by parsimony and likelihood analysis of DNA sequence datasets. Do they make sense? What can mislead an analysis? What software is available? How computationally feasible is what would otherwise be the best approach, and can it deal with missing data?

So in the present case I would also like to consider the practical side. Is the problem of j being maximised so that everything is explained by jump dispersal at all likely to occur in empirical datasets? In the paper Ree and Sanmartin illustrate a two species / two area example. That is clearly not a realistic empirical dataset, as it is much too small for proper analysis. But if we understand to some degree how the various model parameters work we can deduce under what circumstances j is likely to be maximised.

Unless I am mistaken, the circumstances appear to be as follows: We need a dataset in which all species are local endemics, i.e. all are restricted to a single area, and in which sister species never share part of their ranges. This is because other patterns cannot be explained by jump dispersal. If a species occupies two or more areas, it would have had to expand its range, so the analysis cannot reduce the d parameter for range expansion to zero. If sister species share part of their ranges, likewise; if they share the same single area, they must have diverged sympatrically, which again is not speciation through jump dispersal.

This raises the question, how likely are we to find datasets in which these two conditions apply? In my admittedly limited experience such datasets do not appear to be very common. If we are dealing, for example, with a small to medium sized genus on one continent, we will generally find partly overlapping ranges, and often at least one very widespread species. The j parameter will not be maximised. If we are doing a global analysis of a large clade, we will need rather large areas (because if you use too many small areas the problem becomes computationally intractable). This means, among other things, that entire subclades will share the same single-area range, and j will not be maximised.

In other words, the problem of 'all-jump dispersal' solutions appears to be rather theoretical. But what if we actually do have such a dataset? Is it not a problem then? To me the next question is under what circumstances such a situation would arise. Again, we have all species restricted to single areas, meaning that they apparently find it hard to expand their ranges across two areas. Why? Perhaps geographic separation to the degree that they rarely disperse? Geographic separation to the degree that when they disperse gene flow is interrupted, leading to immediate speciation? Again, we never have sister species sharing an area. Why? A good explanation would be that each area is too small for sympatric speciation to be possible.

Now what does that dataset sound like? To me it sounds like an archipelago of small islands, or perhaps a metaphorical island system such as isolated mountain top habitats. The exact scenario, in other words, in which all-jump dispersal seems like a very probable explanation. Because your ancestral island is too small for speciation, the only way to speciate is to jump to another island, and if you jump to another island you are immediately so isolated from your ancestral population that you speciate.

Again, I am not a modeler, and I have not run careful simulation experiments before writing this, but based on this thought experiment it seems to me as if the + j models would work just as they should: j would not be maximised under circumstances where the other processes are needed to explain present ranges, but it would be maximised under precisely those extremely rare circumstances where 'all jump dispersal' is the only realistic explanation.

Footnote

*) Sympatric meaning here at the scale of the areas defined for the analysis. If one the areas in the analysis is all of North America, for example, it is likely that the 'sympatric' events inside that area would in truth mostly have been allopatric, parapatric or peripatric at a smaller spatial scale.

Weekend in the mountains

2018-04-01T17:25:00.001+10:00

We just came back from a nice weekend in the Australian 'Alps', making use of what may have been the last period of nicely warm weather. Still rather cold camping during the night, it is definitely not summer anymore.

Turns out we may finally have to buy a new tent. On the plus side, the belt of the plant press served well to keep the tent pole in shape; not perfectly, but sufficiently to give us just enough structural integrity for two nights.

The main attraction this time was Yarrangobilly Caves. The last time we passed by it was too late in the day, so we were unable to visit them. This time we bought passes for two of the caves.

Although probably weird looking enough, this photo does not do reality justice - the entrance area to South Glory Cave is massive and awe-inspiring.

We camped at our favourite spot in the area, Three Mile Dam. I have posted photos of the lake before, but here is one showing the moon reflected in the water during the night.

Morning mist above the camp site penetrated by the first rays of the sun.

And to conclude, something botanical: Golden everlasting daisies (Xerochrysum subundulatum, Asteraceae) fruiting on the Old Kiandra Gold Fields.

Science spammers constantly reaching for new lows

2018-03-23T07:27:00.002+11:00

I received the following two messages on the same day, in fact they were sitting right next to each other in the inbox.

Surely this is sad. Is there no such thing as taking pride in one's work, even among spammers? Recently some people tried to defraud me, and of course that is annoying, but at least they put a lot of effort into it. I was impressed by how much information they had to accumulate to seem half-convincing. These guys, on the other hand, use such a simplistic bot to produce their mass-emails that it they are immediately recognisable as such.

Really the only thing sadder than these messages is that my spam filter is apparently still unable to understand that the keyword "greetings!!!!" is a certain indicator of spaminess.

Bioregionalisation part 6: Modularity Analysis with the R package rnetcarto

2018-03-21T07:43:00.001+11:00

Today's final post in the bioregionalisation series deals with how to do a network or Modularity Analysis in R. There are two main steps here. First, because we are going to assume, as in the previous post, that we have point distribution data in decimal coordinates, we will turn them into a bipartite network of species and grid cells.

We start by defining a cell size. Again, our data are decimal coordinates, and subsequently we will use one degree cells.

cellsize <- 1

Note that this may not be the ideal approach for publication. The width of one degree cells decreases towards the poles, and in spatial analyses equal area grid cells are often preferred because they are more comparable. If we want equal area cells we first need to project our data into meters and then use a cellsize in meters (e.g. 100,000 for 100 x 100 km). There are R functions for such spatial projection, but we will simply use one degree cells here.

We make a list of all species and a list of all cells that occur in our dataset, naming the cells after their centres in the format "126.5:-46.5". I assume here that we have the data matrix called 'mydata' from the previous post, with the columns species, lat and long.

allspecies <- unique(mydata$species)

longrounded <- floor(mydata$long / cellsize) * cellsize + cellsize/2

latrounded <- floor(mydata$lat / cellsize) * cellsize + cellsize/2

cellcentre <- paste(longrounded,latrounded, sep=":")

allcells <- unique(cellcentre)

We create a matrix of species and cells filled with all zeroes, which means that the species does not occur in the relevant cell. Then we loop through all records to set a species as present in a cell if the coordinates of at least one of its records indicate such presence.

mynetw <- matrix(0, length(allcells), length(allspecies))
for (i in 1:length(mydata[,1]))
{
mynetw[ match(cellcentre[i],allcells) , match(mydata$species[i], allspecies) ] <- 1
}

It is also crucial to name the rows and columns of the network so that we can interpret the results of the Modularity Analysis.

rownames(mynetw) = allcells
colnames(mynetw) = allspecies

Now we come to the actual Modularity Analysis. We need to have the R library rnetcarto installed and load it.

library(rnetcarto)

The command to start the analysis is simply:

mymodules <- netcarto(mynetw, bipartite=TRUE)

This may take a bit of time, but after talking to colleagues who have got experience with other software it seems it is actually reasonably fast - for a Modularity Analysis.

Once the analysis is done, we may first wonder how many modules, which we will subsequently interpret as bioregions, the analysis has produced.

length(unique(mymodules[[1]]$module))

For publication we obviously want a decent map, but that is beyond the scope of this post. What follows is merely a very quick and dirty way of plotting the results to see what they look like, but of course the resulting coordinates and module numbers can also be used for fancier plotting. We split the latitudes and longitudes back out of the cell names, define a vector of colours to use for mapping (here thirteen; if you have more modules you will of course need a longer vector), and then we simply plot the cells like some kind of scatter plot.

allcells2 <- strsplit( as.character( mymodules[[1]]$name ), ":" )

allcells_x <- as.numeric(unlist(allcells2)[c(1:(length(allcells)))*2-1])

allcells_y <- as.numeric(unlist(allcells2)[c(1:(length(allcells)))*2])

mycolors <- c("green", "red", "yellow", "blue", "orange", "cadetblue", "darkgoldenrod", "black", "darkolivegreen", "firebrick4", "darkorchid4", "darkslategray", "mistyrose")

plot(allcells_x, allcells_y, col = mycolors[ as.numeric(mymodules[[1]]$module) ], pch=15, cex=2)

There we are. Modularity analysis with the R library rnetcarto is quite easy, the main problem was building the network.

As an example I have done an analysis with all Australian (and some New Guinean) lycopods, the dataset I mentioned in the previous post. It plots as follows.

There are, of course, a few issues here. The analysis produced six modules, but three of them, the green, orange and light blue ones, consist of only two, one and one cells, respectively, and they seem biologically unrealistic. They may be artifacts of not having cleaned the data as well as I would for an actual study, or represent some kind of edge effect. The remaining three modules are clearly more meaningful. Although they contain some outlier cells, we can start to interpret them as potentially representing tropical (red), temperate (yellow), and subalpine/alpine (dark blue) assemblies of species, respectively.

Despite the less than perfect results I hope the example shows how easy it is to do such a Modularity Analysis, and if due diligence is done to the spatial data, as we would do in an actual study, I would also expect the results to become cleaner.

Botany picture #256: Solenostemon presumably

2018-03-18T20:46:00.000+11:00

In spring we bought three types of Sempervivum (Crassulaceae) and planted them in a large bowl. Two little seedlings spontaneously came up in the succulent soil and, recognising them as members of my other favourite plant family Lamiaceae, I transferred them to a different pot where they would get more water.

I was curious to see what they would grow into - perhaps a useful aromatic herb? Well, they grew and grew and grew, but they did not flower until just now. Although it had become clear to me some time ago that they must be some kind of Solenostemon or relative and are presumably cultivated as ornamentals rather than as kitchen herbs I was hoping that they would at least have nice flowers. The reality, alas, is a bit of a let-down. Not terrible but not exactly stunning either. It is unlikely that they will survive winter anyway, as they are probably tropical plants.

In other news, Canberra was covered by dust blown in from western New South Wales today. The sky was of an otherworldly grey and only returned to its customary blue colour late in the afternoon.

Bioregionalisation part 5: Cleaning point distribution data in R

2018-03-17T17:15:00.000+11:00

I should finally complete my series on bioregionalisation. What is missing is a post on how to do a network (Modularity) analysis in R. But first I thought I would write a bit about how to efficiently do some cleaning of point distribution data in R. As often I write this because it may be useful to somebody who finds it via search engine, but also because I can then look it up myself if I need it after not having done it for months.

The assumption is that we start our spatial or biogeographic analyses by obtaining point distribution data by querying e.g. for the genus or family that we want to study on an online biodiversity database or aggregator such as GBIF or Atlas of Living Australia. We download the record list in CSV format and now presumably have a large file with many columns, most of them irrelevant to our interests.

One problem that we may find is that there are numerous cases of records occurring in implausible locations. They may represent geospatial data entry errors such as land plants supposedly occurring in the ocean, or vouchers collected from plants in botanic gardens where the databasers fo some reason entered the garden's coordinates instead of those of the source location , or other outliers that we suspect to be misidentifications. What follows assumes that this at least has been done already (and it is hard to automate anyway), but we can use R to help us with a few other problems.

We start up R and begin by reading in our data, in this case all lycopod records downloaded from ALA. (One of the advantages about that group is that very few of them are cultivated in botanic gardens, and I did not want to do that kind of data clean-up for a blog post.)

rawdata <- read.csv("Lycopodiales.csv", sep=",", na.strings = "", header=TRUE, row.names=NULL)

We now want to remove all records that lack any of the data we need for spatial and biogeographic analyses, i.e. identification to the species level, latitude and longitude. Other filtering may be desired, e.g. of records with too little geocode precision, but we will leave it at that for the moment. In my case the relevant columns are called genus, specificEpithet, decimalLatidue, and decimalLongitude, but that may of course be different in other data sources and require appropriate adjustment of the commands below.

rawdata <- rawdata[!(is.na(rawdata$decimalLatitude) | rawdata$decimalLatitude==""), ]
rawdata <- rawdata[!(is.na(rawdata$decimalLongitude) | rawdata$decimalLongitude==""), ]
rawdata <- rawdata[!(is.na(rawdata$genus) | rawdata$genus==""), ]
rawdata <- rawdata[!(is.na(rawdata$specificEpithet.1) | rawdata$specificEpithet.1==""), ]

All the records missing those data should be gone now. Next we make a new data frame containing only the data we are actually interested in.

lat <- rawdata$decimalLatitude
long <- rawdata$decimalLongitude
species <- paste( as.character(rawdata$genus), as.character(rawdata$specificEpithet.1, sep=" ") )
mydata <- data.frame(species, lat, long)
mydata$species <- as.character(mydata$species)

Unfortunately at this stage there are still records that we may not want for our analysis, but they can mostly be recognised by having more than the two usual name elements of genus name and specific epithet: hybrids (something like "Huperzia prima x secunda" or "Huperzia x tertia") and undescribed phrase name taxa that may or may not actually be distinct species ("Lycopodiella spec. Mount Farewell"). At the same time we may want to check the list of species in our data table with unique(mydata$species) to see if we recognise any other problems that actually have two name elements, such as "Lycopodium spec." or "Lycopodium Undesignated". If there are any of those, we place them into a vector:

kickout <- c("Lycopodium spec.", "Lycopodium Undesignated")

Then we loop through the data to get rid of all these problematic entries.

myflags <- rep(TRUE, length(mydata[,1]))
for (i in 1:length(myflags))
{
if ( (length(strsplit(mydata$species[i], split=" ")[[1]]) != 2) || (mydata$species[i]) %in% kickout )
{
myflags[i] <- FALSE
}
}
mydata <- mydata[myflags, ]

If there is no 'kickout' vector for undesirable records with two name elements, we do the same but adjust the if command accordingly to not expect its existence.

Check again unique(mydata$species) to see if the situation has improved. If there are instances of name variants or outdated taxonomy that need to be corrected, that is surprisingly easy with a command along the following lines:

mydata$species[mydata$species == "Outdatica fastigiata"] = "Valida fastigiata"

In that way we can efficiently harmonise the names so that one species does not get scored as two just because some specimens still have an outdated or misspelled name.

Although we assume that we had checked for geographic outliers, we may now still want to limit our analysis to a specific area. In my case I want to get rid of non-Australian records, so I remove every record outside of a box of 9.5 to 44.5 degrees south and 111 to 154 degrees east around the continent. Although it turns out that this left parts of New Guinea in that is fine with me for present purposes, we don't want to over-complicate this now.

mydata <- mydata[mydata$long<154, ]
mydata <- mydata[mydata$long>111, ]
mydata <- mydata[mydata$lat>(-44.5), ]
mydata <- mydata[mydata$lat<(-9.5), ]

At this stage we may want to save the cleaned up data for future use, just in case.

write.table(mydata, file = "Lycopodiales_records_cleaned.csv", sep=",")

And now, finally, we can actually turn the point distribution data into grid cells and conduct a network analysis, but that will be the next (and final) post of the series.

Reading The Varieties of Religious Experience: Lecture 2

2018-03-10T11:16:00.001+11:00

In his second lecture, James defines what he would 'religion' consider to be for the purposes of the lecture series.

He stresses right at the beginning that religion is such a complex phenomenon that anybody who thinks they can come up with a clear and simple definition is fooling themselves. He then mentions two aspects, the organisational structure (the church with its office holders and buildings) and the personal beliefs and feelings of each believer, and he excludes the former from consideration to focus his efforts on the latter.

That is unsurprising, given his psychological approach, and fair enough. A historian would perhaps be most comfortable addressing religion as an organised body while excluding personal psychology from their considerations. What I find interesting to observe, however, is that one aspect of religion as I see it is not even mentioned. To me, schools of thought that make truth claims, be they ideologies, religions, or scientific, philosophical, scholarly, and engineering communities, have three main components:

The people who adhere to the school of thought; they are the focus of James' lectures,
The institutional framework (research institutions, churches, political parties, think tanks, journals, internet fora, conferences, etc.); this James mentioned but excluded from consideration, and
The actual body of knowledge or belief system; it appears to remain unexamined so far.

Because 90% of the lectures are still to follow I don't want to dwell on this too much, but I find it interesting even at this stage that James appears curiously incurious about the first question that would come to my mind when faced with a school of thought: are its beliefs true? I guess I will see if he will go there later or if he will remain completely disinterested in that question throughout.

After having settled on the personal relationship of an individual human to the divine as his focus, James clarifies that believing in an actual personal god is not a criterion for him. He mentions 'Emersonianism' and Buddhism as examples of systems that work to produce religious feelings without having personalised deities. I had never heard of Emersonianism, but it appears to be a variant of pantheism, seeing the whole universe as divine and (believe it or not) benign.

Finally, James spends an astonishingly large part of his second lecture on discussing what mindsets he considers truly religious and what mindsets he does not. Again and again he negatively contrasts the philosophical, Stoicist acceptance of the way the world is with the Christian ideal of a joyous embrace of whatever happens, no matter how terrible. Although he sometimes calls the ascetic or highly spiritual Christian 'extreme', the language he uses leaves no doubt that he considers mindless exultation in the face of, say, seeing a loved one dying terribly to be an admirable state of mind, as evidence that religion is a positive force for humanity.

Again I hesitate to immediately reject his argumentation given how little I have progressed into this book, but even here I cannot help wonder if this view does not rely quite a bit of conflation of many different injustices or tribulations to which, really, we would be justified to react in very different ways. We are not merely talking about "the universe is unfair, and a truly wise person will accept that they can only do their best and be happier for it". No, depending on what we are talking about and if we assume gods to exist we may reasonably take very different stances - and I would actually say that religious bliss is the appropriate stance in none of the various cases.

We cannot always get all we wanted. Some things are unachievable, and sometimes we have to compromise with other people. Accepting that is just a sign of maturity. (Embracing such compromises joyously would seem to be a bit twee, though.)

Then there are the evils we do to each other, such as theft, bullying, rape, murder, etc. Really one of the most frustrating facets of human existence is how much needless misery we cause each other, both deliberately and accidentally, given that we would have quite enough misery left to deal with even if we were all perfectly nice to each other (see next point). Point is, in this case the perpetrators generally have a moral responsibility to do better, and joyously accepting their bad deeds is both unreasonable and counterproductive, as it will set perverse incentives and reward bad actors.

What James must really be talking about, however, would have to be 'natural evils', harm to us that is no other human's fault, everything ranging from having to die of old age across natural disasters to people being born with a genetic disorder. Under the (atheist) assumption that there is no god behind these phenomena, that they just happen, James' preferred stance of a joyous embrace would be ridiculous. Stoicist acceptance of what cannot be undone while trying one's best to undo these evils is a more sensible approach.

But what if we assume that natural evils are caused or at least allowed to happen by an omnipotent god who could, with the snap of their metaphorical finger, deliver us from such needless suffering? Does it make sense, under this assumption, to write, "dear superior intelligence running the universe, please accept my heartfelt thanks for making me slowly die of an untreatable, incredibly painful disease; and while on that topic, thanks also for that landslide that crushed my best friend when we were twelve years old"?

I can't say that this would feel sane to me. I would have some very serious questions about the moral character and motivations of such gods, if I believed for a moment that they existed. But then again, James acknowledges himself that there are some people who are unable to have religious feelings as he defined them. I assume I am one of those people, for better or for worse.

And note also that there are presumably many people who would consider themselves religious but who do not feel what James considers to be the religious impulse at its most pure.

Alpha diversity and beta diversity

2018-03-08T21:36:00.000+11:00

At today's journal club meeting, we discussed Alexander Pyron's opinion piece We don't need to save endangered species - extinction is part of evolution. I mentioned it in passing before and still think that his core argument, which is also reflected in the title, is logically equivalent to saying that murder is okay because all humans are going to die of natural causes one day anyway. But reading his piece more thoroughly than before, I now notice a few other, um, problems. The highlights:

Species constantly go extinct, and every species that is alive today will one day follow suit. There is no such thing as an "endangered species," except for all species.

What weirds me out here is the lack of a phylogenetic perspective in a piece written by a systematist - species are discussed as individuals that pop out of thin air and then disappear again. Of course, in the very long run every species will one day go extinct when the sun expands and boils off the oceans. But until then, in the time frame that Pyron discussed, no, not every species will go extinct, quite a few of them will diversify and survive as numerous descendant species, as did the ancestor of all land vertebrates or the ancestor of all insects in the past. They thus become effectively immortal (until, once more, the sun explodes anyway, etc.).

Yet we are obsessed with reviving the status quo ante. The Paris Accords aim to hold the temperature to under two degrees Celsius above preindustrial levels, even though the temperature has been at least eight degrees Celsius warmer within the past 65 million years. Twenty-one thousand years ago, Boston was under an ice sheet a kilometer thick. We are near all-time lows for temperature and sea level ; whatever effort we make to maintain the current climate will eventually be overrun by the inexorable forces of space and geology.

This is sadly a classic of climate change denialism. Yes, there was change in the past too, but there are some major differences. One is the rate of change - the impacts we are having are coming much faster than most natural changes (excepting e.g. meteorite strikes and similarly sudden events), so that animals and plants have less of a chance to migrate or to adapt than they had in past cycles of warm and ice ages. Second, they have even less of a chance to migrate because we have fragmented their available habitats by putting roads, towns, croplands and pastures into their way. Third, past changes did not affect a highly urbanised human population of more than seven billion people; the potential of global change producing catastrophic results even just for us is much greater now than when we were just a few million widely dispersed hunter-gatherers. So yes, it is true that we cannot freeze the status quo in place forever, but I think we would do well to slow the rate of change as far as possible.

Infectious diseases are most prevalent and virulent in the most diverse tropical areas. Nobody donates to campaigns to save HIV, Ebola, malaria, dengue and yellow fever, but these are key components of microbial biodiversity, as unique as pandas, elephants and orangutans, all of which are ostensibly endangered thanks to human interference.

I just don't even. What is the logic here? "Nobody cares about conserving diseases that horribly kill us humans, so we should not care about conserving harmless pandas either?" How does that follow?

And if biodiversity is the goal of extinction fearmongers, how do they regard South Florida, where about 140 new reptile species accidentally introduced by the wildlife trade are now breeding successfully? No extinctions of native species have been recorded, and, at least anecdotally, most natives are still thriving. The ones that are endangered, such as gopher tortoises and indigo snakes , are threatened mostly by habitat destruction. Even if all the native reptiles in the Everglades, about 50, went extinct, the region would still be gaining 90 new species -- a biodiversity bounty. If they can adapt and flourish there, then evolution is promoting their success. If they outcompete the natives, extinction is doing its job.

And this is perhaps what frustrates me most, because while this is not an uncommon argument against biosecurity measures one would expect a biologist to know about different types of biodiversity instead of confusing them. To explain more clearly what is going on, consider the following diagrams. First, we have three areas, roundland, squareland, and hexagonland, with two endemic species each.

Then humans recklessly move species between the areas, allowing them to invade each other's natural ranges. It turns out that three of the species are particularly competitive and prosper at the cost of the other three, driving them to extinction.

Now there are three types of diversity to consider. The first is alpha-diversity, which means simply the number of species in a given place. As we see it has gone up by 50% in all three areas, from two to three species. Yay, more diversity! This is what Pyron proudly points at in Florida.

What is lost, however, is beta-diversity or turnover, that is the heterogeneity you observe as you move between areas. It was very high originally, as every area had its unique species, but now it has been wiped out entirely. Beta-diversity in the second diagram is precisely zero. Under the first scenario a squarelander can go on a holiday trip to roundland and admire the unique flora of that part of the world; under the second scenario they will travel to roundland and merely see the same few weeds that they have growing in their own front yard back home. And the endemic plants of hexagonland have all gone extinct, a 100% loss of that area's irreplaceable evolutionary history.

(Note that beta-diversity would also be zero if all six species survived everywhere. But that is clearly not a realistic assumption, as it would require each area to have such a high carrying capacity that they should each have evolved more than two species to begin with. We would not expect that all the plant species of the world could survive next to each other in, say, Patagonia, even if they were all introduced there.)

Finally, in our example global diversity has of course also been reduced, by 50%. So yeah, great to have more alpha-diversity in Florida, but does that make up for a massive net loss in both beta-diversity and global diversity? The argument seems rather misguided.

Reading The Varieties of Religious Experience: Lecture 1

2018-03-04T17:37:00.001+11:00

I have started reading William James' The Varieties of Religious Experience. Published first in 1902, this collection of twenty lectures is considered to be a classic of the study of religion. It approaches the subject with a psychological as opposed to theological, historical, or apologetic angle, but appears to remain rather charitable towards religious beliefs.

This becomes clear already in the first lecture, much of which is spent assuring the believing reader that they have no reason to be offended by a psychological examination of religious experience.

James calls 'medical materialism' the idea that religion originated as the hallucinations and ravings of 'psychopaths' and 'degenerates' and can therefore be dismissed. (His words; see e.g. the interpretation of Saint Paul's vision of Jesus as the result of an epileptic seizure.) He argues that the value of a phenomenon, here religious truth claims, cannot be deduced from its origins; as an argumentum ad absurdum he points out that a scientific insight would be judged on its own merits even if the scientist who gained it was suffering from some mental disorder. By their fruits ye shall know them, not by their roots.

Well, fair enough, one might say. But while I cannot tell what the state of the discussion was around the year 1900, it seems as if this argument would miss the point of 'medical materialism' as it is applied today. Taking the position of an atheist, it is not the case that they attempt to answer the question of what to think of religious truth claims by looking at how they originated. They would most likely argue that that particular question has already been answered by applying the same criteria as James would (or at least the empirical one, see further down). They already take it as given that religious claims are largely false, and true only by lucky accident:

There is no evidence that there is something to us that lives on after death, and indeed the study of brain damages suggests that all there is to our personality is an emergent property of the physical. There is no evidence that the universe was created by a higher intelligence, and indeed it looks very much as if it wasn't. There is no evidence that the universe was created for our benefit, and indeed it looks very much as if it wasn't. There is no evidence that prayer works; and so on. There is also the small matter that hundreds of religions made and continue to make contradictory claims, meaning that only such a small percentage of them could be true as to be too close to zero percent to matter.

So given that background, the atheist now asks not what to think of a religious claim, but instead: How and why would people come up with something as wrong as that? And here hallucinations are a decent explanation for divine visions. That is why I feel that James' central argument in the first lecture misses its mark. But then again, he seemed to be more interested in reassuring religious readers than in criticising atheist ones anyway.

In this context it is also fascinating to examine what 'fruit' criteria James accepts as valid for judging spiritual and theological claims, now that he has rejected the 'root' criterion. He names three: immediate luminousness, philosophical reasonableness, and moral helpfulness.

Immediate luminousness is also described as based on 'our immediate feeling' upon being exposed to the claim. This seems rather oddly subjective and emotional, and at least in my eyes falls flat as a useful criterion.

Philosophical reasonableness is to be understood as based on how the claim relates to 'the rest of what we hold as true'. This is the most sensible of the three criteria, because that is also how we do it in science. If, for example, somebody presents us with the theories underlying homeopathy, such as water memory, we may consider in comparison what we believe we already understand about physics and chemistry. We then find that either large bodies of scientific knowledge supported by numerous experiments and empirical observations must all be utterly, mind-boggingly wrong, or that, alternatively, homeopathy must be nonsense. At this stage it should be easy to figure out which of the two options strains our credulity less.

Still, in the context of religious truth claims, this approach still appears unsatisfactory. How, after all, are any religious truth claims justified? If they are justified based on fitting into our body of scientific knowledge they are simply more scientific truth claims. If not, as of course they are, then each religion constitutes a network of beliefs that may (or may not) be internally consistent but that is completely unmoored from other such networks and from observable reality. The philosophical reasonableness criterion will have a Christian accept a vision of Jesus in heaven as true and reject a vision of the imminent death of the sun as false, and it will have a precolumbian Aztec reject the former as false and accept the latter as true, with exactly the same justification. How useful.

Finally, moral helpfulness suffers from exactly the same flaw as the previous does in a religious context. Unless the belief system is at some point anchored on empirical, observable reality, it is turtles all the way down.

Botany picture #255: Exocarpos nanus

2018-02-05T22:03:00.001+11:00

Currently we are back in Kosciusko National Park for field work, and for the first time I have consciously seen Exocarpos nanus (Santalaceae), although it is so tiny that I may have previously stepped onto it without noticing. Like its larger congeners it is a hemiparasite.

Bioregionalisation part 4: networks

2018-02-04T22:00:00.000+11:00

Having examined a clustering approach to bioregionalisation, today I will try to illustrate the increasingly popular alternative of network analysis.

Consider again our hypothetical study area of five cells with five taxa, where we want to know how to delimit bioregions (or phytoregions, given that the taxa are plant species) in an objective way:

The first step in the analysis is to interpret these data as a network. Specifically, as we have two different types of elements, what we are dealing with is called a bipartite network. Each type of element is connected directly only to elements of the other type, and to elements of its own type only via the other. In this case, the plant species are connected to all cells they occur in, and cells are connected to all plant species occurring in them:

Once we have scored this kind of network structure in a way that the software of our choice understands (either a list of connections or a matrix with 0s and 1s), we can use an algorithm that divides the network into modules. This algorithm tries to maximise connections within a module and to minimise the connections between modules, which in bioregion terms again means to maximise endemism.

As indicated in the posts on clustering, network analysis has the great advantage that it does not only produce groups, it also provides a reproducible and objective answer for the question about the optimal number of groups, whereas in clustering analysis the user still has to make a subjective decision.

That being said, it is always possible to take a large module by itself and explore its internal structure, if so desired, although of course the answer may be that there are no meaningful subdivisions any more.

Either way, any such algorithm will return modules, and what we are mostly interested in is what cells belong to what module. Nonetheless we would also be able to infer what species belong to what module, and depending on the type of network analysis we may be able to get other statistics that may be of interest for the network and for each individual module or even each element.

There are two main approaches to network analysis that have been explored in bioregionalisation. The first is called the Map Equation, developed by Rosvall et al. (2009) and promoted with a sleek, eponymous website. It was first applied to bioregionalisation by Vilhena & Antonelli (2015). One of its advantages is that it is the faster of the two, which may be particularly attractive if one's dataset is large and complex.

The second is Modularity Analysis (Newman, 2006). This is the approach that I prefer personally, after colleagues at my institution conducted a study comparing the two and clustering against each other (Bloomfield et al., 2017). It is slower than the Map Equation, but it seems to be better at recognising the transitional nature of cells situated between two 'pure' modules, which the Map Equation appears to tend to group into distinct modules in their own right.

Next time, how to do modularity analysis in practice.

References

Bloomfield NJ, Knerr N, Encinas-Viso F, 2017. A comparison of network and clustering methods to detect biogeographical regions. Ecography 41: 1-10.

Newman MEJ, 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, USA 103: 8577-8582.

Rosvall M, Axelsson D, Bergstrom CT, 2009. The map equation. arXiv: 0906.1405 [physics.soc-ph]

Vilhena DA, Antonelli A, 2015. A network approach for identifying and delimiting biogeographical regions. Nature Communications 6: 6848.