Friday, March 23, 2018

Science spammers constantly reaching for new lows

I received the following two messages on the same day, in fact they were sitting right next to each other in the inbox.

Surely this is sad. Is there no such thing as taking pride in one's work, even among spammers? Recently some people tried to defraud me, and of course that is annoying, but at least they put a lot of effort into it. I was impressed by how much information they had to accumulate to seem half-convincing. These guys, on the other hand, use such a simplistic bot to produce their mass-emails that it they are immediately recognisable as such.

Really the only thing sadder than these messages is that my spam filter is apparently still unable to understand that the keyword "greetings!!!!" is a certain indicator of spaminess.

Wednesday, March 21, 2018

Bioregionalisation part 6: Modularity Analysis with the R package rnetcarto

Today's final post in the bioregionalisation series deals with how to do a network or Modularity Analysis in R. There are two main steps here. First, because we are going to assume, as in the previous post, that we have point distribution data in decimal coordinates, we will turn them into a bipartite network of species and grid cells.

We start by defining a cell size. Again, our data are decimal coordinates, and subsequently we will use one degree cells.
cellsize <- 1
Note that this may not be the ideal approach for publication. The width of one degree cells decreases towards the poles, and in spatial analyses equal area grid cells are often preferred because they are more comparable. If we want equal area cells we first need to project our data into meters and then use a cellsize in meters (e.g. 100,000 for 100 x 100 km). There are R functions for such spatial projection, but we will simply use one degree cells here.

We make a list of all species and a list of all cells that occur in our dataset, naming the cells after their centres in the format "126.5:-46.5". I assume here that we have the data matrix called 'mydata' from the previous post, with the columns species, lat and long.
allspecies <- unique(mydata$species)

longrounded <- floor(mydata$long / cellsize) * cellsize + cellsize/2

latrounded <- floor(mydata$lat / cellsize) * cellsize + cellsize/2

cellcentre <- paste(longrounded,latrounded, sep=":")

allcells <- unique(cellcentre)
We create a matrix of species and cells filled with all zeroes, which means that the species does not occur in the relevant cell. Then we loop through all records to set a species as present in a cell if the coordinates of at least one of its records indicate such presence.
mynetw <- matrix(0, length(allcells), length(allspecies))
for (i in 1:length(mydata[,1]))
  mynetw[ match(cellcentre[i],allcells) , match(mydata$species[i], allspecies) ] <- 1
It is also crucial to name the rows and columns of the network so that we can interpret the results of the Modularity Analysis.
rownames(mynetw) = allcells
colnames(mynetw) = allspecies
Now we come to the actual Modularity Analysis. We need to have the R library rnetcarto installed and load it.
The command to start the analysis is simply:
mymodules <- netcarto(mynetw, bipartite=TRUE)
This may take a bit of time, but after talking to colleagues who have got experience with other software it seems it is actually reasonably fast - for a Modularity Analysis.

Once the analysis is done, we may first wonder how many modules, which we will subsequently interpret as bioregions, the analysis has produced.
For publication we obviously want a decent map, but that is beyond the scope of this post. What follows is merely a very quick and dirty way of plotting the results to see what they look like, but of course the resulting coordinates and module numbers can also be used for fancier plotting. We split the latitudes and longitudes back out of the cell names, define a vector of colours to use for mapping (here thirteen; if you have more modules you will of course need a longer vector), and then we simply plot the cells like some kind of scatter plot.
allcells2 <- strsplit( as.character( mymodules[[1]]$name ), ":" )
allcells_x <- as.numeric(unlist(allcells2)[c(1:(length(allcells)))*2-1])

allcells_y <- as.numeric(unlist(allcells2)[c(1:(length(allcells)))*2])

mycolors <- c("green", "red", "yellow", "blue", "orange", "cadetblue", "darkgoldenrod", "black", "darkolivegreen", "firebrick4", "darkorchid4", "darkslategray", "mistyrose")

plot(allcells_x, allcells_y, col = mycolors[ as.numeric(mymodules[[1]]$module) ], pch=15, cex=2)
There we are. Modularity analysis with the R library rnetcarto is quite easy, the main problem was building the network.

As an example I have done an analysis with all Australian (and some New Guinean) lycopods, the dataset I mentioned in the previous post. It plots as follows.

There are, of course, a few issues here. The analysis produced six modules, but three of them, the green, orange and light blue ones, consist of only two, one and one cells, respectively, and they seem biologically unrealistic. They may be artifacts of not having cleaned the data as well as I would for an actual study, or represent some kind of edge effect. The remaining three modules are clearly more meaningful. Although they contain some outlier cells, we can start to interpret them as potentially representing tropical (red), temperate (yellow), and subalpine/alpine (dark blue) assemblies of species, respectively.

Despite the less than perfect results I hope the example shows how easy it is to do such a Modularity Analysis, and if due diligence is done to the spatial data, as we would do in an actual study, I would also expect the results to become cleaner.

Sunday, March 18, 2018

Botany picture #256: Solenostemon presumably

In spring we bought three types of Sempervivum (Crassulaceae) and planted them in a large bowl. Two little seedlings spontaneously came up in the succulent soil and, recognising them as members of my other favourite plant family Lamiaceae, I transferred them to a different pot where they would get more water.

I was curious to see what they would grow into - perhaps a useful aromatic herb? Well, they grew and grew and grew, but they did not flower until just now. Although it had become clear to me some time ago that they must be some kind of Solenostemon or relative and are presumably cultivated as ornamentals rather than as kitchen herbs I was hoping that they would at least have nice flowers. The reality, alas, is a bit of a let-down. Not terrible but not exactly stunning either. It is unlikely that they will survive winter anyway, as they are probably tropical plants.

In other news, Canberra was covered by dust blown in from western New South Wales today. The sky was of an otherworldly grey and only returned to its customary blue colour late in the afternoon.

Saturday, March 17, 2018

Bioregionalisation part 5: Cleaning point distribution data in R

I should finally complete my series on bioregionalisation. What is missing is a post on how to do a network (Modularity) analysis in R. But first I thought I would write a bit about how to efficiently do some cleaning of point distribution data in R. As often I write this because it may be useful to somebody who finds it via search engine, but also because I can then look it up myself if I need it after not having done it for months.

The assumption is that we start our spatial or biogeographic analyses by obtaining point distribution data by querying e.g. for the genus or family that we want to study on an online biodiversity database or aggregator such as GBIF or Atlas of Living Australia. We download the record list in CSV format and now presumably have a large file with many columns, most of them irrelevant to our interests.

One problem that we may find is that there are numerous cases of records occurring in implausible locations. They may represent geospatial data entry errors such as land plants supposedly occurring in the ocean, or vouchers collected from plants in botanic gardens where the databasers fo some reason entered the garden's coordinates instead of those of the source location , or other outliers that we suspect to be misidentifications. What follows assumes that this at least has been done already (and it is hard to automate anyway), but we can use R to help us with a few other problems.

We start up R and begin by reading in our data, in this case all lycopod records downloaded from ALA. (One of the advantages about that group is that very few of them are cultivated in botanic gardens, and I did not want to do that kind of data clean-up for a blog post.)
rawdata <- read.csv("Lycopodiales.csv", sep=",", na.strings = "", header=TRUE, row.names=NULL)
We now want to remove all records that lack any of the data we need for spatial and biogeographic analyses, i.e. identification to the species level, latitude and longitude. Other filtering may be desired, e.g. of records with too little geocode precision, but we will leave it at that for the moment. In my case the relevant columns are called genus, specificEpithet, decimalLatidue, and decimalLongitude, but that may of course be different in other data sources and require appropriate adjustment of the commands below.
rawdata <- rawdata[!($decimalLatitude) | rawdata$decimalLatitude==""), ]
rawdata <- rawdata[!($decimalLongitude) | rawdata$decimalLongitude==""), ]
rawdata <- rawdata[!($genus) | rawdata$genus==""), ]
rawdata <- rawdata[!($specificEpithet.1) | rawdata$specificEpithet.1==""), ]
All the records missing those data should be gone now. Next we make a new data frame containing only the data we are actually interested in.
lat <- rawdata$decimalLatitude
long <- rawdata$decimalLongitude
species <- paste( as.character(rawdata$genus), as.character(rawdata$specificEpithet.1, sep=" ") )
mydata <- data.frame(species, lat, long)
mydata$species <- as.character(mydata$species)
Unfortunately at this stage there are still records that we may not want for our analysis, but they can mostly be recognised by having more than the two usual name elements of genus name and specific epithet: hybrids (something like "Huperzia prima x secunda" or "Huperzia x tertia") and undescribed phrase name taxa that may or may not actually be distinct species ("Lycopodiella spec. Mount Farewell"). At the same time we may want to check the list of species in our data table with unique(mydata$species) to see if we recognise any other problems that actually have two name elements, such as "Lycopodium spec." or "Lycopodium Undesignated". If there are any of those, we place them into a vector:
kickout <- c("Lycopodium spec.", "Lycopodium Undesignated")
Then we loop through the data to get rid of all these problematic entries.
myflags <- rep(TRUE, length(mydata[,1]))
for (i in 1:length(myflags))
  if ( (length(strsplit(mydata$species[i], split=" ")[[1]]) != 2) || (mydata$species[i]) %in% kickout )
    myflags[i] <- FALSE
mydata <- mydata[myflags, ]
If there is no 'kickout' vector for undesirable records with two name elements, we do the same but adjust the if command accordingly to not expect its existence.

Check again unique(mydata$species) to see if the situation has improved. If there are instances of name variants or outdated taxonomy that need to be corrected, that is surprisingly easy with a command along the following lines:
mydata$species[mydata$species == "Outdatica fastigiata"] = "Valida fastigiata"
In that way we can efficiently harmonise the names so that one species does not get scored as two just because some specimens still have an outdated or misspelled name.

Although we assume that we had checked for geographic outliers, we may now still want to limit our analysis to a specific area. In my case I want to get rid of non-Australian records, so I remove every record outside of a box of 9.5 to 44.5 degrees south and 111 to 154 degrees east around the continent. Although it turns out that this left parts of New Guinea in that is fine with me for present purposes, we don't want to over-complicate this now.
mydata <- mydata[mydata$long<154, ]
mydata <- mydata[mydata$long>111, ]
mydata <- mydata[mydata$lat>(-44.5), ]
mydata <- mydata[mydata$lat<(-9.5), ]
At this stage we may want to save the cleaned up data for future use, just in case.
write.table(mydata, file = "Lycopodiales_records_cleaned.csv", sep=",")
And now, finally, we can actually turn the point distribution data into grid cells and conduct a network analysis, but that will be the next (and final) post of the series.

Saturday, March 10, 2018

Reading The Varieties of Religious Experience: Lecture 2

In his second lecture, James defines what he would 'religion' consider to be for the purposes of the lecture series.

He stresses right at the beginning that religion is such a complex phenomenon that anybody who thinks they can come up with a clear and simple definition is fooling themselves. He then mentions two aspects, the organisational structure (the church with its office holders and buildings) and the personal beliefs and feelings of each believer, and he excludes the former from consideration to focus his efforts on the latter.

That is unsurprising, given his psychological approach, and fair enough. A historian would perhaps be most comfortable addressing religion as an organised body while excluding personal psychology from their considerations. What I find interesting to observe, however, is that one aspect of religion as I see it is not even mentioned. To me, schools of thought that make truth claims, be they ideologies, religions, or scientific, philosophical, scholarly, and engineering communities, have three main components:
  • The people who adhere to the school of thought; they are the focus of James' lectures,
  • The institutional framework (research institutions, churches, political parties, think tanks, journals, internet fora, conferences, etc.); this James mentioned but excluded from consideration, and
  • The actual body of knowledge or belief system; it appears to remain unexamined so far.
Because 90% of the lectures are still to follow I don't want to dwell on this too much, but I find it interesting even at this stage that James appears curiously incurious about the first question that would come to my mind when faced with a school of thought: are its beliefs true? I guess I will see if he will go there later or if he will remain completely disinterested in that question throughout.

After having settled on the personal relationship of an individual human to the divine as his focus, James clarifies that believing in an actual personal god is not a criterion for him. He mentions 'Emersonianism' and Buddhism as examples of  systems that work to produce religious feelings without having personalised deities. I had never heard of Emersonianism, but it appears to be a variant of pantheism, seeing the whole universe as divine and (believe it or not) benign.

Finally, James spends an astonishingly large part of his second lecture on discussing what mindsets he considers truly religious and what mindsets he does not. Again and again he negatively contrasts the philosophical, Stoicist acceptance of the way the world is with the Christian ideal of a joyous embrace of whatever happens, no matter how terrible. Although he sometimes calls the ascetic or highly spiritual Christian 'extreme', the language he uses leaves no doubt that he considers mindless exultation in the face of, say, seeing a loved one dying terribly to be an admirable state of mind, as evidence that religion is a positive force for humanity.

Again I hesitate to immediately reject his argumentation given how little I have progressed into this book, but even here I cannot help wonder if this view does not rely quite a bit of conflation of many different injustices or tribulations to which, really, we would be justified to react in very different ways. We are not merely talking about "the universe is unfair, and a truly wise person will accept that they can only do their best and be happier for it". No, depending on what we are talking about and if we assume gods to exist we may reasonably take very different stances - and I would actually say that religious bliss is the appropriate stance in none of the various cases.

We cannot always get all we wanted. Some things are unachievable, and sometimes we have to compromise with other people. Accepting that is just a sign of maturity. (Embracing such compromises joyously would seem to be a bit twee, though.)

Then there are the evils we do to each other, such as theft, bullying, rape, murder, etc. Really one of the most frustrating facets of human existence is how much needless misery we cause each other, both deliberately and accidentally, given that we would have quite enough misery left to deal with even if we were all perfectly nice to each other (see next point). Point is, in this case the perpetrators generally have a moral responsibility to do better, and joyously accepting their bad deeds is both unreasonable and counterproductive, as it will set perverse incentives and reward bad actors.

What James must really be talking about, however, would have to be 'natural evils', harm to us that is no other human's fault, everything ranging from having to die of old age across natural disasters to people being born with a genetic disorder. Under the (atheist) assumption that there is no god behind these phenomena, that they just happen, James' preferred stance of a joyous embrace would be ridiculous. Stoicist acceptance of what cannot be undone while trying one's best to undo these evils is a more sensible approach.

But what if we assume that natural evils are caused or at least allowed to happen by an omnipotent god who could, with the snap of their metaphorical finger, deliver us from such needless suffering? Does it make sense, under this assumption, to write, "dear superior intelligence running the universe, please accept my heartfelt thanks for making me slowly die of an untreatable, incredibly painful disease; and while on that topic, thanks also for that landslide that crushed my best friend when we were twelve years old"?

I can't say that this would feel sane to me. I would have some very serious questions about the moral character and motivations of such gods, if I believed for a moment that they existed. But then again, James acknowledges himself that there are some people who are unable to have religious feelings as he defined them. I assume I am one of those people, for better or for worse.

And note also that there are presumably many people who would consider themselves religious but who do not feel what James considers to be the religious impulse at its most pure.

Thursday, March 8, 2018

Alpha diversity and beta diversity

At today's journal club meeting, we discussed Alexander Pyron's opinion piece We don't need to save endangered species - extinction is part of evolution. I mentioned it in passing before and still think that his core argument, which is also reflected in the title, is logically equivalent to saying that murder is okay because all humans are going to die of natural causes one day anyway. But reading his piece more thoroughly than before, I now notice a few other, um, problems. The highlights:
Species constantly go extinct, and every species that is alive today will one day follow suit. There is no such thing as an "endangered species," except for all species.
What weirds me out here is the lack of a phylogenetic perspective in a piece written by a systematist - species are discussed as individuals that pop out of thin air and then disappear again. Of course, in the very long run every species will one day go extinct when the sun expands and boils off the oceans. But until then, in the time frame that Pyron discussed, no, not every species will go extinct, quite a few of them will diversify and survive as numerous descendant species, as did the ancestor of all land vertebrates or the ancestor of all insects in the past. They thus become effectively immortal (until, once more, the sun explodes anyway, etc.).
Yet we are obsessed with reviving the status quo ante. The Paris Accords aim to hold the temperature to under two degrees Celsius above preindustrial levels, even though the temperature has been at least eight degrees Celsius warmer within the past 65 million years. Twenty-one thousand years ago, Boston was under an ice sheet a kilometer thick. We are near all-time lows for temperature and sea level ; whatever effort we make to maintain the current climate will eventually be overrun by the inexorable forces of space and geology.
This is sadly a classic of climate change denialism. Yes, there was change in the past too, but there are some major differences. One is the rate of change - the impacts we are having are coming much faster than most natural changes (excepting e.g. meteorite strikes and similarly sudden events), so that animals and plants have less of a chance to migrate or to adapt than they had in past cycles of warm and ice ages. Second, they have even less of a chance to migrate because we have fragmented their available habitats by putting roads, towns, croplands and pastures into their way. Third, past changes did not affect a highly urbanised human population of more than seven billion people; the potential of global change producing catastrophic results even just for us is much greater now than when we were just a few million widely dispersed hunter-gatherers. So yes, it is true that we cannot freeze the status quo in place forever, but I think we would do well to slow the rate of change as far as possible.
Infectious diseases are most prevalent and virulent in the most diverse tropical areas. Nobody donates to campaigns to save HIV, Ebola, malaria, dengue and yellow fever, but these are key components of microbial biodiversity, as unique as pandas, elephants and orangutans, all of which are ostensibly endangered thanks to human interference.
I just don't even. What is the logic here? "Nobody cares about conserving diseases that horribly kill us humans, so we should not care about conserving harmless pandas either?" How does that follow?
And if biodiversity is the goal of extinction fearmongers, how do they regard South Florida, where about 140 new reptile species accidentally introduced by the wildlife trade are now breeding successfully? No extinctions of native species have been recorded, and, at least anecdotally, most natives are still thriving. The ones that are endangered, such as gopher tortoises and indigo snakes , are threatened mostly by habitat destruction. Even if all the native reptiles in the Everglades, about 50, went extinct, the region would still be gaining 90 new species -- a biodiversity bounty. If they can adapt and flourish there, then evolution is promoting their success. If they outcompete the natives, extinction is doing its job.
And this is perhaps what frustrates me most, because while this is not an uncommon argument against biosecurity measures one would expect a biologist to know about different types of biodiversity instead of confusing them. To explain more clearly what is going on, consider the following diagrams. First, we have three areas, roundland, squareland, and hexagonland, with two endemic species each.

Then humans recklessly move species between the areas, allowing them to invade each other's natural ranges. It turns out that three of the species are particularly competitive and prosper at the cost of the other three, driving them to extinction.

Now there are three types of diversity to consider. The first is alpha-diversity, which means simply the number of species in a given place. As we see it has gone up by 50% in all three areas, from two to three species. Yay, more diversity! This is what Pyron proudly points at in Florida.

What is lost, however, is beta-diversity or turnover, that is the heterogeneity you observe as you move between areas. It was very high originally, as every area had its unique species, but now it has been wiped out entirely. Beta-diversity in the second diagram is precisely zero. Under the first scenario a squarelander can go on a holiday trip to roundland and admire the unique flora of that part of the world; under the second scenario they will travel to roundland and merely see the same few weeds that they have growing in their own front yard back home. And the endemic plants of hexagonland have all gone extinct, a 100% loss of that area's irreplaceable evolutionary history.

(Note that beta-diversity would also be zero if all six species survived everywhere. But that is clearly not a realistic assumption, as it would require each area to have such a high carrying capacity that they should each have evolved more than two species to begin with. We would not expect that all the plant species of the world could survive next to each other in, say, Patagonia, even if they were all introduced there.)

Finally, in our example global diversity has of course also been reduced, by 50%. So yeah, great to have more alpha-diversity in Florida, but does that make up for a massive net loss in both beta-diversity and global diversity? The argument seems rather misguided.

Sunday, March 4, 2018

Reading The Varieties of Religious Experience: Lecture 1

I have started reading William James' The Varieties of Religious Experience. Published first in 1902, this collection of twenty lectures is considered to be a classic of the study of religion. It approaches the subject with a psychological as opposed to theological, historical, or apologetic angle, but appears to remain rather charitable towards religious beliefs.

This becomes clear already in the first lecture, much of which is spent assuring the believing reader that they have no reason to be offended by a psychological examination of religious experience.

James calls 'medical materialism' the idea that religion originated as the hallucinations and ravings of 'psychopaths' and 'degenerates' and can therefore be dismissed. (His words; see e.g. the interpretation of Saint Paul's vision of Jesus as the result of an epileptic seizure.) He argues that the value of a phenomenon, here religious truth claims, cannot be deduced from its origins; as an argumentum ad absurdum he points out that a scientific insight would be judged on its own merits even if the scientist who gained it was suffering from some mental disorder. By their fruits ye shall know them, not by their roots.

Well, fair enough, one might say. But while I cannot tell what the state of the discussion was around the year 1900, it seems as if this argument would miss the point of 'medical materialism' as it is applied today. Taking the position of an atheist, it is not the case that they attempt to answer the question of what to think of religious truth claims by looking at how they originated. They would most likely argue that that particular question has already been answered by applying the same criteria as James would (or at least the empirical one, see further down). They already take it as given that religious claims are largely false, and true only by lucky accident:

There is no evidence that there is something to us that lives on after death, and indeed the study of brain damages suggests that all there is to our personality is an emergent property of the physical. There is no evidence that the universe was created by a higher intelligence, and indeed it looks very much as if it wasn't. There is no evidence that the universe was created for our benefit, and indeed it looks very much as if it wasn't. There is no evidence that prayer works; and so on. There is also the small matter that hundreds of religions made and continue to make contradictory claims, meaning that only such a small percentage of them could be true as to be too close to zero percent to matter.

So given that background, the atheist now asks not what to think of a religious claim, but instead: How and why would people come up with something as wrong as that? And here hallucinations are a decent explanation for divine visions. That is why I feel that James' central argument in the first lecture misses its mark. But then again, he seemed to be more interested in reassuring religious readers than in criticising atheist ones anyway.

In this context it is also fascinating to examine what 'fruit' criteria James accepts as valid for judging spiritual and theological claims, now that he has rejected the 'root' criterion. He names three: immediate luminousness, philosophical reasonableness, and moral helpfulness.

Immediate luminousness is also described as based on 'our immediate feeling' upon being exposed to the claim. This seems rather oddly subjective and emotional, and at least in my eyes falls flat as a useful criterion.

Philosophical reasonableness is to be understood as based on how the claim relates to 'the rest of what we hold as true'. This is the most sensible of the three criteria, because that is also how we do it in science. If, for example, somebody presents us with the theories underlying homeopathy, such as water memory, we may consider in comparison what we believe we already understand about physics and chemistry. We then find that either large bodies of scientific knowledge supported by numerous experiments and empirical observations must all be utterly, mind-boggingly wrong, or that, alternatively, homeopathy must be nonsense. At this stage it should be easy to figure out which of the two options strains our credulity less.

Still, in the context of religious truth claims, this approach still appears unsatisfactory. How, after all, are any religious truth claims justified? If they are justified based on fitting into our body of scientific knowledge they are simply more scientific truth claims. If not, as of course they are, then each religion constitutes a network of beliefs that may (or may not) be internally consistent but that is completely unmoored from other such networks and from observable reality. The philosophical reasonableness criterion will have a Christian accept a vision of Jesus in heaven as true and reject a vision of the imminent death of the sun as false, and it will have a precolumbian Aztec reject the former as false and accept the latter as true, with exactly the same justification. How useful.

Finally, moral helpfulness suffers from exactly the same flaw as the previous does in a religious context. Unless the belief system is at some point anchored on empirical, observable reality, it is turtles all the way down.