Wednesday, August 30, 2017

Botany picture #251: Schizaea bifida


I have been meaning to post something more substantive, but very busy with other things. So here another plant, and a strangely looking one: Schizaea bifida (Schizaeaceae), a little fern from the Blue Mountains, 2011. Yes, except for the rootstock that is it, that's the whole fern. Some species of Schizaea have finely divided sterile leaves, but as the Flora of NSW Online remarks, in this species they are "rarely present", so what you see is a cluster of green lines with sporangia on top, the fertile leaves in all their splendour. But who is complaining? I love weird plants.

This was one of the plant groups I had only read about before coming to Australia, so when I saw it during a family holiday I was very happy. Sadly I have not again run into the genus since that one time six years ago.

Friday, August 25, 2017

Phylogenetic trees in XML format

I recently had the extremely frustrating experience of having had to look into how phylogenetic trees are coded in XML format. To illustrate why this was frustrating, let us start by considering a small phylogenetic tree as an example. I got one sequence each for the genera Nassauvia, Erigeron, Xerochrysum, Matricaria, Lactuca, Senecio, Ursinia, Calycera, Kippistia, and Synedrella from Genbank, and produced a likelihood tree in PAUP. In graphical representation it looks as follows.



So how is it generally saved? The most concise way of scoring a phylogenetic tree in plain text files is the venerable and widely accepted Newick standard. It consists of OTU names separated by commas and grouped into clades by round brackets. There may be numbers after colons, which are branch lengths, and if there is a number directly after a closing bracket it indicates some kind of support value, such as bootstrap or Bayesian posterior probability. The Newick representation of my little example tree is as follows.



Again, very concise. If we want just a tiny bit more bells and whistles we can use the Nexus format. In the context of phylogenetic trees it is just the Newick format plus "#nexus [line break] begin trees;" at the beginning and "end;" after the trees, and then each Newick tree has "tree [name of tree] =" in front of it and another semicolon at the end. The main advantage is that multiple trees in the same file can now have informative names, whereas in a Newick file they cannot.

If we want to find out how this would look in XML format, we can head over to the nexml.org website, where we will find an online tool that can transform our boring old Newick or Nexus trees into shiny, exciting, newfangled NeXML trees (for Nexus-inspired XML I guess, although as we will soon see there isn't really any similarity at all). Of course for this post I have done that with the example tree.



So, what do we see? As the name XML implies, the format is similar to HTML in that it consists largely of nested sets of tags starting with is-smaller-than signs and ending with is-larger-than signs. But those are just the optics. What about functionality?

As a Newick file, my phylogenetic tree was 448 bytes in size. After transformation into NeXML, the new tree file is now 2645 bytes in size, an increase by 490%. This has several obvious benefits in particular for the results of Bayesian analyses where thousands of trees have to be saved and may take up megabytes even in Newick format, for example I can't think of any right now.

And I am not even going to go into how NeXML scores data matrices beyond observing that it appears to require a tag assigning character type for every individual character. In other words, instead of saying something like "characters 1-9000 are anonymous genome-wide SNPs with the possible states 0, 1, 2 and ?", as in Nexus files, you would have 9000 lines of code (!) each saying "character 4306 is a SNP character" and then "character 4307 is a SNP character", and so on, wasting enormous amounts of disk space and/or bandwidth. Efficiency!

More generally, the structure of the tree coded as NeXML is extremely convoluted compared to what it looks like in Newick format. Newick is, as mentioned above, a set of nested brackets indicating clades; consequently it can be examined and read relatively easily, and even allows the user to copy subtrees in or out in manual editing (it helps if you have a text editor like SciTE that shows which brackets belong together). In fact I have often produced hypothetical example trees to illustrate a point on this blog by typing them out in Newick format and then opening them in a tree viewer. NeXML, however, has a list of nodes and edges that are referring to each other via obscure identifiers, making it virtually impossible to read, type out and edit manually, especially for larger trees. But I am sure XML makes life easier for the end user because please insert reasons here.

Next, imagine writing a program that should be able to read a phylogeny. If you want it to read a Newick tree, you merely need to parse nested brackets, recognise taxon names, and deal with branch length and support value annotations; this is relatively straightforward. If you want it to be able to read NeXML trees, on the other hand, it needs to be able to handle a large number of possible tags in varying order, plus various parameters in each tag that can appear in varying order (<node id="ne16" otu="ou27" label="Senecio_vulgaris"/> could just as well be <node otu="ou27" label="Senecio_vulgaris" id="ne16"/>, for example). This makes life easier for programmers because I'm sorry I really have no idea. But I mean, the nexml.org website says that this format is "more easily validated and processed", so that must be true, right? Otherwise they wouldn't claim so, would they?

While on the topic of phylogenetics software, to the best of my knowledge none of the programs that I currently use or have seriously used in the past can read or write phylogenies in XML format. BEAST, PAUP, and MrBayes produce Nexus files, TNT exports its own idiosyncratic format or Nexus, and RAxML produces Newick files. (BEAST uses famously convoluted XML input files, but even here the assumption is that most users import Nexus data matrices into the GUI BEAUTi. At any rate it does not save its output as NeXML.) Mesquite, which uses Nexus as its default format, is supposed to be able to export into NeXML format once we install a certain add-on library, but when I tried to do such a conversion I merely got an incomprehensible crash report.

Perhaps more to the point, if NeXML phylogenies produced by some obscure phylogenetics software that I never employ myself are supposed to be of use they have to be displayed, so how are we doing for tree viewers? The very popular cross-platform software FigTree expects Nexus or Newick phylogenies, and as far as I know the same is true for TreeView. DendroScope claims to read NeXML files but then only gave me an error message when I tried to import the simple example phylogeny after conversion by the official nexml.org website. To quote from that same website, "the future data exchange standard is here!"

While on that topic, standardisation is one of the main benefits claimed by NeXML or by XML more generally. As Simon St. Laurent wrote already in 1998:
XML allows developers to set standards defining the information that should appear in a document, and in what sequence. XML, in combination with other standards, makes it possible to define the content of a document separately from its formatting, making it easy to reuse that content in other applications or for other presentation environments. Most important, XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations without needing to pass through many layers of conversion.
I guess at this stage it should come as no surprise at all that there are already at least two different XML standards for phylogenetic trees, which is another way of saying that there is no XML standard for phylogenetic trees. In addition to NeXML, which I have discussed in detail above, there is phyloXML. Where NeXML describes trees using lists of nodes and edges phyloXML uses nested clade tags, which I find more intuitive and useful because it allows easier parsing and easier manual editing, and which is also more similar in spirit to Newick and Nexus and would thus be more deserving of a name like NeXML than NeXML. Otherwise it appears to be just as inefficient and convoluted though.

So concerning standardisation I guess the reality is that XML is flexible enough that anybody could come up with a new, XML-based standard. Just think of a few words, put is-smaller-than and is-larger-than signs around them, convince a handful of colleagues to adopt this standard, and off you go. Yes, if it is so easy to do then everybody will do it, and then we achieve the exact opposite of standardisation, but I guess that is where XML proponents can switch to touting its "flexibility". Heads XML wins, tails all other data standards lose.

As far as I can see Newick and Nexus work just fine. Compared to XML phylogenies they are easier to parse, are already standardised, are accepted by virtually every phylogenetics software and tree viewer, and take up a fraction of the disk space. Why fix what isn't broken?

Sunday, August 20, 2017

I still don't get area cladistics, and 'geographic paralogy' in particular

Since I started looking into panbiogeography and area cladistics, I have been curious about the concept of geographic paralogy. The word is used by area cladists (in the widest sense), and I have so far been doubtful about whether the analogy to gene paralogy fits.

To recap, area cladistics attempts to infer biogeographic area relationships from the patterns that species' areas of distribution show on a phylogenetic tree. If, for example, several plant or animal groups show distributions on a phylogeny that are ( Africa, ( South America , Australia ) ), i.e. sister lineages are endemic to South America and Australia, and more distantly related lineages are endemic to Africa, then an area cladist would conclude that South America and Australia are "more closely related" biogeographically than either is to Africa, or even that they form a "monophyletic biogeographic area".

Whatever that is supposed to mean, given that the word monophyletic only applies if we presuppose tree-like relationships. But I am getting ahead of myself.

The problem is now that phylogenies do not necessarily show such a simple pattern. Some species may be widespread and occur in several of the areas in the analysis, and of course the same area may occur repeatedly in different parts of the phylogeny. This is what area cladists call 'geographic paralogy', and they 'solve' the problem it poses for their analyses by selecting 'paralogy-free' subtrees from a phylogeny.

Again, two questions: Does it make sense to call this geographic paralogy, in analogy to gene paralogy? And does it make sense to do area cladistics by cherry-picking 'paralogy-free' subtrees, effectively ignoring these patterns?

I started a conversation with a colleague at the IBC, and he recommended I read Ladiges (1998, "Biogeography after Burbidge", Australian Systematic Botany 11: 231-242) as an introduction to the relevant concepts and approaches. So this I have now done. Unfortunately, the paper did not really solve my conceptual problems. I will start with a few quotes:
In cladistic biogeography, nodes of a cladogram for organisms (1,2 and 3) are potentially informative about the geographic areas (A, B and C) in which they occur: node 2 in Fig. 3 shows that areas B and C are related more closely to each other than to area A.

Such statements of relationship, the nodes of the cladogram, are explained by a variety of historical theories. One is dispersal from a restricted ancestral area, for example from area A to areas B and C, a pattern that may match fossil ages and distribution. An alternative explanation is vicariance of a widespread ancestral species coincident with physical breakup or climatic differentiation of the general area. A vicariance explanation is favoured by evidence of biogeographic congruence: finding the same pattern for other groups of organisms.
So far so good, although I do wonder whether the concept of area relationships makes sense if dispersal is the right answer. It seems to me that even calling it relationships only makes sense if there is no frequent floristic or faunal exchange, if near-everything is due to vicariance. And as I have mentioned before, there are good alternative explanations for congruence that do not imply vicariance, in particular prevailing directions of wind or ocean currents, common routes of migratory birds, etc.

Now come the complications:
Data for any one group of organisms are rarely as simple as the example shown (...). Some taxa are widespread, and some areas have more than one taxon. When combining data for different groups of organisms, not all areas are represented in each taxonomic group. Such complications are obstacles to development of analytical methods for determining area cladograms and general area cladograms.
Well yes, either that or, alternatively, they prove that the concept of an area cladogram is as incoherent as a 'species-level phylogeny' with only human populations as the terminals, and that the research program of area cladistics is a non-starter. Two pages on, the term at the centre of this post is introduced.
I offer two conclusions: (1) that evidence of historical geographic relationship is associated with nodes (not the distribution per se of terminal taxa) and (2) that some nodes of cladograms of organisms are paralogous. (...)

What is geographic paralogy? It is evidenced by duplication or overlap in geographic distribution of taxa related at a node (references). The term has its origin in molecular biology, geographic paralogy being analogous to gene duplication, with each gene copy subsequently tracking a separate evolutionary history.

(...) There is duplication of biogeographic regions across the clades (e.g. South America is in three), which is evidence of geographic paralogy. In other words, the major lineages shown in the cladogram existed prior to the breakup of Gondwana and each potentially reflects that geological history.
Consider what is claimed here. First, as we have seen earlier, simple area relationships that are congruent across lineages are claimed as evidence for vicariance. Now the fact that the same area shows up in several parts of a phylogeny is seen as evidence for paralogy; and this paralogy is also seen as evidence for vicariance and against dispersal. I cannot say that this makes a lot of sense to me.

Having gone through these quotes, I now want to carefully examine the analogy between gene paralogy and geographic paralogy. Let's start with the former. It works like this:



In this and the following figures, we see a grey species tree with species 1, 2 and 3. Within it we see the gene trees, as genes evolve inside the species. Here an originally single gene lineage (blue) was duplicated in the common ancestor of all three species, creating a red gene and a black gene. We now call the alleles A and Y paralogues of each other, because while they are distantly related they are not really the same gene anymore. In contrast, A and B are orthologues of each other. They are really the same gene, only in two different species.



The above figure now shows the problem that gene paralogy can cause in phylogeny reconstruction. If in this case Z is wrongly assumed to be an orthologue of A and B, we will infer the wrong species relationships, i.e. ((1,2),3) instead of the true (1,(2,3)). However, there are also other causes why we may get conflicting or complicated patterns.



In the above case we have the gene tree contradicting the species tree, but nonetheless there is no paralogy because there is only one gene involved. What has happened here is that two versions of the gene arose in an ancestral population, and that subsequent populations were large enough and/or speciation events happened so close after each other that both copies were carried through to the ancestor of 2 and 3. We call this incomplete lineage sorting (ILS) or ancestral polymorphism. We could also still find all gene variants in all three species. Point is, this is not paralogy.



Something different has happened in the above scenario. We get the same pattern of a gene tree showing ((1,2),3) despite the species phylogeny of (1,(2,3)), but this time because of a hybridisation or introgression event between 1 and 2. Of course, we could also still find the original gene variant in species 2 along with the introgressed one. Again, this is not paralogy.



Now the same for biogeography. Above the scenario where I think the analogy works: There are two clades that arose before continental breakup, and they both independently trace the 'area relationships'. In this case it makes sense to use the two clades or subtrees as independent data points for inference in area cladistics.



Here is the same problem for area cladistics as for phylogenetic inference. If we do not realise that we are treating paralogues as orthologues, we may get species phylogenies and, by analogy, area relationships wrong. So in the case of phylogenetics, people have developed methods for orthologue inference and to exclude paralogues from the data.

What I don't really see is how area cladists do the same. They claim they pick 'paralogue-free subtrees', but that merely means that they search for a statement like ((1,2),3) and remove statements like (1&2&3,(1&2,2&3)). It does  not mean that they actually have any way of recognising that ((1,2),3) is an instance of paralogy while (1,(2,3)) isn't. They can merely hope that it comes out in the wash because the true relationship will be more frequent than the wrong ones.

This appears to be rather problematic, unless I am missing something equivalent to orthology inference in phylogenetics. But on top of that we have the other scenarios, those where there really is no paralogy.



The above is the biogeographic equivalent of incomplete lineage sorting. We could imagine here that species C stayed endemic to a part of South America while its sister species was more widespread. If we now also had some species occurring in two areas, area cladists would speak of paralogous nodes, but again, there does not appear to be any paralogy involved.



But really crucial is the biogeographic parallel to gene introgression: dispersal. The above scenario shows what area cladists call paralogy and, as we saw in the quotes above, consider evidence of vicariance, but what reason is there to exclude dispersal as a possible explanation? This is, of course, precisely the pattern that dispersal would produce!

And it is clearly not in any way comparable to gene paralogy anyway, because there are no paralogues involved. It makes no sense to use a term that assumes the existence of two genes independently tracing the species phylogeny (and, by analogy, two species-lineages independently tracing 'area relationships') to refer to any difficult pattern, even where there are no such two deep species-lineages.

In summary, I am still not exactly convinced that area cladistics makes sense. The assumption that pretty much any pattern - congruence as well as the contradictory data from paralogy! - is evidence of vicariance seems particularly hard to swallow.

Tuesday, August 15, 2017

This is not how this works, religious freedom edition

Sometimes it is interesting what raises one's hackles. Today one could get upset about domestic terrorism in the USA or the level of discourse regarding dual citizenships in Australian politics, but what really annoyed me was an opinion piece on the ABC website with the title "same-sex marriage is more complex than the Yes campaign admits".

Basically, the author, one Peter Kurti identified as "a research fellow at the Centre for Independent Studies and the author of The Tyranny of Tolerance: Threats to Religious Liberty in Australia", argues that any changes to laws that will allow same-sex marriage will have to ensure that those who don't like that change can still discriminate against homosexuals:
Freedom of religion extends far beyond the walls of a church or a synagogue.

Schools, charities, and other faith-based not-for-profits, as well as ordinary business people such as bakers, florists, and photographers who wish to uphold the traditional meaning of marriage need to be protected from discrimination and attack if the law on marriage does change. [...]
   
If the law is eventually changed to allow same-sex couples to marry, it should not create an additional entitlement enabling some citizens to force other citizens to act against their religious beliefs or conscience by making them help celebrate same-sex marriages.
The usual disclaimer applies here as to how what I write is my private, non-professional opinion and not necessarily that of any other person or institution that I may be associated with, but I believe I am not stating anything particularly controversial or revolutionary when I now write:
This is not how freedom of religion works.

Let's consider how this would have to work in other cases, if it wasn't complete nonsense.
  • If the law is eventually changed to allow mixed-race couples to marry, it should not create an additional entitlement enabling some citizens to force other citizens to act against their religious beliefs or conscience by making them help celebrate mixed-race marriages.
  • If the law is eventually changed to allow women to seek employment, it should not create an additional entitlement enabling some citizens to force other citizens to act against their religious beliefs or conscience by making them hire women.
  • If the law is eventually changed to allow people to wear yellow shirts, it should not create an additional entitlement enabling some citizens to force other citizens to act against their religious beliefs or conscience by making them provide services to customers wearing yellow shirts.
The logic, and I am using this word loosely because any more appropriate alternative would be impolite, is exactly the same in all four cases. There is not one iota difference between them.

Freedom of religion, unless it is intended to destroy all personal freedom and tolerance, cannot mean that people get to discriminate against whatever they personally don't like. It means that they are allowed to follow their religious rules, for example by not marrying somebody of the same sex themselves, but it cannot mean that they are allowed to discriminate against others who do not follow those rules.

What really frustrates me is that this is not some random dude who has never looked into how rights and freedoms work and made some thoughtless off the cuff remark during lunch break. This is somebody who has carefully written an opinion piece and got it published by Australia's public broadcasting company on the strength of their authority as a scholar. I assume it would be silly to ask if the piece underwent peer review.

Monday, August 14, 2017

Botany picture #250: Aristolochia clematitis


Aristolochia clematitis (Aristolochiaceae), the only member of its genus in Germany, 2008. I rather like this genus with its weird flowers, but unfortunately I very rarely see them in the wild. It took me years before I ran into this one in Europe, and otherwise I have only seen the odd one or two species in South America. There are certainly none near where I live now.

Friday, August 11, 2017

Does philosophy produce knowledge?

Recently I jumped, rather rashly, into a discussion about the purpose of philosophy. On his blog, philosopher Daniel Kaufman commented on the suggestion by a different philosopher that philosophy PhD students should be banned from publishing and took the opportunity to argue that there were too many PhDs hunting too many jobs, partly because graduate students were exploited as cheap labour, everybody was publishing too much, many of the leading philosophers weren't doing enough undergraduate teaching, and the field had gone down a dangerous and misguided path by assuming that it was a knowledge-generating exercise like science.

In Kaufman's telling, (a) philosophy does not generate knowledge ("these are not the sorts of questions that will ever admit of conclusive answers") but its purpose is to enrich our lives, (b) universities should fund things that enrich our lives even when they do not generate knowledge, and (c) admitting that philosophy does not generate knowledge is the best strategy for minimising future funding cuts, while trying to play scientist will backfire.

I could clearly have expressed myself better, especially in my first comment, but my position is that (a) philosophy can generate knowledge, (b) I do not see why universities should teach things that are self-admittedly non-academic, and (c) I strongly doubt that pitching philosophy as intellectually futile is going to work in its favour.

Note that I am not saying in any way whatsoever that universities should only train people for jobs, quite the opposite. I am merely saying that they are there to produce, manage and transfer knowledge (e.g. history of music or theory of music), while mere amusement or practical skills (e.g. appreciating music or learning to play the piano) are better accommodated in other ways, for example by buying a music CD or paying a private piano teacher. I am also not saying that anything that doesn't produce, manage or transfer knowledge is useless, merely that such a non-academic activity could perhaps better be accommodated outside of the university.

The main point I want to discuss now is, however, the first: can philosophy produce knowledge? The example that I would like to use is that of divine command theory and the Euthyphro dilemma. It may be said that that is very low-hanging fruit, but well, if somebody wanted to show how science can produce knowledge they would also choose something simple like the shape of the earth as opposed to the minutiae of population genetics or quantum physics.

As most people will know, divine command theory is the claim that "what is moral is determined by what God commands, and that for a person to be moral is to follow his commands" (quoted from Wikipedia, 11 Aug 2017). As most people will also know, Plato challenged this idea with the Euthyphro dilemma, which in modern terms is perhaps best summarised as follows:

There are two possibilities. Either the gods command that some action is moral because it is moral by an independent, objective standard. If that is the case, then we can cut out the middleman and conclude that morality does not actually flow from the gods. Alternatively, whatever random thing the gods declare moral is moral merely because the gods say so. If that is how it works, what if the gods commanded you to torture an innocent person to death? Clearly the first option is incompatible with divine command theory, but if the alternative is accepted then the theory is shown to have absurd consequences.

Religious people have, of course, tried to find answers to this dilemma. They seem to fall mostly into two categories, either stating that god would never command something evil, which even if they do not realise it grants that there is an independent standard and thus divine command theory is false, or claiming that the answer is "both", that there is no dilemma. As I have written on this blog before, that latter rebuttal does not work because both you can't avoid a bad outcome by accepting two bad options. If a judge asks whether you have murdered your neighbour or whether you got him killed through recklessness replying "both, your honour" won't clear you either.

I would consequently argue that Plato's philosophy has in this case generated a piece of knowledge: divine command theory does not work. And it was generated through philosophy as opposed to science, as no empirical data were involved.

What possible objections could be raised?

First, this is merely what we might call 'negative' knowledge. We still don't know what to base our moral reasoning on, merely that we cannot base it on "but the gods said so". To this I would respond that there is a clear parallel in science, which can also test and reject hypotheses and models but only ever tentatively (!) keep the ones that are currently not superseded and rejected.

Second, it could be argued somebody could find a solution to the dilemma or perhaps already has found a solution to the dilemma. Again the parallel to science should be clear: knowledge is always tentative until somebody comes along and disproves it and/or suggests an even better idea. The fact that we are never omniscient does not mean that we are as ignorant after somebody has thought through a problem as we were before.

Third, it could be observed that there are still plenty of people working as philosophers who accept divine command theory. And once more I would like to point towards the example of science. There are plenty of creationists, even some (if few) biologists; does that mean that biology does not produce knowledge? The only difference is that even the professionals in philosophy share less consensus on what is right than professional biologists.

Here I would argue that how much of a consensus can be achieved in a field depends on two main factors: whether the knowledge produced by the field is important in some kind of practice, and whether there is a lot of motivation to continue accepting a falsehood. Engineering, for example, has immediate and crucial practical applications. If an engineer accepts nonsense, they may construct something that fails embarrassingly, and consequently engineers are very likely to reject nonsense in their field of expertise (this qualifier is obviously important).

Economists, on the other hand, work in a field where things do not just work or fail, but they generally work in favour of either this interest group or that interest group. Even if raising wages would be "better" for the economy as a whole, it might still not be in the interest of individual investors; and even if lowering wages would be "better" for the competitiveness of an economy, it might still not be in the interest of an individual employee who wants more money now. It therefore seems entirely unsurprising to me that there would be a lot of motivated reasoning in economics, making it hard to discard false beliefs.

Philosophy has no immediate applications on the lines of keeping a bridge up, but it certainly deals with a lot of questions that are dear to people or affect closely held beliefs, for example ethics or epistemology. It therefore seems entirely unsurprising to me that the field would have it harder to discard false beliefs than chemistry or geography, for example. The take-home message here is that individual practitioners disagreeing does not demonstrate that there is no knowledge to be had; it may merely indicate that some practitioners reject that knowledge due to personal biases.

In conclusion, I remain convinced that philosophy does, or at least can, generate knowledge. It does so, among other approaches, by thought experiment, showing claims to lead to absurd consequences, or showing claims to involve a self-contradiction. Much of that may be rejection as opposed to proving of claims, but again, science is also mostly rejection of false ideas. The (always tentative) understanding we have now is what remained after myriads of mistakes were corrected.

Wednesday, August 9, 2017

Discussions of diversity and equality are generally very depressing

Somebody at Google circulated an opinion piece on Google's diversity efforts, which was ultimately published by Gizmodo. A public discussion ensued. And as always with what is called "cultural" issues I find the way it goes very depressing. Perhaps surprisingly that is not because of some particularly backwards or intolerant position taken by this or that participant (although that too, see #6 below), but rather because much of what goes on in these kinds of discussions seems so futile.

One of the most fundamental problems is that there is not actually one controversy, there are numerous controversies going on at the same time, and people mix them all up. Just checking out two articles or posts and following their links to perhaps another three, it seems to me as if at least all of this is being discussed at the same time, in no particular order:

1. Whether there are psychological differences between men and women.

2. If such differences exist, to what degree they are genetic/developmental or socially conditioned.

3. Whether there are cognitive differences between men and women to the degree that the average man is objectively better at abstract problem solving and thus more suited for being a software engineer than the average woman.

4. Whether there are cognitive differences between men and women to the degree that the average man is objectively better at abstract problem solving than the average woman, but because software engineering is really a collaborative and thus people-oriented activity, at which women are said to excel, the average woman makes a better software engineer than the average man.

5. Whether different levels of representation of men and women in different fields of work are now largely due to job preferences as opposed to discrimination, meaning that trying to achieve parity in all fields is futile.

6. Whether women are, and I quote, "inferior" in sports. Yeah, I have no idea what that has to do with anything either, but I believe the choice of terminology speaks volumes.

7. Whether Google (and by extension many other companies) now has been captured by "the left" and has adopted "political correctness" to the degree that nobody dares to speak their mind for fear of being shamed, ostracised, and fired.

8. Whether Google was justified in firing the author of the memo for being disruptive and/or violating its code of conduct.

9. Whether circulating this memo to colleagues falls under the Free Speech guarantee of the US constitution.

And I am sure I have missed some. For what it is worth, the way I understand the original memo it was clumsily trying to argue mostly #5 and #7 and potentially #3, or at least it is widely read as arguing the latter.

In light of this it is unsurprising that so little is achieved and that so many people are at each others' throats. Of course there are many other topics where people will have heated discussions, but it is because their opinions differ very strongly (e.g. economic policy, environment, energy), not merely because they are completely talking past each other.

But with these equality / diversity issues I regularly see people go ballistic at each other who seem to pretty much agree on policy goals (e.g. better representation of currently underrepresented groups), general political outlook and acceptance of empirical reality (e.g. differences in mean innate cognitive abilities between groups of humans are negligible compared to variance within those groups) and should consequently be able to hash their differences out in a more rational manner.

One person says "maybe it is mostly job preferences now" but the other hears it as "I want to excuse under-payment and harassment of women"; or one person says "what he wrote could be read as if women don't make good engineers, and that creates a hostile work environment" and the other hears it as "nobody is allowed to have a different opinion than me; burn, heretic!" Makes me despair of political discourse.

Tuesday, August 8, 2017

Botany picture #249: Nothofagus of Patagonia


Deciduous Nothofagus (Fagaceae) trees near Puerto Blest, Argentina, in 2009. Or whatever the current genus name for this subgroup of southern beeches is after Nothofagus has been split up. In this case the reason for taxonomic changes was not phylogenetic systematics, because Nothofagus in its wider circumscription was also monophyletic. If I understand correctly, the idea was to make the age of the genera more comparable to Quercus, Fagus and suchlike. Either way, I like how this picture came out.

Sunday, August 6, 2017

Undergraduate resumes / CVs

I don't seem to have one of those files on my current computers any more, but I know that my CV as an undergraduate looked something like this:
Name
Address

Picture taken by professional photographer while I was wearing a formal jacket and perhaps a tie

Formation

Studying biology at [university], 1996 - now
Non-military service, 1995 - 1996
[Public grammar school] (high school & college in one), 1988 - 1997
[Yet another public school], 1986 - 1988
[Public primary school], 1982 - 1986

Undergraduate scholarship of [foundation], 1997 - now
And... that was that. Black on white, Times New Roman size 11 point, 1.2 line spacing, one page, easy to see all relevant information at a glance.

Now, an Australian undergraduate's CV today appears to look something like the following:
Name
Address, e-mail

Either no picture (which is what is expected in Australia) or a selfie taken at a party

Personal details

My name is [name], I am 24 years old and live in Woolalla, New South Wales. I am currently in my third year at Ned Kelly University studying a combined degree Bachelor of Science / Bachelor of Arts majoring in biology and journalism. I hope to pursue a career in science and apply what I learned in university to better the world.

Personal attributes

Effective communicator
Reliable and trustworthy
Ability to work in team as well as independently
Hard worker
Leadership skills demonstrated by frying burgers at McDonalds
Organisation talent demonstrated during waitressing by correctly taking customer's orders

Skills

Word, Excel, PowerPoint, Internet Explorer, Google

Employment history

Sales assistant at some supermarket, 2009 - now
Waitressing at Happy Hogan's bar, 2008-2012
Frying burgers at McDonalds, 2013 - now

Volunteer work and leadership

Friends of the State Zoo, 2011 - now
Church Youth, 2007 - 2009

Education

Ned Kelly University,  Bachelor of Science / Bachelor of Arts majoring in biology and journalism, 2014 - now
Catholic College of South-eastern Western North Sydney, 2012 - 2014
Little Sisters of Perpetual Misery Private Catholic High School, 2008 - 2012

Other activities

Raising money for YUZN charity
Wildlife rescue
Greening Australia
Surfing
Blood donor for Red Cross
Debate club

Achievements

Consistently excellent marks in university*
Award for high placement
Mentor for other students
Talent Award
Award for outstanding job as house head
President of debate club
Dean's letter of recommendation
They are often carefully formatted in a fancy sans serif font with about 50% white space, perhaps a red bar at the top or a blue bar along the left margin of the page. They are often three to four pages long.

A few thoughts. First, it is not as if we didn't have extracurricular activities and hobbies back then in Germany. It just would never have occurred to most of us that a potential employer or scholarship provider would care the least bit about our participation in a badminton club. And as far as I can tell they wouldn't have, and I certainly don't. This is wasted space that merely makes it harder to find the truly relevant information.

Second, the personal attributes also seem a bit pointless. Will anybody actually truthfully write "I am lazy" or "I am a poor communicator"? Presumably not, everybody will claim the positives, honestly or not. So this is wasted space that merely makes it harder to find the truly relevant information.

Third, I assume somebody tells Australian students to put all their work experience in there to demonstrate ... well, this is where it breaks down for me. That they will show up for work if you give them a contract? That's kind of a low hurdle to clear. But beyond that, how is flipping burgers or waiting tables a relevant qualification for a job or scholarship in science? I don't get it. This is wasted space that merely makes it harder to find the truly relevant information.

Fourth, all those achievements? When I was a school or university student in Germany, we did not have even just a tenth of those awards. Here half the students seem to have lists of awards that look seriously impressive; but given how many of them have lists like that I do wonder how easy they are to get. If there is no term like award inflation (in analogy to grade inflation) then we need to create it.

Of course, given the length of the time since I left I also wonder how German undergraduate students' CVs look these days. Do they now also mention every little thing they did, no matter how irrelevant to the job or scholarship they are applying for? Do they also now try to look as if they had been written by a graphic design graduate?

Footnote

*) From what I can tell the likelihood of somebody explicitly claiming to have consistently high marks in the achievement list seems negatively correlated with the actual quality of their marks. The people who actually have near-straight high distinctions tend to have only an understated line in the CV providing their point average.

Thursday, August 3, 2017

Basal and transitional taxa

Shortly before I left for China I received an alert on an interesting paper:

Bronzati M, 2017. Should the terms 'basal taxon' and 'transitional taxon' be extinguished from cladistic studies with extinct organisms? Palaeontologia Electronica 20.2.3E: 1-12.

As can be expected from this title, Bronzati argues that the terms are misleading and confusing, and that they should not be used. I find myself tending to disagree, at least in part, and not only because of an allergic reaction to being told what words I am not supposed to use because it might confuse 'the public' (cf. free will debate). Before I go over the arguments, however, I would like to clarify where I agree:

First, there are clearly cases where it would be desirable not to use a concept or term because it is really wrong or incoherent, and in some cases even because it is misleading. At the recent conference I flinched at a speaker who said "this individual is paraphyletic". Although I understand what they meant (the utterly trivial and commonplace observation that an individual had two different alleles at a gene locus they had sequenced) such a sentence is Not Even Wrong and has to be based on confusion about, well, pretty much everything that matters in molecular and phylogenetic systematics beyond perhaps how to hold a pipette the right way and click "run analysis" in a few programs. But it is not necessarily the case that the terms basal and transitional suffer from the same problems.

Second, I obviously agree that supraspecific taxa should be monophyletic.

Third, I also agree that evolution is not teleological (with a caveat I will go into below) and that terms such as primitive or advanced are to be avoided, in particular when talking about organisms that live(d) in the same time-slice. And in fact there are very few people left who still think that e.g. mosses are primitive compared to seed plants. Both lineages as they exist today have evolved for precisely the same time. The mosses are certainly not more primitive as mosses than seed plants would be as mosses, they just went completely elsewhere in terms of morphospace and adaptive peaks.

Evolution is a story not of progress but of diversification, and it only looks to us as if there was progress from morphospace position A to position B because life necessarily had to start in some position, and even after a pure random walk some extant organism may still (or again) occupy that starting position or something close to it. A good analogy I once read is to imagine a bunch of people all starting in front of a wall and then milling about aimlessly. Although their movement is random the group will still expand in one direction, away from the wall, because they cannot go in the opposite direction; conversely then, the fact that they are now further away from the wall does not mean that they meant to move in that direction specifically.

It is consequently important to keep in mind the "studies with extinct organisms" part of the paper title, because on the question of sorting extant organisms into a ladder of progress all competent evolutionary biologists are agreed anyway. Okay, but what now of extinct taxa, which had in their time not yet undergone the same amount of evolution as the taxa we have today? Are they basal or transitional to the latter?

Bronzati starts by examining whether basal taxa are those that are older than the non-basal ones and observes that fossil ages do not necessarily reflect the ages of the lineages they belong to. He suggests instead to use "'early' and 'late' in an explicit comparative framework". That is very clear, but I do not think that this is how the word basal is meant by most people anyway, and it is certainly not how I would use it. As Bronzati soon observes himself, "'basal' is a relative term regarding the base of the tree" and thus refers to the relative age of lineages, not to the age of fossils.

I am not quite sure I understand the next part, where he writes that "different people certainly have different assumptions of what a 'basal' taxon is" and discusses whether something outside of clade A can be a basal A or not. I'd say not, but again, I think basal is a relative term along a tree topology and not something that I would use in this way.

Now Bronzati turns to the question that I consider the most relevant: "Basal taxa are closer to the root" - precisely that is how I understand the word - "but how to measure it?" But this is also where I think the argumentation becomes a bit odd, because he argues against the use of the term by comparing apples and oranges, and then throwing incomplete sampling into the mix. This will now need an illustration. Consider the following phylogeny:



Bronzati argues against the use of 'basal' by looking at species A, which people would supposedly (?) consider to be basal because they read the tree like a ladder from left to right, and then observing that this species is actually more distant from the root in terms of internal tree nodes than species B. I hope the problem is immediately obvious: species A is not the unit we would be talking about when saying "more basal than B". Would anybody ever actually say that A is basal in the tree? It is clearly fairly nested. Instead, the only use of such terminology that makes sense would be to say that the entire genus Ales (the red box) is basal in the entire family also consisting of the other genera Beles, Celes and Deles (the other three boxes), or more basal than Beles.

And this is where I am willing to be convinced otherwise but at the moment happy to continue using the term basal: if and only if we are talking about the branching order along a phylogeny backbone, along a grade. I will be the first to agree that all supraspecific ranks are arbitrary, but we also have to appreciate that we are using them, or alternatively unranked clade names, nonetheless. This is not so much about evolutionary theory as about having at our disposal non-atrocious language to describe a tree topology. When talking about these genera, what is so problematic about saying "Ales is basal in its family" compared with "Ales is sister to the rest of its family"? At least in my eyes the two statements are equivalent and neither is more misleading than the other. Making it about species A feels like a red herring.

And this is then also all that needs to be said about sampling, because it is based on a similar argument. Bronzati describes a hypothetical tree of all dinosaurs with all of the huge bird clade represented only by the chicken and then jokes he "would hope that no one would suggest that the bird is a basal dinosaur ... based on the number of intervening nodes to the root". No, I don't think anybody would. But maybe we would say the birds (!) are. If, hypothetically, part of the topology were (birds, (dinos2, (dinos3, dinos4) ) ) then yes, I would not have any problem saying that the birds as a whole are more basal in the tree than that other named clade dinos2, for example, because the birds as a whole are quite simply branching off one more inclusive ancestor closer to the root than dinos2.

What is really puzzling to me is that Bronzati himself makes the same point two paragraphs later: "it is not terminal taxa (...) that can be 'more basal' in relation to other terminal taxa, but the nodes (i.e. hypothetical ancestors) of the tree in relation to other nodes".

Concluding his discussion of basal 'basal', Bronzati examines the question whether basal taxa have more plesiomorphic traits and concludes no, but again this is based on considering in isolation a very derived descendant of the entire clade I would call 'basal'.

He then turns to the term 'transitional'. Here he appears to make two main arguments against its use. First, that evolution has no goal, and second, that phylogenetic trees are branching diagrams instead of ladders.

I have already mentioned above that I agree completely that evolution is non-teleological, but with one caveat, which is this: lineages may discover, for the first time, a new peak in the adaptive landscape, and when that happens we can expect them to evolve up that peak, so that earlier forms would be more poorly adapted to the new situation than their later descendants. Bronzati himself mentions the colonisation of dry land, focusing of course on vertebrates, his specialty. Using the group that I am more familiar with as an example, it seems clear that the early vascular plants started out without roots, and that the lineages that descended from them evolved roots because having those was a pretty good idea on dry land. In fact there are none left that are primarily without roots, presumably because they were out-competed (although there are a few secondary losses under unusual circumstances, e.g. Cuscuta).

I would argue that this, and only this, and only along a time axis, is where we can perhaps meaningfully speak of primitive and advanced, but that is not even the point here, because the term we are dealing with is transitional. More important seems the second argument. Yes, phylogenetic trees are branching diagrams, but they do not merely consist of terminals, they also consist of hypothetical ancestors. It is a bit unclear to me where Bronzati stands on the question of those; on page 6 of his paper, as mentioned, he talks about hypothetical ancestors himself, but here he spends considerable time arguing in a way that suggests that he does not want to identify actual species or fossils as ancestral:
It is important to stress that the absence of autapomorphies in taxa [sic] B does not indicate that it is transitional between A and C-F. Firstly, this might be just a reflex [sic] of the lack of ability to translate different morphologies into phylogenetic characters. Furthermore, the study of living species shows us that even if there is no recognisable morphological difference between [sic], they can differ at the genetic level.
Of course they can, but remains unclear to me what should keep us from tentatively concluding that some fossil may represent an ancestor until we get additional evidence that shows otherwise, just like pretty much every other conclusion in science is also tentative. And if we have a presumed ancestor we can say that it is transitional between an even earlier presumed ancestor and descendants further down the line. There is no teleology involved here, but the internal nodes of a phylogeny can indeed be read as a ladder of ancestor-descendant relationships.



I am sorry to say I just don't see the problem here either.

Bronzati ends with making four recommendations:

Tree toplogies should be described with sister-group statements, avoiding terms like basal or early diverging. My concern is that this will lead to very ugly and repetitive language when describing anything but a very small phylogeny: "our results indicate that A is sister to the rest of the study group. B is then sister to the rest of that rest, and then C is sister to the rest of that rest we just mentioned; now D is sister to the rest of that last rest ...", and so on for another four clades. That is just not very aesthetic. So why not a much more concise "the earliest diverging lineage is A, followed by B, C, and D"?

Instead of calling a terminal taxon a basal member of clade A, we should say it is a non-A member of the next larger named clade around it, as in non-avian dinosaurs. That makes sense, but again, I would never have used basal for a deeply nested terminal anyway but only to discuss the relative position of several clades along a grade.

We should say "this taxon fills a gap in the fossil record" instead of "this taxon is transitional". As mentioned above, I don't see it, perhaps because I have a different approach to internal nodes and species without autapomorphies.

Finally, we should avoid teleological language. No disagreement from me on this one!