Maybe I missed something, but I only noticed today that the super-fast likelihood phylogenetics software RAxML can use four different models of sequence evolution.
I seem to remember (?) that it could only do the GTR model. Perhaps they added the three others in a newer version and I missed the change while I was busy using BEAST and otherwise doing non-phylogenetic research? Or did I just never notice before?
Anyway, there is the choice between GTR, JC, K80, and HKY, so the most parameter-rich model, the most parameter-poor, and the two standard models with different rates transitions and transversions. Which is good, because I have a tendency to get GTR or HKY suggested for the data I usually use. It seems, however, as if it is not possible to specify different models for the various parts of the partition.
Anyway, I have accordingly updated the posts on what number of schemes to test in jModelTest and on what models are implemented in the various phylogenetics programs I am familiar with.
Wednesday, November 30, 2016
Tuesday, November 29, 2016
Cladistics textbook
In my office I have two 'proper' phylogenetics textbooks, that is counting only those that cover the principles and theory as opposed to offering only a practical how-to manual. One is Felsenstein's, who is strongly associated with likelihood phylogenetics, although his book covers all approaches. The second is:
Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.
As the title implies, it is entirely about parsimony phylogenetics.
Having recently looked into Kitching et al., I noticed two short sections that I found interesting enough to discuss here. I will start with the question of ancestors. Proponents of paraphyletic taxa often make claims on the lines of cladists "not accepting the existence of ancestral species", of "ignoring ancestors", or of "treating all species as sister taxa".
Here now we have a textbook written by cladists, in other words the official version, to the degree that an official version exists. It is, of couse, not as easy as that because the only thing that unites cladists in the sense of what paraphylists argue against is that supraspecific taxa should be monophyletic. Many other details differ from cladist to cladist, and in the sense of what paraphylists argue against the concept of cladist includes those who use e.g. Bayesian phylogenetics.
I also do not want to give the impression that I, personally, take what Kitching et al. promote on this or that detailed question to necessarily be The Correct View. It is well possible that I, a cladist, find myself in disagreement with some chapter of that textbook. I am not even arguing here, in this instance, that making taxa monophyletic is the way to go (although of course I do believe that).
No, the point of the post is merely this: if Kitching et al. argue not-XYZ, then this demonstrates decisively that any claim of all cladists arguing XYZ is nonsense.
So, about ancestors, and turning to page 14 of the textbook:
It should be clear that the above section is correct. An ancestral species would not have any systematically useful characters relative to its descendants, because that descendant clade would have started out as that species. My view - and here other cladists may differ - is actually that the ancestral species and the clade are one and the same. The ancestral species has over time turned (diversified) into the clade.
Note also that at least the cladists who wrote the textbook do not have any problem with paraphyletic species. Whether we think that this use of the word paraphyletic makes sense or not (as do I), it is discussions like this one which make me groan in frustration whenever I read a paraphylist claim that cladists only accepted paraphyletic species as a cop-out once they could no longer deny that they existed. No, cladism was founded on the principle that monophyly applies above the species level, so it never had to backpedal like that.
It is as if the people who claim that cladists do not accept the existence of ancestors haven't even bothered to figure out what any cladists really think.
Next time I will look at a short section of the textbook that I definitely disagree with.
Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.
As the title implies, it is entirely about parsimony phylogenetics.
Having recently looked into Kitching et al., I noticed two short sections that I found interesting enough to discuss here. I will start with the question of ancestors. Proponents of paraphyletic taxa often make claims on the lines of cladists "not accepting the existence of ancestral species", of "ignoring ancestors", or of "treating all species as sister taxa".
Here now we have a textbook written by cladists, in other words the official version, to the degree that an official version exists. It is, of couse, not as easy as that because the only thing that unites cladists in the sense of what paraphylists argue against is that supraspecific taxa should be monophyletic. Many other details differ from cladist to cladist, and in the sense of what paraphylists argue against the concept of cladist includes those who use e.g. Bayesian phylogenetics.
I also do not want to give the impression that I, personally, take what Kitching et al. promote on this or that detailed question to necessarily be The Correct View. It is well possible that I, a cladist, find myself in disagreement with some chapter of that textbook. I am not even arguing here, in this instance, that making taxa monophyletic is the way to go (although of course I do believe that).
No, the point of the post is merely this: if Kitching et al. argue not-XYZ, then this demonstrates decisively that any claim of all cladists arguing XYZ is nonsense.
So, about ancestors, and turning to page 14 of the textbook:
In fact, to date, Archaeopteryx has no recognized autapomorphies. Indeed, if there were, Achaeopteryx would have to be placed as the sister-group to the rest of the birds.It does not matter here whether more recent analyses have demonstrated Archaeopteryx to have autapomorphies and to actually have been a side branch relative to modern birds. We should here simply think of any species that looks exactly like the ancestral species of a later-existing clade is inferred to have looked like.
It should be clear that the above section is correct. An ancestral species would not have any systematically useful characters relative to its descendants, because that descendant clade would have started out as that species. My view - and here other cladists may differ - is actually that the ancestral species and the clade are one and the same. The ancestral species has over time turned (diversified) into the clade.
In terms of unique characters, Archaeopteryx simply does not exist. This is absurd, for its remains have been excavated and studied. To circumvent this logical dilemma, cladists place likely ancestors on the cladogram as the sister-group to their putative descendants and accept that they must be nominal paraphyletic taxa (Fig. 1.9c). Ancestors, just like paraphyletic taxa in general, can only be recognized by a particular combination of characters that they have and characters that they do not have. The unique attribute of possible ancestors is the time at which they lived.Here is the reason why paraphylists complain about ancestors being treated as sister to their descendants: they are treated like that, crucially, so that we can do the analysis. It is a practical, not a philosophical reason.
Note also that at least the cladists who wrote the textbook do not have any problem with paraphyletic species. Whether we think that this use of the word paraphyletic makes sense or not (as do I), it is discussions like this one which make me groan in frustration whenever I read a paraphylist claim that cladists only accepted paraphyletic species as a cop-out once they could no longer deny that they existed. No, cladism was founded on the principle that monophyly applies above the species level, so it never had to backpedal like that.
After a cladistic analysis has been completed the cladogram may be reinterpreted as a tree (see below)What they mean here is that they see a cladogram as such (merely) as a different visualisation of the data from the data matrix, while the "tree" is the cladogram's interpretation in terms of evolutionary relationships, of actual genealogical relatedness of the terminals.
and at this stage some palaeontologists may choose to recognize these paraphyletic taxa as ancestors, particularly when they do not overlap in time with their putative descendants (see Smith 199a for a discussion).And this is the main point. Here we have a group of senior cladists who wrote, to put it in the simplest possible terms, "we need to treat every species as a terminal to get a cladogram, but then if you wish you can interpret a terminal without autapomorphies as an ancestor".
It is as if the people who claim that cladists do not accept the existence of ancestors haven't even bothered to figure out what any cladists really think.
Next time I will look at a short section of the textbook that I definitely disagree with.
Thursday, November 24, 2016
The political system of the USA
So, about that recent election. I am not an American, so I don't actually have a horse in that race except to the degree that everybody will be impacted by what one of the most influential nations on the planet decides to do.
I don't really want to discuss party politics on this blog either, so what I will focus on is simply what I consider to be certain systematic issues with how elections work in the USA. The point is, as far as I can tell the system is built so that it systematically favours conservatives, whether intentionally or not.
A major concept here is Gerrymandering.
In case that isn't clear what Gerrymandering is, imagine a political system in which the seats in parliament are given to people representing individual electoral districts as opposed to nation-wide party lists. So if you win the plurality of the votes in a district, or perhaps the majority after resolving preferences or a run-off election, you get its seat in parliament. Imagine further you have two electoral districts in your town with a hundred voters each and where voters favour the two major parties as follows*: the Yellows always get 45 votes, the Reds 55. Both seats go to the Reds.
Now assume that the Yellows have control of the state government and can redraw the district boundaries. They cleverly manage to redistrict the voters as follows: one district now has Yellows 55 votes versus Reds 45, and the other makes up the difference with Yellows 35 versus Reds 65. Eh voilĂ , the Yellows have won one seat in parliament without convincing a single voter to switch allegiance. The trick is to concentrate your opponent's voters in a few districts that are super-overwhelmingly safe for them while giving yourself lots of narrower margins.
Now coming to the USA, which of course have district-based representation as opposed to proportional representation.
First, the Electoral College. This is the most obvious and widely discussed, as there have now been two elections within twenty years in which conservative candidates won despite losing the popular vote. If election of presidents had been direct, their opponents would have carried the day. That being said, however, the Electoral College is probably the least Gerrymandered of all bodies, not least because it cannot possibly have been done deliberately. State boundaries are just what they are, they do not get redistricted easily. Still, I assume that urbanisation has an effect here. Many Americans are concentrated in a very small number of states, in huge metropolitan areas, which are strongly leaning progressive. Most states are rural, rural voters lean conservative, and consequently the Electoral College leans conservative.
(By the way, I find it extremely bizarre how these discussions go on American websites. On many sites I have read in the last two weeks there will be commenters who complain about the Electoral College not reflecting the popular vote. And then there will always be somebody replying to the effect of "it is doing exactly what it was meant to do, that is stopping a minority [of states] from dominating the majority [of states]". This is weird, isn't it? There are two main political camps; either the first "dominates" the second, or the second "dominates" the first. You cannot say that when the first doesn't happen but the second does there is suddenly no "dominating" going on. So why should the majority of states count more than the majority of voters? At a minimum I would need more explanation here than is usually forthcoming...)
Second, the US Senate, the upper house of the US Parliament, which has a lot of power. It consists of two senators from each state. Immediately the situation should become clear: the Senate is Gerrymandered by default. Most states are rural, rural voters lean conservative, consequently the Senate leans conservative.
Third, the states. The same principle applies. The majority of states is rural, rural voters lean conservative, and consequently conservative politicians control a majority of the states.
But that was just the districting; there are other factors.
Fourth, for some strange reason US citizens are not automatically registered for voting, they have to make a deliberate effort to become registered voters. People who have time on their hands will have an easier time doing that. People who have time on their hands are, in particular, pensioners and the independently wealthy, while the working poor will have it harder. Old and wealthy people lean conservative, consequently registered voters will lean conservative.
Fifth, out of ancient tradition the USA have elections on Tuesdays, a working day. This makes it much easier for pensioners and the independently wealthy to participate in elections, whereas the working poor will find it harder to take one of their very few days of leave to stand in a queue for voting. Old and wealthy people lean conservative, consequently voters will lean conservative.
Sixth, it is my understanding that US citizens have a lot more elections than the citizens of most other countries. They elect people for offices that are filled without formal election campaigns elsewhere, such as judges, sheriffs, school boards, etc. This means that participating in democracy is much more time-consuming for US citizens, and will easily lead to voting fatigue. The people who have lots of time to deal with all that are, in particular, pensioners and the independently wealthy. Old and wealthy people lean conservative, consequently voters will lean conservative.
Now I have read those who argue that this is simply the system that exists, so progressives will have to learn how to win elections in that system instead of e.g. whining about the unfair Electoral College. Fair enough. What is more, it works well for the conservatives, and again, I am not even an American. It just kind of seems to me, personally, that the point of a democracy is that the outcome of an election should kind of reflect the popular will. The Americans will have to know themselves what they want, and there does not appear to be any interest in change. Myself, I prefer proportional representation, party-independent committees drawing district boundaries, automatic voter registration, voting on a Sunday, and fewer elections with much shorter campaigns so that there is less voting fatigue. It seems to work well in many countries. Just my two cents.
Footnote
*) Of course, one of my major concerns with district-based parliaments is that they distort the popular will even without any Gerrymandering whatsoever. If a smaller party gets 20% in each district of the country they will still get 0% of the seats in parliament, a situation that completely disenfranchises one in five voters.
I don't really want to discuss party politics on this blog either, so what I will focus on is simply what I consider to be certain systematic issues with how elections work in the USA. The point is, as far as I can tell the system is built so that it systematically favours conservatives, whether intentionally or not.
A major concept here is Gerrymandering.
In case that isn't clear what Gerrymandering is, imagine a political system in which the seats in parliament are given to people representing individual electoral districts as opposed to nation-wide party lists. So if you win the plurality of the votes in a district, or perhaps the majority after resolving preferences or a run-off election, you get its seat in parliament. Imagine further you have two electoral districts in your town with a hundred voters each and where voters favour the two major parties as follows*: the Yellows always get 45 votes, the Reds 55. Both seats go to the Reds.
Now assume that the Yellows have control of the state government and can redraw the district boundaries. They cleverly manage to redistrict the voters as follows: one district now has Yellows 55 votes versus Reds 45, and the other makes up the difference with Yellows 35 versus Reds 65. Eh voilĂ , the Yellows have won one seat in parliament without convincing a single voter to switch allegiance. The trick is to concentrate your opponent's voters in a few districts that are super-overwhelmingly safe for them while giving yourself lots of narrower margins.
Now coming to the USA, which of course have district-based representation as opposed to proportional representation.
First, the Electoral College. This is the most obvious and widely discussed, as there have now been two elections within twenty years in which conservative candidates won despite losing the popular vote. If election of presidents had been direct, their opponents would have carried the day. That being said, however, the Electoral College is probably the least Gerrymandered of all bodies, not least because it cannot possibly have been done deliberately. State boundaries are just what they are, they do not get redistricted easily. Still, I assume that urbanisation has an effect here. Many Americans are concentrated in a very small number of states, in huge metropolitan areas, which are strongly leaning progressive. Most states are rural, rural voters lean conservative, and consequently the Electoral College leans conservative.
(By the way, I find it extremely bizarre how these discussions go on American websites. On many sites I have read in the last two weeks there will be commenters who complain about the Electoral College not reflecting the popular vote. And then there will always be somebody replying to the effect of "it is doing exactly what it was meant to do, that is stopping a minority [of states] from dominating the majority [of states]". This is weird, isn't it? There are two main political camps; either the first "dominates" the second, or the second "dominates" the first. You cannot say that when the first doesn't happen but the second does there is suddenly no "dominating" going on. So why should the majority of states count more than the majority of voters? At a minimum I would need more explanation here than is usually forthcoming...)
Second, the US Senate, the upper house of the US Parliament, which has a lot of power. It consists of two senators from each state. Immediately the situation should become clear: the Senate is Gerrymandered by default. Most states are rural, rural voters lean conservative, consequently the Senate leans conservative.
Third, the states. The same principle applies. The majority of states is rural, rural voters lean conservative, and consequently conservative politicians control a majority of the states.
But that was just the districting; there are other factors.
Fourth, for some strange reason US citizens are not automatically registered for voting, they have to make a deliberate effort to become registered voters. People who have time on their hands will have an easier time doing that. People who have time on their hands are, in particular, pensioners and the independently wealthy, while the working poor will have it harder. Old and wealthy people lean conservative, consequently registered voters will lean conservative.
Fifth, out of ancient tradition the USA have elections on Tuesdays, a working day. This makes it much easier for pensioners and the independently wealthy to participate in elections, whereas the working poor will find it harder to take one of their very few days of leave to stand in a queue for voting. Old and wealthy people lean conservative, consequently voters will lean conservative.
Sixth, it is my understanding that US citizens have a lot more elections than the citizens of most other countries. They elect people for offices that are filled without formal election campaigns elsewhere, such as judges, sheriffs, school boards, etc. This means that participating in democracy is much more time-consuming for US citizens, and will easily lead to voting fatigue. The people who have lots of time to deal with all that are, in particular, pensioners and the independently wealthy. Old and wealthy people lean conservative, consequently voters will lean conservative.
Now I have read those who argue that this is simply the system that exists, so progressives will have to learn how to win elections in that system instead of e.g. whining about the unfair Electoral College. Fair enough. What is more, it works well for the conservatives, and again, I am not even an American. It just kind of seems to me, personally, that the point of a democracy is that the outcome of an election should kind of reflect the popular will. The Americans will have to know themselves what they want, and there does not appear to be any interest in change. Myself, I prefer proportional representation, party-independent committees drawing district boundaries, automatic voter registration, voting on a Sunday, and fewer elections with much shorter campaigns so that there is less voting fatigue. It seems to work well in many countries. Just my two cents.
Footnote
*) Of course, one of my major concerns with district-based parliaments is that they distort the popular will even without any Gerrymandering whatsoever. If a smaller party gets 20% in each district of the country they will still get 0% of the seats in parliament, a situation that completely disenfranchises one in five voters.
Monday, November 21, 2016
Just have to share my astonishment here
This morning over breakfast I read an article in the Canberra Times. When I had finished, I first scrolled up again to make sure that I had not accidentally opened The Onion or, perhaps more likely, its Australian counterpart The Shovel. But no, it was indeed the Canberra Times. Then I thought hard if I had somehow missed that it was 1 April, but again, no such luck.
The article in question?
Housing affordability in Canberra: Renting is the ACT's 'biggest issue'. It argues that rents are so high in Canberra that people cannot save up enough to buy property, which is fair enough, ... using as its only example and case study a 23 year old student to whom, and I cite, "the great Australian dream" (of owning a house) "seems just that - a dream".
Maybe I am just weird, but when I was a 23 year old undergraduate back in Germany I would not have had the money to buy a house either. I lived off a mixture of a small competitive stipend, money earned from teaching assistantships, and my parents topping up the rest. Life was nonetheless good, as the student canteen was cheap and rents reasonable. But if I had started whinging about not being able to buy a house my friends and family would have given me a lot of side-eye, to put it mildly.
I would also argue that at 23 I was not mature enough to take on this responsibility, and I think I would have said so myself, even then. It was a time of learning, of studying, of first figuring out where I want to go with my life.
Which brings up another point. After finishing my studies and doctorate in that town I moved to a different state of the same country; two years later I moved to a different country on the same continent; and nearly one and a half years after that I moved to the other side of the planet. And really something like that was to be expected, given the way the job market in science works. So even if I had been able to afford a house I would not have wanted to buy one until I was settled. Yes, I guess there are some undergrads who study economics or law and then get into a company or public service in their home town, but that cannot be assumed to be a given.
Don't get me wrong, housing is expensive in Canberra. And clearly there must be some up-bidding of prices going on, because looking at quality and size the flats and houses are objectively not worth what they are going for, so the article seems to have got that right. I am forty now, and if we were to describe our fantastic, pie in the sky dream it would be to one day be able to afford a small two bedroom flat with a little courtyard or, if that is impossible, at least a balcony. A house is totally out of the question. This just for context - and note that I am not depressed about it. Billions of people on this planet live happy, productive and fulfilled lives while renting.
But apparently somebody at the newspaper seems to think that the average 23 year old (!) student (!) is expected to be able to buy and own a house. Further, that one's main goal in life, this "great Australian dream", cannot possibly wait until the old age of, I dunno, thirty, but has to achieved before even having finished education. Somebody looked at this article and went, yes, that looks sensible, let's click "publish". I am really, really astonished.
And I am eagerly awaiting to see the next article in the series, "Marriage prospects in Canberra: how a nine year old girl despairs of ever finding Mr Right".
The article in question?
Housing affordability in Canberra: Renting is the ACT's 'biggest issue'. It argues that rents are so high in Canberra that people cannot save up enough to buy property, which is fair enough, ... using as its only example and case study a 23 year old student to whom, and I cite, "the great Australian dream" (of owning a house) "seems just that - a dream".
Maybe I am just weird, but when I was a 23 year old undergraduate back in Germany I would not have had the money to buy a house either. I lived off a mixture of a small competitive stipend, money earned from teaching assistantships, and my parents topping up the rest. Life was nonetheless good, as the student canteen was cheap and rents reasonable. But if I had started whinging about not being able to buy a house my friends and family would have given me a lot of side-eye, to put it mildly.
I would also argue that at 23 I was not mature enough to take on this responsibility, and I think I would have said so myself, even then. It was a time of learning, of studying, of first figuring out where I want to go with my life.
Which brings up another point. After finishing my studies and doctorate in that town I moved to a different state of the same country; two years later I moved to a different country on the same continent; and nearly one and a half years after that I moved to the other side of the planet. And really something like that was to be expected, given the way the job market in science works. So even if I had been able to afford a house I would not have wanted to buy one until I was settled. Yes, I guess there are some undergrads who study economics or law and then get into a company or public service in their home town, but that cannot be assumed to be a given.
Don't get me wrong, housing is expensive in Canberra. And clearly there must be some up-bidding of prices going on, because looking at quality and size the flats and houses are objectively not worth what they are going for, so the article seems to have got that right. I am forty now, and if we were to describe our fantastic, pie in the sky dream it would be to one day be able to afford a small two bedroom flat with a little courtyard or, if that is impossible, at least a balcony. A house is totally out of the question. This just for context - and note that I am not depressed about it. Billions of people on this planet live happy, productive and fulfilled lives while renting.
But apparently somebody at the newspaper seems to think that the average 23 year old (!) student (!) is expected to be able to buy and own a house. Further, that one's main goal in life, this "great Australian dream", cannot possibly wait until the old age of, I dunno, thirty, but has to achieved before even having finished education. Somebody looked at this article and went, yes, that looks sensible, let's click "publish". I am really, really astonished.
And I am eagerly awaiting to see the next article in the series, "Marriage prospects in Canberra: how a nine year old girl despairs of ever finding Mr Right".
Thursday, November 17, 2016
Clock rooting with strong rate shifts - or not
Today at our journal club we discussed Schmitt 2016, "Hennig, Ax, and present-day mainstream cladistics, on polarising characters", published in Peckiana 11: 35-42.
The point of the paper is that early phylogeneticists discussed various ways of figuring out character polarity (i.e. which character state is ancestral and which is derived) first and then using that inference to build a phylogeny, whereas today nearly everybody does the phylogeny building first and then uses outgroup rooting to polarise the resulting network and infer character polarity.
And... that's it, really. There does not appear to be any clear call to action, although one would have expected something on the lines of "and this is bad because...". The paper does end with an exhortation to use more morphological characters instead of only molecular data, and then there is language that may be meant to identify the author as a proponent of paraphyletic taxa without making it explicit (anagenesis!), but neither of those two conclusions appear to be to the point. There is no actual way forward regarding the question of how to polarise characters without outgroup rooting.
The approaches discussed in the paper are the following:
Palaeontological precedence. The character state appearing first in the fossil record is the ancestral one. The problem is, this only works if we assume that the fossil record is fairly complete.
Chorological progression. The character state found more frequently near the edges of a range is the derived one, whereas the ancestral state dominates at the centre of origin. Problem, this is circular because we first need to figure out where the centre of origin is. I am not too convinced of the principle either.
Ontological precedence. Because organisms cannot completely retool their developmental pathways but only change through (1) neoteny or (2) attaching steps to the end of the process, the earlier states in ontogeny are the ancestral ones. The author mentions the problem of a scarcity of ontological data; I might add that this shows a bit of a zoological bias, as it will rarely work in plants and presumably never in microorganisms.
Correlation of transformation series. I must admit I don't quite understand the logic here, and the author isn't very impressed by it either.
Comparison with the stem lineage of the study group. The state found in the ancestral lineage is ancestral. This if very obviously circular, because we would need to know the phylogeny first, and being able to infer that was the whole point of polarising the character.
Ingroup comparison. The state that is more frequent in the study group is ancestral. I see no reason to assume that this is always true, as there can be shifts in diversification rates.
Finally, outgroup comparison. The state that is found in the closest relative(s) of the study group is ancestral in the study group. It is perhaps not totally correct to call this circular, but it has something of turtles all the way down: to find out what the closest relative of your study group is you need to polarise the larger group around it, and then you have the same problem. Still this is the most broadly useful of all these approaches.
Polarising a phylogeny and polarising characters are two sides of the same coin. I have written a thorough post on the former before, which regularly seems to be found by quite a few people doing Google searches. I hope it is still useful. One of the ways I mentioned there for giving the stack of turtles something to stand on is clock rooting, and I found it surprising that the present paper did not mention it at all. It was this, however, that our journal club discussion dwelt on for quite some time.
Admittedly said discussion was a bit meandering, but here are a few thoughts:
The big problem with clock rooting is that it will be thrown off if there are strong rate shifts. Imagine that the true phylogram consists of two sister groups, one with very long branches (short-lived organisms) and the other with very short branches (their long-lived relatives). If we apply a molecular clock model to the phylogenetic analysis, e.g. in MrBayes, it will try to root the tree so that the branches all end at about the same level, the present. The obvious way to do it is to root the tree within the long-branch group. Eh voilĂ , it has rooted incorrectly.
What to do about this?
The first suggestion was to use an outgroup. In my admittedly limited experience that doesn't work so well. It seems that if the rate shift is strong enough the analysis will simply attach the outgroup to the ingroup in the wrong place.
The next idea was to use a very relaxed clock model, in particular the random local clock model available in BEAST (unfortunately not in MrBayes). But then it was called nice in theory but said to make it hard to achieve stationarity of the MCMC run. I cannot say.
Nick Matzke suggested that a better model could be developed. The hope is that this would allow the analysis to figure out what is going on, recognise the rate shift in the right place, and then root correctly. It would have to be seen how that would work, but at the moment something like that does not appear to be available.
Finally, another colleague said that if the clock models don't work then simply don't use them. Well, but what if we need a time-calibrated phylogeny, a chronogram, to do our downstream analyses, as in biogeographic modelling, studies of diversification rates, or divergence time estimates?
I guess the only way I can think of at the moment is to infer a phylogram whose rooting we trust and then turn it into a chronogram while maintaining topology, as with the software r8s. Maybe there are other ways around the rooting issue with clock models, but I am not ware of them.
The point of the paper is that early phylogeneticists discussed various ways of figuring out character polarity (i.e. which character state is ancestral and which is derived) first and then using that inference to build a phylogeny, whereas today nearly everybody does the phylogeny building first and then uses outgroup rooting to polarise the resulting network and infer character polarity.
And... that's it, really. There does not appear to be any clear call to action, although one would have expected something on the lines of "and this is bad because...". The paper does end with an exhortation to use more morphological characters instead of only molecular data, and then there is language that may be meant to identify the author as a proponent of paraphyletic taxa without making it explicit (anagenesis!), but neither of those two conclusions appear to be to the point. There is no actual way forward regarding the question of how to polarise characters without outgroup rooting.
The approaches discussed in the paper are the following:
Palaeontological precedence. The character state appearing first in the fossil record is the ancestral one. The problem is, this only works if we assume that the fossil record is fairly complete.
Chorological progression. The character state found more frequently near the edges of a range is the derived one, whereas the ancestral state dominates at the centre of origin. Problem, this is circular because we first need to figure out where the centre of origin is. I am not too convinced of the principle either.
Ontological precedence. Because organisms cannot completely retool their developmental pathways but only change through (1) neoteny or (2) attaching steps to the end of the process, the earlier states in ontogeny are the ancestral ones. The author mentions the problem of a scarcity of ontological data; I might add that this shows a bit of a zoological bias, as it will rarely work in plants and presumably never in microorganisms.
Correlation of transformation series. I must admit I don't quite understand the logic here, and the author isn't very impressed by it either.
Comparison with the stem lineage of the study group. The state found in the ancestral lineage is ancestral. This if very obviously circular, because we would need to know the phylogeny first, and being able to infer that was the whole point of polarising the character.
Ingroup comparison. The state that is more frequent in the study group is ancestral. I see no reason to assume that this is always true, as there can be shifts in diversification rates.
Finally, outgroup comparison. The state that is found in the closest relative(s) of the study group is ancestral in the study group. It is perhaps not totally correct to call this circular, but it has something of turtles all the way down: to find out what the closest relative of your study group is you need to polarise the larger group around it, and then you have the same problem. Still this is the most broadly useful of all these approaches.
Polarising a phylogeny and polarising characters are two sides of the same coin. I have written a thorough post on the former before, which regularly seems to be found by quite a few people doing Google searches. I hope it is still useful. One of the ways I mentioned there for giving the stack of turtles something to stand on is clock rooting, and I found it surprising that the present paper did not mention it at all. It was this, however, that our journal club discussion dwelt on for quite some time.
Admittedly said discussion was a bit meandering, but here are a few thoughts:
The big problem with clock rooting is that it will be thrown off if there are strong rate shifts. Imagine that the true phylogram consists of two sister groups, one with very long branches (short-lived organisms) and the other with very short branches (their long-lived relatives). If we apply a molecular clock model to the phylogenetic analysis, e.g. in MrBayes, it will try to root the tree so that the branches all end at about the same level, the present. The obvious way to do it is to root the tree within the long-branch group. Eh voilĂ , it has rooted incorrectly.
What to do about this?
The first suggestion was to use an outgroup. In my admittedly limited experience that doesn't work so well. It seems that if the rate shift is strong enough the analysis will simply attach the outgroup to the ingroup in the wrong place.
The next idea was to use a very relaxed clock model, in particular the random local clock model available in BEAST (unfortunately not in MrBayes). But then it was called nice in theory but said to make it hard to achieve stationarity of the MCMC run. I cannot say.
Nick Matzke suggested that a better model could be developed. The hope is that this would allow the analysis to figure out what is going on, recognise the rate shift in the right place, and then root correctly. It would have to be seen how that would work, but at the moment something like that does not appear to be available.
Finally, another colleague said that if the clock models don't work then simply don't use them. Well, but what if we need a time-calibrated phylogeny, a chronogram, to do our downstream analyses, as in biogeographic modelling, studies of diversification rates, or divergence time estimates?
I guess the only way I can think of at the moment is to infer a phylogram whose rooting we trust and then turn it into a chronogram while maintaining topology, as with the software r8s. Maybe there are other ways around the rooting issue with clock models, but I am not ware of them.
Sunday, November 13, 2016
Black Mountain plants
Although it was windy and cool we went for a walk on Black Mountain Nature Reserve. It is interesting how completely different its flora is compared against that of the Mount Majura - Mount Ainslie range. For example, Black Mountain has many species of orchids, the other two have only very few. Apparently the difference is to a large degree one of soil chemistry, but past land-use is also said to have differed.
Telstra Tower atop Black Mountain, as seen from near the Australian National Botanic Garden's nursery.
Poranthera microphylla, a tiny plant that is widespread and common in south-eastern Australia, but presumably often overlooked. It was traditionally considered to be a member of the spurge family but apparently now belongs to the Phyllanthaceae.
And here is a representative of the type genus of the Phyllanthaceae, Phyllanthus hirtellus. In this case the plant is larger, a dwarf shrub, but the flowers are still minuscule.
Finally, beautiful Grevillea alpina (Proteaceae), or at least so I hope. There is another rather similar species of the same genus in the area, but it is supposed to have glabrous tepals.
Telstra Tower atop Black Mountain, as seen from near the Australian National Botanic Garden's nursery.
Poranthera microphylla, a tiny plant that is widespread and common in south-eastern Australia, but presumably often overlooked. It was traditionally considered to be a member of the spurge family but apparently now belongs to the Phyllanthaceae.
And here is a representative of the type genus of the Phyllanthaceae, Phyllanthus hirtellus. In this case the plant is larger, a dwarf shrub, but the flowers are still minuscule.
Finally, beautiful Grevillea alpina (Proteaceae), or at least so I hope. There is another rather similar species of the same genus in the area, but it is supposed to have glabrous tepals.
Tuesday, November 8, 2016
Botany picture #237: Allium karataviense
As mentioned with previous botany pictures I really like the onion genus. The above picture, which I took in 2008 either in the botanic garden of Halle or in Gatersleben, Germany, shows one of the many impressive Asian species, Allium karataviense. Like several others of its large-headed relatives it is an ornamental species, not so much a kitchen herb.
Saturday, November 5, 2016
New Laptop, and how to get science / phylogenetics crucial software onto Ubuntu
About a week ago I finally bit and bought a new laptop, a Dell Inspiron 11 3162, as my old netbook has grown old, slow, and of short battery life.
Yes, this is not exactly high-end, but the point is, I don't want high-end. A really good, high performance, cutting edge laptop would come with two downsides. First, it would be optimised for being high performance and not for being light and small; and I really want something that travels easily. Second, it would be expensive; and I want something cheap - we are talking less than AUD400 here - so that it will not be too painful if it gets damaged or stolen during field work or a conference trip.
The new machine came with Windows 10. I think it is a psychological defect on my part, but whenever I try to use Windows 8 or 10 even just for a few minutes I get really upset. Not trying to proselytise here, just my personal problem. A real problem, however, is that this model of laptop comes with storage of only 32 G on a card, no hard-drive. I assume the idea is that many people use the cloud these days (I don't), but still this is a tad on the ridiculous side. Windows 10 took up very nearly half of that space, so install a few things and get a few security updates and you hit the wall.
Having considered these two issues, I carpet-bombed Windows and installed Ubuntu 16.04. By itself this operating system takes up about 3.3 G without and now c. 8 G with various programs and packages I installed on top of it, still only half of what Windows did by itself. So, yeah. Also, I can now click on something and the computer does not have to think for 3-4 seconds before it reacts. As a colleague sardonically commented on the performance issues of Windows, "Intel giveth, and Microsoft taketh away." I also bought a micro card to fit into the little slot on the left side of the laptop, and it works nicely as additional storage, contrary to some comments I have seen on the web. It merely had to be formatted for Ubuntu.
Mostly I use my laptop for simple things, like Skype, checking eMails, writing on a manuscript, etc., but generally not to run time-consuming analyses. That being said, I like to have some analysis software on there too in case it becomes necessary during travel. It has to be admitted that one of the disadvantages of Ubuntu is still that it is not always trivial to find and install specialised software. As I just had to do exactly that, I thought I would use this post to collect my recent experiences. Perhaps somebody will find it useful before it is too much out of date.
First, the software centre was weirdly empty. Here I found a post on the ask ubuntu forum helpful.
If you have the same problem, open terminal and run:
sudo apt update
sudo apt upgrade -y
Now for my personal must-haves and how I got them:
Inkscape
(Vector graphics program, e.g. for editing figures for a manuscript.)
No problem installing from Software Centre.
GIMP
(Pixel graphics editor, for photos.)
No problem installing from Software Centre.
R
(Statistics software.)
There are probably different ways of doing it, but I followed the instructions from digitalocean.
rstudio
(GUI for R.)
Download the relevant .deb file from the program website, right-click and select to open it with the software centre.
Java
(Required to run several of the other programs here.)
I got JDK instead of merely JRE, just to be on the safe side. Open terminal and run:
sudo apt-get update
sudo apt-get install default-jdk
Source of information: digitalocean.
Acrobat PDF Reader
(Sadly it seems to be the only PDF reader on Linux that can edit complex PDFs such as used for grant proposals by some funding agencies. Only a very old version is available as Acrobat has discontinued Linux support.)
Open terminal and run:
sudo apt-get install libatk1.0-0 libc6 libfontconfig1 libgcc1 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk2.0-0 libidn11 libpango1.0-0 libstdc++6 libx11-6 libxext6 libxml2 libxt6 zlib1g lsb-release debconf
wget http://archive.canonical.com/pool/partner/a/acroread/acroread-bin_9.5.5-1raring1_i386.deb
sudo dpkg -i acroread-bin_9.5.5-1raring1_i386.deb
Source of information: ask ubuntu forum.
In my case, however, I experienced some errors. Apparently Acrobat Reader requires some outdated packages to run, and Ubuntu does not want to install them because it has got the newer versions. The system itself then kindly suggested to me to run a command to fix the problem. I hope I remember correctly, but I think it was simply:
sudo apt-get upgrade -f
The f parameter is supposed to fix broken dependencies. That command (or one very much like it) solved the problem for me, and I was able to start the reader.
AliView
(Sequence alignment editor.)
Following the instructions on the program website should have worked, in principle. However, I realised only then that Java was not yet installed, and AliView obviously wouldn't work without it. Download the aliview.install.run file, change its file preferences to make it executable, open terminal, go to relevant folder, run:
sudo ./aliview.install.run
MAFFT
(In my eyes the best sequence alignment tool, can also be called by AliView.)
After experiencing some problems trying to install from the rpm file that is available on the program homepage I found an entry on howtoinstall.co that simplified things. Open a terminal and run simply:
sudo apt-get update
sudo apt-get install mafft
That was easy.
PAUP test versions
(Phylogenetics software implementing various methods.)
This comes as an executable. Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done.
TNT
(Parsimony phylogenetics software.)
Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done. When running the program the first time you will have to accept the license agreement, as opposed to during an installation.
MrBayes
(Bayesian phylogenetics software.)
Available on github and sourceforge. I downloaded and unpacked it, opened a terminal, navigated to the relevant folder, and followed the instructions for compiling that are given in the appropriately named text file. Worked beautifully, only I had to disable Beagle, as prompted during compilation.
FigTree
(Phylogenetic tree viewer.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
Tracer
(To examine the results of MrBayes runs.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
jModelTest
(Model testing for Likelihood and Bayesian phylogenetics. For larger datasets I would not use the laptop, of course, as it takes forever and benefits greatly from parallel processing.)
Java program, so simply get it from the program website, unpack, and set the JAR file and the PhyML program (!) to be executable in their preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
WINE
(Windows emulator, just in case)
No problem installing from Software Centre.
SciTE
(Text editor with many useful functions, for data files etc.)
No problem installing from Software Centre.
Skype
(Video calls.)
Download the .deb file from the program website, right-click and select to open it with the software centre.
ClamTK
(Virus scanner.)
No problem installing from Software Centre.
Yes, this is not exactly high-end, but the point is, I don't want high-end. A really good, high performance, cutting edge laptop would come with two downsides. First, it would be optimised for being high performance and not for being light and small; and I really want something that travels easily. Second, it would be expensive; and I want something cheap - we are talking less than AUD400 here - so that it will not be too painful if it gets damaged or stolen during field work or a conference trip.
The new machine came with Windows 10. I think it is a psychological defect on my part, but whenever I try to use Windows 8 or 10 even just for a few minutes I get really upset. Not trying to proselytise here, just my personal problem. A real problem, however, is that this model of laptop comes with storage of only 32 G on a card, no hard-drive. I assume the idea is that many people use the cloud these days (I don't), but still this is a tad on the ridiculous side. Windows 10 took up very nearly half of that space, so install a few things and get a few security updates and you hit the wall.
Having considered these two issues, I carpet-bombed Windows and installed Ubuntu 16.04. By itself this operating system takes up about 3.3 G without and now c. 8 G with various programs and packages I installed on top of it, still only half of what Windows did by itself. So, yeah. Also, I can now click on something and the computer does not have to think for 3-4 seconds before it reacts. As a colleague sardonically commented on the performance issues of Windows, "Intel giveth, and Microsoft taketh away." I also bought a micro card to fit into the little slot on the left side of the laptop, and it works nicely as additional storage, contrary to some comments I have seen on the web. It merely had to be formatted for Ubuntu.
Mostly I use my laptop for simple things, like Skype, checking eMails, writing on a manuscript, etc., but generally not to run time-consuming analyses. That being said, I like to have some analysis software on there too in case it becomes necessary during travel. It has to be admitted that one of the disadvantages of Ubuntu is still that it is not always trivial to find and install specialised software. As I just had to do exactly that, I thought I would use this post to collect my recent experiences. Perhaps somebody will find it useful before it is too much out of date.
First, the software centre was weirdly empty. Here I found a post on the ask ubuntu forum helpful.
If you have the same problem, open terminal and run:
sudo apt update
sudo apt upgrade -y
Now for my personal must-haves and how I got them:
Inkscape
(Vector graphics program, e.g. for editing figures for a manuscript.)
No problem installing from Software Centre.
GIMP
(Pixel graphics editor, for photos.)
No problem installing from Software Centre.
R
(Statistics software.)
There are probably different ways of doing it, but I followed the instructions from digitalocean.
rstudio
(GUI for R.)
Download the relevant .deb file from the program website, right-click and select to open it with the software centre.
Java
(Required to run several of the other programs here.)
I got JDK instead of merely JRE, just to be on the safe side. Open terminal and run:
sudo apt-get update
sudo apt-get install default-jdk
Source of information: digitalocean.
Acrobat PDF Reader
(Sadly it seems to be the only PDF reader on Linux that can edit complex PDFs such as used for grant proposals by some funding agencies. Only a very old version is available as Acrobat has discontinued Linux support.)
Open terminal and run:
sudo apt-get install libatk1.0-0 libc6 libfontconfig1 libgcc1 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk2.0-0 libidn11 libpango1.0-0 libstdc++6 libx11-6 libxext6 libxml2 libxt6 zlib1g lsb-release debconf
wget http://archive.canonical.com/pool/partner/a/acroread/acroread-bin_9.5.5-1raring1_i386.deb
sudo dpkg -i acroread-bin_9.5.5-1raring1_i386.deb
Source of information: ask ubuntu forum.
In my case, however, I experienced some errors. Apparently Acrobat Reader requires some outdated packages to run, and Ubuntu does not want to install them because it has got the newer versions. The system itself then kindly suggested to me to run a command to fix the problem. I hope I remember correctly, but I think it was simply:
sudo apt-get upgrade -f
The f parameter is supposed to fix broken dependencies. That command (or one very much like it) solved the problem for me, and I was able to start the reader.
AliView
(Sequence alignment editor.)
Following the instructions on the program website should have worked, in principle. However, I realised only then that Java was not yet installed, and AliView obviously wouldn't work without it. Download the aliview.install.run file, change its file preferences to make it executable, open terminal, go to relevant folder, run:
sudo ./aliview.install.run
MAFFT
(In my eyes the best sequence alignment tool, can also be called by AliView.)
After experiencing some problems trying to install from the rpm file that is available on the program homepage I found an entry on howtoinstall.co that simplified things. Open a terminal and run simply:
sudo apt-get update
sudo apt-get install mafft
That was easy.
PAUP test versions
(Phylogenetics software implementing various methods.)
This comes as an executable. Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done.
TNT
(Parsimony phylogenetics software.)
Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done. When running the program the first time you will have to accept the license agreement, as opposed to during an installation.
MrBayes
(Bayesian phylogenetics software.)
Available on github and sourceforge. I downloaded and unpacked it, opened a terminal, navigated to the relevant folder, and followed the instructions for compiling that are given in the appropriately named text file. Worked beautifully, only I had to disable Beagle, as prompted during compilation.
FigTree
(Phylogenetic tree viewer.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
Tracer
(To examine the results of MrBayes runs.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
jModelTest
(Model testing for Likelihood and Bayesian phylogenetics. For larger datasets I would not use the laptop, of course, as it takes forever and benefits greatly from parallel processing.)
Java program, so simply get it from the program website, unpack, and set the JAR file and the PhyML program (!) to be executable in their preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
WINE
(Windows emulator, just in case)
No problem installing from Software Centre.
SciTE
(Text editor with many useful functions, for data files etc.)
No problem installing from Software Centre.
Skype
(Video calls.)
Download the .deb file from the program website, right-click and select to open it with the software centre.
ClamTK
(Virus scanner.)
No problem installing from Software Centre.
Friday, November 4, 2016
CBA seminar on molecular phylogenetics
Today I went to a Centre of Biodiversity Analysis seminar over at the Australian National University: Prof. Lindell Bromham on Reading the story in DNA - the core principles of molecular phylogenetic inference. This was very refreshing, as I have spent most of the year doing non-phylogenetic work such as cytology, programming, species delimitation, and building identification keys.
The seminar was packed, the audience was lively and from very diverse fields, and the speaker was clear and engaging. As can be expected, Prof. Bromham started with the very basics but had nearly two hours (!) to get to very complicated topics: sequence alignments, signal saturation, distance methods, parsimony analysis, likelihood phylogenetics, Bayesian phylogenetics, and finally various problems with the latter, including choice of priors or when results merely restate the priors.
The following is a slightly unsystematic run-down of what I found particularly interesting. Certainly other participants will have a different perspective.
Signal saturation or homoplasy at the DNA level erases the historical evidence. Not merely: makes the evidence harder to find. Erases. It is gone. That means that strictly speaking we cannot infer or even estimate phylogenies, even with a superb model, we can only ever build hypotheses.
Phylogenetics is a social activity. The point is that fads and fashions, irrational likes and dislikes, groupthink, the age of a method, and quite simply the availability and user-friendliness of software determine the choice of analysis quite as much as the appropriateness of the analysis. Even if one were able to show that parsimony, for example, works well for a particular dataset one would still not be able to get the paper into any prestigious journal except Cladistics. And yes, she stressed that there is no method that is automatically inappropriate, even distance or parsimony. It depends on the data.
Any phylogenetic approach taken in a study can be characterised with three elements: a search strategy, an optimality criterion, and a model of how evolution works. For parsimony, for example, the search strategy is usually heuristic (not her words, see below), the optimality criterion is minimal number of character changes, and the implicit model is that character changes are rare and absence of homoplasy.
The more sophisticated the method, the harder it gets to state its assumptions. Just saying out loud all the assumptions behind a BEAST run would take a lot of time. Of course that does not mean that the simpler methods do not make assumptions - they are merely implicit. (I guess if one were to spell them out, they would then often be "this factor can safely be ignored".)
Nominally Bayesian phylogeneticists often behave in very un-Bayesian ways. Examples are use of arbitrary Bayes factor cut-offs, not updating priors but treating every analysis as independent, and frowning upon informative topology priors.
Unfortunately, in Bayesian phylogenetics priors determine the posterior more often than most people realise. This brought me back to discussions with a very outspoken Bayesian seven years ago; his argument was "a wrong prior doesn't matter if you have strong data", which if true would kind of make me wonder what the point is of doing Bayesian analysis in the first place.
However, Prof. Bromham also said a few things that I found a bit odd, or at least potentially in need of some clarification.
She implied that parsimony analysis generally used exhaustive searches. Although there was also a half-sentence to the effect of at least originally, I would stress that search strategy and optimality criterion are two very different things. Nothing keeps a likelihood analysis from using an exhaustive search (except that it would not stop before the heat death of the universe), and conversely no TNT user today who has a large dataset would dream of doing anything but heuristic searches. Indeed the whole point of that program was to offer ways of cutting even more corners in the search.
Parsimony analysis is also a form of likelihood analysis. Well, I would certainly never claim, as some people do, that it comes without assumptions. I would say that parsimony has a model of evolution in the same sense as the word model is used across science, yes. I can also understand how and why people interpret parsimony as a model in the specific sense of likelihood phylogenetics and examine what that means for its behaviour and parameterisation compared to other models. But calling it a subset of likelihood analysis still leaves me a bit uncomfortable, because it does not use likelihood as a criterion but simply tree length. Maybe I am overlooking something, in fact most likely I am overlooking something, but to me the logic of the analysis seems to be rather different, for better or for worse.
One of the reasons why parsimony has fallen out of fashion is that "cladistics" is an emotional and controversial topic; this was illustrated with a caricature of Willi Hennig dressed up as a saint. I feel that this may conflate Hennig's phylogenetic systematics with parsimony analysis, in other words a principle of classification with an optimality criterion. Although the topic is indeed still hotly debated by a small minority, phylogenetic systematics is today state of the art, even as people have moved to using Bayesian methods to figure out whether a group is monophyletic or not.
The main reasons for the popularity of Bayesian methods are (a) that they allow more complex models and (b) that they are much faster than likelihood analyses. The second claim surprised me greatly because it does not at all reflect my personal experience. When I later discussed it with somebody at work, I realised that it depends greatly on what software we choose for comparison. I was thinking BEAST versus RAxML with fast bootstapping, i.e. several days on a supercomputer versus less than an hour on my desktop. But if we compare MrBayes versus likelihood analysis in PAUP with thorough bootstrapping, well, suddenly I see where this comes from.
These days you can only get published if you use Bayesian methods. Again, that is not at all my experience. It seems to depend on the data, not least because huge genomic datasets can often not be processed with Bayesian approaches anyway. We can see likelihood trees of transcriptome data published in Nature, or ASTRAL trees in other prestigious journals. Definitely not Bayesian.
In summary, this was a great seminar to go to especially because I am planning some phylogenetics work over summer. It definitely got the old cogs turning again. Also, Prof. Bromham provided perhaps the clearest explanation I have ever heard of how Bayesian/MCMC analyses work, and that may become useful for when I have to discuss them with a student myself...
The seminar was packed, the audience was lively and from very diverse fields, and the speaker was clear and engaging. As can be expected, Prof. Bromham started with the very basics but had nearly two hours (!) to get to very complicated topics: sequence alignments, signal saturation, distance methods, parsimony analysis, likelihood phylogenetics, Bayesian phylogenetics, and finally various problems with the latter, including choice of priors or when results merely restate the priors.
The following is a slightly unsystematic run-down of what I found particularly interesting. Certainly other participants will have a different perspective.
Signal saturation or homoplasy at the DNA level erases the historical evidence. Not merely: makes the evidence harder to find. Erases. It is gone. That means that strictly speaking we cannot infer or even estimate phylogenies, even with a superb model, we can only ever build hypotheses.
Phylogenetics is a social activity. The point is that fads and fashions, irrational likes and dislikes, groupthink, the age of a method, and quite simply the availability and user-friendliness of software determine the choice of analysis quite as much as the appropriateness of the analysis. Even if one were able to show that parsimony, for example, works well for a particular dataset one would still not be able to get the paper into any prestigious journal except Cladistics. And yes, she stressed that there is no method that is automatically inappropriate, even distance or parsimony. It depends on the data.
Any phylogenetic approach taken in a study can be characterised with three elements: a search strategy, an optimality criterion, and a model of how evolution works. For parsimony, for example, the search strategy is usually heuristic (not her words, see below), the optimality criterion is minimal number of character changes, and the implicit model is that character changes are rare and absence of homoplasy.
The more sophisticated the method, the harder it gets to state its assumptions. Just saying out loud all the assumptions behind a BEAST run would take a lot of time. Of course that does not mean that the simpler methods do not make assumptions - they are merely implicit. (I guess if one were to spell them out, they would then often be "this factor can safely be ignored".)
Nominally Bayesian phylogeneticists often behave in very un-Bayesian ways. Examples are use of arbitrary Bayes factor cut-offs, not updating priors but treating every analysis as independent, and frowning upon informative topology priors.
Unfortunately, in Bayesian phylogenetics priors determine the posterior more often than most people realise. This brought me back to discussions with a very outspoken Bayesian seven years ago; his argument was "a wrong prior doesn't matter if you have strong data", which if true would kind of make me wonder what the point is of doing Bayesian analysis in the first place.
However, Prof. Bromham also said a few things that I found a bit odd, or at least potentially in need of some clarification.
She implied that parsimony analysis generally used exhaustive searches. Although there was also a half-sentence to the effect of at least originally, I would stress that search strategy and optimality criterion are two very different things. Nothing keeps a likelihood analysis from using an exhaustive search (except that it would not stop before the heat death of the universe), and conversely no TNT user today who has a large dataset would dream of doing anything but heuristic searches. Indeed the whole point of that program was to offer ways of cutting even more corners in the search.
Parsimony analysis is also a form of likelihood analysis. Well, I would certainly never claim, as some people do, that it comes without assumptions. I would say that parsimony has a model of evolution in the same sense as the word model is used across science, yes. I can also understand how and why people interpret parsimony as a model in the specific sense of likelihood phylogenetics and examine what that means for its behaviour and parameterisation compared to other models. But calling it a subset of likelihood analysis still leaves me a bit uncomfortable, because it does not use likelihood as a criterion but simply tree length. Maybe I am overlooking something, in fact most likely I am overlooking something, but to me the logic of the analysis seems to be rather different, for better or for worse.
One of the reasons why parsimony has fallen out of fashion is that "cladistics" is an emotional and controversial topic; this was illustrated with a caricature of Willi Hennig dressed up as a saint. I feel that this may conflate Hennig's phylogenetic systematics with parsimony analysis, in other words a principle of classification with an optimality criterion. Although the topic is indeed still hotly debated by a small minority, phylogenetic systematics is today state of the art, even as people have moved to using Bayesian methods to figure out whether a group is monophyletic or not.
The main reasons for the popularity of Bayesian methods are (a) that they allow more complex models and (b) that they are much faster than likelihood analyses. The second claim surprised me greatly because it does not at all reflect my personal experience. When I later discussed it with somebody at work, I realised that it depends greatly on what software we choose for comparison. I was thinking BEAST versus RAxML with fast bootstapping, i.e. several days on a supercomputer versus less than an hour on my desktop. But if we compare MrBayes versus likelihood analysis in PAUP with thorough bootstrapping, well, suddenly I see where this comes from.
These days you can only get published if you use Bayesian methods. Again, that is not at all my experience. It seems to depend on the data, not least because huge genomic datasets can often not be processed with Bayesian approaches anyway. We can see likelihood trees of transcriptome data published in Nature, or ASTRAL trees in other prestigious journals. Definitely not Bayesian.
In summary, this was a great seminar to go to especially because I am planning some phylogenetics work over summer. It definitely got the old cogs turning again. Also, Prof. Bromham provided perhaps the clearest explanation I have ever heard of how Bayesian/MCMC analyses work, and that may become useful for when I have to discuss them with a student myself...
