On our way back to Canberra we stopped at Jenolan Caves, which I had so far only heard mentioned. They were really worth the admission price.
Above the entrance area, a large tunnel through the mountain. The various cave systems that can be visited branch off from this central cavity.
Before our tour started we spent some time at the amazingly blue lake nearby. It is said to have healing properties, if you will believe it.
We visited the Chifley cave. It is not even the largest of them but already extremely impressive, and it features a lot of interesting crystal and limestone formations...
... as seen above. I found it a bit weird that the guide was constantly apologising for explaining the geology. What is next, a zoo guide apologising for explaining animals?
These formations are apparently called shawls, at least if we understood correctly.
And because there will not be any plants otherwise, I will close with this yellow lichen landscape on tree bark.
Friday, December 30, 2016
Thursday, December 29, 2016
Blue Mountains Holiday Trip, part 3
Today's stations were, in order of appearance, Echo Point and Leura Cascades in Katoomba, Wentworth Falls, and Govet's Leap Lookout in Blackheath.
The obligatory Three Sisters Picture, although from this perspective it looks more like one sister.
Blue Mountains landscape as seen from the lookout behind Leura Cascades.
Leura Cascades are a paradise for various species of ferns. Here the leaf venation of Todea barbara (Osmundaceae) showing through against the light.
Cliff face near Wentworth Falls. If you look very carefully you can see tiny people; we got to that place a bit later.
The surroundings of Wentworth Falls were perhaps botanically the most interesting area today. Above Rimacola elliptica (Orchidaceae), a species restricted to a fairly small range around Sydney.
And here we have Alania endlicheri, of the strange little family Boryaceae. Its sister genus occurs far away in Western Australia.
The obligatory Three Sisters Picture, although from this perspective it looks more like one sister.
Blue Mountains landscape as seen from the lookout behind Leura Cascades.
Leura Cascades are a paradise for various species of ferns. Here the leaf venation of Todea barbara (Osmundaceae) showing through against the light.
Cliff face near Wentworth Falls. If you look very carefully you can see tiny people; we got to that place a bit later.
The surroundings of Wentworth Falls were perhaps botanically the most interesting area today. Above Rimacola elliptica (Orchidaceae), a species restricted to a fairly small range around Sydney.
And here we have Alania endlicheri, of the strange little family Boryaceae. Its sister genus occurs far away in Western Australia.
Wednesday, December 28, 2016
Blue Mountains Holiday Trip, part 2
Today our goals were the stone pagodas of Garden of Stones and Wollemi National Parks as well as Mount Tomah Botanic Gardens.
The landscape of the sandstone-ironstone pagodas. This picture is taken from atop one of them. I had been looking forward to showing this area to my family.
One of the plants in flower on the way there was Goodenia bellidifolia (Goodeniaceae).
The only downside of getting there is that you have to pass through Newnes State "Forest". Forest as in clear-felling, apparently.
We then returned to Lithgow and drove eastwards from there to Mount Tomah. Shortly before reaching the gardens I noticed Calomeria amaranthoides (Asteraceae) on the roadside. Yes, Asteraceae; it is certainly one of the weirder representatives of this family on a continent that has its fair share of morphologically aberrant daisies.
The above is the view over Mount Tomah Botanic Gardens, with the Blue Mountains in the distance.
My daughter was particularly enchanted by the bog garden, as it had a lot of carnivorous plants on display. Here a detail of a sun-dew (Drosera).
She also liked the remnant rainforest. Here a Microsorum fern is climbing up a trunk.
The landscape of the sandstone-ironstone pagodas. This picture is taken from atop one of them. I had been looking forward to showing this area to my family.
One of the plants in flower on the way there was Goodenia bellidifolia (Goodeniaceae).
The only downside of getting there is that you have to pass through Newnes State "Forest". Forest as in clear-felling, apparently.
We then returned to Lithgow and drove eastwards from there to Mount Tomah. Shortly before reaching the gardens I noticed Calomeria amaranthoides (Asteraceae) on the roadside. Yes, Asteraceae; it is certainly one of the weirder representatives of this family on a continent that has its fair share of morphologically aberrant daisies.
The above is the view over Mount Tomah Botanic Gardens, with the Blue Mountains in the distance.
My daughter was particularly enchanted by the bog garden, as it had a lot of carnivorous plants on display. Here a detail of a sun-dew (Drosera).
She also liked the remnant rainforest. Here a Microsorum fern is climbing up a trunk.
Tuesday, December 27, 2016
Blue Mountains Holiday Trip, part 1
Over the holidays we are spending a few nights in the Blue Mountains. Today was mostly driving, but after our arrival in Lithgow we found the time to visit at least Hassans Wall Lookout, apparently the highest lookout in the area.
Unfortunately, as with my recent work trip, the sky was grey. Photos did not come out too badly considering the circumstances.
A bunch of tourists were going beyond the fences, into areas that are considered out of bounds. Bit too dangerous for my taste, given the landscape.
There were several interesting plants flowering in the area, nearly all of them white. Some of them I do not know the species names of, but the above is Platysace lanceolata (Apiaceae), a common shrub of heath-like habitats.
Unfortunately, as with my recent work trip, the sky was grey. Photos did not come out too badly considering the circumstances.
A bunch of tourists were going beyond the fences, into areas that are considered out of bounds. Bit too dangerous for my taste, given the landscape.
There were several interesting plants flowering in the area, nearly all of them white. Some of them I do not know the species names of, but the above is Platysace lanceolata (Apiaceae), a common shrub of heath-like habitats.
Thursday, December 22, 2016
Field trip to Moss Vale area
Pictures from a one day field trip I did today:
Landscape as seen from the Fitzroy Falls lookout; only a pity about the grey sky.
A bit further on from the falls we encountered this charming little shrub, Boronia anemonifolia (Rutaceae). The strongly divided leaves are very aromatic, somewhat fruity in their note.
At our second major stop in an area called Barren Ground we saw many interesting plants, although not the ones I was really after - in that sense the previous area was more satisfying. But this one was nonetheless a highlight of the trip: Drosera binata, the forked-leaf sundew. Way back when I was a student in Germany I bought a book on carnivorous plants, and the species is featured heavily in it, presumably because its unique leaf morphology makes it particularly attractive to collectors.
Landscape as seen from the Fitzroy Falls lookout; only a pity about the grey sky.
A bit further on from the falls we encountered this charming little shrub, Boronia anemonifolia (Rutaceae). The strongly divided leaves are very aromatic, somewhat fruity in their note.
At our second major stop in an area called Barren Ground we saw many interesting plants, although not the ones I was really after - in that sense the previous area was more satisfying. But this one was nonetheless a highlight of the trip: Drosera binata, the forked-leaf sundew. Way back when I was a student in Germany I bought a book on carnivorous plants, and the species is featured heavily in it, presumably because its unique leaf morphology makes it particularly attractive to collectors.
Sunday, December 11, 2016
Cladistics textbook, part 2
Coming back to the textbook
Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.
..., in my previous post I mentioned that I also ran into a section that I find hard to agree with. The chapter on support values opens with the following:
To me there is really no big difference. We always infer what is most likely to have happened in individual instances in the past and then draw more general conclusions from those instances, no matter whether it is history or social science, archeology or engineering, paleobotany or (extant) plant taxonomy, evolutionary biology or population genetics.
I assume that a big part of the difference in perspective here is about what organismal characters people are thinking of. Reading through the cladistics textbook, the focus is pretty much always on morphology. Reading through works that introduce likelihood or Bayesian phylogenetics, in other words probabilistic and model-based evolutionary analysis, the focus is pretty much always on nucleotide sequence data, with protein sequence data coming a distant second.
It makes sense to me that somebody who thinks predominantly in terms of trait shifts like the evolution of bird feathers from scales or of angiosperm gynoecia from ovules sitting nakedly on a stalk would have reason to favour parsimony analysis. In fact I myself, despite frequently using likelihood and Bayesian phylogenetics for sequence data, would still have to be counted among those who are highly sceptical whether the Mk model works better with morphological traits than parsimony.
These kinds of characters have very low homoplasy, at least if scored correctly; and where they do show homoplasy, I would say that is due to a scoring error that can be rectified (e.g. if double fertilisation has evolved independently in angiosperms and gnetophytes then the two should be scored as separate character states). And it just so happens that parsimony analysis is a better tool for the data the less homoplasy there is. What is more, it seems a bit odd to try and apply the same model to all morphological characters, given how vastly different they are.
It also makes a lot of sense to me that somebody who thinks predominantly in terms of trait shifts like an A in the DNA sequence turning into T would see reason to favour analyses using models of sequence evolution. As Prof. Bromham pointed out during her talk I heard a few weeks ago, if that A has changed into a T in two parallel instances and then all the A-carrying individuals died out there is no way in which we can ever find evidence for that.
In other words, in the case of our four letter soup of DNA sequence characters homoplasy is not a scoring error to be discovered by looking closer but a hard fact of life that we cannot rid ourselves of (except to the degree that we can choose slower-evolving markers). And it just so happens that parsimony analysis is a worse tool for the data the more homoplasy there is, while the right model-based approach can deal with that. (Or at least somewhat better - obviously, once homoplasy is so rampant that all signal is lost no phylogenetic method will work, and likelihood analysis has also been shown to suffer from long branch attraction.) What is more, it seems logical to apply the same model to all DNA sequence characters, given that they are equivalent nucleotides along a chain.
So when I call myself a cladist, what I mean is not that I prefer parsimony analysis for all data, but that I acknowledge Willi Hennig's legacy, the idea that systematists should classify consistently by relatedness.
Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.
..., in my previous post I mentioned that I also ran into a section that I find hard to agree with. The chapter on support values opens with the following:
Page 118: The study of phylogeny is an historical science, concerned with the discovery of historical singularities. Consequently, we do not consider phylogenetic inference per se to be fundamentally a statistical question, open to discoverable and objectively definable confidence limits. Hence, we are in diametric opposition to those who would include such a standard statistical framework as part of cladistic theory and practice.I can only repeat in slightly different words what I wrote some time ago about the same question in the context of biogeographic studies. I find it hard to draw a line between historical science and non-historical science, not least because, to take just one example, any physical experiment, be it ever so reproducible, turns into a singular historical event a split second after it has been conducted.
To me there is really no big difference. We always infer what is most likely to have happened in individual instances in the past and then draw more general conclusions from those instances, no matter whether it is history or social science, archeology or engineering, paleobotany or (extant) plant taxonomy, evolutionary biology or population genetics.
I assume that a big part of the difference in perspective here is about what organismal characters people are thinking of. Reading through the cladistics textbook, the focus is pretty much always on morphology. Reading through works that introduce likelihood or Bayesian phylogenetics, in other words probabilistic and model-based evolutionary analysis, the focus is pretty much always on nucleotide sequence data, with protein sequence data coming a distant second.
It makes sense to me that somebody who thinks predominantly in terms of trait shifts like the evolution of bird feathers from scales or of angiosperm gynoecia from ovules sitting nakedly on a stalk would have reason to favour parsimony analysis. In fact I myself, despite frequently using likelihood and Bayesian phylogenetics for sequence data, would still have to be counted among those who are highly sceptical whether the Mk model works better with morphological traits than parsimony.
These kinds of characters have very low homoplasy, at least if scored correctly; and where they do show homoplasy, I would say that is due to a scoring error that can be rectified (e.g. if double fertilisation has evolved independently in angiosperms and gnetophytes then the two should be scored as separate character states). And it just so happens that parsimony analysis is a better tool for the data the less homoplasy there is. What is more, it seems a bit odd to try and apply the same model to all morphological characters, given how vastly different they are.
It also makes a lot of sense to me that somebody who thinks predominantly in terms of trait shifts like an A in the DNA sequence turning into T would see reason to favour analyses using models of sequence evolution. As Prof. Bromham pointed out during her talk I heard a few weeks ago, if that A has changed into a T in two parallel instances and then all the A-carrying individuals died out there is no way in which we can ever find evidence for that.
In other words, in the case of our four letter soup of DNA sequence characters homoplasy is not a scoring error to be discovered by looking closer but a hard fact of life that we cannot rid ourselves of (except to the degree that we can choose slower-evolving markers). And it just so happens that parsimony analysis is a worse tool for the data the more homoplasy there is, while the right model-based approach can deal with that. (Or at least somewhat better - obviously, once homoplasy is so rampant that all signal is lost no phylogenetic method will work, and likelihood analysis has also been shown to suffer from long branch attraction.) What is more, it seems logical to apply the same model to all DNA sequence characters, given that they are equivalent nucleotides along a chain.
So when I call myself a cladist, what I mean is not that I prefer parsimony analysis for all data, but that I acknowledge Willi Hennig's legacy, the idea that systematists should classify consistently by relatedness.
Thursday, December 8, 2016
How to set multiple calibration points in R's chronos function
As mentioned in this venue before, there are two fundamental ways of producing a time-calibrated phylogeny: either infer a Bayesian or Likelihood tree directly under a clock model, or infer a phylogram first and then force it into ultrametric shape afterwards. The second approach has the obvious advantage that you can use whatever tree you want, including e.g. parsimony trees. It also means that the topology is kept the same.
Some time ago I discussed how to use the program r8s, but as mentioned then it is only easily available on Mac and pretty much impossible to get to work on Windows. A more up-to-date alternative is the chronos function in the R package APE. And as mentioned in an even more recent post, it uses Penalised Likelihood with three different settings (relaxed, correlated or discrete) and an adjustable smoothing parameter lambda to time-calibrate the phylogeny.
The first few steps are well documented in APE's manual or on other online sources. Set your working directory, for example setwd("C:/Users/you/Documents/R stuff") . Load APE with library(ape). It needs to be installed, of course.
Next import your tree, either mytree <- read.tree("phylogram.tre") or mytree <- read.nexus("phylogram.tre") depending on whether it is saved in Newick or Nexus format.
Now if you want to merely specify the root age of the tree, for example as 15 million years, you have it easy. Just establish a calibration using APE's makeChronosCalibration function as described in the manual:
mycalibration <- makeChronosCalib(mytree, node="root", age.max=15)
You hand your tree, smoothing parameter, model choice and the calibration over to the chronos function:
mytimetree <- chronos(mytree, lambda = 1, model = "relaxed", calibration = mycalibration, control = chronos.control() )
Done! Now you can plot the tree or save it as usual. But what if you have several calibration points? The documentation does not provide an example of how to do that. It mentions an interactive mode in which the phylogram is displayed, you can click on one branch after the other, and each time you are asked to provide its minimum and maximum ages manually.
A very good discussion of the interactive mode was provided by a blogger called Justin Bagley a bit more than a year ago, but the so far first and only commenter raised the obvious issue:
So I spent some time figuring out how exactly we can build a data matrix of calibration points for chronos ourselves. Here is what you do.
Create a vector of nodes that you have calibrations for. I find it easiest to let APE find the node numbers itself by using the get Most Recent Common Ancestor function. For example, if you need to find the node where the ancestors of Osmunda and Doodia diverged, you would use getMRCA(mytree, tip = c("Osmunda", "Doodia")):
node <- c(
getMRCA(mytree, tip = c("Equisetum","Doodia") ),
getMRCA(mytree, tip = c("Osmunda","Doodia") ),
getMRCA(mytree, tip = c("Hymenophyllum","Doodia") )
)
So now we have defined three nodes. Next, minimum and maximum ages. The first is the root of my imaginary tree, so I want to set a maximum age as a wall for the whole tree depth but no minimum. The others are internal nodes calibrated with fossils, so they should be minimum ages but have no maximum. I tried to enter NA or NULL when there was no minimum or maximum but it didn't work, so the easiest thing to do is to specify 0 for no minimum and the root age for no maximum.
age.min <- c(
0,
280,
270
)
age.max <- c(
354,
354,
354
)
Finally, chronos expects a last column of the data matrix that is not currently used. Still it needs to have the right dimensions:
soft.bounds <- c(
FALSE,
FALSE,
FALSE
)
Now simply unite these four vectors into a single matrix:
mycalibration <- data.frame(node, age.min, age.max, soft.bounds)
The names of the vectors turn into the column names that chronos expects, and off we go: three calibration points. I hope some readers find this useful, and that it works as well for them as it did for me.
Some time ago I discussed how to use the program r8s, but as mentioned then it is only easily available on Mac and pretty much impossible to get to work on Windows. A more up-to-date alternative is the chronos function in the R package APE. And as mentioned in an even more recent post, it uses Penalised Likelihood with three different settings (relaxed, correlated or discrete) and an adjustable smoothing parameter lambda to time-calibrate the phylogeny.
The first few steps are well documented in APE's manual or on other online sources. Set your working directory, for example setwd("C:/Users/you/Documents/R stuff") . Load APE with library(ape). It needs to be installed, of course.
Next import your tree, either mytree <- read.tree("phylogram.tre") or mytree <- read.nexus("phylogram.tre") depending on whether it is saved in Newick or Nexus format.
Now if you want to merely specify the root age of the tree, for example as 15 million years, you have it easy. Just establish a calibration using APE's makeChronosCalibration function as described in the manual:
mycalibration <- makeChronosCalib(mytree, node="root", age.max=15)
You hand your tree, smoothing parameter, model choice and the calibration over to the chronos function:
mytimetree <- chronos(mytree, lambda = 1, model = "relaxed", calibration = mycalibration, control = chronos.control() )
Done! Now you can plot the tree or save it as usual. But what if you have several calibration points? The documentation does not provide an example of how to do that. It mentions an interactive mode in which the phylogram is displayed, you can click on one branch after the other, and each time you are asked to provide its minimum and maximum ages manually.
A very good discussion of the interactive mode was provided by a blogger called Justin Bagley a bit more than a year ago, but the so far first and only commenter raised the obvious issue:
I am trying to place several dozens of calibrations in a very large tree so the "clicking on the tree" step in the interactive mode is kind of tricky as I just cannot see the nodes especially the more derived ones.Also, it is just plain tedious for that many calibration points. What is more, I tried it myself earlier today and found that even when I provided a minimum age only and skipped on entering a maximum, the resulting calibration table would still have the maximum age identical to the minimum age - that is not how it is supposed to work.
So I spent some time figuring out how exactly we can build a data matrix of calibration points for chronos ourselves. Here is what you do.
Create a vector of nodes that you have calibrations for. I find it easiest to let APE find the node numbers itself by using the get Most Recent Common Ancestor function. For example, if you need to find the node where the ancestors of Osmunda and Doodia diverged, you would use getMRCA(mytree, tip = c("Osmunda", "Doodia")):
node <- c(
getMRCA(mytree, tip = c("Equisetum","Doodia") ),
getMRCA(mytree, tip = c("Osmunda","Doodia") ),
getMRCA(mytree, tip = c("Hymenophyllum","Doodia") )
)
So now we have defined three nodes. Next, minimum and maximum ages. The first is the root of my imaginary tree, so I want to set a maximum age as a wall for the whole tree depth but no minimum. The others are internal nodes calibrated with fossils, so they should be minimum ages but have no maximum. I tried to enter NA or NULL when there was no minimum or maximum but it didn't work, so the easiest thing to do is to specify 0 for no minimum and the root age for no maximum.
age.min <- c(
0,
280,
270
)
age.max <- c(
354,
354,
354
)
Finally, chronos expects a last column of the data matrix that is not currently used. Still it needs to have the right dimensions:
soft.bounds <- c(
FALSE,
FALSE,
FALSE
)
Now simply unite these four vectors into a single matrix:
mycalibration <- data.frame(node, age.min, age.max, soft.bounds)
The names of the vectors turn into the column names that chronos expects, and off we go: three calibration points. I hope some readers find this useful, and that it works as well for them as it did for me.
Saturday, December 3, 2016
Molecular clock models in three different programs
There are quite a few molecular clock models implemented in various phylogenetic programs. What I find somewhat annoying is that there is generally only very cryptic information on how they work and how they relate to each other.
What we usually get is the manual or documentation merely saying "our software offers models A, B, and C", without any details on what A means. If we are extremely lucky we find a reference to a paper. In that paper there will be a lot of complicated formulas but rarely will we find a sentence that simply says "model A assumes that rates on neighbouring branches are not correlated".
So I just slogged through documentation, references, and some more or less helpful websites to try and figure out the clock models in R's chronos function, which turns a pre-existing phylogram into a chronogram, and in MrBayes 3 and BEAST 2, which produce chronograms directly.
R: ape package: chronos function
Correlated
As far as I understand this is probably the original Penalised Likelihood approach as also implemented in the software r8s. If so, it allows rates to vary across the tree but with neighbouring internodes of the tree having more similar rates than distant branches are allowed to have. How strongly the rate can vary across the tree depends on the smoothness factor lambda of the model. Here a default of 1 is often used, and it has to changed by orders of magnitude to explore better settings (10, 100...).
Relaxed
Each internode of the tree has its own rate of evolution, with the rate distribution drawn from a gamma parameter.
Discrete
Attempts to estimate a number of categories of internodes, where all the branches of the same category have the same rate of evolution.
MrBayes 3
This is a bit more complicated because MrBayes has several strict clock models, and the relaxed models are always based on an underlying strict model to which they add at least one parameter to describe how rates vary. So how can there be different strict clock models, when a strict clock model means nothing but a constant rate? Well, really this is about the prior (expectation) for the distribution of divergence times across the tree.
Uniform (a.k.a. simple)
Basically the expectation is that a divergence is equally likely to have happened at any point in time between the divergences immediately before and after it.
Birth-death
This model has parameters for speciation rate, extinction rate, and sampling probability. It looks at the tree as something that comes in existence through a process of lineage splits and lineage extinctions, forward through time. Crucially, trees can have rather non-uniform branching patterns depending on the speciation and extinction rates. If both are high, for example, there will be many recent splits but long branches deep in time.
Coalescent
Parameters are theta and the ploidy level, to estimate effective population sizes. The perspective is the opposite of the previous one, with the tree being seen as the coalescence of contemporary lineages into common ancestors as we go back in time.
There are also a birth-death based fossilisation model and the species tree model as additional alternatives. But coming now to how the clock can be relaxed, MrBayes has three options for that (clockvarpr):
Independent Gamma Rate (igr)
I assume this is the same as the relaxed model of ape's chronos function, see above, as the MrBayes manual says it is uncorrelated, and they both have the gamma parameter.
Brownian motion / autocorrelated lognormal distribution (tk02)
The terms Brownian motion and autocorrelated suggest that it is a correlated clock model, i.e. neighbouring branches do not vary independently.
Compound Poisson Process (cpp)
The way I read the paper where this was first suggested it seems to me as if rates are correlated. It allows shifts in rate to occur anywhere on the tree.
BEAST 2
My understanding is that BEAST always uses the coalescent model. It then offers the following options for the clock model:
Strict
The same rate across all branches. So this should be equivalent to using the coalescent divergence time prior with a strict model in MrBayes.
Relaxed exponential or relaxed log normal
Two uncorrelated models differing in the shape of the rate distribution. No idea which one could a priori be expected to be more realistic.
Random local
There are different clocks applying to different parts of the phylogeny (reference). This should work well with sudden rate shifts, but I don't know if it will make clock rooting any more reliable. The way I understand it, it also means that the model is uncorrelated.
In other words, it looks as if "strict" is the only setting that is not an uncorrelated model. This brings us to two final points.
What model to use?
First, I have spoken to a senior colleague and browsed a few sources online, and the general thought seems to be that correlated clock models are more biologically realistic than uncorrelated ones. That makes a lot of sense to me, although with the caveat that there seem to be obvious shifts in some phylogenies, generally associated with a change in a crucial trait such as generation time or metabolism.
Second, one of the sources I read opined that only the strict clock model is really appropriate with the coalescent model, because the latter kind of assumes the former. I have no real opinion on this; I assume that the BEAST developers would have a reason to offer several relaxed models in their coalescent-based package.
Summary
In summary, and hoping I didn't get anything wrong, this is how I currently understand the hierarchy of the relevant clock models:
Strict clock (strict in BEAST; default in MrBayes if no clockvarpr added)
Relaxed clocks
Correlated rates
Brownian motion (tk02 in MrBayes)
Compound Poisson Process (cpp in MrBayes)
Penalised Likelihood (correlated in chronos)
Uncorrelated rates
Independent Gamma Rates (relaxed in chronos; igr in MrBayes)
Discrete rate categories (discrete in chronos)
Relaxed exponential (just that name in BEAST)
Relaxed log normal (just that name in BEAST)
Random local (just that name in BEAST)
What we usually get is the manual or documentation merely saying "our software offers models A, B, and C", without any details on what A means. If we are extremely lucky we find a reference to a paper. In that paper there will be a lot of complicated formulas but rarely will we find a sentence that simply says "model A assumes that rates on neighbouring branches are not correlated".
So I just slogged through documentation, references, and some more or less helpful websites to try and figure out the clock models in R's chronos function, which turns a pre-existing phylogram into a chronogram, and in MrBayes 3 and BEAST 2, which produce chronograms directly.
R: ape package: chronos function
Correlated
As far as I understand this is probably the original Penalised Likelihood approach as also implemented in the software r8s. If so, it allows rates to vary across the tree but with neighbouring internodes of the tree having more similar rates than distant branches are allowed to have. How strongly the rate can vary across the tree depends on the smoothness factor lambda of the model. Here a default of 1 is often used, and it has to changed by orders of magnitude to explore better settings (10, 100...).
Relaxed
Each internode of the tree has its own rate of evolution, with the rate distribution drawn from a gamma parameter.
Discrete
Attempts to estimate a number of categories of internodes, where all the branches of the same category have the same rate of evolution.
MrBayes 3
This is a bit more complicated because MrBayes has several strict clock models, and the relaxed models are always based on an underlying strict model to which they add at least one parameter to describe how rates vary. So how can there be different strict clock models, when a strict clock model means nothing but a constant rate? Well, really this is about the prior (expectation) for the distribution of divergence times across the tree.
Uniform (a.k.a. simple)
Basically the expectation is that a divergence is equally likely to have happened at any point in time between the divergences immediately before and after it.
Birth-death
This model has parameters for speciation rate, extinction rate, and sampling probability. It looks at the tree as something that comes in existence through a process of lineage splits and lineage extinctions, forward through time. Crucially, trees can have rather non-uniform branching patterns depending on the speciation and extinction rates. If both are high, for example, there will be many recent splits but long branches deep in time.
Coalescent
Parameters are theta and the ploidy level, to estimate effective population sizes. The perspective is the opposite of the previous one, with the tree being seen as the coalescence of contemporary lineages into common ancestors as we go back in time.
There are also a birth-death based fossilisation model and the species tree model as additional alternatives. But coming now to how the clock can be relaxed, MrBayes has three options for that (clockvarpr):
Independent Gamma Rate (igr)
I assume this is the same as the relaxed model of ape's chronos function, see above, as the MrBayes manual says it is uncorrelated, and they both have the gamma parameter.
Brownian motion / autocorrelated lognormal distribution (tk02)
The terms Brownian motion and autocorrelated suggest that it is a correlated clock model, i.e. neighbouring branches do not vary independently.
Compound Poisson Process (cpp)
The way I read the paper where this was first suggested it seems to me as if rates are correlated. It allows shifts in rate to occur anywhere on the tree.
BEAST 2
My understanding is that BEAST always uses the coalescent model. It then offers the following options for the clock model:
Strict
The same rate across all branches. So this should be equivalent to using the coalescent divergence time prior with a strict model in MrBayes.
Relaxed exponential or relaxed log normal
Two uncorrelated models differing in the shape of the rate distribution. No idea which one could a priori be expected to be more realistic.
Random local
There are different clocks applying to different parts of the phylogeny (reference). This should work well with sudden rate shifts, but I don't know if it will make clock rooting any more reliable. The way I understand it, it also means that the model is uncorrelated.
In other words, it looks as if "strict" is the only setting that is not an uncorrelated model. This brings us to two final points.
What model to use?
First, I have spoken to a senior colleague and browsed a few sources online, and the general thought seems to be that correlated clock models are more biologically realistic than uncorrelated ones. That makes a lot of sense to me, although with the caveat that there seem to be obvious shifts in some phylogenies, generally associated with a change in a crucial trait such as generation time or metabolism.
Second, one of the sources I read opined that only the strict clock model is really appropriate with the coalescent model, because the latter kind of assumes the former. I have no real opinion on this; I assume that the BEAST developers would have a reason to offer several relaxed models in their coalescent-based package.
Summary
In summary, and hoping I didn't get anything wrong, this is how I currently understand the hierarchy of the relevant clock models:
Strict clock (strict in BEAST; default in MrBayes if no clockvarpr added)
Relaxed clocks
Correlated rates
Brownian motion (tk02 in MrBayes)
Compound Poisson Process (cpp in MrBayes)
Penalised Likelihood (correlated in chronos)
Uncorrelated rates
Independent Gamma Rates (relaxed in chronos; igr in MrBayes)
Discrete rate categories (discrete in chronos)
Relaxed exponential (just that name in BEAST)
Relaxed log normal (just that name in BEAST)
Random local (just that name in BEAST)
Wednesday, November 30, 2016
Hey, since when can RAxML do that?
Maybe I missed something, but I only noticed today that the super-fast likelihood phylogenetics software RAxML can use four different models of sequence evolution.
I seem to remember (?) that it could only do the GTR model. Perhaps they added the three others in a newer version and I missed the change while I was busy using BEAST and otherwise doing non-phylogenetic research? Or did I just never notice before?
Anyway, there is the choice between GTR, JC, K80, and HKY, so the most parameter-rich model, the most parameter-poor, and the two standard models with different rates transitions and transversions. Which is good, because I have a tendency to get GTR or HKY suggested for the data I usually use. It seems, however, as if it is not possible to specify different models for the various parts of the partition.
Anyway, I have accordingly updated the posts on what number of schemes to test in jModelTest and on what models are implemented in the various phylogenetics programs I am familiar with.
I seem to remember (?) that it could only do the GTR model. Perhaps they added the three others in a newer version and I missed the change while I was busy using BEAST and otherwise doing non-phylogenetic research? Or did I just never notice before?
Anyway, there is the choice between GTR, JC, K80, and HKY, so the most parameter-rich model, the most parameter-poor, and the two standard models with different rates transitions and transversions. Which is good, because I have a tendency to get GTR or HKY suggested for the data I usually use. It seems, however, as if it is not possible to specify different models for the various parts of the partition.
Anyway, I have accordingly updated the posts on what number of schemes to test in jModelTest and on what models are implemented in the various phylogenetics programs I am familiar with.
Tuesday, November 29, 2016
Cladistics textbook
In my office I have two 'proper' phylogenetics textbooks, that is counting only those that cover the principles and theory as opposed to offering only a practical how-to manual. One is Felsenstein's, who is strongly associated with likelihood phylogenetics, although his book covers all approaches. The second is:
Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.
As the title implies, it is entirely about parsimony phylogenetics.
Having recently looked into Kitching et al., I noticed two short sections that I found interesting enough to discuss here. I will start with the question of ancestors. Proponents of paraphyletic taxa often make claims on the lines of cladists "not accepting the existence of ancestral species", of "ignoring ancestors", or of "treating all species as sister taxa".
Here now we have a textbook written by cladists, in other words the official version, to the degree that an official version exists. It is, of couse, not as easy as that because the only thing that unites cladists in the sense of what paraphylists argue against is that supraspecific taxa should be monophyletic. Many other details differ from cladist to cladist, and in the sense of what paraphylists argue against the concept of cladist includes those who use e.g. Bayesian phylogenetics.
I also do not want to give the impression that I, personally, take what Kitching et al. promote on this or that detailed question to necessarily be The Correct View. It is well possible that I, a cladist, find myself in disagreement with some chapter of that textbook. I am not even arguing here, in this instance, that making taxa monophyletic is the way to go (although of course I do believe that).
No, the point of the post is merely this: if Kitching et al. argue not-XYZ, then this demonstrates decisively that any claim of all cladists arguing XYZ is nonsense.
So, about ancestors, and turning to page 14 of the textbook:
It should be clear that the above section is correct. An ancestral species would not have any systematically useful characters relative to its descendants, because that descendant clade would have started out as that species. My view - and here other cladists may differ - is actually that the ancestral species and the clade are one and the same. The ancestral species has over time turned (diversified) into the clade.
Note also that at least the cladists who wrote the textbook do not have any problem with paraphyletic species. Whether we think that this use of the word paraphyletic makes sense or not (as do I), it is discussions like this one which make me groan in frustration whenever I read a paraphylist claim that cladists only accepted paraphyletic species as a cop-out once they could no longer deny that they existed. No, cladism was founded on the principle that monophyly applies above the species level, so it never had to backpedal like that.
It is as if the people who claim that cladists do not accept the existence of ancestors haven't even bothered to figure out what any cladists really think.
Next time I will look at a short section of the textbook that I definitely disagree with.
Kitching IJ, Forey PL, Humphries CJ, Williams DM, 1998. Cladistics second edition - the theory and practice of parsimony analysis. The Systematics Association Publication No. 11. Oxford Science Publications.
As the title implies, it is entirely about parsimony phylogenetics.
Having recently looked into Kitching et al., I noticed two short sections that I found interesting enough to discuss here. I will start with the question of ancestors. Proponents of paraphyletic taxa often make claims on the lines of cladists "not accepting the existence of ancestral species", of "ignoring ancestors", or of "treating all species as sister taxa".
Here now we have a textbook written by cladists, in other words the official version, to the degree that an official version exists. It is, of couse, not as easy as that because the only thing that unites cladists in the sense of what paraphylists argue against is that supraspecific taxa should be monophyletic. Many other details differ from cladist to cladist, and in the sense of what paraphylists argue against the concept of cladist includes those who use e.g. Bayesian phylogenetics.
I also do not want to give the impression that I, personally, take what Kitching et al. promote on this or that detailed question to necessarily be The Correct View. It is well possible that I, a cladist, find myself in disagreement with some chapter of that textbook. I am not even arguing here, in this instance, that making taxa monophyletic is the way to go (although of course I do believe that).
No, the point of the post is merely this: if Kitching et al. argue not-XYZ, then this demonstrates decisively that any claim of all cladists arguing XYZ is nonsense.
So, about ancestors, and turning to page 14 of the textbook:
In fact, to date, Archaeopteryx has no recognized autapomorphies. Indeed, if there were, Achaeopteryx would have to be placed as the sister-group to the rest of the birds.It does not matter here whether more recent analyses have demonstrated Archaeopteryx to have autapomorphies and to actually have been a side branch relative to modern birds. We should here simply think of any species that looks exactly like the ancestral species of a later-existing clade is inferred to have looked like.
It should be clear that the above section is correct. An ancestral species would not have any systematically useful characters relative to its descendants, because that descendant clade would have started out as that species. My view - and here other cladists may differ - is actually that the ancestral species and the clade are one and the same. The ancestral species has over time turned (diversified) into the clade.
In terms of unique characters, Archaeopteryx simply does not exist. This is absurd, for its remains have been excavated and studied. To circumvent this logical dilemma, cladists place likely ancestors on the cladogram as the sister-group to their putative descendants and accept that they must be nominal paraphyletic taxa (Fig. 1.9c). Ancestors, just like paraphyletic taxa in general, can only be recognized by a particular combination of characters that they have and characters that they do not have. The unique attribute of possible ancestors is the time at which they lived.Here is the reason why paraphylists complain about ancestors being treated as sister to their descendants: they are treated like that, crucially, so that we can do the analysis. It is a practical, not a philosophical reason.
Note also that at least the cladists who wrote the textbook do not have any problem with paraphyletic species. Whether we think that this use of the word paraphyletic makes sense or not (as do I), it is discussions like this one which make me groan in frustration whenever I read a paraphylist claim that cladists only accepted paraphyletic species as a cop-out once they could no longer deny that they existed. No, cladism was founded on the principle that monophyly applies above the species level, so it never had to backpedal like that.
After a cladistic analysis has been completed the cladogram may be reinterpreted as a tree (see below)What they mean here is that they see a cladogram as such (merely) as a different visualisation of the data from the data matrix, while the "tree" is the cladogram's interpretation in terms of evolutionary relationships, of actual genealogical relatedness of the terminals.
and at this stage some palaeontologists may choose to recognize these paraphyletic taxa as ancestors, particularly when they do not overlap in time with their putative descendants (see Smith 199a for a discussion).And this is the main point. Here we have a group of senior cladists who wrote, to put it in the simplest possible terms, "we need to treat every species as a terminal to get a cladogram, but then if you wish you can interpret a terminal without autapomorphies as an ancestor".
It is as if the people who claim that cladists do not accept the existence of ancestors haven't even bothered to figure out what any cladists really think.
Next time I will look at a short section of the textbook that I definitely disagree with.
Thursday, November 24, 2016
The political system of the USA
So, about that recent election. I am not an American, so I don't actually have a horse in that race except to the degree that everybody will be impacted by what one of the most influential nations on the planet decides to do.
I don't really want to discuss party politics on this blog either, so what I will focus on is simply what I consider to be certain systematic issues with how elections work in the USA. The point is, as far as I can tell the system is built so that it systematically favours conservatives, whether intentionally or not.
A major concept here is Gerrymandering.
In case that isn't clear what Gerrymandering is, imagine a political system in which the seats in parliament are given to people representing individual electoral districts as opposed to nation-wide party lists. So if you win the plurality of the votes in a district, or perhaps the majority after resolving preferences or a run-off election, you get its seat in parliament. Imagine further you have two electoral districts in your town with a hundred voters each and where voters favour the two major parties as follows*: the Yellows always get 45 votes, the Reds 55. Both seats go to the Reds.
Now assume that the Yellows have control of the state government and can redraw the district boundaries. They cleverly manage to redistrict the voters as follows: one district now has Yellows 55 votes versus Reds 45, and the other makes up the difference with Yellows 35 versus Reds 65. Eh voilà, the Yellows have won one seat in parliament without convincing a single voter to switch allegiance. The trick is to concentrate your opponent's voters in a few districts that are super-overwhelmingly safe for them while giving yourself lots of narrower margins.
Now coming to the USA, which of course have district-based representation as opposed to proportional representation.
First, the Electoral College. This is the most obvious and widely discussed, as there have now been two elections within twenty years in which conservative candidates won despite losing the popular vote. If election of presidents had been direct, their opponents would have carried the day. That being said, however, the Electoral College is probably the least Gerrymandered of all bodies, not least because it cannot possibly have been done deliberately. State boundaries are just what they are, they do not get redistricted easily. Still, I assume that urbanisation has an effect here. Many Americans are concentrated in a very small number of states, in huge metropolitan areas, which are strongly leaning progressive. Most states are rural, rural voters lean conservative, and consequently the Electoral College leans conservative.
(By the way, I find it extremely bizarre how these discussions go on American websites. On many sites I have read in the last two weeks there will be commenters who complain about the Electoral College not reflecting the popular vote. And then there will always be somebody replying to the effect of "it is doing exactly what it was meant to do, that is stopping a minority [of states] from dominating the majority [of states]". This is weird, isn't it? There are two main political camps; either the first "dominates" the second, or the second "dominates" the first. You cannot say that when the first doesn't happen but the second does there is suddenly no "dominating" going on. So why should the majority of states count more than the majority of voters? At a minimum I would need more explanation here than is usually forthcoming...)
Second, the US Senate, the upper house of the US Parliament, which has a lot of power. It consists of two senators from each state. Immediately the situation should become clear: the Senate is Gerrymandered by default. Most states are rural, rural voters lean conservative, consequently the Senate leans conservative.
Third, the states. The same principle applies. The majority of states is rural, rural voters lean conservative, and consequently conservative politicians control a majority of the states.
But that was just the districting; there are other factors.
Fourth, for some strange reason US citizens are not automatically registered for voting, they have to make a deliberate effort to become registered voters. People who have time on their hands will have an easier time doing that. People who have time on their hands are, in particular, pensioners and the independently wealthy, while the working poor will have it harder. Old and wealthy people lean conservative, consequently registered voters will lean conservative.
Fifth, out of ancient tradition the USA have elections on Tuesdays, a working day. This makes it much easier for pensioners and the independently wealthy to participate in elections, whereas the working poor will find it harder to take one of their very few days of leave to stand in a queue for voting. Old and wealthy people lean conservative, consequently voters will lean conservative.
Sixth, it is my understanding that US citizens have a lot more elections than the citizens of most other countries. They elect people for offices that are filled without formal election campaigns elsewhere, such as judges, sheriffs, school boards, etc. This means that participating in democracy is much more time-consuming for US citizens, and will easily lead to voting fatigue. The people who have lots of time to deal with all that are, in particular, pensioners and the independently wealthy. Old and wealthy people lean conservative, consequently voters will lean conservative.
Now I have read those who argue that this is simply the system that exists, so progressives will have to learn how to win elections in that system instead of e.g. whining about the unfair Electoral College. Fair enough. What is more, it works well for the conservatives, and again, I am not even an American. It just kind of seems to me, personally, that the point of a democracy is that the outcome of an election should kind of reflect the popular will. The Americans will have to know themselves what they want, and there does not appear to be any interest in change. Myself, I prefer proportional representation, party-independent committees drawing district boundaries, automatic voter registration, voting on a Sunday, and fewer elections with much shorter campaigns so that there is less voting fatigue. It seems to work well in many countries. Just my two cents.
Footnote
*) Of course, one of my major concerns with district-based parliaments is that they distort the popular will even without any Gerrymandering whatsoever. If a smaller party gets 20% in each district of the country they will still get 0% of the seats in parliament, a situation that completely disenfranchises one in five voters.
I don't really want to discuss party politics on this blog either, so what I will focus on is simply what I consider to be certain systematic issues with how elections work in the USA. The point is, as far as I can tell the system is built so that it systematically favours conservatives, whether intentionally or not.
A major concept here is Gerrymandering.
In case that isn't clear what Gerrymandering is, imagine a political system in which the seats in parliament are given to people representing individual electoral districts as opposed to nation-wide party lists. So if you win the plurality of the votes in a district, or perhaps the majority after resolving preferences or a run-off election, you get its seat in parliament. Imagine further you have two electoral districts in your town with a hundred voters each and where voters favour the two major parties as follows*: the Yellows always get 45 votes, the Reds 55. Both seats go to the Reds.
Now assume that the Yellows have control of the state government and can redraw the district boundaries. They cleverly manage to redistrict the voters as follows: one district now has Yellows 55 votes versus Reds 45, and the other makes up the difference with Yellows 35 versus Reds 65. Eh voilà, the Yellows have won one seat in parliament without convincing a single voter to switch allegiance. The trick is to concentrate your opponent's voters in a few districts that are super-overwhelmingly safe for them while giving yourself lots of narrower margins.
Now coming to the USA, which of course have district-based representation as opposed to proportional representation.
First, the Electoral College. This is the most obvious and widely discussed, as there have now been two elections within twenty years in which conservative candidates won despite losing the popular vote. If election of presidents had been direct, their opponents would have carried the day. That being said, however, the Electoral College is probably the least Gerrymandered of all bodies, not least because it cannot possibly have been done deliberately. State boundaries are just what they are, they do not get redistricted easily. Still, I assume that urbanisation has an effect here. Many Americans are concentrated in a very small number of states, in huge metropolitan areas, which are strongly leaning progressive. Most states are rural, rural voters lean conservative, and consequently the Electoral College leans conservative.
(By the way, I find it extremely bizarre how these discussions go on American websites. On many sites I have read in the last two weeks there will be commenters who complain about the Electoral College not reflecting the popular vote. And then there will always be somebody replying to the effect of "it is doing exactly what it was meant to do, that is stopping a minority [of states] from dominating the majority [of states]". This is weird, isn't it? There are two main political camps; either the first "dominates" the second, or the second "dominates" the first. You cannot say that when the first doesn't happen but the second does there is suddenly no "dominating" going on. So why should the majority of states count more than the majority of voters? At a minimum I would need more explanation here than is usually forthcoming...)
Second, the US Senate, the upper house of the US Parliament, which has a lot of power. It consists of two senators from each state. Immediately the situation should become clear: the Senate is Gerrymandered by default. Most states are rural, rural voters lean conservative, consequently the Senate leans conservative.
Third, the states. The same principle applies. The majority of states is rural, rural voters lean conservative, and consequently conservative politicians control a majority of the states.
But that was just the districting; there are other factors.
Fourth, for some strange reason US citizens are not automatically registered for voting, they have to make a deliberate effort to become registered voters. People who have time on their hands will have an easier time doing that. People who have time on their hands are, in particular, pensioners and the independently wealthy, while the working poor will have it harder. Old and wealthy people lean conservative, consequently registered voters will lean conservative.
Fifth, out of ancient tradition the USA have elections on Tuesdays, a working day. This makes it much easier for pensioners and the independently wealthy to participate in elections, whereas the working poor will find it harder to take one of their very few days of leave to stand in a queue for voting. Old and wealthy people lean conservative, consequently voters will lean conservative.
Sixth, it is my understanding that US citizens have a lot more elections than the citizens of most other countries. They elect people for offices that are filled without formal election campaigns elsewhere, such as judges, sheriffs, school boards, etc. This means that participating in democracy is much more time-consuming for US citizens, and will easily lead to voting fatigue. The people who have lots of time to deal with all that are, in particular, pensioners and the independently wealthy. Old and wealthy people lean conservative, consequently voters will lean conservative.
Now I have read those who argue that this is simply the system that exists, so progressives will have to learn how to win elections in that system instead of e.g. whining about the unfair Electoral College. Fair enough. What is more, it works well for the conservatives, and again, I am not even an American. It just kind of seems to me, personally, that the point of a democracy is that the outcome of an election should kind of reflect the popular will. The Americans will have to know themselves what they want, and there does not appear to be any interest in change. Myself, I prefer proportional representation, party-independent committees drawing district boundaries, automatic voter registration, voting on a Sunday, and fewer elections with much shorter campaigns so that there is less voting fatigue. It seems to work well in many countries. Just my two cents.
Footnote
*) Of course, one of my major concerns with district-based parliaments is that they distort the popular will even without any Gerrymandering whatsoever. If a smaller party gets 20% in each district of the country they will still get 0% of the seats in parliament, a situation that completely disenfranchises one in five voters.
Monday, November 21, 2016
Just have to share my astonishment here
This morning over breakfast I read an article in the Canberra Times. When I had finished, I first scrolled up again to make sure that I had not accidentally opened The Onion or, perhaps more likely, its Australian counterpart The Shovel. But no, it was indeed the Canberra Times. Then I thought hard if I had somehow missed that it was 1 April, but again, no such luck.
The article in question?
Housing affordability in Canberra: Renting is the ACT's 'biggest issue'. It argues that rents are so high in Canberra that people cannot save up enough to buy property, which is fair enough, ... using as its only example and case study a 23 year old student to whom, and I cite, "the great Australian dream" (of owning a house) "seems just that - a dream".
Maybe I am just weird, but when I was a 23 year old undergraduate back in Germany I would not have had the money to buy a house either. I lived off a mixture of a small competitive stipend, money earned from teaching assistantships, and my parents topping up the rest. Life was nonetheless good, as the student canteen was cheap and rents reasonable. But if I had started whinging about not being able to buy a house my friends and family would have given me a lot of side-eye, to put it mildly.
I would also argue that at 23 I was not mature enough to take on this responsibility, and I think I would have said so myself, even then. It was a time of learning, of studying, of first figuring out where I want to go with my life.
Which brings up another point. After finishing my studies and doctorate in that town I moved to a different state of the same country; two years later I moved to a different country on the same continent; and nearly one and a half years after that I moved to the other side of the planet. And really something like that was to be expected, given the way the job market in science works. So even if I had been able to afford a house I would not have wanted to buy one until I was settled. Yes, I guess there are some undergrads who study economics or law and then get into a company or public service in their home town, but that cannot be assumed to be a given.
Don't get me wrong, housing is expensive in Canberra. And clearly there must be some up-bidding of prices going on, because looking at quality and size the flats and houses are objectively not worth what they are going for, so the article seems to have got that right. I am forty now, and if we were to describe our fantastic, pie in the sky dream it would be to one day be able to afford a small two bedroom flat with a little courtyard or, if that is impossible, at least a balcony. A house is totally out of the question. This just for context - and note that I am not depressed about it. Billions of people on this planet live happy, productive and fulfilled lives while renting.
But apparently somebody at the newspaper seems to think that the average 23 year old (!) student (!) is expected to be able to buy and own a house. Further, that one's main goal in life, this "great Australian dream", cannot possibly wait until the old age of, I dunno, thirty, but has to achieved before even having finished education. Somebody looked at this article and went, yes, that looks sensible, let's click "publish". I am really, really astonished.
And I am eagerly awaiting to see the next article in the series, "Marriage prospects in Canberra: how a nine year old girl despairs of ever finding Mr Right".
The article in question?
Housing affordability in Canberra: Renting is the ACT's 'biggest issue'. It argues that rents are so high in Canberra that people cannot save up enough to buy property, which is fair enough, ... using as its only example and case study a 23 year old student to whom, and I cite, "the great Australian dream" (of owning a house) "seems just that - a dream".
Maybe I am just weird, but when I was a 23 year old undergraduate back in Germany I would not have had the money to buy a house either. I lived off a mixture of a small competitive stipend, money earned from teaching assistantships, and my parents topping up the rest. Life was nonetheless good, as the student canteen was cheap and rents reasonable. But if I had started whinging about not being able to buy a house my friends and family would have given me a lot of side-eye, to put it mildly.
I would also argue that at 23 I was not mature enough to take on this responsibility, and I think I would have said so myself, even then. It was a time of learning, of studying, of first figuring out where I want to go with my life.
Which brings up another point. After finishing my studies and doctorate in that town I moved to a different state of the same country; two years later I moved to a different country on the same continent; and nearly one and a half years after that I moved to the other side of the planet. And really something like that was to be expected, given the way the job market in science works. So even if I had been able to afford a house I would not have wanted to buy one until I was settled. Yes, I guess there are some undergrads who study economics or law and then get into a company or public service in their home town, but that cannot be assumed to be a given.
Don't get me wrong, housing is expensive in Canberra. And clearly there must be some up-bidding of prices going on, because looking at quality and size the flats and houses are objectively not worth what they are going for, so the article seems to have got that right. I am forty now, and if we were to describe our fantastic, pie in the sky dream it would be to one day be able to afford a small two bedroom flat with a little courtyard or, if that is impossible, at least a balcony. A house is totally out of the question. This just for context - and note that I am not depressed about it. Billions of people on this planet live happy, productive and fulfilled lives while renting.
But apparently somebody at the newspaper seems to think that the average 23 year old (!) student (!) is expected to be able to buy and own a house. Further, that one's main goal in life, this "great Australian dream", cannot possibly wait until the old age of, I dunno, thirty, but has to achieved before even having finished education. Somebody looked at this article and went, yes, that looks sensible, let's click "publish". I am really, really astonished.
And I am eagerly awaiting to see the next article in the series, "Marriage prospects in Canberra: how a nine year old girl despairs of ever finding Mr Right".
Thursday, November 17, 2016
Clock rooting with strong rate shifts - or not
Today at our journal club we discussed Schmitt 2016, "Hennig, Ax, and present-day mainstream cladistics, on polarising characters", published in Peckiana 11: 35-42.
The point of the paper is that early phylogeneticists discussed various ways of figuring out character polarity (i.e. which character state is ancestral and which is derived) first and then using that inference to build a phylogeny, whereas today nearly everybody does the phylogeny building first and then uses outgroup rooting to polarise the resulting network and infer character polarity.
And... that's it, really. There does not appear to be any clear call to action, although one would have expected something on the lines of "and this is bad because...". The paper does end with an exhortation to use more morphological characters instead of only molecular data, and then there is language that may be meant to identify the author as a proponent of paraphyletic taxa without making it explicit (anagenesis!), but neither of those two conclusions appear to be to the point. There is no actual way forward regarding the question of how to polarise characters without outgroup rooting.
The approaches discussed in the paper are the following:
Palaeontological precedence. The character state appearing first in the fossil record is the ancestral one. The problem is, this only works if we assume that the fossil record is fairly complete.
Chorological progression. The character state found more frequently near the edges of a range is the derived one, whereas the ancestral state dominates at the centre of origin. Problem, this is circular because we first need to figure out where the centre of origin is. I am not too convinced of the principle either.
Ontological precedence. Because organisms cannot completely retool their developmental pathways but only change through (1) neoteny or (2) attaching steps to the end of the process, the earlier states in ontogeny are the ancestral ones. The author mentions the problem of a scarcity of ontological data; I might add that this shows a bit of a zoological bias, as it will rarely work in plants and presumably never in microorganisms.
Correlation of transformation series. I must admit I don't quite understand the logic here, and the author isn't very impressed by it either.
Comparison with the stem lineage of the study group. The state found in the ancestral lineage is ancestral. This if very obviously circular, because we would need to know the phylogeny first, and being able to infer that was the whole point of polarising the character.
Ingroup comparison. The state that is more frequent in the study group is ancestral. I see no reason to assume that this is always true, as there can be shifts in diversification rates.
Finally, outgroup comparison. The state that is found in the closest relative(s) of the study group is ancestral in the study group. It is perhaps not totally correct to call this circular, but it has something of turtles all the way down: to find out what the closest relative of your study group is you need to polarise the larger group around it, and then you have the same problem. Still this is the most broadly useful of all these approaches.
Polarising a phylogeny and polarising characters are two sides of the same coin. I have written a thorough post on the former before, which regularly seems to be found by quite a few people doing Google searches. I hope it is still useful. One of the ways I mentioned there for giving the stack of turtles something to stand on is clock rooting, and I found it surprising that the present paper did not mention it at all. It was this, however, that our journal club discussion dwelt on for quite some time.
Admittedly said discussion was a bit meandering, but here are a few thoughts:
The big problem with clock rooting is that it will be thrown off if there are strong rate shifts. Imagine that the true phylogram consists of two sister groups, one with very long branches (short-lived organisms) and the other with very short branches (their long-lived relatives). If we apply a molecular clock model to the phylogenetic analysis, e.g. in MrBayes, it will try to root the tree so that the branches all end at about the same level, the present. The obvious way to do it is to root the tree within the long-branch group. Eh voilà, it has rooted incorrectly.
What to do about this?
The first suggestion was to use an outgroup. In my admittedly limited experience that doesn't work so well. It seems that if the rate shift is strong enough the analysis will simply attach the outgroup to the ingroup in the wrong place.
The next idea was to use a very relaxed clock model, in particular the random local clock model available in BEAST (unfortunately not in MrBayes). But then it was called nice in theory but said to make it hard to achieve stationarity of the MCMC run. I cannot say.
Nick Matzke suggested that a better model could be developed. The hope is that this would allow the analysis to figure out what is going on, recognise the rate shift in the right place, and then root correctly. It would have to be seen how that would work, but at the moment something like that does not appear to be available.
Finally, another colleague said that if the clock models don't work then simply don't use them. Well, but what if we need a time-calibrated phylogeny, a chronogram, to do our downstream analyses, as in biogeographic modelling, studies of diversification rates, or divergence time estimates?
I guess the only way I can think of at the moment is to infer a phylogram whose rooting we trust and then turn it into a chronogram while maintaining topology, as with the software r8s. Maybe there are other ways around the rooting issue with clock models, but I am not ware of them.
The point of the paper is that early phylogeneticists discussed various ways of figuring out character polarity (i.e. which character state is ancestral and which is derived) first and then using that inference to build a phylogeny, whereas today nearly everybody does the phylogeny building first and then uses outgroup rooting to polarise the resulting network and infer character polarity.
And... that's it, really. There does not appear to be any clear call to action, although one would have expected something on the lines of "and this is bad because...". The paper does end with an exhortation to use more morphological characters instead of only molecular data, and then there is language that may be meant to identify the author as a proponent of paraphyletic taxa without making it explicit (anagenesis!), but neither of those two conclusions appear to be to the point. There is no actual way forward regarding the question of how to polarise characters without outgroup rooting.
The approaches discussed in the paper are the following:
Palaeontological precedence. The character state appearing first in the fossil record is the ancestral one. The problem is, this only works if we assume that the fossil record is fairly complete.
Chorological progression. The character state found more frequently near the edges of a range is the derived one, whereas the ancestral state dominates at the centre of origin. Problem, this is circular because we first need to figure out where the centre of origin is. I am not too convinced of the principle either.
Ontological precedence. Because organisms cannot completely retool their developmental pathways but only change through (1) neoteny or (2) attaching steps to the end of the process, the earlier states in ontogeny are the ancestral ones. The author mentions the problem of a scarcity of ontological data; I might add that this shows a bit of a zoological bias, as it will rarely work in plants and presumably never in microorganisms.
Correlation of transformation series. I must admit I don't quite understand the logic here, and the author isn't very impressed by it either.
Comparison with the stem lineage of the study group. The state found in the ancestral lineage is ancestral. This if very obviously circular, because we would need to know the phylogeny first, and being able to infer that was the whole point of polarising the character.
Ingroup comparison. The state that is more frequent in the study group is ancestral. I see no reason to assume that this is always true, as there can be shifts in diversification rates.
Finally, outgroup comparison. The state that is found in the closest relative(s) of the study group is ancestral in the study group. It is perhaps not totally correct to call this circular, but it has something of turtles all the way down: to find out what the closest relative of your study group is you need to polarise the larger group around it, and then you have the same problem. Still this is the most broadly useful of all these approaches.
Polarising a phylogeny and polarising characters are two sides of the same coin. I have written a thorough post on the former before, which regularly seems to be found by quite a few people doing Google searches. I hope it is still useful. One of the ways I mentioned there for giving the stack of turtles something to stand on is clock rooting, and I found it surprising that the present paper did not mention it at all. It was this, however, that our journal club discussion dwelt on for quite some time.
Admittedly said discussion was a bit meandering, but here are a few thoughts:
The big problem with clock rooting is that it will be thrown off if there are strong rate shifts. Imagine that the true phylogram consists of two sister groups, one with very long branches (short-lived organisms) and the other with very short branches (their long-lived relatives). If we apply a molecular clock model to the phylogenetic analysis, e.g. in MrBayes, it will try to root the tree so that the branches all end at about the same level, the present. The obvious way to do it is to root the tree within the long-branch group. Eh voilà, it has rooted incorrectly.
What to do about this?
The first suggestion was to use an outgroup. In my admittedly limited experience that doesn't work so well. It seems that if the rate shift is strong enough the analysis will simply attach the outgroup to the ingroup in the wrong place.
The next idea was to use a very relaxed clock model, in particular the random local clock model available in BEAST (unfortunately not in MrBayes). But then it was called nice in theory but said to make it hard to achieve stationarity of the MCMC run. I cannot say.
Nick Matzke suggested that a better model could be developed. The hope is that this would allow the analysis to figure out what is going on, recognise the rate shift in the right place, and then root correctly. It would have to be seen how that would work, but at the moment something like that does not appear to be available.
Finally, another colleague said that if the clock models don't work then simply don't use them. Well, but what if we need a time-calibrated phylogeny, a chronogram, to do our downstream analyses, as in biogeographic modelling, studies of diversification rates, or divergence time estimates?
I guess the only way I can think of at the moment is to infer a phylogram whose rooting we trust and then turn it into a chronogram while maintaining topology, as with the software r8s. Maybe there are other ways around the rooting issue with clock models, but I am not ware of them.
Sunday, November 13, 2016
Black Mountain plants
Although it was windy and cool we went for a walk on Black Mountain Nature Reserve. It is interesting how completely different its flora is compared against that of the Mount Majura - Mount Ainslie range. For example, Black Mountain has many species of orchids, the other two have only very few. Apparently the difference is to a large degree one of soil chemistry, but past land-use is also said to have differed.
Telstra Tower atop Black Mountain, as seen from near the Australian National Botanic Garden's nursery.
Poranthera microphylla, a tiny plant that is widespread and common in south-eastern Australia, but presumably often overlooked. It was traditionally considered to be a member of the spurge family but apparently now belongs to the Phyllanthaceae.
And here is a representative of the type genus of the Phyllanthaceae, Phyllanthus hirtellus. In this case the plant is larger, a dwarf shrub, but the flowers are still minuscule.
Finally, beautiful Grevillea alpina (Proteaceae), or at least so I hope. There is another rather similar species of the same genus in the area, but it is supposed to have glabrous tepals.
Telstra Tower atop Black Mountain, as seen from near the Australian National Botanic Garden's nursery.
Poranthera microphylla, a tiny plant that is widespread and common in south-eastern Australia, but presumably often overlooked. It was traditionally considered to be a member of the spurge family but apparently now belongs to the Phyllanthaceae.
And here is a representative of the type genus of the Phyllanthaceae, Phyllanthus hirtellus. In this case the plant is larger, a dwarf shrub, but the flowers are still minuscule.
Finally, beautiful Grevillea alpina (Proteaceae), or at least so I hope. There is another rather similar species of the same genus in the area, but it is supposed to have glabrous tepals.
Tuesday, November 8, 2016
Botany picture #237: Allium karataviense
As mentioned with previous botany pictures I really like the onion genus. The above picture, which I took in 2008 either in the botanic garden of Halle or in Gatersleben, Germany, shows one of the many impressive Asian species, Allium karataviense. Like several others of its large-headed relatives it is an ornamental species, not so much a kitchen herb.
Saturday, November 5, 2016
New Laptop, and how to get science / phylogenetics crucial software onto Ubuntu
About a week ago I finally bit and bought a new laptop, a Dell Inspiron 11 3162, as my old netbook has grown old, slow, and of short battery life.
Yes, this is not exactly high-end, but the point is, I don't want high-end. A really good, high performance, cutting edge laptop would come with two downsides. First, it would be optimised for being high performance and not for being light and small; and I really want something that travels easily. Second, it would be expensive; and I want something cheap - we are talking less than AUD400 here - so that it will not be too painful if it gets damaged or stolen during field work or a conference trip.
The new machine came with Windows 10. I think it is a psychological defect on my part, but whenever I try to use Windows 8 or 10 even just for a few minutes I get really upset. Not trying to proselytise here, just my personal problem. A real problem, however, is that this model of laptop comes with storage of only 32 G on a card, no hard-drive. I assume the idea is that many people use the cloud these days (I don't), but still this is a tad on the ridiculous side. Windows 10 took up very nearly half of that space, so install a few things and get a few security updates and you hit the wall.
Having considered these two issues, I carpet-bombed Windows and installed Ubuntu 16.04. By itself this operating system takes up about 3.3 G without and now c. 8 G with various programs and packages I installed on top of it, still only half of what Windows did by itself. So, yeah. Also, I can now click on something and the computer does not have to think for 3-4 seconds before it reacts. As a colleague sardonically commented on the performance issues of Windows, "Intel giveth, and Microsoft taketh away." I also bought a micro card to fit into the little slot on the left side of the laptop, and it works nicely as additional storage, contrary to some comments I have seen on the web. It merely had to be formatted for Ubuntu.
Mostly I use my laptop for simple things, like Skype, checking eMails, writing on a manuscript, etc., but generally not to run time-consuming analyses. That being said, I like to have some analysis software on there too in case it becomes necessary during travel. It has to be admitted that one of the disadvantages of Ubuntu is still that it is not always trivial to find and install specialised software. As I just had to do exactly that, I thought I would use this post to collect my recent experiences. Perhaps somebody will find it useful before it is too much out of date.
First, the software centre was weirdly empty. Here I found a post on the ask ubuntu forum helpful.
If you have the same problem, open terminal and run:
sudo apt update
sudo apt upgrade -y
Now for my personal must-haves and how I got them:
Inkscape
(Vector graphics program, e.g. for editing figures for a manuscript.)
No problem installing from Software Centre.
GIMP
(Pixel graphics editor, for photos.)
No problem installing from Software Centre.
R
(Statistics software.)
There are probably different ways of doing it, but I followed the instructions from digitalocean.
rstudio
(GUI for R.)
Download the relevant .deb file from the program website, right-click and select to open it with the software centre.
Java
(Required to run several of the other programs here.)
I got JDK instead of merely JRE, just to be on the safe side. Open terminal and run:
sudo apt-get update
sudo apt-get install default-jdk
Source of information: digitalocean.
Acrobat PDF Reader
(Sadly it seems to be the only PDF reader on Linux that can edit complex PDFs such as used for grant proposals by some funding agencies. Only a very old version is available as Acrobat has discontinued Linux support.)
Open terminal and run:
sudo apt-get install libatk1.0-0 libc6 libfontconfig1 libgcc1 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk2.0-0 libidn11 libpango1.0-0 libstdc++6 libx11-6 libxext6 libxml2 libxt6 zlib1g lsb-release debconf
wget http://archive.canonical.com/pool/partner/a/acroread/acroread-bin_9.5.5-1raring1_i386.deb
sudo dpkg -i acroread-bin_9.5.5-1raring1_i386.deb
Source of information: ask ubuntu forum.
In my case, however, I experienced some errors. Apparently Acrobat Reader requires some outdated packages to run, and Ubuntu does not want to install them because it has got the newer versions. The system itself then kindly suggested to me to run a command to fix the problem. I hope I remember correctly, but I think it was simply:
sudo apt-get upgrade -f
The f parameter is supposed to fix broken dependencies. That command (or one very much like it) solved the problem for me, and I was able to start the reader.
AliView
(Sequence alignment editor.)
Following the instructions on the program website should have worked, in principle. However, I realised only then that Java was not yet installed, and AliView obviously wouldn't work without it. Download the aliview.install.run file, change its file preferences to make it executable, open terminal, go to relevant folder, run:
sudo ./aliview.install.run
MAFFT
(In my eyes the best sequence alignment tool, can also be called by AliView.)
After experiencing some problems trying to install from the rpm file that is available on the program homepage I found an entry on howtoinstall.co that simplified things. Open a terminal and run simply:
sudo apt-get update
sudo apt-get install mafft
That was easy.
PAUP test versions
(Phylogenetics software implementing various methods.)
This comes as an executable. Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done.
TNT
(Parsimony phylogenetics software.)
Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done. When running the program the first time you will have to accept the license agreement, as opposed to during an installation.
MrBayes
(Bayesian phylogenetics software.)
Available on github and sourceforge. I downloaded and unpacked it, opened a terminal, navigated to the relevant folder, and followed the instructions for compiling that are given in the appropriately named text file. Worked beautifully, only I had to disable Beagle, as prompted during compilation.
FigTree
(Phylogenetic tree viewer.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
Tracer
(To examine the results of MrBayes runs.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
jModelTest
(Model testing for Likelihood and Bayesian phylogenetics. For larger datasets I would not use the laptop, of course, as it takes forever and benefits greatly from parallel processing.)
Java program, so simply get it from the program website, unpack, and set the JAR file and the PhyML program (!) to be executable in their preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
WINE
(Windows emulator, just in case)
No problem installing from Software Centre.
SciTE
(Text editor with many useful functions, for data files etc.)
No problem installing from Software Centre.
Skype
(Video calls.)
Download the .deb file from the program website, right-click and select to open it with the software centre.
ClamTK
(Virus scanner.)
No problem installing from Software Centre.
Yes, this is not exactly high-end, but the point is, I don't want high-end. A really good, high performance, cutting edge laptop would come with two downsides. First, it would be optimised for being high performance and not for being light and small; and I really want something that travels easily. Second, it would be expensive; and I want something cheap - we are talking less than AUD400 here - so that it will not be too painful if it gets damaged or stolen during field work or a conference trip.
The new machine came with Windows 10. I think it is a psychological defect on my part, but whenever I try to use Windows 8 or 10 even just for a few minutes I get really upset. Not trying to proselytise here, just my personal problem. A real problem, however, is that this model of laptop comes with storage of only 32 G on a card, no hard-drive. I assume the idea is that many people use the cloud these days (I don't), but still this is a tad on the ridiculous side. Windows 10 took up very nearly half of that space, so install a few things and get a few security updates and you hit the wall.
Having considered these two issues, I carpet-bombed Windows and installed Ubuntu 16.04. By itself this operating system takes up about 3.3 G without and now c. 8 G with various programs and packages I installed on top of it, still only half of what Windows did by itself. So, yeah. Also, I can now click on something and the computer does not have to think for 3-4 seconds before it reacts. As a colleague sardonically commented on the performance issues of Windows, "Intel giveth, and Microsoft taketh away." I also bought a micro card to fit into the little slot on the left side of the laptop, and it works nicely as additional storage, contrary to some comments I have seen on the web. It merely had to be formatted for Ubuntu.
Mostly I use my laptop for simple things, like Skype, checking eMails, writing on a manuscript, etc., but generally not to run time-consuming analyses. That being said, I like to have some analysis software on there too in case it becomes necessary during travel. It has to be admitted that one of the disadvantages of Ubuntu is still that it is not always trivial to find and install specialised software. As I just had to do exactly that, I thought I would use this post to collect my recent experiences. Perhaps somebody will find it useful before it is too much out of date.
First, the software centre was weirdly empty. Here I found a post on the ask ubuntu forum helpful.
If you have the same problem, open terminal and run:
sudo apt update
sudo apt upgrade -y
Now for my personal must-haves and how I got them:
Inkscape
(Vector graphics program, e.g. for editing figures for a manuscript.)
No problem installing from Software Centre.
GIMP
(Pixel graphics editor, for photos.)
No problem installing from Software Centre.
R
(Statistics software.)
There are probably different ways of doing it, but I followed the instructions from digitalocean.
rstudio
(GUI for R.)
Download the relevant .deb file from the program website, right-click and select to open it with the software centre.
Java
(Required to run several of the other programs here.)
I got JDK instead of merely JRE, just to be on the safe side. Open terminal and run:
sudo apt-get update
sudo apt-get install default-jdk
Source of information: digitalocean.
Acrobat PDF Reader
(Sadly it seems to be the only PDF reader on Linux that can edit complex PDFs such as used for grant proposals by some funding agencies. Only a very old version is available as Acrobat has discontinued Linux support.)
Open terminal and run:
sudo apt-get install libatk1.0-0 libc6 libfontconfig1 libgcc1 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk2.0-0 libidn11 libpango1.0-0 libstdc++6 libx11-6 libxext6 libxml2 libxt6 zlib1g lsb-release debconf
wget http://archive.canonical.com/pool/partner/a/acroread/acroread-bin_9.5.5-1raring1_i386.deb
sudo dpkg -i acroread-bin_9.5.5-1raring1_i386.deb
Source of information: ask ubuntu forum.
In my case, however, I experienced some errors. Apparently Acrobat Reader requires some outdated packages to run, and Ubuntu does not want to install them because it has got the newer versions. The system itself then kindly suggested to me to run a command to fix the problem. I hope I remember correctly, but I think it was simply:
sudo apt-get upgrade -f
The f parameter is supposed to fix broken dependencies. That command (or one very much like it) solved the problem for me, and I was able to start the reader.
AliView
(Sequence alignment editor.)
Following the instructions on the program website should have worked, in principle. However, I realised only then that Java was not yet installed, and AliView obviously wouldn't work without it. Download the aliview.install.run file, change its file preferences to make it executable, open terminal, go to relevant folder, run:
sudo ./aliview.install.run
MAFFT
(In my eyes the best sequence alignment tool, can also be called by AliView.)
After experiencing some problems trying to install from the rpm file that is available on the program homepage I found an entry on howtoinstall.co that simplified things. Open a terminal and run simply:
sudo apt-get update
sudo apt-get install mafft
That was easy.
PAUP test versions
(Phylogenetics software implementing various methods.)
This comes as an executable. Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done.
TNT
(Parsimony phylogenetics software.)
Obtain Linux distribution from program website, unpack, set file preferences to allow the program being executed, done. When running the program the first time you will have to accept the license agreement, as opposed to during an installation.
MrBayes
(Bayesian phylogenetics software.)
Available on github and sourceforge. I downloaded and unpacked it, opened a terminal, navigated to the relevant folder, and followed the instructions for compiling that are given in the appropriately named text file. Worked beautifully, only I had to disable Beagle, as prompted during compilation.
FigTree
(Phylogenetic tree viewer.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
Tracer
(To examine the results of MrBayes runs.)
Java program, so simply get it from the program website, unpack, and set the JAR file to be executable in its preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
jModelTest
(Model testing for Likelihood and Bayesian phylogenetics. For larger datasets I would not use the laptop, of course, as it takes forever and benefits greatly from parallel processing.)
Java program, so simply get it from the program website, unpack, and set the JAR file and the PhyML program (!) to be executable in their preferences. It should then be run by Java whenever it is opened. I find it useful to create a link on the desktop for easier access.
WINE
(Windows emulator, just in case)
No problem installing from Software Centre.
SciTE
(Text editor with many useful functions, for data files etc.)
No problem installing from Software Centre.
Skype
(Video calls.)
Download the .deb file from the program website, right-click and select to open it with the software centre.
ClamTK
(Virus scanner.)
No problem installing from Software Centre.
Friday, November 4, 2016
CBA seminar on molecular phylogenetics
Today I went to a Centre of Biodiversity Analysis seminar over at the Australian National University: Prof. Lindell Bromham on Reading the story in DNA - the core principles of molecular phylogenetic inference. This was very refreshing, as I have spent most of the year doing non-phylogenetic work such as cytology, programming, species delimitation, and building identification keys.
The seminar was packed, the audience was lively and from very diverse fields, and the speaker was clear and engaging. As can be expected, Prof. Bromham started with the very basics but had nearly two hours (!) to get to very complicated topics: sequence alignments, signal saturation, distance methods, parsimony analysis, likelihood phylogenetics, Bayesian phylogenetics, and finally various problems with the latter, including choice of priors or when results merely restate the priors.
The following is a slightly unsystematic run-down of what I found particularly interesting. Certainly other participants will have a different perspective.
Signal saturation or homoplasy at the DNA level erases the historical evidence. Not merely: makes the evidence harder to find. Erases. It is gone. That means that strictly speaking we cannot infer or even estimate phylogenies, even with a superb model, we can only ever build hypotheses.
Phylogenetics is a social activity. The point is that fads and fashions, irrational likes and dislikes, groupthink, the age of a method, and quite simply the availability and user-friendliness of software determine the choice of analysis quite as much as the appropriateness of the analysis. Even if one were able to show that parsimony, for example, works well for a particular dataset one would still not be able to get the paper into any prestigious journal except Cladistics. And yes, she stressed that there is no method that is automatically inappropriate, even distance or parsimony. It depends on the data.
Any phylogenetic approach taken in a study can be characterised with three elements: a search strategy, an optimality criterion, and a model of how evolution works. For parsimony, for example, the search strategy is usually heuristic (not her words, see below), the optimality criterion is minimal number of character changes, and the implicit model is that character changes are rare and absence of homoplasy.
The more sophisticated the method, the harder it gets to state its assumptions. Just saying out loud all the assumptions behind a BEAST run would take a lot of time. Of course that does not mean that the simpler methods do not make assumptions - they are merely implicit. (I guess if one were to spell them out, they would then often be "this factor can safely be ignored".)
Nominally Bayesian phylogeneticists often behave in very un-Bayesian ways. Examples are use of arbitrary Bayes factor cut-offs, not updating priors but treating every analysis as independent, and frowning upon informative topology priors.
Unfortunately, in Bayesian phylogenetics priors determine the posterior more often than most people realise. This brought me back to discussions with a very outspoken Bayesian seven years ago; his argument was "a wrong prior doesn't matter if you have strong data", which if true would kind of make me wonder what the point is of doing Bayesian analysis in the first place.
However, Prof. Bromham also said a few things that I found a bit odd, or at least potentially in need of some clarification.
She implied that parsimony analysis generally used exhaustive searches. Although there was also a half-sentence to the effect of at least originally, I would stress that search strategy and optimality criterion are two very different things. Nothing keeps a likelihood analysis from using an exhaustive search (except that it would not stop before the heat death of the universe), and conversely no TNT user today who has a large dataset would dream of doing anything but heuristic searches. Indeed the whole point of that program was to offer ways of cutting even more corners in the search.
Parsimony analysis is also a form of likelihood analysis. Well, I would certainly never claim, as some people do, that it comes without assumptions. I would say that parsimony has a model of evolution in the same sense as the word model is used across science, yes. I can also understand how and why people interpret parsimony as a model in the specific sense of likelihood phylogenetics and examine what that means for its behaviour and parameterisation compared to other models. But calling it a subset of likelihood analysis still leaves me a bit uncomfortable, because it does not use likelihood as a criterion but simply tree length. Maybe I am overlooking something, in fact most likely I am overlooking something, but to me the logic of the analysis seems to be rather different, for better or for worse.
One of the reasons why parsimony has fallen out of fashion is that "cladistics" is an emotional and controversial topic; this was illustrated with a caricature of Willi Hennig dressed up as a saint. I feel that this may conflate Hennig's phylogenetic systematics with parsimony analysis, in other words a principle of classification with an optimality criterion. Although the topic is indeed still hotly debated by a small minority, phylogenetic systematics is today state of the art, even as people have moved to using Bayesian methods to figure out whether a group is monophyletic or not.
The main reasons for the popularity of Bayesian methods are (a) that they allow more complex models and (b) that they are much faster than likelihood analyses. The second claim surprised me greatly because it does not at all reflect my personal experience. When I later discussed it with somebody at work, I realised that it depends greatly on what software we choose for comparison. I was thinking BEAST versus RAxML with fast bootstapping, i.e. several days on a supercomputer versus less than an hour on my desktop. But if we compare MrBayes versus likelihood analysis in PAUP with thorough bootstrapping, well, suddenly I see where this comes from.
These days you can only get published if you use Bayesian methods. Again, that is not at all my experience. It seems to depend on the data, not least because huge genomic datasets can often not be processed with Bayesian approaches anyway. We can see likelihood trees of transcriptome data published in Nature, or ASTRAL trees in other prestigious journals. Definitely not Bayesian.
In summary, this was a great seminar to go to especially because I am planning some phylogenetics work over summer. It definitely got the old cogs turning again. Also, Prof. Bromham provided perhaps the clearest explanation I have ever heard of how Bayesian/MCMC analyses work, and that may become useful for when I have to discuss them with a student myself...
The seminar was packed, the audience was lively and from very diverse fields, and the speaker was clear and engaging. As can be expected, Prof. Bromham started with the very basics but had nearly two hours (!) to get to very complicated topics: sequence alignments, signal saturation, distance methods, parsimony analysis, likelihood phylogenetics, Bayesian phylogenetics, and finally various problems with the latter, including choice of priors or when results merely restate the priors.
The following is a slightly unsystematic run-down of what I found particularly interesting. Certainly other participants will have a different perspective.
Signal saturation or homoplasy at the DNA level erases the historical evidence. Not merely: makes the evidence harder to find. Erases. It is gone. That means that strictly speaking we cannot infer or even estimate phylogenies, even with a superb model, we can only ever build hypotheses.
Phylogenetics is a social activity. The point is that fads and fashions, irrational likes and dislikes, groupthink, the age of a method, and quite simply the availability and user-friendliness of software determine the choice of analysis quite as much as the appropriateness of the analysis. Even if one were able to show that parsimony, for example, works well for a particular dataset one would still not be able to get the paper into any prestigious journal except Cladistics. And yes, she stressed that there is no method that is automatically inappropriate, even distance or parsimony. It depends on the data.
Any phylogenetic approach taken in a study can be characterised with three elements: a search strategy, an optimality criterion, and a model of how evolution works. For parsimony, for example, the search strategy is usually heuristic (not her words, see below), the optimality criterion is minimal number of character changes, and the implicit model is that character changes are rare and absence of homoplasy.
The more sophisticated the method, the harder it gets to state its assumptions. Just saying out loud all the assumptions behind a BEAST run would take a lot of time. Of course that does not mean that the simpler methods do not make assumptions - they are merely implicit. (I guess if one were to spell them out, they would then often be "this factor can safely be ignored".)
Nominally Bayesian phylogeneticists often behave in very un-Bayesian ways. Examples are use of arbitrary Bayes factor cut-offs, not updating priors but treating every analysis as independent, and frowning upon informative topology priors.
Unfortunately, in Bayesian phylogenetics priors determine the posterior more often than most people realise. This brought me back to discussions with a very outspoken Bayesian seven years ago; his argument was "a wrong prior doesn't matter if you have strong data", which if true would kind of make me wonder what the point is of doing Bayesian analysis in the first place.
However, Prof. Bromham also said a few things that I found a bit odd, or at least potentially in need of some clarification.
She implied that parsimony analysis generally used exhaustive searches. Although there was also a half-sentence to the effect of at least originally, I would stress that search strategy and optimality criterion are two very different things. Nothing keeps a likelihood analysis from using an exhaustive search (except that it would not stop before the heat death of the universe), and conversely no TNT user today who has a large dataset would dream of doing anything but heuristic searches. Indeed the whole point of that program was to offer ways of cutting even more corners in the search.
Parsimony analysis is also a form of likelihood analysis. Well, I would certainly never claim, as some people do, that it comes without assumptions. I would say that parsimony has a model of evolution in the same sense as the word model is used across science, yes. I can also understand how and why people interpret parsimony as a model in the specific sense of likelihood phylogenetics and examine what that means for its behaviour and parameterisation compared to other models. But calling it a subset of likelihood analysis still leaves me a bit uncomfortable, because it does not use likelihood as a criterion but simply tree length. Maybe I am overlooking something, in fact most likely I am overlooking something, but to me the logic of the analysis seems to be rather different, for better or for worse.
One of the reasons why parsimony has fallen out of fashion is that "cladistics" is an emotional and controversial topic; this was illustrated with a caricature of Willi Hennig dressed up as a saint. I feel that this may conflate Hennig's phylogenetic systematics with parsimony analysis, in other words a principle of classification with an optimality criterion. Although the topic is indeed still hotly debated by a small minority, phylogenetic systematics is today state of the art, even as people have moved to using Bayesian methods to figure out whether a group is monophyletic or not.
The main reasons for the popularity of Bayesian methods are (a) that they allow more complex models and (b) that they are much faster than likelihood analyses. The second claim surprised me greatly because it does not at all reflect my personal experience. When I later discussed it with somebody at work, I realised that it depends greatly on what software we choose for comparison. I was thinking BEAST versus RAxML with fast bootstapping, i.e. several days on a supercomputer versus less than an hour on my desktop. But if we compare MrBayes versus likelihood analysis in PAUP with thorough bootstrapping, well, suddenly I see where this comes from.
These days you can only get published if you use Bayesian methods. Again, that is not at all my experience. It seems to depend on the data, not least because huge genomic datasets can often not be processed with Bayesian approaches anyway. We can see likelihood trees of transcriptome data published in Nature, or ASTRAL trees in other prestigious journals. Definitely not Bayesian.
In summary, this was a great seminar to go to especially because I am planning some phylogenetics work over summer. It definitely got the old cogs turning again. Also, Prof. Bromham provided perhaps the clearest explanation I have ever heard of how Bayesian/MCMC analyses work, and that may become useful for when I have to discuss them with a student myself...
Saturday, October 29, 2016
Yet more Majura
Still no energy to write something technical, but we went walking at the nearby nature reserve again.
Massive carpets of daisies, mostly Leucochrysum albicans so far; Xerochrysum and Chrysocephalum are still mostly in bud. I regret not being able to do field work in the interior this year as the rains were fantastic, but this is a slight compensation.
Added six more species to the Wildflowers of Mt Majura post but forgot to change the line that says when it was last updated. Next time.
Massive carpets of daisies, mostly Leucochrysum albicans so far; Xerochrysum and Chrysocephalum are still mostly in bud. I regret not being able to do field work in the interior this year as the rains were fantastic, but this is a slight compensation.
Added six more species to the Wildflowers of Mt Majura post but forgot to change the line that says when it was last updated. Next time.
Sunday, October 23, 2016
Wildflowers of Mt Majura updated
We went for a walk on Mt Majura Nature Reserve today and saw quite a few species there for the first time.
Of greatest interest I found the above Ophioglossum lusitanicum (Ophioglossaceae), which I would never have expected. It is tiny and thus easily overlooked. Unusually (although admittedly not uniquely) for its group this species tends to have several leaves at the same time.
I took the opportunity to update the Wildflowers of Mt Majura post on this blog.
Of greatest interest I found the above Ophioglossum lusitanicum (Ophioglossaceae), which I would never have expected. It is tiny and thus easily overlooked. Unusually (although admittedly not uniquely) for its group this species tends to have several leaves at the same time.
I took the opportunity to update the Wildflowers of Mt Majura post on this blog.
Germany trip 2016, part 7: Singapore Airport
Back in Australia! We again flew via Singapore, and this time I had the camera with me.
Above the Butterfly Garden in the airport. It is a large open space across both levels of the building.
There are, obviously, butterflies. Apart from the various plants growing in the garden they are provided with cut-flowers like these and pineapple slices, both of which are placed on tables where the travellers can watch with ease.
While many plants were clearly chosen for having flowers adapted to butterfly pollination, there are also many that are simply ornamental. Such as this Selaginella, which of course does not have any flowers at all.
And in a previous post I already mentioned the carnivorous Nepenthes. Again the flowers are not precisely butterfly-attractants, but they are interesting.
Above the Butterfly Garden in the airport. It is a large open space across both levels of the building.
There are, obviously, butterflies. Apart from the various plants growing in the garden they are provided with cut-flowers like these and pineapple slices, both of which are placed on tables where the travellers can watch with ease.
While many plants were clearly chosen for having flowers adapted to butterfly pollination, there are also many that are simply ornamental. Such as this Selaginella, which of course does not have any flowers at all.
And in a previous post I already mentioned the carnivorous Nepenthes. Again the flowers are not precisely butterfly-attractants, but they are interesting.
Friday, October 14, 2016
Germany trip 2016, part 6: Hamburg Botanic Garden
Today we visited the Botanic Garden of Hamburg, Germany. Not, however, the old, well-known park Planten un Blomen, but the gardens at the second site in the suburb of Klein Flottbek. They are right next to the biological teaching and research centre (Biozentrum) of the Hamburg University.
The gardens are large and offer a huge diversity of sections, including steppe plants, crop plants, medical plants, regional sections representing everything from northern Germany to South America, and much more. A few examples:
The Bauerngarten, or farmer's garden. It features a nice selection of useful plants and ornamentals. There are also some old farming machines exhibited in a corner.
The garden designers show some humour in the Alpinum, the alpine section. Here is a sign as one would see it in the German Alps, reading in translation: Experience with the Alpine environment, a sure step, and a head for heights required. Signed, the German Alps Society.
A few metres on we find this Gipfelkreuz, as one would usually see on the summit of a large mountain (in overly Christian countries, that is). The background shows what dizzying heights the intrepid Alpine hiker will have braved at this point.
While on the topic of crosses, the weirdest part of the botanic gardens might be the Bibelgarten, which consists of plants mentioned in the holy book of one particular religion and signage listing the relevant bible verses. Let's just say that Germany is not the most secular country on the planet and move on.
Much nicer is the Asian section. Not only is it very well landscaped and features beautiful plants...
...it also includes a Japanese rock garden. Despite a slightly confusing sign that seems to forbid it visitors are invited to walk across the larger rocks and the platform but obviously shouldn't step onto the pebble patterns.
Finally, the systematic section. Despite being very new its explanatory signage suffers a bit from scala naturae thinking (e.g. ginkgoes are described as the "oldest" gymnosperms). It is, however, an unusually well landscaped systematic section; this kind of display is all too often built as a simple, linear row of flowerbeds.
The garden does not have an entry fee. Unfortunately the visitor shop is only open on weekends.
Before seeing the gardens I was also able to pay a visit to the Hamburg Herbarium (HBG) and to study some specimens two levels below the ground. The herbarium is huge - I was told 1.8 million specimens -, and the vaults are accordingly large and were in fact something of a maze to me. One factor may be that, as the picture shows, the specimens are not stored in compactus units. I am grateful that I was able to examine a species I could not lay my hands on in Australia, so all in all a great day today.
The gardens are large and offer a huge diversity of sections, including steppe plants, crop plants, medical plants, regional sections representing everything from northern Germany to South America, and much more. A few examples:
The Bauerngarten, or farmer's garden. It features a nice selection of useful plants and ornamentals. There are also some old farming machines exhibited in a corner.
The garden designers show some humour in the Alpinum, the alpine section. Here is a sign as one would see it in the German Alps, reading in translation: Experience with the Alpine environment, a sure step, and a head for heights required. Signed, the German Alps Society.
A few metres on we find this Gipfelkreuz, as one would usually see on the summit of a large mountain (in overly Christian countries, that is). The background shows what dizzying heights the intrepid Alpine hiker will have braved at this point.
While on the topic of crosses, the weirdest part of the botanic gardens might be the Bibelgarten, which consists of plants mentioned in the holy book of one particular religion and signage listing the relevant bible verses. Let's just say that Germany is not the most secular country on the planet and move on.
Much nicer is the Asian section. Not only is it very well landscaped and features beautiful plants...
...it also includes a Japanese rock garden. Despite a slightly confusing sign that seems to forbid it visitors are invited to walk across the larger rocks and the platform but obviously shouldn't step onto the pebble patterns.
Finally, the systematic section. Despite being very new its explanatory signage suffers a bit from scala naturae thinking (e.g. ginkgoes are described as the "oldest" gymnosperms). It is, however, an unusually well landscaped systematic section; this kind of display is all too often built as a simple, linear row of flowerbeds.
The garden does not have an entry fee. Unfortunately the visitor shop is only open on weekends.
Before seeing the gardens I was also able to pay a visit to the Hamburg Herbarium (HBG) and to study some specimens two levels below the ground. The herbarium is huge - I was told 1.8 million specimens -, and the vaults are accordingly large and were in fact something of a maze to me. One factor may be that, as the picture shows, the specimens are not stored in compactus units. I am grateful that I was able to examine a species I could not lay my hands on in Australia, so all in all a great day today.
Subscribe to:
Posts (Atom)