Yesterday, today and tomorrow I am participating in a workshop called 'Computational Macroevolution' run by Dan Rabosky, one of the leading experts in evolutionary modelling. Today we are learning about modelling changes in diversification rates, but yesterday he did a more general introduction into handling phylogenetic trees in R, especially with the APE package.
This was really useful because so far I have used R only for general statistics or plotting, never for dealing with trees. Among other things, it is interesting to see how R actually saves trees. It works like this:
In R, a phylogenetic tree is a list of four types of data.
The first is called Nnodes and is simply the number of internal nodes that the tree has. In a fully resolved tree, this should be number of terminals minus one.
Unsurprisingly, there is also a vector of terminal names.
The third element of the list are the internodes. They are organised as a vector of pairs of node numbers, where the first is always the parent node and the second is the daughter node. So a tree like ((A,B),(C,D)) would be structured as follows:
5 - 6
5 - 7
6 - 1
6 - 2
7 - 3
7 - 4
If there are N terminals, then nodes 1-N are the terminals themselves, obviously in the same order as in the vector of terminal names. Node number N+1 is always the root node, the others follow by order of distance from the root although really that is an arbitrary convention.
It is also obvious that a fully resolved and rooted tree will have two entries in the vector of internodes for every internal node. If there are polytomies then there are more than twice as many internodes as internal nodes, and there are less than N-1 internal nodes.
Finally, there is a fourth element of a phylogenetic tree, the vector of branch lengths (if any). The order is the same as in the vector of internodes.
I don't know exactly how an unrooted tree is handled but that will be easy to figure out. Anyway, useful stuff.