PhyloBotanist: Bioregionalisation part 4: networks

Sunday, February 4, 2018

Bioregionalisation part 4: networks

Having examined a clustering approach to bioregionalisation, today I will try to illustrate the increasingly popular alternative of network analysis.

Consider again our hypothetical study area of five cells with five taxa, where we want to know how to delimit bioregions (or phytoregions, given that the taxa are plant species) in an objective way:

The first step in the analysis is to interpret these data as a network. Specifically, as we have two different types of elements, what we are dealing with is called a bipartite network. Each type of element is connected directly only to elements of the other type, and to elements of its own type only via the other. In this case, the plant species are connected to all cells they occur in, and cells are connected to all plant species occurring in them:

Once we have scored this kind of network structure in a way that the software of our choice understands (either a list of connections or a matrix with 0s and 1s), we can use an algorithm that divides the network into modules. This algorithm tries to maximise connections within a module and to minimise the connections between modules, which in bioregion terms again means to maximise endemism.

As indicated in the posts on clustering, network analysis has the great advantage that it does not only produce groups, it also provides a reproducible and objective answer for the question about the optimal number of groups, whereas in clustering analysis the user still has to make a subjective decision.

That being said, it is always possible to take a large module by itself and explore its internal structure, if so desired, although of course the answer may be that there are no meaningful subdivisions any more.

Either way, any such algorithm will return modules, and what we are mostly interested in is what cells belong to what module. Nonetheless we would also be able to infer what species belong to what module, and depending on the type of network analysis we may be able to get other statistics that may be of interest for the network and for each individual module or even each element.

There are two main approaches to network analysis that have been explored in bioregionalisation. The first is called the Map Equation, developed by Rosvall et al. (2009) and promoted with a sleek, eponymous website. It was first applied to bioregionalisation by Vilhena & Antonelli (2015). One of its advantages is that it is the faster of the two, which may be particularly attractive if one's dataset is large and complex.

The second is Modularity Analysis (Newman, 2006). This is the approach that I prefer personally, after colleagues at my institution conducted a study comparing the two and clustering against each other (Bloomfield et al., 2017). It is slower than the Map Equation, but it seems to be better at recognising the transitional nature of cells situated between two 'pure' modules, which the Map Equation appears to tend to group into distinct modules in their own right.

Next time, how to do modularity analysis in practice.

References

Bloomfield NJ, Knerr N, Encinas-Viso F, 2017. A comparison of network and clustering methods to detect biogeographical regions. Ecography 41: 1-10.

Newman MEJ, 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, USA 103: 8577-8582.

Rosvall M, Axelsson D, Bergstrom CT, 2009. The map equation. arXiv: 0906.1405 [physics.soc-ph]

Vilhena DA, Antonelli A, 2015. A network approach for identifying and delimiting biogeographical regions. Nature Communications 6: 6848.

Sunday, February 4, 2018

Bioregionalisation part 4: networks

No comments:

Post a Comment