Wednesday, August 27, 2014

Final update on using fastStructure and similar software

After my somewhat mixed experience trying to use fastStructure, I have recently found the time to throw my data at two other programs for inferring population structure.

To recap, I have thousands of SNPs for two groups of species, in one case from 91 individuals and in the other from 224 individuals and I want to know how best to group the individuals into separate 'populations', in the present case potential species. I originally used fastStructure because it was new and supposedly written specifically for large numbers of SNPs, but the results were ultimately odd. The clusters didn't make very much sense and the program found virtually no admixed individuals, that is hybrids, although there really should have been some.

Earlier this week I then tried the R package adegenet. On the plus side, it turned out to be very simple and user-friendly. Of course you need to know how to use R, but the manual of the package is well written, and adegenet has a straightforward "read" function for importing datasets. It easily imported my Structure file without any hiccups, and after that it was a simple manner of handing my data over to adegenet's "find.clusters" function.

However, I tried different settings and did not get reasonable populations with any of them. One problem in my dataset are missing data, and I found that setting allele frequencies to zero for those cases produced the most meaningful results, but still there were several populations with no samples in them and the populations that had samples didn't make a lot of sense.

Yesterday I finally tried my luck with good old Structure itself - somewhat hesitatingly because I feared it would be very slow with such a big dataset. Yes, even for my smaller dataset what I wanted to do ran overnight, but that is still faster than I feared, and the results are worth it. The populations make sense, and in marked contrast to fastStructure it finds evidence of admixture. My larger dataset will probably need several days to be analysed, but if that is necessary so be it.

There is probably a reason why that program is the most popular in the area...

No comments:

Post a Comment