Wednesday, March 26, 2014

Exploring Statistics for Metagenomic Datasets

Recent lab discussions have made me think a lot about statistical tests we can use to detect and verify differences between metagenomic datasets. Since I don't have a strong background in statistics, my knowledge of this topic is still evolving - the scale and distribution of genomic datasets can be a tricky issue to deal with in a lot of statistical tests, it seems.

Some of of the most useful resources I've found so far are as follows (feel free to comment and recommend more resources):

Parks, D. H., & Beiko, R. G. (2010). Identifying biologically relevant differences between metagenomic communities. Bioinformatics, 26(6), 715–721. doi:10.1093/bioinformatics/btq041 (good rundown of the different statistical techniques applied to genomic data, including their implementation in the STAMP pipeline)

Primmer, C. R., Papakostas, S., Leder, E. H., Davis, M. J., & Ragan, M. A. (2013). Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research. Molecular Ecology, 22(12), 3216–3241. doi:10.1111/mec.12309 (especially Box 3 - Gene Ontology enrichment tests)

Metagenome Ordination in IMG - provides a good comparison of PCA vs. PCoA vs. NDMS, particularly in regard to how each of these statistics are calculated (and differ from one another).


Dinsdale, E. A., Edwards, R. A., Bailey, B. A., Tuba, I., Akhter, S., McNair, K., et al. (2013). Multivariate analysis of functional metagenomes. Frontiers in Genetics, 4, 41. doi:10.3389/fgene.2013.00041 (added to list 5/10/14 - comprehensive and thought-provoking overview of metagenomic data analysis)

No comments:

Post a Comment