...only 16.7%, 1,262 from a total of 7,539 publications surveyed, provided accessible alignments/trees (Figures 1 and 2). Our attempts to obtain datasets directly from authors were only 16% successful (61/375; see Table S4), and we estimate that approximately 70% of existing alignments/trees are no longer accessible. Thus, we conclude that most of the underlying sequence alignments and phylogenetic trees produced by the systematic community during the past several decades are essentially lost, accessible only as static figures in a published journal article with no capacity for subsequent manipulation. Furthermore, when data are deposited, they are often incomplete (e.g., what characters were excluded, accepted taxon names; see Text S1 and Figure S1). Our survey of publications that implemented BEAST revealed that only 11 out of 100 (11%) examined studies provided access to the underlying xml input file, which is critical for reproducing BEAST results. Although funding agencies often require all data to be accessible from funded publications, our results reveal this is more the exception than the rule.What made me cringe even more is that my discipline (microbial systematics - including microbial eukaryotes, bacteria and archaea) are the worst offenders when it comes to data sharing. The green line indicating full data deposition is pretty much flatlining in some years for microbes! (Drew et al. 2013)
I've been thinking about this issue a lot recently, and taking strides to correct my past mistakes. I'm now digging through old PhD files to find my alignments and tree files to contribute a nematode phylogeny for the Open Tree of Life project. I'll also post these data on Figshare so my data will no longer be another "Lost Branch" on the Tree of Life.
Reference:
Drew BT, Gazis R, Cabezas P, Swithers KS, Deng J, Rodriguez R, et al. (2013) Lost Branches on the Tree of Life. PLoS Biology, 11(9):e1001636.
No comments:
Post a Comment