Showing posts with label microbial eukaryoes. Show all posts
Showing posts with label microbial eukaryoes. Show all posts

Thursday, March 27, 2014

Lateral Gene Transfer detected in Eukaryotic rRNA genes

This paper is an example of super cool science that also makes me worry. Eukaryote are known to have lower levels of Lateral Gene Transfer (LGT), and before this paper I assumed that LGT would not impact eukaryotic rRNA genes. However, this not so according to Yabuki et al. (2014):
Here, we report the first case of lateral transfer of eukaryotic rRNA genes. Two distinct sequences of the 18S rRNA gene were detected from a clonal culture of the stramenopile, Ciliophrys infusionum. One was clearly derived from Ciliophrys, but the other gene originated from a perkinsid alveolate. Genomewalking analyses revealed that this alveolate-type rRNA gene is immediately adjacent to two proteincoding genes (ubc12 and usp39), and the origin of both genes was shown to be a stramenopile (that is, Ciliophrys) in our phylogenetic analyses. These findings indicate that the alveolate-type rRNA gene is encoded on the Ciliophrys genome and that eukaryotic rRNA genes can be transferred laterally.
Why is this paper worrisome? Well, if LGT of rRNA genes is a widespread phenomenon in microbial eukaryotes, it will conflate biodiversity estimates obtained from environmental sequencing studies. If you had a environmental rRNA Illumina dataset, your bioinformatic analysis would show taxonomic assignments for an alveolate and stremenopile (detecting 2 taxa from one genome, one true assignment, one false). The authors cite this concern in their conclusion:
These large-scale [environmental] surveys may detect transferred rRNA genes and such transferred rRNA genes may confuse our understanding of the true diversity and distribution of microbial eukaryotes, even if the frequency of lateral transfers of the rRNA gene is rare and the copy numbers of the transferred rRNA gene in environments are low. We agree that environmental rRNA gene surveys with PCR are still useful and effective to estimate the diversity/ distribution of microbial eukaryotes. However, the fact that recovered rRNA gene sequences do not always reflect the actual existence of microbial eukaryotes corresponding to these sequences should be kept in mind based on our findings. 
In other words, more research is needed to determine exactly how widespread this rRNA LGT phenomenon is in eukaryotes...it may be something else we need to take into account when designing software workflows for environmental sequence data.

Reference:

Yabuki, A., Toyofuku, T., & Takishita, K. (2014). Lateral transfer of eukaryotic ribosomal RNA genes: an emerging concern for molecular ecology of microbial eukaryotes, 1–4. doi:10.1038/ismej.2013.252

Monday, January 6, 2014

NRC survey: Research Priorities for Marine Science

I received an e-mail from the INDEEP mailing list, asking me to participate in a Virtual Town Hall on marine science research priorities, currently being run by the NRC. Here's the rundown from their website:
The National Research Council, at the request of the National Science Foundation, is seeking guidance from the ocean sciences community on the prioritization of research and facilities for the coming decade. The Decadal Survey of Ocean Sciences (DSOS) committee has been assembled for this task. To fulfill its charge, the DSOS committee is asking for community input via this Virtual Town Hall. To submit your input, please fill out the following identifying information, since anonymous comments will not be collected or posted. The deadline to submit your comments is March 15, 2014.
I figured I'd post my survey answers here (it would be great to generate some discussion about how we can promote greater emphasis on genomic tools and high-throughput sequencing in marine ecosystems - in particular the deep sea):

Across all ocean science disciplines, please list 3 important scientific questions that you believe will drive ocean research over the decade.

1) What is the role of microbial processes in ecosystem function?

2) How do microbes respond to (and impact) climate change?

3) How do we integrate knowledge from different fields (e.g. physical oceanography, biogeochemistry, taxonomy, marine biology) to gain a more comprehensive view of the marine environment?

Within your own discipline, please list 3 important scientific questions that you believe will drive ocean research over the next decade.

1) Characterizing phylogeographic patterns in microbial eukaryotes using genomic data. What is the proportion of comopolitan vs. regionally restricted species in different marine habitats?

2) Linking genomic data (DNA, RNA, genome sequences) to the existing body of morphological, ecological and taxonomic data. Particularly important for microbial species where each of these data types exists in discipline-specific silos. How can such linked data further our understanding of marine ecosystems?

3) How do we build accurate models (e.g. using robust algorithms and existing data as training sets) to predict species distributions and the potential impacts of climate change?

Please list 3 ideas for programs, technology, infrastructure, or facilities that you believe will play a major role in addressing the above questions over the next decade. Please consider both existing and new technology/facilities/infrastructure/programs that could be deployed in this timeframe. What mechanisms might be identified to best leverage these investments (interagency collaborations, international partnerships, etc.)?

1) In order to address ecosystem-scale questions, and use cutting-edge methods to do so, the marine science community (particularly ecologists and taxonomists) need to forge links with researchers in genomics and computational biology. DNA sequencing is largely under utilized in marine environments (notably lacking in the deep-sea), yet it offers a deep, cost-effective view of species, populations, and communities. Yet, computational expertise is needed to effectively apply genomic tools to marine systems, and that expertise must come from researchers who are knowledgeable about current software and algorithms (workflows optimized for "big data").

2) Funding initiatives or programs emphasizing microbial eukaryotes are needed to complement the (currently much greater) emphasis on bacteria/archaea, macro fauna and megafauna. Meiofauna and protists underpin many key ecosystem processes (e.g. nutrient cycling), but their role in marine habitats is perpetually understudied. We lack even a basic understanding of global biodiversity and species distributions for the majority of microbial metazoan phyla.

3) Marine sampling protocols MUST adopt forward-looking approaches. Ship time is expensive, and samples from habitats such as the deep-sea are precious and difficult to obtain (particularly for researchers in the genomics or computational biology communities, who may not have the professional connections needed to obtain biological samples). Many sample preservation methods do not consider the potential long-term use of a sample; for example, using formalin to preserve sediment immediately destroys the possibility of using that sample for DNA sequencing. There are many alternate sample preservation methods that preserve both DNA and morphological features (e.g. DESS is effective for sampling microbial metazoa). Giving deeper thought to sample collection, and prioritizing DNA preservation from diverse marine environments, is CRITICAL for furthering our understanding of marine biodiversity, biogeography, and ecology.

To give your own input, fill out the survey at this link: http://nas-sites.org/dsos2015/

Thursday, September 19, 2013

Microbial Phylogenies have the *least accessible* data in systematics

This month in PLoS Biology, "Lost Branches on the Tree of Life" gives us a pretty stark overview of data sharing and accessibility in the systematics community. The paper assessed just how many published phylogeny papers also deposited their corresponding sequence alignments, tree files, and program parameters. The grim news:
...only 16.7%, 1,262 from a total of 7,539 publications surveyed, provided accessible alignments/trees (Figures 1 and 2). Our attempts to obtain datasets directly from authors were only 16% successful (61/375; see Table S4), and we estimate that approximately 70% of existing alignments/trees are no longer accessible. Thus, we conclude that most of the underlying sequence alignments and phylogenetic trees produced by the systematic community during the past several decades are essentially lost, accessible only as static figures in a published journal article with no capacity for subsequent manipulation. Furthermore, when data are deposited, they are often incomplete (e.g., what characters were excluded, accepted taxon names; see Text S1 and Figure S1). Our survey of publications that implemented BEAST revealed that only 11 out of 100 (11%) examined studies provided access to the underlying xml input file, which is critical for reproducing BEAST results. Although funding agencies often require all data to be accessible from funded publications, our results reveal this is more the exception than the rule.
What made me cringe even more is that my discipline (microbial systematics - including microbial eukaryotes, bacteria and archaea) are the worst offenders when it comes to data sharing. The green line indicating full data deposition is pretty much flatlining in some years for microbes! (Drew et al. 2013)


I'll be the first to admit that my own data is part of the problem - when I was doing my PhD, no one ever had a conversation with me about data reproducibility and sharing. I made my best effort to publish the supplemental files I thought would be useful, but at that time I wasn't in the loop about scientific reproducibility and best practices for data archiving. For my nematode phylogeny paper in BMC Evoltionary Biology, I did upload the original ARB databases I used to construct and edit the rRNA structural alignments; but in hindsight, this file requires knowledge of the ARB software itself (not an easy package to use), and I didn't even think to publish a FASTA alignment file or a Nexus tree file. Partially this was because my Phylogeny papers involved a multitude of topology tests and I didn't think it was correct to pick just "one tree" to represent my spectrum of results.

I've been thinking about this issue a lot recently, and taking strides to correct my past mistakes. I'm now digging through old PhD files to find my alignments and tree files to contribute a nematode phylogeny for the Open Tree of Life project. I'll also post these data on Figshare so my data will no longer be another "Lost Branch" on the Tree of Life.

Reference:

Drew BT, Gazis R, Cabezas P, Swithers KS, Deng J, Rodriguez R, et al. (2013) Lost Branches on the Tree of Life. PLoS Biology, 11(9):e1001636.