Showing posts with label rRNA. Show all posts
Showing posts with label rRNA. Show all posts

Thursday, March 27, 2014

Lateral Gene Transfer detected in Eukaryotic rRNA genes

This paper is an example of super cool science that also makes me worry. Eukaryote are known to have lower levels of Lateral Gene Transfer (LGT), and before this paper I assumed that LGT would not impact eukaryotic rRNA genes. However, this not so according to Yabuki et al. (2014):
Here, we report the first case of lateral transfer of eukaryotic rRNA genes. Two distinct sequences of the 18S rRNA gene were detected from a clonal culture of the stramenopile, Ciliophrys infusionum. One was clearly derived from Ciliophrys, but the other gene originated from a perkinsid alveolate. Genomewalking analyses revealed that this alveolate-type rRNA gene is immediately adjacent to two proteincoding genes (ubc12 and usp39), and the origin of both genes was shown to be a stramenopile (that is, Ciliophrys) in our phylogenetic analyses. These findings indicate that the alveolate-type rRNA gene is encoded on the Ciliophrys genome and that eukaryotic rRNA genes can be transferred laterally.
Why is this paper worrisome? Well, if LGT of rRNA genes is a widespread phenomenon in microbial eukaryotes, it will conflate biodiversity estimates obtained from environmental sequencing studies. If you had a environmental rRNA Illumina dataset, your bioinformatic analysis would show taxonomic assignments for an alveolate and stremenopile (detecting 2 taxa from one genome, one true assignment, one false). The authors cite this concern in their conclusion:
These large-scale [environmental] surveys may detect transferred rRNA genes and such transferred rRNA genes may confuse our understanding of the true diversity and distribution of microbial eukaryotes, even if the frequency of lateral transfers of the rRNA gene is rare and the copy numbers of the transferred rRNA gene in environments are low. We agree that environmental rRNA gene surveys with PCR are still useful and effective to estimate the diversity/ distribution of microbial eukaryotes. However, the fact that recovered rRNA gene sequences do not always reflect the actual existence of microbial eukaryotes corresponding to these sequences should be kept in mind based on our findings. 
In other words, more research is needed to determine exactly how widespread this rRNA LGT phenomenon is in eukaryotes...it may be something else we need to take into account when designing software workflows for environmental sequence data.

Reference:

Yabuki, A., Toyofuku, T., & Takishita, K. (2014). Lateral transfer of eukaryotic ribosomal RNA genes: an emerging concern for molecular ecology of microbial eukaryotes, 1–4. doi:10.1038/ismej.2013.252

Monday, October 21, 2013

Intra-Genomic Variation in the Ribosomal Repeats of Nematodes

Happy to announce our new paper, published last week in PLoS ONE:
Bik HM, Fournier D, Sung W, Bergeron RD, Thomas WK (2013) Intra-Genomic Variation in the Ribosomal Repeats of Nematodes. PLoS ONE 8(10): e78230. doi:10.1371/journal.pone.0078230
This manuscript was in the works for a while, and was based on undergraduate research carried out by co-author Dave Fournier while he was an undergraduate at UNH. The rationale? To assess the level of variation in rRNA loci within a single nematode genome, as well as between genomes of different nematode species. rRNA is typically present as a repeated, muti-copy locus in eukaryote genomes, which makes it hard (impossible) to correlate gene abundance to organismal abundance in environmental sequencing studies. Unlike bacteria, there is no known correction that we can apply to "normalize" DNA for species with multiple rRNA copies - every species has multiple copies (sometimes into the thousands!) and we know little about the typical ranges of rRNA copy number across different eukaryote groups.

In this manuscript were were asking questions about both rRNA copy number (how many rRNA repeats are present in a genome?) and intragenomic variation (how many of these copies are unique rRNA gene sequences within a genome, and across rRNA variants are there "hotspots" for base polymorphisms?). We wanted to determine if we could spot patterns that govern rRNA copy number and level of intragenomic variation amongst gene copies - taking into account things like genome size and phylogenetic distance.

The result? There doesn't seem to be any pattern determining copy number or intragenomic rRNA variants across species, which kind of makes biodiversity estimates from environmental rRNA studies feel like a shot in the dark. But we DID find some interesting evidence of selection acting on rRNA loci:
By applying the same approach to four C. elegans mutation accumulation lines propagated by repeated bottlenecking for an average of ~400 generations, we find on average a 2-fold increase in repeat copy number (rate of increase in rRNA estimated at 0.0285-0.3414 copies per generation), suggesting that rRNA repeat copy number is subject to selection. Within each Caenorhabditis species, the majority of intragenomic variation found across the rRNA repeat was observed within gene regions (18S, 28S, 5.8S), suggesting that such intragenomic variation is not a product of selection for rRNA coding function.
Divergence and polymorpishm are illustrated in the figure below:

Figure 1. Variation observed in nematode ribosomal arrays. (A) Divergence in rRNA repeats observed between the genomes of C. elegans, C. briggsae, C. japonica, and C. remanei; here, base substitutions are denoted as transitions or transversions, while complex polymorphisms represent any type of insertion, deletion, or inversion event. (B) Polymorphic positions in rRNA repeats observed within the genomes of each Caenorhabditis species. Results suggest that the pattern of intragenomic polymorphisms is unique across repeats within a species, whereas patterns of interspecific divergence reflect a strong signature of natural selection for rRNA function. 

The data on genomic patterns in eukaryotic rRNA is still very preliminary, and this paper is just a starting point. Hopefully this type of work will inspire similar analyses in other groups - we desperately need more knowledge, particularly for non-model organisms.

Thursday, March 28, 2013

Primer tests for Fungal ITS regions...plus, statistics!

Reading a good paper is so inherently satisfying--and if you want to share my satisfaction, I recommend this recent piece of literature:
Bazzicalupo AL, Bálint M, Schmitt I. (2013) Comparison of ITS1 and ITS2 rDNA in 454 sequencing of hyperdiverse fungal communities. Fungal Ecology, 6(1):102–9. 
I only wish this paper wasn't paywalled, because it contains quite a bit of useful information that is extremely relevant for the environmental sequencing community.

Firstly, the authors carried out a comparison of ITS primer sets and assessed their ability (and overlap) in recovering different fungal Orders, Families, Genera, and Species. I'm a big fan--these type of primer comparisons are important for figuring out what we might be missing in any given PCR-based approach.
Our results suggest that ITS2 may be more variable and recovers more of the molecular diversity. We confirm an earlier in silico study showing that ITS1 and ITS2 yielded somewhat different taxonomic community compositions when blasted against public databases. However, we demonstrate that both ITS1 and ITS2 reveal similar patterns in community structure when analyzed in a community ecology context. [Bazzicalupo et al. 2013]
Secondly, I feel like I learned some statistics by reading this paper! Or at least, I understood why authors chose the methods they did. I really liked that this paper includes detailed explanation of the statistical tests used to assess the ITS regions and make OTU comparisons. For example:
We compared OTU abundance distributions between the ITS1 and ITS2 datasets at all similarity levels with the KolmogoroveSmirnov (KS) test to see whether the ITS1 or ITS2 would project higher OTU rich- ness in the samples. KS tests are often used to test the distribution of datasets against other distributions, so one may use it to test if a dataset is e.g. normally distributed (Conover 1999). However, the KS test may also be used to compare the shapes of two empirical distributions. Species abundance distributions contain information about both the richness and evenness, thus the comparison of distributions is more meaningful than comparing the means of distributions with e.g. t-tests (Phillips et al. 2012). [Bazzicalupo et al. 2013]
I don't have a strong statistics background (but I'm very aware that I need to become more competent in this area), and this paper helped me understand what types of statistical tests I could apply to environmental sequence data in future analyses. In this regard, the Bazzicalupo et al. methods section was a great change of narrative, compared to the stats-name-dropping-without-explanation I see so often in other papers.




Thursday, January 24, 2013

SMBE Meeting on Eukaryotic -Omics: April 29-May 2 at #UCDavis

The website is built, speakers have been lined up, and we're ready to announce it to the world:



Myself, along with my former PI Kelley Thomas at the University of New Hampshire, received funding from the Society for Molecular Biology and Evolution to host an SMBE Satellite Meeting focused on Eukaryotic -Omics at UC Davis this spring. The meeting dates have been set as April 29-May 2, 2013, and the meeting description is as follows:
The SMBE Satellite Meeting on Eukaryotic -Omics will bring together an interdisciplinary pool of researchers to discuss current efforts, challenges, and future directions for high-throughput sequencing approaches focused on microbial eukaryotes (environmental studies of non-model organisms). The meeting program will encompass investigations of eukaryote biodiversity, ecology, and evolution, using approaches such as rRNA marker genes, shotgun metagenomics, metatranscriptomics, and computational biology tools and software pipelines.
See the meeting website (http://www.smbe.org/eukaryotes/) for program announcements, registration details, and travel award information. We're currently in talks to tack on a QIIME workshop at the end of the meeting (tentative dates May 2-4), so keep an eye our for further details. The official conference hashtag will be #SMBEeuks on Twitter.

STEM diversity has been on my mind a lot lately, particularly given the Eisen lab's obsession with equality in gender representation. So I'm very excited to announce that our call for travel award applications includes a heavy focus on diversity--encouraging early-career applicants as well as those from underrepresented groups. Deadline for abstract submission and travel grant applications is Feburary 22, 2013 - mark it on your calendars!


Thursday, December 13, 2012

Defining a DNA barcording locus for protists

Last month's PLoS Biology had a community article devoted to the Protist Working Group recently initiated though the Consortium for the Barcode of Life (CBOL). Now, people have mixed (often vehement) opinions about CBOL - I won't go into the history or debate about DNA barcoding here, but it's worth checking out posts by Dave Lunt at EvoPhylo (Rewriting the invention of DNA barcoding) and Jonathan Eisen at the Tree of Life ("Barcoding" researchers keep ignoring microbes and history).

Since its inception, CBOL has been overwhelmingly focused on the mitochondrial Cytochrome c Oxidase 1 loci that can be "universally" amplified (unless you're trying to barcode nematodes or any other taxon where these universal primer sets don't work). However, in recent years I've been pleased to see the formation of different taxon-specific working groups focused on ribosomal rRNA genes as "official" DNA barcodes (e.g. ITS for Fungi). Ribosomal loci have a much longer history--and thus more available reference data--in most eukaryote groups, and have pretty much been adopted as de facto barcodes for molecular studies (read: not yet CBOL-approved).

Even if all researchers working on a specific taxon are using the same gene to complement morphological/ecological information, I still strongly support the formation of these CBOL working groups. As more people adopt high-throughput sequencing approaches, we need coordination and interaction across different taxonomic communities. For a given taxon, discussions focused on the barcoding locus will simply get people talking, illuminating what different labs are actually doing and helping us to determine the most useful (although probably not perfect) community standards.

So CBOL working groups essentially gather all the experts within a particular taxon, and have them discuss the merits and drawbacks of different loci for molecular identification of species:
Identifying the standard barcode regions for protists and assembling a reference library are the main objectives of the Protist Working Group (ProWG), initiated by the Consortium for the Barcode of Life (CBOL, http://www.barcodeoflife.org/). The ProWG unites a panel of international experts in protist taxonomy and ecology, with the aim to assess and unify the efforts to identify the barcode regions across all protist lineages, create an integrated plan to finalize the selection, and launch projects that would populate the reference barcode library. (Pawlowski et al. 2012)
I was highly encouraged by the Protist Working Group's stance - instead of trying to force everyone to use a single locus (e.g. defining ONE barcode for all protists), they advocate a much more realistic approach:
Because of their long, independent, and complex evolutionary histories, protists are so genetically variable that it is virtually impossible to find a single universal DNA barcode suitable for all of them. The ProWG consortium therefore recommends a two-step barcoding approach, comprising a preliminary identification using a universal eukaryotic barcode, called the pre-barcode, followed by a species-level assignment using a group-specific barcode (Figure 3). In this nested strategy, the ~500 bp variable V4 region of 18S rDNA is proposed as the universal eukaryotic pre-barcode. Group-specific barcodes (Figure 2C) will then have to be defined separately for each major protistan group, based on comparative studies using the CBOL selection criteria, and much of this work is still to be done. (Pawlowski et al. 2012)
This proposed approach will easily translate to high-throughput studies - you might want to get a broad overview of eukaryote communities with a universal 18S primer set, and then dig deeper into species assemblages by also sequencing other loci (ITS, COX1) targeted at ecologically important groups.

Now all we need is a CBOL Working Group for Nematodes - seriously, why don't we have one yet?!

Reference:

Pawlowski J, Audic S, Adl S, Bass D, Belbahri L, Berney C, et al. (2012) CBOL Protist Working Group: Barcoding Eukaryotic Richness beyond the Animal, Plant, and Fungal Kingdoms. PLoS Biology, 10(11): e1001419.