Showing posts with label PLoS. Show all posts
Showing posts with label PLoS. Show all posts

Monday, October 21, 2013

Intra-Genomic Variation in the Ribosomal Repeats of Nematodes

Happy to announce our new paper, published last week in PLoS ONE:
Bik HM, Fournier D, Sung W, Bergeron RD, Thomas WK (2013) Intra-Genomic Variation in the Ribosomal Repeats of Nematodes. PLoS ONE 8(10): e78230. doi:10.1371/journal.pone.0078230
This manuscript was in the works for a while, and was based on undergraduate research carried out by co-author Dave Fournier while he was an undergraduate at UNH. The rationale? To assess the level of variation in rRNA loci within a single nematode genome, as well as between genomes of different nematode species. rRNA is typically present as a repeated, muti-copy locus in eukaryote genomes, which makes it hard (impossible) to correlate gene abundance to organismal abundance in environmental sequencing studies. Unlike bacteria, there is no known correction that we can apply to "normalize" DNA for species with multiple rRNA copies - every species has multiple copies (sometimes into the thousands!) and we know little about the typical ranges of rRNA copy number across different eukaryote groups.

In this manuscript were were asking questions about both rRNA copy number (how many rRNA repeats are present in a genome?) and intragenomic variation (how many of these copies are unique rRNA gene sequences within a genome, and across rRNA variants are there "hotspots" for base polymorphisms?). We wanted to determine if we could spot patterns that govern rRNA copy number and level of intragenomic variation amongst gene copies - taking into account things like genome size and phylogenetic distance.

The result? There doesn't seem to be any pattern determining copy number or intragenomic rRNA variants across species, which kind of makes biodiversity estimates from environmental rRNA studies feel like a shot in the dark. But we DID find some interesting evidence of selection acting on rRNA loci:
By applying the same approach to four C. elegans mutation accumulation lines propagated by repeated bottlenecking for an average of ~400 generations, we find on average a 2-fold increase in repeat copy number (rate of increase in rRNA estimated at 0.0285-0.3414 copies per generation), suggesting that rRNA repeat copy number is subject to selection. Within each Caenorhabditis species, the majority of intragenomic variation found across the rRNA repeat was observed within gene regions (18S, 28S, 5.8S), suggesting that such intragenomic variation is not a product of selection for rRNA coding function.
Divergence and polymorpishm are illustrated in the figure below:

Figure 1. Variation observed in nematode ribosomal arrays. (A) Divergence in rRNA repeats observed between the genomes of C. elegans, C. briggsae, C. japonica, and C. remanei; here, base substitutions are denoted as transitions or transversions, while complex polymorphisms represent any type of insertion, deletion, or inversion event. (B) Polymorphic positions in rRNA repeats observed within the genomes of each Caenorhabditis species. Results suggest that the pattern of intragenomic polymorphisms is unique across repeats within a species, whereas patterns of interspecific divergence reflect a strong signature of natural selection for rRNA function. 

The data on genomic patterns in eukaryotic rRNA is still very preliminary, and this paper is just a starting point. Hopefully this type of work will inspire similar analyses in other groups - we desperately need more knowledge, particularly for non-model organisms.

Sunday, January 13, 2013

Navigating (and drowning in) the flood of PLoS ONE journal articles

I love PLoS ONE--both the mission of the journal and much of the science that is published there--and for the large part I love the new website redesign. But one thing I'm definitely not feeling is the revamped e-mail alert system.

I will admit it up front: I still abide by some old skool methods for discovering relevant literature. Every week, I pour through the Table of Contents and early article alerts from my favorite journals,  neatly delivered to my inbox. Twitter also helps me find a lot of literature, but I find it to be more of a stochastic and unpredictable method (particularly for weeks where my time for social media is limited due to a heavy workload or lots of travel). Plus, being on Pacific Time puts me off kilter with the rest of the world--relevant information is very easily buried in my Tweet stream, even on days when I am looking. So I stick to my e-mail alerts to make sure I don't miss any exciting new science.

Up until a month or so ago, the PLoS ONE e-mail alerts were a behemoth, but they were manageable. The HTML e-mail was nicely formatted with embedded links to a list of articles in fairly specific subject areas, such as "Marine and Aquatic Sciences" and "Evolution and Ecology". It would take a couple of minutes to scan through these sub-categories, but for the most part it was a pretty good way to filter out the research areas which were most certainly not relevant to you. Also, many articles were placed into multiple categories, so an environmental metagenome study using novel analysis methods would be listed under the subject headings for "Computational Biology" and "Evolution and Ecology".

So much to my dismay, I've been going through my holiday-induced backlog of journal alerts and was horrified by the new format for PLoS ONE Table of Contents:


The subject headings that were formerly useful for me have now been completely condensed into the very broad subject headings "Biology and Life Sciences" and "Environmental Sciences and Ecology". Worse, each of these subheading seems to contain a ridiculous number of articles (I didn't count how many, but I was scrolling for a looooong time before I reached the end of the subsection). And it also seems like I need to be looking through both of the above-mentioned subject headings: there were a few relevant articles peppered amongst lots of non-useful literature in each subheading.

I don't have the time (nor do I want to make the time) to scroll through lots of irrelevant scientific literature essentially looking for a needle in a haystack. So I took the advice of the yellow banner and went to create a custom alert on the PLoS ONE website. Frustratingly, there is no way just to look for new articles within in a defined subset of subject areas. You have to include a search term, which immediately narrows your search window. I tried just doing a simple search for "metagenomics", but I was getting a lot of biomedical/clinical articles amongst the interesting ones (and I didn't want to scroll through all 963 articles). Plus, I'm paranoid that my search term isn't catching all the articles that I would want to see. I tried filtering down the articles to a more manageable set, but my attempts did not go over very well:


My final gripe is the subject categories themselves. The checklist of subject terms initially presented under the Advanced Search function is different from the larger list of subject terms listed under "filter by subject area". Are the "filter by subject area" search terms defined based on the articles themselves? I have no idea. I also have no idea what half of the subject terms mean. There's a subject term called "Sequence Analysis" and also one called "Research and Analysis methods" - could/should an article overlap these two terms, or are they referring to two distinctly different things? In my mind these categories seem a bit too vague and redundant to be much use for users. Subject terms also have some glaring errors--there is no "Ecology" category at all!

In the end, I basically gave up. I'm going to begrudgingly go back to that monster of an e-mail Table of Contents. 

I'm a firm believer in intuitive web interfaces with powerful user functionality--I don't think any researchers should have to work this hard to complete what is essentially a very simple (and very common) task. It also in the best interest of journals (and authors) to have their work easily accessible--the articles I'm downloading today may result in future citations, blog posts, social media sharing, etc.

So my pleas for PLoS ONE:

  • Bring back the old e-mail alert format! Or even better, a new revamped format with even more useful subject categories.
  • Consolidate and streamline the subject terms - make them consistent between search interfaces, and specific enough that the meaning of each term is obvious. Ideally, each article would have something like Mendeley tags that would function as searchable keywords. If I liked a particular article, there would be a way to view articles with similar keywords--kind of like "Customers who bought item X also looked at these items" on Amazon.com

I know these type of fixes won't necessarily be easy - I don't know how PLoS ONE organizes its article databases, and the things I'm suggesting might require a significant amount of coding and/or manual curation to implement. But I do think this type of organization is imperative for the long-term business model of the journals. Keep the scientists happy!