879 research outputs found
Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness
BACKGROUND: Despite increasing interest in the noncoding fraction of transcriptomes, the number, species-conservation and functions, if any, of many non-protein-coding transcripts remain to be discovered. Two extensive long intergenic noncoding RNA (ncRNA) transcript catalogues are now available for mouse: over 3,000 macroRNAs identified by cDNA sequencing, and 1,600 long intergenic noncoding RNA (lincRNA) intervals that are predicted from chromatin-state maps. Previously we showed that macroRNAs tend to be more highly conserved than putatively neutral sequence, although only 5% of bases are predicted as constrained. By contrast, over a thousand lincRNAs were reported as being highly conserved. This apparent difference may account for the surprisingly small fraction (11%) of transcripts that are represented in both catalogues. Here we sought to resolve the reported discrepancy between the evolutionary rates for these two sets. RESULTS: Our analyses reveal lincRNA and macroRNA exon sequences to be subject to the same relatively low degree of sequence constraint. Nonetheless, our observations are consistent with the functionality of a fraction of ncRNA in these sets, with up to a quarter of ncRNA exons having evolved significantly slower than neighboring neutral sequence. The more tissue-specific macroRNAs are enriched in predicted RNA secondary structures and thus may often act in trans, whereas the more highly and broadly expressed lincRNAs appear more likely to act in the cis-regulation of adjacent transcription factor genes. CONCLUSIONS: Taken together, our results indicate that each of the two ncRNA catalogues unevenly and lightly samples the true, much larger, ncRNA repertoire of the mouse
Rapid bursts of gene duplication occurred independently in diverse mammals
Background:
The draft mouse (Mus musculus) genome sequence revealed an unexpected proliferation of gene duplicates encoding a family of secretoglobin proteins including the androgen-binding protein (ABP) α, β and γ subunits. Further investigation of 14 α-like (Abpa) and 13 β- or γ-like (Abpbg) undisrupted gene sequences revealed a rich diversity of developmental stage-, sex- and tissue-specific expression. Despite these studies, our understanding of the evolution of this gene family remains incomplete. Questions arise from imperfections in the initial mouse genome assembly and a dearth of information about the gene family structure in other rodents and mammals.
Results:
Here, we interrogate the latest 'finished' mouse (Mus musculus) genome sequence assembly to show that the Abp gene repertoire is, in fact, twice as large as reported previously, with 30 Abpa and 34 Abpbg genes and pseudogenes. All of these have arisen since the last common ancestor with rat (Rattus norvegicus). We then demonstrate, by sequencing homologs from species within the Mus genus, that this burst of gene duplication occurred very recently, within the past seven million years. Finally, we survey Abp orthologs in genomes from across the mammalian clade and show that bursts of Abp gene duplications are not specific to the murid rodents; they also occurred recently in the lagomorph (rabbit, Oryctolagus cuniculus) and ruminant (cattle, Bos taurus) lineages, although not in other mammalian taxa.
Conclusion:
We conclude that Abp genes have undergone repeated bursts of gene duplication and adaptive sequence diversification driven by these genes' participation in chemosensation and/or sexual identification. </p
Diagnostically relevant facial gestalt information from ordinary photos
Craniofacial characteristics are highly informative for clinical geneticists when diagnosing genetic diseases. As a first step towards the high-throughput diagnosis of ultra-rare developmental diseases we introduce an automatic approach that implements recent developments in computer vision. This algorithm extracts phenotypic information from ordinary non-clinical photographs and, using machine learning, models human facial dysmorphisms in a multidimensional 'Clinical Face Phenotype Space'. The space locates patients in the context of known syndromes and thereby facilitates the generation of diagnostic hypotheses. Consequently, the approach will aid clinicians by greatly narrowing (by 27.6-fold) the search space of potential diagnoses for patients with suspected developmental disorders. Furthermore, this Clinical Face Phenotype Space allows the clustering of patients by phenotype even when no known syndrome diagnosis exists, thereby aiding disease identification. We demonstrate that this approach provides a novel method for inferring causative genetic variants from clinical sequencing data through functional genetic pathway comparisons
Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells
February 17, 2011The conversion of lineage-committed cells to induced pluripotent stem cells (iPSCs) by reprogramming is accompanied by a global remodeling of the epigenome[superscript 1, 2, 3, 4, 5], resulting in altered patterns of gene expression[superscript 2, 6, 7, 8, 9]. Here we characterize the transcriptional reorganization of large intergenic non-coding RNAs (lincRNAs)[superscript 10, 11] that occurs upon derivation of human iPSCs and identify numerous lincRNAs whose expression is linked to pluripotency. Among these, we defined ten lincRNAs whose expression was elevated in iPSCs compared with embryonic stem cells, suggesting that their activation may promote the emergence of iPSCs. Supporting this, our results indicate that these lincRNAs are direct targets of key pluripotency transcription factors. Using loss-of-function and gain-of-function approaches, we found that one such lincRNA (lincRNA-RoR) modulates reprogramming, thus providing a first demonstration for critical functions of lincRNAs in the derivation of pluripotent stem cells
Improved annotation with <i>de novo</i> transcriptome assembly in four social amoeba species
Background: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. Results: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. Conclusions: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects
Peptide Array X-Linking (PAX): A New Peptide-Protein Identification Approach
Many protein interaction domains bind short peptides based on canonical sequence consensus motifs. Here we report the development of a peptide array-based proteomics tool to identify proteins directly interacting with ligand peptides from cell lysates. Array-formatted bait peptides containing an amino acid-derived cross-linker are photo-induced to crosslink with interacting proteins from lysates of interest. Indirect associations are removed by high stringency washes under denaturing conditions. Covalently trapped proteins are subsequently identified by LC-MS/MS and screened by cluster analysis and domain scanning. We apply this methodology to peptides with different proline-containing consensus sequences and show successful identifications from brain lysates of known and novel proteins containing polyproline motif-binding domains such as EH, EVH1, SH3, WW domains. These results suggest the capacity of arrayed peptide ligands to capture and subsequently identify proteins by mass spectrometry is relatively broad and robust. Additionally, the approach is rapid and applicable to cell or tissue fractions from any source, making the approach a flexible tool for initial protein-protein interaction discovery.National Institutes of Health (U.S.) (Grant R21-CA-140030-01
BB0172, a Borrelia burgdorferi Outer Membrane Protein That Binds Integrin Α3Β1
Lyme disease is a multisystemic disorder caused by Borrelia burgdorferi infection. Upon infection, some B. burgdorferi genes are upregulated, including members of the microbial surface components recognizing adhesive matrix molecule (MSCRAMM) protein family, which facilitate B. burgdorferi adherence to extracellular matrix components of the host. Comparative genome analysis has revealed a new family of B. burgdorferi proteins containing the von Willebrand factor A (vWFA) domain. In the present study, we characterized the expression and membrane association of the vWFA domain-containing protein BB0172 by using in vitro transcription/translation systems in the presence of microsomal membranes and with detergent phase separation assays. Our results showed evidence of BB0172 localization in the outer membrane, the orientation of the vWFA domain to the extracellular environment, and its function as a metal ion-dependent integrin-binding protein. This is the first report of a borrelial adhesin with a metal ion-dependent adhesion site (MIDAS) motif that is similar to those observed in eukaryotic integrins and has a similar function
- …
