367 research outputs found
The transposable element environment of human genes is associated with histone and expression changes in cancer
International audienc
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files
International audienceBackground: Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results: We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions: Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs
Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty.
Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed.
Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven
High-density SNP genotyping array for hexaploid wheat and its secondary and tertiary gene pool
In wheat, a lack of genetic diversity between breeding lines has been recognized as a significant block to future yield increases. Species belonging to bread wheat's secondary and tertiary gene pools harbour a much greater level of genetic variability, and are an important source of genes to broaden its genetic base. Introgression of novel genes from progenitors and related species has been widely employed to improve the agronomic characteristics of hexaploid wheat, but this approach has been hampered by a lack of markers that can be used to track introduced chromosome segments. Here, we describe the identification of a large number of single nucleotide polymorphisms that can be used to genotype hexaploid wheat and to identify and track introgressions from a variety of sources. We have validated these markers using an ultra-high-density Axiom(®) genotyping array to characterize a range of diploid, tetraploid and hexaploid wheat accessions and wheat relatives. To facilitate the use of these, both the markers and the associated sequence and genotype information have been made available through an interactive web site
Relaxed Purifying Selection is Associated with an Accumulation of Transposable Elements in Flies.
Although the mechanisms driving genome size evolution are not yet fully understood, one potentially important factor is the dynamics of the accumulation of transposable elements (TEs). Since most TEs are neutral or slightly deleterious, a negative correlation between the genome size and the efficacy of selection is expected. However, previous empirical studies on closely related species with distinct life history traits (thought to undergo different selective regimes) have yielded inconsistent results. Here, we perform the first large-scale analysis of the effect of genetic drift on the genome size evolution, without any prior assumption on the amount of genetic drift. We reconstructed a phylogeny based on the whole-genome data (2,242 genes) for 77 Drosophilid species to examine correlations between the genome size, TE content, and the efficacy of selection (using dN/dS ratios of non-synonymous to synonymous divergence). Using an integrative approach that controls for shared evolutionary history, we reveal that the genome-wide dN/dS are strongly positively correlated with the genome size and TE content, particularly in GC-poor genes. This study suggests the critical importance of controlling for heterogeneity in the base composition when estimating dN/dS. Furthermore, we emphasize that the lack of evidence for the TE accumulation due to increased genetic drift in several previous studies may be due to a secondary effect of changes in life history traits (i.e. asexuality) on TE dynamics. In conclusion, this work provides evidence for TE proliferation in fly genomes when purifying selection is reduced, shedding new light on the role of TEs and genetic drift in the evolution of genome architecture
High-density molecular characterization and association mapping in Ethiopian durum wheat landraces reveals high diversity and potential for wheat breeding
Durum wheat (Triticum turgidum subsp. durum) is a key crop worldwide, yet its improvement
and adaptation to emerging environmental threats is made difficult by the limited amount of
allelic variation included in its elite pool. New allelic diversity may provide novel loci to
international crop breeding through quantitative trait loci (QTL) mapping in unexplored material.
Here we report the extensive molecular and phenotypic characterization of hundreds of Ethiopian
durum wheat landraces and several Ethiopian improved lines. We test 81,587 markers scoring
30,155 single nucleotide polymorphisms and use them to survey the diversity, structure, and
genome-specific variation in the panel. We show the uniqueness of Ethiopian germplasm using a
siding collection of Mediterranean durum wheat accessions. We phenotype the Ethiopian panel
for ten agronomic traits in two highly diversified Ethiopian environments for two consecutive
years, and use this information to conduct a genome wide association study. We identify several
loci underpinning agronomic traits of interest, both confirming loci already reported and
describing new promising genomic regions. These loci may be efficiently targeted with
molecular markers already available to conduct marker-assisted selection in Ethiopian and
international wheat. We show that Ethiopian durum wheat represents an important and mostly
unexplored source of durum wheat diversity. The panel analyzed in this study allows the
accumulation of QTL mapping experiments, providing the initial step for a quantitative,
methodical exploitation of untapped diversity in producing a better wheat
High-density molecular characterization and association mapping in Ethiopian durum wheat landraces reveals high diversity and potential for wheat breeding
Durum wheat (Triticum turgidum subsp. durum) is a key crop worldwide, yet its improvement
and adaptation to emerging environmental threats is made difficult by the limited amount of
allelic variation included in its elite pool. New allelic diversity may provide novel loci to
international crop breeding through quantitative trait loci (QTL) mapping in unexplored material.
Here we report the extensive molecular and phenotypic characterization of hundreds of Ethiopian
durum wheat landraces and several Ethiopian improved lines. We test 81,587 markers scoring
30,155 single nucleotide polymorphisms and use them to survey the diversity, structure, and
genome-specific variation in the panel. We show the uniqueness of Ethiopian germplasm using a
siding collection of Mediterranean durum wheat accessions. We phenotype the Ethiopian panel
for ten agronomic traits in two highly diversified Ethiopian environments for two consecutive
years, and use this information to conduct a genome wide association study. We identify several
loci underpinning agronomic traits of interest, both confirming loci already reported and
describing new promising genomic regions. These loci may be efficiently targeted with
molecular markers already available to conduct marker-assisted selection in Ethiopian and
international wheat. We show that Ethiopian durum wheat represents an important and mostly
unexplored source of durum wheat diversity. The panel analyzed in this study allows the
accumulation of QTL mapping experiments, providing the initial step for a quantitative,
methodical exploitation of untapped diversity in producing a better wheat
Finding a partner in the ocean: molecular and evolutionary bases of the response to sexual cues in a planktonic diatom
Microalgae play a major role as primary producers in aquatic ecosystems. Cell signalling regulates their interactions with the environment and other organisms, yet this process in phytoplankton is poorly defined. Using the marine planktonic diatom Pseudo-nitzschia multistriata, we investigated the cell response to cues released during sexual reproduction, an event that demands strong regulatory mechanisms and impacts on population dynamics. We sequenced the genome of P. multistriata and performed phylogenomic and transcriptomic analyses, which allowed the definition of gene gains and losses, horizontal gene transfers, conservation and evolutionary rate of sex-related genes. We also identified a small number of conserved noncoding elements. Sexual reproduction impacted on cell cycle progression and induced an asymmetric response of the opposite mating types. G protein-coupled receptors and cyclic guanosine monophosphate (cGMP) are implicated in the response to sexual cues, which overall entails a modulation of cell cycle, meiosis-related and nutrient transporter genes, suggesting a fine control of nutrient uptake even under nutrient-replete conditions. The controllable life cycle and the genome sequence of P. multistriata allow the reconstruction of changes occurring in diatoms in a key phase of their life cycle, providing hints on the evolution and putative function of their genes and empowering studies on sexual reproduction
- …
