1,298 research outputs found

    GreenPhylDB: A Gene Family Database for plant functional Genomics

    Get PDF
    With the increasing number of genomes being sequenced, a major objective is to transfer accurate annotation from characterised proteins to uncharacterised sequences. Consequently, comparative genomics has become a usual and efficient strategy in functional genomics. The release of various annotated genomes of plants, such as _O. sativa_ and _A. thaliana_, has allowed setting up comprehensive lists of gene families defined by automated methods. However, like for gene sequence, manual curation of gene families is an important requirement that has to be undertaken. GreenPhylDB comprises protein sequences of 12 plant species fully sequenced that were grouped into homeomorphic families using similarity-based methods. Clusters are finally processed by phylogenetic analysis to infer orthologs and paralogs that will be particularly helpful to study genome evolution. Previously, each cluster has to be curated (i.e. properly named and classified) using different sources of information. A web interface for plant gene families’ curation was developed for that purpose. This interface, accessible on GreenPhylDB ("http://greenphyl.cirad.fr":http://greenphyl.cirad.fr), centralizes external references (e.g. InterPro, KEGG, Swiss-Prot, PIRSF, Pubmed) related to all gene members of the clusters and shows statistics and automatic analysis. We believe that this synthetic view of data available for a gene cluster, combined with basic guidelines, is an efficient way to provide reliable method for gene family annotations

    Function prediction in plant genomes from large scale phylogenomics analyses : P0932

    Full text link
    With the increasing number of plant genomes being sequenced, a major challenge is to accurately transfer annotations from well characterized genomes to newly obtained sequences. GreenPhylDB is a database designed for comparative and functional genomics based on complete genome-derived gene sequences. The database currently includes gene families of protein sequences from 22 plant species, including socio-economically important crops like rice, sorghum, maize, cassava and banana. Genes from all these species are organized in clusters based on sequence similarity. The clusters are manually annotated (i.e. properly named and classified) and sequences included in each cluster are characterized by phylogenetic analysis in order to elucidate evolutionary relationships (e.g. orthologs, super-orthologs, in/out-paralogs) among genes. GreenPhyl provides a reliable and stable catalog of gene families useful for annotation on new genome sequences in plants. With its improved user interface, the new release of GreenPhyl keeps the previous gene clustering quality and introduces additional features such as specific search engines (quick search, deep search, InterPro domain combination and GO family browser). The GreenPhyl's pipeline relies on RapGreen, a new version of the RAP reconciliation tool (Dufayard et al, 2005) that allows us to root gene trees and infer orthology relationships between sequences of a family. GreenPhyl version 3 is available at http://www.greenphyl.org and is a collaborative resource of SouthGreen (southgreen.cirad.fr), a bioinformatics platform applied to the genetic and genomic resources analyses of the South and Mediterranean plants. (Résumé d'auteur

    Towards a bioinformatics platform for the Musa research community : [Abstract W076]

    Full text link
    Current experiments in genomics produce a large amount of data that needs to be organized into databases and broadly accessible. Like other species, the Musa genomics community would benefit from centralized and innovative ways to study its genome. Over the past years, genetic and genomic data (e.g. BAC, EST, Markers) have been generated and stored in databases. Several pipelines of analyses were implemented for gene, transposable element, and expression data analyses, and for comparative genomics such as ortholog predictions via a phylogenomic approach (GreenPhyl). Web tools have been developed or implemented to facilitate access to data, such as genetic makers (TropGeneDB), genetic maps (CMap), a physical Map (GBrowse), and Expressed sequence Tags (ESTtik) gene/TE predictions, and to allow online manual genome annotation (GnpAnnot). The number of tools may continue to grow, in particular with the near release of the Musa genome sequence, and the increase of Next-Generation Sequencing (NGS) facilities. The GMGC website (http://www.musagenomics.org) is a place where data can be shared, and where databases and tools can be listed in an homogeneous way to serve the Musa genomics community. It is intended to provide the researchers interested in Musa with a common set of resources in order to work more efficiently and effectively. (Texte intégral

    GreenPhyl : Phylogenomic resources for comparative and functional genomics

    Full text link
    With the increasing number of plant genomes being sequenced, a major objective is to transfer accurate annotation from characterized sequences to uncharacterized sequences. GreenPhyl (Conte et al., 2008, Nucleic Acid Research, 36: D991-8) is a tool for plant comparative genomics that predicts the function of genes based on their evolutionary relationship with genes of known function. The database (version 2) comprises protein sequences of 16 plant species fully sequenced including socio-economically important crops like rice, sorghum and maize that were grouped into gene families using similarity-based methods. GreenPhyl contains approximately 13,000 gene families being annotated, computational analyzes and external cross-references (InterPro, KEGG, Swiss-Prot, Pubmed) related to all gene members. Once manually annotated (i.e. properly named and classified), gene families are finally processed by phylogenetic analyses to distinguish orthologous and paralogous gene. Orthologous genes descend from the last common ancestor through speciation and most probably encode proteins with a similar function in different species. In addition, the website offers a range of user-friendly tools to query the data. These resources will be particularly helpful to molecular biologist for gene discovery and gene function inference. We believe that a better understanding of genome evolution will contribute to elucidate the genetic basis of important agronomic traits and therefore facilitate ongoing plant breeding efforts.(Texte intégral

    Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations.</p> <p>Results</p> <p>We developed a procedure for ortholog prediction between <it>Oryza sativa </it>and <it>Arabidopsis thaliana</it>. Firstly, we established an efficient method to cluster <it>A. thaliana </it>and <it>O. sativa </it>full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions.</p> <p>Conclusion</p> <p>Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.</p

    GreenPhylDB v5: A comparative pangenomic database for plant genomes

    Get PDF
    Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts

    Toward community standards in the quest for orthologs

    Get PDF
    The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. Contact: [email protected]

    A look at trails through the pangenome visualization jungle

    Get PDF
    High-throughput sequencing technologies enabled the production of multiple reference genome sequences for a single species. Comparisons of such sequences showed that there are structural variations between individuals from the same species such as Copy Number Variations (CNV) and Presence Absence Variations (PAV) that can have a significant impact on phenotypic variation in plants and could be suitable for breeding improved crop varieties. Thus, a single reference genome is insufficient to capture all variations. Pangenomics is an integrative approach which aims to the assessment of such genomic variations and more within a group of closely related individuals. Its definition can focus on the whole repertoire of genes within a group or can include blocks of genomic sequences shared between species. We introduce here a new visualization tool, based on a linear representation: the PANgenome Analyzer with CHromosomal Exploration (PANACHE). It is a web-based application which enables its users to explore a pangenomic reference divided in multiple panchromosomes

    Mapping for males: Sustainable sex control in nile tilapia

    Get PDF
    Sexual dimorphism of aquaculture traits is common for farmed fish. The Nile tilapia is the second most important farmed species with a production 6 million tons in 2020. Intensive farming relies on the production of all-males due to males' higher growth rate, and as a way to avoid uncontrolled reproductions. Currently the large majority of the all-male productions are obtained through androgen treatments. We aim to use more sustainable procedures to produce all-males such as the use of YY males. Until now, the use of YY males has not been reliable. This is because sex determination in Nile tilapia is complex and controlled by several factors. Although sex determinism follows an XX/XY system, the linkage group (LG) carrying the major sex determinant gene has been assigned to either LG1 or LG23, depending on the domesticated strain. Minor parental factors can also be implicated and in addition, high temperatures can override the genetic determinism. It is not clear to what extent these differences in sex determination are due to natural diversity in the mechanisms of sex determination or due to processes of domestication. It is therefore necessary to better understand the genetic basis of sex determinism in order to use this approach to generate all-males. For this, we decided to work on wild populations in Africa that have not suffered domestic manipulations. We underwent a study of sex determination in several wild populations from West (Lake Volta, Lake Kou) and East Africa (Lake Koka and Lake Hora). We used complementary genomic approaches of ddRAD, whole genome sequencing and long Nanopore reads. We were able to determine that the amh region present on LG23 is the major sex-determining region in most of these populations. Nevertheless, our results also show that there is high polymorphism in this SD region. Furthermore, there are populations that lack the male-specific amh duplication on LG23. Hence, there are no universal Y markers for Nile tilapia. It is necessary to work at the population level to identify and validate sex markers, in order to allow the local production of YY males

    GreenPhylDB v5: a comparative pangenomic database for plant genomes

    Get PDF
    Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts
    corecore