7 research outputs found

    Extending TCGA queries to automatically identify analogous genomic data from dbGaP [version 1; referees: 2 approved, 1 approved with reservations]

    Get PDF
    Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources

    Accelerating Data Discovery with an Ontology-driven Tool for an Enterprise-scale Data Lake Environment

    No full text
    A large overhead in the analytics process is the time required to find relevant data. We present an ontology-driven data discovery application, implemented over IBM’s Cognitive Enterprise Data Platform (CEDP). CEDP contains a large collection of heterogeneous data assets from enterprise-wide data sources. The application accelerates the time required for data consumers to search and find data relevant for their analytics applications
    corecore