150 research outputs found

    The UCSC Genome Browser Database: 2008 update

    Get PDF
    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.ed

    Syncope and the drive towards minimization in Colloquial Bamana

    Get PDF
    Data from two varieties of Bamana, a Mande language spoken in West Africa, illustrate that permissible syllable shapes vary between the two types. A comparison of Classic Bamana spoken in Segou, Mali and that spoken by a younger cohort of individuals in the Malian capital, Bamako, reveals that the latter variety is synchronically developing complex CCV and CVC syllable shapes, whereas the classical variety permits only maximal CV syllables. We posit that the development of these syllable shapes represents an overall drive towards word minimization in this variety of the language. This study formalizes minimization in Colloquial Bamana in an optimality theoretic framework and illustrates the support that these developing processes in Bamana provide for the Split Margin Approach to the syllable, developed in Baertsch (2002). Preferential deletion patterns, the role of phonotactics in driving these patterns, and other processes interacting with and/or preventing syncope from occurring are also explored.National Institutes of Health DC00433, RR7031K, DC00076, DC001694 (PI: Gierut

    The completion of the Mammalian Gene Collection (MGC)

    Get PDF
    Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

    Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

    Get PDF
    Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe

    Forces Shaping the Fastest Evolving Regions in the Human Genome

    Get PDF
    Comparative genomics allow us to search the human genome for segments that were extensively changed in the last ~5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome

    The UCSC Genome Browser Database: update 2006

    Get PDF
    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at

    Co-Occurrence of Cytogenetic Abnormalities and High-Risk Disease in Newly Diagnosed and Relapsed/Refractory Multiple Myeloma

    Get PDF
    \ua9 2025 by American Society of Clinical Oncology.PURPOSESurvival for patients with multiple myeloma (MM) has improved but outcomes remain heterogeneous. Consistent diagnostic identification of high-risk disease is desirable to address unmet patient need. The aim was to investigate the consistency of association of co-occurrence of high-risk cytogenetic abnormalities (HRCAs) with prognosis in patients with newly diagnosed MM (NDMM) and relapsed/refractory MM (RRMM), and across a range of treatment modalities. METHODSA systematic review of randomized controlled trials of MM that reported testing for HRCA between January 1, 2000, and December 9, 2021, was performed. Groups were contacted and asked to locally perform a novel, federated analysis of their data for single hit (one HRCA) and double hit (≥two HRCAs), using a centrally provided algorithm. Analysis results were centrally collated and meta-analyzed to assess the hazard ratio (HR) for progression-free survival (PFS) and overall survival (OS) for one/≥two HRCAs across patient subgroups using random-effects models.RESULTSTwenty-four trials including 13,926 patients were included. The median age of participants was 66.5 years (IQR, 59-72) and 56.5% were male (IQR, 52-60). The HR for PFS was 2.28 (95% CI, 2.05 to 2.54) for patients with ≥two HRCAs and 1.51 (95% CI, 1.38 to 1.65) for patients with one HRCA. The HR for OS was 2.94 (95% CI, 2.49 to 3.47) and 1.69 (95% CI, 1.52 to 1.88) for the two subgroups, respectively. In studies initiated since 2015, the effect abides (≥two HRCA PFS, HR, 2.39 [95% CI, 1.96 to 2.91]; OS, 3.10 [95% CI, 2.10 to 4.60]) both for NDMM and RRMM. Heterogeneity related to transplant eligibility and relapsed/refractory status was as expected.CONCLUSIONThe association of ≥two HRCAs with the poorest outcome in NDMM and RRMM, and across treatment modalities, as demonstrated here for the first time to our knowledge, allows for more focused development of novel approaches to these patients with high unmet need

    The UCSC Genome Browser Database: 2008 update

    Get PDF
    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/

    Tracking and coordinating an international curation effort for the CCDS Project

    Get PDF
    The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines

    Co-Occurrence of Cytogenetic Abnormalities and High-Risk Disease in Newly Diagnosed and Relapsed/Refractory Multiple Myeloma

    Get PDF
    PURPOSE:Survival for patients with multiple myeloma (MM) has improved but outcomes remain heterogeneous. Consistent diagnostic identification of high-risk disease is desirable to address unmet patient need. The aim was to investigate the consistency of association of co-occurrence of high-risk cytogenetic abnormalities (HRCAs) with prognosis in patients with newly diagnosed MM (NDMM) and relapsed/refractory MM (RRMM), and across a range of treatment modalities. METHODS:A systematic review of randomized controlled trials of MM that reported testing for HRCA between January 1, 2000, and December 9, 2021, was performed. Groups were contacted and asked to locally perform a novel, federated analysis of their data for single hit (one HRCA) and double hit (≥two HRCAs), using a centrally provided algorithm. Analysis results were centrally collated and meta-analyzed to assess the hazard ratio (HR) for progression-free survival (PFS) and overall survival (OS) for one/≥two HRCAs across patient subgroups using random-effects models.RESULTS:Twenty-four trials including 13,926 patients were included. The median age of participants was 66.5 years (IQR, 59-72) and 56.5% were male (IQR, 52-60). The HR for PFS was 2.28 (95% CI, 2.05 to 2.54) for patients with ≥two HRCAs and 1.51 (95% CI, 1.38 to 1.65) for patients with one HRCA. The HR for OS was 2.94 (95% CI, 2.49 to 3.47) and 1.69 (95% CI, 1.52 to 1.88) for the two subgroups, respectively. In studies initiated since 2015, the effect abides (≥two HRCA PFS, HR, 2.39 [95% CI, 1.96 to 2.91]; OS, 3.10 [95% CI, 2.10 to 4.60]) both for NDMM and RRMM. Heterogeneity related to transplant eligibility and relapsed/refractory status was as expected.CONCLUSION:The association of ≥two HRCAs with the poorest outcome in NDMM and RRMM, and across treatment modalities, as demonstrated here for the first time to our knowledge, allows for more focused development of novel approaches to these patients with high unmet need.</p
    corecore