Search CORE

201 research outputs found

The UCSC Genome Browser Database: 2008 update

Author: Baertsch R.
Barber G. P.
Clawson H.
Diekhans M.
Giardine B.
Harte R. A.
Haussler D.
Hinrichs A. S.
Hsu F.
Karolchik D.
Kent W. J.
Kober K. M.
Kuhn R. M.
Miller W.
Pedersen J. S.
Pohl A.
Raney B. J.
Rhead B.
Rosenbloom K. R.
Smith K. E.
Stanke M.
Thakkapallayil A.
Trumbower H.
Wang T.
Zweig A. S.
Publication venue
Publication date: 02/08/2017
Field of study

The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.ed

RERO DOC Digital Library

The completion of the Mammalian Gene Collection (MGC)

Author: Astashyn A.
Baertsch R.
Bhat N.
Blakesley R. W.
Bonner T. I.
Bouffard G. G.
Brejova B.
Brent M.
Brown G.
Brownstein M.
Buetow K. H.
Chuah E.
Collins F. S.
Comstock C. L.
Deng A.
Deng M.
Derge J. G.
Dickson M. C.
Diekhans M.
Farrell C.
Feingold E. A.
Garcia A. M.
Gerhard D. S.
Ghamsari L.
Gibbs R. A.
Good P. J.
Green E. D.
Grimwood J.
Gruber C. E.
Gunaratne P. H.
Hart J.
Harte R.
Haussler D.
Hirst M.
Hudson J.
Jacob H.
Jang W.
Kent J.
Kloske D.
Landrum M.
Langton L.
Lazar J.
Lebeau A.
Lewis J.
Lin C.
Ma K.
Maglott D.
Mah D.
Maidak B. L.
Mandich A.
Marsh A.
McPherson J.
Mello E.
Misquitta L.
Moksa M.
Moore T.
Mullikin J.
Muratet M.
Murphy M.
Murphy T.
Murray R. R.
Muzny D.
Myers R. M.
Pang J.
Pardes E.
Pennacchio C.
Phan L.
Pruitt K. D.
Rajput B.
Rasooly R.
Riddick L.
Robinson C.
Rodriguez A. C.
Salehi-Ashtiani K.
Schaefer C. F.
Schmutz J.
Schreiber K.
Sethupathy P.
Shapiro N.
Shenmen C. M.
Shoaf D.
Sieja S.
Siepel A.
Simmons B.
Smith M. R.
Stevens M.
Taylor G.
Temple G.
Tse K.
van Baren M. J.
Wagner L.
Ward M.
Webb D.
Weber J.
Wei C.
Wu J.
Wu W.
Yankie L.
Young A. C.
Zeng T.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2009
Field of study

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

Cold Spring Harbor Laboratory Institutional Repository

The UCSC Proteome Browser

Author: Diekhans Mark
Haussler David
Hsu Fan
Karolchik Donna
Kent W. James
Kuhn Robert M.
Pringle Tom H.
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The University of California Santa Cruz (UCSC) Proteome Browser provides a wealth of protein information presented in graphical images and with links to other protein-related Internet sites. The Proteome Browser is tightly integrated with the UCSC Genome Browser. For the first time, Genome Browser users have both the genome and proteome worlds at their fingertips simultaneously. The Proteome Browser displays tracks of protein and genomic sequences, exon structure, polarity, hydrophobicity, locations of cysteine and glycosylation potential, Superfamily domains and amino acids that deviate from normal abundance. Histograms show genome-wide distribution of protein properties, including isoelectric point, molecular weight, number of exons, InterPro domains and cysteine locations, together with specific property values of the selected protein. The Proteome Browser also provides links to gene annotations in the Genome Browser, the Known Genes details page and the Gene Sorter; domain information from Superfamily, InterPro and Pfam; three-dimensional structures at the Protein Data Bank and ModBase; and pathway data at KEGG, BioCarta/CGAP and BioCyc. As of August 2004, the Proteome Browser is available for human, mouse and rat proteomes. The browser may be accessed from any Known Genes details page of the Genome Browser at http://genome.ucsc.edu. A user's guide is also available on this website

CiteSeerX

Crossref

PubMed Central

AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature

Author: Beggs Alan H.
Bejerano Gill
Bernstein Jonathan A.
Birgmeier Johannes
Cooper David N.
Deisseroth Cole A.
Diekhans Mark E.
Guturu Harendra
Haeussler Maximilian
Jagadeesh Karthik A.
Ratner Alexander J.
Ré Christopher
Steinberg Ethan H.
Stenson Peter D.
Wenger Aaron M.
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 20/05/2020
Field of study

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu

Online Research @ Cardiff

PubMed Central

eScholarship - University of California

Tracking and coordinating an international curation effort for the CCDS Project

Author: A. Frankish
B. Aken
Bab
Baertsch
Brogna
Buhler
C. M. Farrell
C. Wallin
Church
Crowe
D. Barrell
Eberle
Green
Hwang
J. E. Loveland
J. Harrow
Jackson
K. D. Pruitt
Kim
Kozak
Kozak
Kozak
L. Wilming
Lee
Luukkonen
M. Diekhans
M.-M. Suner
Morris
Natsoulis
Nicholson
Parla
Prakash
R. A. Harte
S. Searle
Silva
Simeone
The ENCODE Project Consortium
Udby
Wethmar
Wu
Publication venue: Oxford University Press
Publication date: 12/02/2013
Field of study

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines

CiteSeerX

Crossref

PubMed Central

GENCODE: the reference human genome annotation for The ENCODE Project.

Author: Aken B.L.
Balasubramanian S.
Barnes I.
Barrell D.
Bignell A.
Boychenko V.
Brent M.
Chrast J.
Derrien T.
Despacio-Reyes G.
Diekhans M.
Ezkurdia I.
Frankish A.
Gerstein M.
Gonzalez J.M.
Guigó R.
Harrow J.
Harte R.
Haussler D.
Howald C.
Hubbard T.J.
Hunt T.
Kay M.
Kellis M.
Kokocinski F.
Lin M.
Mukherjee G.
Pei B.
Rajan J.
Reymond A.
Rodriguez J.M.
Saunders G.
Searle S.
Steward C.
Tanzer A.
Tapanari E.
Tress M.
Valencia A.
van Baren J.
Walters N.
Zadissa A.
Publication venue
Publication date: 01/11/2011
Field of study

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers

Crossref

DSpace@MIT

UNIL IRIS | Institutional Research Information System

PubMed Central

UPF Digital Repository

King's Research Portal

The UCSC Genome Browser Database: update 2006

Author: Baertsch R.
Barber G. P.
Bejerano G.
Clawson H.
Diekhans M.
Furey T. S.
Harte R. A.
Haussler D.
Hillman-Jackson J.
Hinrichs A. S.
Hsu F.
Karolchik D.
Kent W. J.
Kuhn R. M.
Pedersen J. S.
Pohl A.
Raney B. J.
Rosenbloom K. R.
Siepel A.
Smith K. E.
Sugnet C. W.
Sultan-Qurraie A.
Thomas D. J.
Trumbower H.
Weber R. J.
Weirauch M.
Zweig A. S.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at

CiteSeerX

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Copenhagen University Research Information System

Comparative analysis of pseudogenes across three phyla

Author: Balasubramanian S
Clark W
Diekhans M
Frankish A
Gerstein MB
Harrow J
Harte R
Hubbard T
Leng J
Pei B
Rozowsky J
Rutenberg-Schoenberg M
Sisu C
Wang D
Zhang Y
Publication venue: NATL ACAD SCIENCES
Publication date: 25/08/2014
Field of study

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than proteincoding genes, reflecting the different remodeling processes marking each organism’s genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles

Crossref

PubMed Central

King's Research Portal

Brunel University Research Archive

The UCSC Genome Browser Database: 2008 update

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
A. Thakkapallayil
Altshuler
B. Giardine
B. J. Raney
B. Rhead
Benson
Blanchette
Collins
Conrad
D. Haussler
D. Karolchik
F. Hsu
Feuk
G. P. Barber
H. Clawson
H. Trumbower
Hsu
Iafrate
J. S. Pedersen
K. E. Smith
K. M. Kober
K. R. Rosenbloom
Karolchik
Kent
Locke
M. Diekhans
M. Stanke
McCarroll
Mishra
R. A. Harte
R. Baertsch
R. M. Kuhn
Redon
Riggins
Rual
Sebat
Sharp
Sherry
Stelzl
T. Wang
The MGC Project Team
Vang
Velculescu
W. J. Kent
W. Miller
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Crossref

PubMed Central

Copenhagen University Research Information System

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Author: Aken Bronwen L
Barnes If
Bennett Ruth
Berry Andrew E
Bruford Elspeth A
Bult Carol J
Cox Eric
Davidson Claire
Diekhans Mark
Farrell Catherine M
Frankish Adam
Girón Carlos G
Goldfarb Tamara
Gonzalez Jose M
Hunt Toby
Jackson John
Joardar Vinita
Kay Mike P
Kodali Vamsi K
Loveland Jane E
Martin Fergal J
McAndrews Monica
McGarvey Kelly M
Mudge Jonathan M
Murphy Michael
Murphy Terence
O\u27Leary Nuala A
Pruitt Kim D
Pujar Shashikant
Rajput Bhanu
Rangwala Sanjida H
Riddick Lillian D
Seal Ruth L
Suner Marie-Marthe
Wallin Craig
Webb David
Zhu Sophia
Publication venue: The Mouseion at the JAXlibrary
Publication date: 06/11/2017
Field of study

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary