Search CORE

DI-fusion

wKinMut: An integrated tool for the analysis and interpretation of mutations in human protein kinases

Author: A Baudot
A Gonzalez-Perez
A Torkamani
A Valencia
Alfonso Valencia
Angela del Pozo
B Reva
C Ferrer-Costa
C Greenman
C Greenman
C Ortutay
D Miranda-Saavedra
G Lopez
G Manning
G Wainreb
I Friedberg
IA Adzhubei
J Hurst
J Izarzugaza
JM Izarzugaza
JMG Izarzugaza
JMG Izarzugaza
Jose MG Izarzugaza
JS Kaminker
LD Wood
M Cline
M Krallinger
M Krallinger
Miguel Vazquez
MR Stratton
P Beltrao
P Lahiry
P Minguez
P Yue
PC Ng
R Calabrese
R Hoffmann
R Karchin
R Karchin
RJ Clifford
S Bamford
T Sjöblom
V Quesada
V Ramensky
VG Krishnan
XS Puente
Y Bromberg
YL Yip
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Protein kinases are involved in relevant physiological functions and a broad number of mutations in this superfamily have been reported in the literature to affect protein function and stability. Unfortunately, the exploration of the consequences on the phenotypes of each individual mutation remains a considerable challenge. RESULTS: The wKinMut web-server offers direct prediction of the potential pathogenicity of the mutations from a number of methods, including our recently developed prediction method based on the combination of information from a range of diverse sources, including physicochemical properties and functional annotations from FireDB and Swissprot and kinase-specific characteristics such as the membership to specific kinase groups, the annotation with disease-associated GO terms or the occurrence of the mutation in PFAM domains, and the relevance of the residues in determining kinase subfamily specificity from S3Det. This predictor yields interesting results that compare favourably with other methods in the field when applied to protein kinases. Together with the predictions, wKinMut offers a number of integrated services for the analysis of mutations. These include: the classification of the kinase, information about associations of the kinase with other proteins extracted from iHop, the mapping of the mutations onto PDB structures, pathogenicity records from a number of databases and the classification of mutations in large-scale cancer studies. Importantly, wKinMut is connected with the SNP2L system that extracts mentions of mutations directly from the literature, and therefore increases the possibilities of finding interesting functional information associated to the studied mutations. CONCLUSIONS: wKinMut facilitates the exploration of the information available about individual mutations by integrating prediction approaches with the automatic extraction of information from the literature (text mining) and several state-of-the-art databases. wKinMut has been used during the last year for the analysis of the consequences of mutations in the context of a number of cancer genome projects, including the recent analysis of Chronic Lymphocytic Leukemia cases and is publicly available at http://wkinmut.bioinfo.cnio.es

Online Research Database In Technology

Adaptive Evolution and the Birth of CTCF Binding Sites in the Drosophila Genome

Author: A Kong
A Mortazavi
A Murrell
A Siepel
A Valouev
AC Bell
AG Clark
AJ Carter
AK Holloway
AM Bushey
AR Borneman
B Langmead
BZ He
CE Grant
D Schmidt
D Schmidt
DT Odom
EE Hare
EE Holohan
F Chiaromonte
F Karch
F Tajima
GA Wray
GN Filippova
H Dai
H Moon
Harmit S. Malik
J Demars
J Mihaly
J Parsch
JA Wallace
JC Marioni
JD Storey
JD Storey
JE Phillips
JH Bullard
JH McDonald
JR Powell
JR Raab
JR Stone
JS Kaminker
JT Robinson
Kevin P. White
KP White
L Guelen
L Handoko
L Valenzuela
M Gaszner
M Long
M Mohan
MA Larkin
Manyuan Long
MC King
MZ Ludwig
N Bierne
N Jiang
N Negre
Nicolas Nègre
P Andolfatto
P Heger
P Librado
PK Geyer
PR Haddrill
Q He
R McDaniell
RK Bradley
RM Kuhn
RR Hudson
S Barges
S Chen
S Cuddapah
S MacArthur
SA Rifkin
SA Rifkin
SB Carroll
Sidi Chen
ST Smith
T Murali
TH Kim
TI Gerasimova
VX Fu
W Wang
Xiaochun Ni
Y Zhang
YE Zhang
Yong E. Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2011
Field of study

Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. Here we examine the genome-wide binding of CTCF in four species of Drosophila separated by between ~2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, we find that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, we detected signatures of selection in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. Our data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes

Public Library of Science (PLOS)

DSpace@MIT

The Francis Crick Institute

An integrated computational pipeline and database to support whole-genome sequence annotation

Author: Berman BP
Carlson J
Frise E
Harris N
Kaminker JS
Lewis SE
Marshall B
Misra S
Mungall CJ
Prochnik SE
Rubin GM
Shu S
Smith CD
Smith E
Tupy JL
Wiel C
Publication venue: BioMed Central
Publication date: 23/12/2002
Field of study

We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture

Harvard University - DASH

Recommended from our members

Apollo: a sequence annotation editor

Author: Bayraktaroglu L
Birney E
Clamp ME
Crosby MA
Gibson M
Harris N
Iyer V
Kaminker JS
Lewis SE
Matthews BB
Misra S
Mungall CJ
Prochnik SE
Richter J
Rubin GM
Searle SMJ
Smith CD
Tupy JL
Wiel C
Publication venue: BioMed Central
Publication date: 23/12/2002
Field of study

The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects

Predicting disease-associated substitution of a single amino acid by analyzing residue interactions

Author: A del Sol
A Liaw
AM Fernandez-Escamilla
B Li
C Ferrer-Costa
C Ferrer-Costa
C Kosiol
CT Saunders
DJ Watts
G Amitai
G Bagler
H Carter
Hui Yin
J Reumers
Jiamin Xiao
JS Kaminker
JS Kaminker
KV Brinda
L Bao
L Bao
L Breman
Lezheng Yu
LH Greene
Li Yang
M Mort
M Vendruscolo
MEJ Newman
Menglong Li
NV Dokholyan
P Yue
P Yue
P Yue
PA Alexander
PC Ng
PC Ng
PD Thomas
R Karchin
RA Gibbs
RJ Dobson
S Miyazawa
S Sunyaev
SF Altschul
ST Sherry
V Ramensky
W Kabsch
W Lee
Y Bromberg
Y Bromberg
Yizhou Li
YL Yip
YL Yip
Z Wang
Zhining Wen
ZQ Ye
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues. Results We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively. Conclusions The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.</p

Public Library of Science (PLOS)

Characterizing Mutational Heterogeneity in a Glioblastoma Patient with Double Recurrence

Author: A Jemal
AC Tsiatis
Andrea Cohen
Andrew E. Sloan
DW Parsons
E Franceschi
Gabrielle C. Nickel
GD Schuler
H Li
H Ohgaki
H Ohgaki
IA Adzhubei
J Clarke
J Tomsic
Jill Barnholtz-Sloan
Jin Q. Cheng
JS Kaminker
K Guda
Kishore Guda
LC Hou
M Li
M Meyerson
Mark Cohen
Mark D. Adams
Meetha P. Gould
N Navin
N Navin
PC Ng
PJ Campbell
R Stupp
RG Verhaak
RK Thomas
S Yachida
SA Forbes
Sarah McMahon
Thomas LaFramboise
W Liu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Human cancers are driven by the acquisition of somatic mutations. Separating the driving mutations from those that are random consequences of general genomic instability remains a challenge. New sequencing technology makes it possible to detect mutations that are present in only a minority of cells in a heterogeneous tumor population. We sought to leverage the power of ultra-deep sequencing to study various levels of tumor heterogeneity in the serial recurrences of a single glioblastoma multiforme patient. Our goal was to gain insight into the temporal succession of DNA base-level lesions by querying intra- and inter-tumoral cell populations in the same patient over time. We performed targeted “next-generation" sequencing on seven samples from the same patient: two foci within the primary tumor, two foci within an initial recurrence, two foci within a second recurrence, and normal blood. Our study reveals multiple levels of mutational heterogeneity. We found variable frequencies of specific EGFR, PIK3CA, PTEN, and TP53 base substitutions within individual tumor regions and across distinct regions within the same tumor. In addition, specific mutations emerge and disappear along the temporal spectrum from tumor at the time of diagnosis to second recurrence, demonstrating evolution during tumor progression. Our results shed light on the spatial and temporal complexity of brain tumors. As sequencing costs continue to decline and deep sequencing technology eventually moves into the clinic, this approach may provide guidance for treatment choices as we embark on the path to personalized cancer medicine

The Francis Crick Institute

Cooperation between the GATA and RUNX factors Serpent and Lozenge during Drosophila hematopoiesis

Author: Franc NC
Golling G
Haenlin M
Jippo T
Kaminker JS
Nelson RE
Okuda T
Persons DA
Rehorn KP
Rizki TM
Tepass U
Tsai FY
Tsang AP
Tsang AP
Vyas P
Xu C
Yamaguchi Y
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Improving the prediction of disease-related variants using protein three-dimensional structure

Author: B Li
CC Chang
E Capriotti
E Capriotti
E Capriotti
E Capriotti
E Capriotti
E Capriotti
EI Boyle
Emidio Capriotti
G Wainreb
H Berman
H Zhou
HapMap Consortium
International Human Genome Sequencing Consortium
J Pei
JS Kaminker
L Bao
L Bao
M Cargill
MA Care
ML Waters
P Baldi
P Yue
PC Ng
PC Ng
PD Thomas
PD Thomas
R Calabrese
R Guerois
R Karchin
RG Cotton
RJ Dobson
Russ B Altman
SF Altschul
SF Betz
ST Sherry
V Parthiban
V Ramensky
VG Krishnan
W Kabsch
Y Bromberg
YL Yip
Z Wang
ZQ Ye
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. Non-synonymous SNPs occurring in coding regions result in single amino acid polymorphisms (SAPs) that may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performance, the quality of these predictions can be further improved by introducing new features derived from three-dimensional protein structures.Results: In this paper, we present a structure-based machine learning approach for predicting disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features derived from the protein's sequence, structure, and function. After dataset balancing, the structure-based method (SVM-3D) reaches an overall accuracy of 85%, a correlation coefficient of 0.70, and an area under the receiving operating characteristic curve (AUC) of 0.92. When compared with a similar sequence-based predictor, SVM-3D results in an increase of the overall accuracy and AUC by 3%, and correlation coefficient by 0.06. The robustness of this improvement has been tested on different datasets and in all the cases SVM-3D performs better than previously developed methods even when compared with PolyPhen2, which explicitly considers in input protein structure information.Conclusion: This work demonstrates that structural information can increase the accuracy of disease-related SAPs identification. Our results also quantify the magnitude of improvement on a large dataset. This improvement is in agreement with previously observed results, where structure information enhanced the prediction of protein stability changes upon mutation. Although the structural information contained in the Protein Data Bank is limiting the application and the performance of our structure-based method, we expect that SVM-3D will result in higher accuracy when more structural date become available. \ua9 2011 Capriotti; licensee BioMed Central Ltd

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Statistical method on nonrandom clustering with application to somatic mutations in cancer

Author: A Bardelli
A Torkamani
A Wagner
Adam Pavlicek
AJ Strongosky
B Vogelstein
C Greenman
Cancer Genome Atlas Research Network
CH Huang
Chi-Hse Teng
D Graur
DL Evans
DP Cahill
DW Parsons
Elizabeth A Lunney
H Davies
H Davies
H Song
HM Berman
IB Weinstein
IF Mata
IW Burr
J Glaz
J Sved
JI Naus
JI Naus
Jingjing Ye
JL Bos
JM Nigro
JS Kaminker
L Ding
M Hollstein
N Balakrishnan
NL Johnson
PA Jones
Paul A Rejto
PJ Morin
R Inzelberg
S Jones
SA Forbes
T Hagen
T Sjöblom
T Tolkacheva
TL Wang
WP Yu
Y Benjamini
Y Benjamini
Y Samuels
Y Wang
Y-X Fan
YL Yip
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention. Results We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC) database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and <it>β</it>-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors. Conclusions Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.</p