69 research outputs found
SurfaceomeDB: a cancer-orientated database for genes encoding cell surface proteins
Cell surface proteins (CSPs) are excellent targets for the development
of diagnostic and therapeutic reagents, and it is estimated that 10-
20% of all genes in the human genome encode CSPs. In an effort to
integrate all data publicly available for genes encoding cell surface
proteins, a database (SurfaceomeDB) was developed. SurfaceomeDB
is a gene-centered portal containing different types of information,
including annotation for gene expression, protein domains, somatic
mutations in cancer, and protein-protein interactions for all human
genes encoding CSPs. SurfaceomeDB was implemented as an
integrative and relational database in a user-friendly web interface,
where users can search for gene name, gene annotation, or keywords.
There is also a streamlined graphical representation of all data
provided and links to the most important data repositories and
databases, such as NCBI, UCSC Genome Browser, and EBI
A total transcriptome profiling method for plasma-derived extracellular vesicles: applications for liquid biopsies
Extracellular vesicles (EVs) are key mediators of intercellular communication. Part of their biological effects can be attributed to the transfer of cargos of diverse types of RNAs, which are promising diagnostic and prognostic biomarkers. EVs found in human biofluids are a valuable source for the development of minimally invasive assays. However, the total transcriptional landscape of EVs is still largely unknown. Here we develop a new method for total transcriptome profiling of plasma-derived EVs by next generation sequencing (NGS) from limited quantities of patient-derived clinical samples, which enables the unbiased characterization of the complete RNA cargo, including both small- and long-RNAs, in a single library preparation step. This approach was applied to RNA extracted from EVs isolated by ultracentrifugation from the plasma of five healthy volunteers. Among the most abundant RNAs identified we found small RNAs such as tRNAs, miRNAs and miscellaneous RNAs, which have largely unknown functions. We also identified protein-coding and long noncoding transcripts, as well as circular RNA species that were also experimentally validated. This method enables, for the first time, the full spectrum of transcriptome data to be obtained from minute patient-derived samples, and will therefore potentially allow the identification of cell-to-cell communication mechanisms and biomarkers.Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP)Gillson-Longenbaugh FoundationNational Institutes of Health (NIH/NCATS) through the NIH Common Fund, Office of Strategic Coordination (OSC)AC Camargo Canc Ctr, Lab Med Genom, Sao Paulo, SP, BrazilAC Camargo Canc Ctr, Lab Computat Biol, Sao Paulo, SP, BrazilUniv Sao Paulo, Inst Biomed Sci, Dept Cell & Dev Biol, Sao Paulo, SP, BrazilUniv Fed Sao Paulo, Electron Microscopy Ctr, Sao Paulo, SP, BrazilUniv Texas MD Anderson Canc Ctr, Dept Expt Therapeut, Houston, TX 77030 USAUniv Texas MD Anderson Canc Ctr, Ctr RNA Interference & Non Coding RNAs, Houston, TX 77030 USAUniv New Mexico, Comprehens Canc Ctr, Albuquerque, NM 87131 USAUniv New Mexico, Sch Med, Div Hematol Oncol, Dept Internal Med, Albuquerque, NM 87131 USAUniv New Mexico, Sch Med, Div Mol Med, Dept Internal Med, Albuquerque, NM 87131 USARockefeller Univ, Lab Mol Immunol, 1230 York Ave, New York, NY 10021 USAFMUSP, Lab Neurociencias Alzira Denise Hertzog Silva LIM, Inst Psiquiatria, Sao Paulo, SP, BrazilUniv Fed Sao Paulo, Electron Microscopy Ctr, Sao Paulo, SP, BrazilFAPESP: 2011/09172-3FAPESP: 2014/26897-0Web of Scienc
Regulation of somatic hypermutation by higher-order chromatin structure
The generation of protective antibodies by somatic hypermutation (SHM) is essential for antibody maturation and adaptive immunity. SHM involves co-transcriptional mutagenesis of immunoglobulin variable (V) regions regulated by enhancers located hundreds of kilobases away. How 3D chromatin topology affects SHM is poorly understood. Here, we measure higher-order interactions on single alleles of the human immunoglobulin heavy-chain locus (IGH) using Tri-C. We find that SHM is underpinned by a multiway hub wherein the V region is proximal to all enhancers. Cohesin-mediated loop extrusion is dispensable for IGH transcription and hub architecture. Transcription and mutagenesis of IGH switch regions, which are necessary for antibody class-switch recombination, create new chromatin loops that can form without cohesin. However, these additional loops do not compromise hub integrity, V region transcription, or SHM. Thus, antibody maturation occurs within a multiway hub accommodating several gene-enhancer loops in which transcription and mutagenesis of different segments occur non-competitively
Influence of BRCA1 germline mutations in the somatic mutational burden of triple-negative breast cancer
The majority of the hereditary triple-negative breast cancers (TNBCs) are associated with BRCA1 germline mutations. Nevertheless, the understanding of the role of BRCA1 deficiency in the TNBC tumorigenesis is poor. In this sense, we performed whole-exome sequencing of triplet samples (leucocyte, tumor, and normal-adjacent breast tissue) for 10 cases of early-onset TNBC, including 5 hereditary (with BRCA1 germline pathogenic mutation) and 5 sporadic (with no BRCA1 or BRCA2 germline pathogenic mutations), for assessing the somatic mutation repertoire. Protein-affecting somatic mutations were identified for both mammary tissues, and Ingenuity Pathway Analysis was used to investigate gene interactions. BRCA1 and RAD51C somatic promoter methylation in tumor samples was also investigated by bisulfite sequencing. Sporadic tumors had higher proportion of driver mutations (≥25% allele frequency) than BRCA1 hereditary tumors, whereas no difference was detected in the normal breast samples. Distinct gene networks were obtained from the driver genes in each group. The Cancer Genome Atlas data analysis of TNBC classified as hereditary and sporadic reinforced our findings. The data presented here indicate that in the absence of BRCA1 germline mutations, a higher number of driver mutations are required for tumor development and that different defective processes are operating in the tumorigenesis of hereditary and sporadic TNBC in young women
Family-based whole-exome sequencing identifies rare variants potentially related to cutaneous melanoma predisposition in Brazilian melanoma-prone families
Genetic predisposition accounts for nearly 10% of all melanoma cases and has been associated with a dozen moderate- to high-penetrance genes, including CDKN2A, CDK4, POT1 and BAP1. However, in most melanoma-prone families, the genetic etiology of cancer predisposition remains undetermined. The goal of this study was to identify rare genomic variants associated with cutaneous melanoma susceptibility in melanoma-prone families. Whole-exome sequencing was performed in 2 affected individuals of 5 melanoma-prone families negative for mutations in CDKN2A and CDK4, the major cutaneous melanoma risk genes. A total of 288 rare coding variants shared by the affected relatives of each family were identified, including 7 loss-of-function variants. By performing in silico analyses of gene function, biological pathways, and variant pathogenicity prediction, we underscored the putative role of several genes for melanoma risk, including previously described genes such as MYO7A and WRN, as well as new putative candidates, such as SERPINB4, HRNR, and NOP10. In conclusion, our data revealed rare germline variants in melanoma-prone families contributing with a novel set of potential candidate genes to be further investigated in future studie
Exome sequencing of multiple-sclerosis patients and their unaffected first-degree relatives
Deep Learning Predicts Underlying Features on Pathology Images with Therapeutic Relevance for Breast and Gastric Cancer
DNA repair deficiency (DRD) is an important driver of carcinogenesis and an efficient target for anti-tumor therapies to improve patient survival. Thus, detection of DRD in tumors is paramount. Currently, determination of DRD in tumors is dependent on wet-lab assays. Here we describe an efficient machine learning algorithm which can predict DRD from histopathological images. The utility of this algorithm is demonstrated with data obtained from 1445 cancer patients. Our method performs rather well when trained on breast cancer specimens with homologous recombination deficiency (HRD), AUC (area under curve) = 0.80. Results for an independent breast cancer cohort achieved an AUC = 0.70. The utility of our method was further shown by considering the detection of mismatch repair deficiency (MMRD) in gastric cancer, yielding an AUC = 0.81. Our results demonstrate the capacity of our learning-base system as a low-cost tool for DRD detection
signeR: an empirical Bayesian approach to mutational signature discovery
Abstract
Motivation
Mutational signatures can be used to understand cancer origins and provide a unique opportunity to group tumor types that share the same origins and result from similar processes. These signatures have been identified from high throughput sequencing data generated from cancer genomes by using non-negative matrix factorisation (NMF) techniques. Current methods based on optimization techniques are strongly sensitive to initial conditions due to high dimensionality and nonconvexity of the NMF paradigm. In this context, an important question consists in the determination of the actual number of signatures that best represent the data. The extraction of mutational signatures from high-throughput data still remains a daunting task.
Results
Here we present a new method for the statistical estimation of mutational signatures based on an empirical Bayesian treatment of the NMF model. While requiring minimal intervention from the user, our method addresses the determination of the number of signatures directly as a model selection problem. In addition, we introduce two new concepts of significant clinical relevance for evaluating the mutational profile. The advantages brought by our approach are shown by the analysis of real and synthetic data. The later is used to compare our approach against two alternative methods mostly used in the literature and with the same NMF parametrization as the one considered here. Our approach is robust to initial conditions and more accurate than competing alternatives. It also estimates the correct number of signatures even when other methods fail. Results on real data agree well with current knowledge.
Availability and Implementation
signeR is implemented in R and C ++, and is available as a R package at http://bioconductor.org/packages/signeR.
Supplementary information
Supplementary data are available at Bioinformatics online.
</jats:sec
A mixture model for determining SARS-Cov-2 variant composition in pooled samples
Abstract
Motivation
Despite of the fast development of highly effective vaccines to control the current COVID–19 pandemics, the unequal distribution and availability of these vaccines worldwide and the number of people infected in the world lead to the continuous emergence of Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) variants of concern. Therefore, it is likely that real-time genomic surveillance will be continuously needed as an unceasing monitoring tool, necessary to follow the spread of the disease and the evolution of the virus. In this context, new genomic variants of SARS-CoV-2, including variants refractory to current vaccines, makes genomic surveillance programs tools of utmost importance. Nevertheless, the lack of appropriate analytical tools to quickly and effectively access the viral composition in meta-transcriptomic sequencing data, including environmental surveillance, represent possible challenges that may impact the fast adoption of this approach to mitigate the spread and transmission of viruses.
Results
We propose a statistical model for the estimation of the relative frequencies of SARS-CoV-2 variants in pooled samples. This model is built by considering a previously defined selection of genomic polymorphisms that characterize SARS-CoV-2 variants. The methods described here support both raw sequencing reads for polymorphisms-based markers calling and predefined markers in the variant call format. Results obtained using simulated data show that our method is quite effective in recovering the correct variant proportions. Further, results obtained by considering longitudinal data from wastewater samples of two locations in Switzerland agree well with those describing the epidemiological evolution of COVID-19 variants in clinical samples of these locations. Our results show that the described method can be a valuable tool for tracking the proportions of SARS-CoV-2 variants in complex mixtures such as waste water and environmental samples.
Availability and implementation
http://github.com/rvalieris/LCS.
Supplementary information
Supplementary data are available at Bioinformatics online.
</jats:sec
- …
