392 research outputs found
Recommended from our members
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction
Objective: (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. Material and methods A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated ‘gold standard’. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. Results: The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. Discussion and conclusion NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively
Recommended from our members
Integrative genomic analyses of neurofibromatosis tumours identify SOX9 as a biomarker and survival gene
Understanding the biological pathways critical for common neurofibromatosis type 1 (NF1) peripheral nerve tumours is essential, as there is a lack of tumour biomarkers, prognostic factors and therapeutics. We used gene expression profiling to define transcriptional changes between primary normal Schwann cells (n = 10), NF1-derived primary benign neurofibroma Schwann cells (NFSCs) (n = 22), malignant peripheral nerve sheath tumour (MPNST) cell lines (n = 13), benign neurofibromas (NF) (n = 26) and MPNST (n = 6). Dermal and plexiform NFs were indistinguishable. A prominent theme in the analysis was aberrant differentiation. NFs repressed gene programs normally active in Schwann cell precursors and immature Schwann cells. MPNST signatures strongly differed; genes up-regulated in sarcomas were significantly enriched for genes activated in neural crest cells. We validated the differential expression of 82 genes including the neural crest transcription factor SOX9 and SOX9 predicted targets. SOX9 immunoreactivity was robust in NF and MPSNT tissue sections and targeting SOX9 – strongly expressed in NF1-related tumours – caused MPNST cell death. SOX9 is a biomarker of NF and MPNST, and possibly a therapeutic target in NF1
Correction: Disease candidate gene identification and prioritization using protein interaction networks
PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease
As knowledge of human genetic polymorphisms grows, so does the opportunity and challenge of identifying those polymorphisms that may impact the health or disease risk of an individual person. A critical need is to organize large-scale polymorphism analyses and to prioritize candidate non-synonymous coding SNPs (nsSNPs) that should be tested in experimental and epidemiological studies to establish their context-specific impacts on protein function. In addition, with emerging high-resolution clinical genetics testing, new polymorphisms must be analyzed in the context of all available protein feature knowledge including other known mutations and polymorphisms. To approach this, we developed PolyDoms () as a database to integrate the results of multiple algorithmic procedures and functional criteria applied to the entire Entrez dbSNP dataset. In addition to predicting structural and functional impacts of all nsSNPs, filtering functions enable group-based identification of potentially harmful nsSNPs among multiple genes associated with specific diseases, anatomies, mammalian phenotypes, gene ontologies, pathways or protein domains. PolyDoms, thus, provides a means to derive a list of candidate SNPs to be evaluated in experimental or epidemiological studies for impact on protein functions and disease risk associations. PolyDoms will continue to be curated to improve its usefulness
Improved human disease candidate gene prioritization using mouse phenotype
<p>Abstract</p> <p>Background</p> <p>The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.</p> <p>Results</p> <p>Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene <url>http://toppgene.cchmc.org</url>, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.</p> <p>Conclusion</p> <p>The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.</p
Dissecting microregulation of a master regulatory network
<p>Abstract</p> <p>Background</p> <p>The master regulator p53 tumor-suppressor protein through coordination of several downstream target genes and upstream transcription factors controls many pathways important for tumor suppression. While it has been reported that some of the p53's functions are microRNA-mediated, it is not known as to how many other microRNAs might contribute to the p53-mediated tumorigenesis.</p> <p>Results</p> <p>Here, we use bioinformatics-based integrative approach to identify and prioritize putative p53-regulated miRNAs, and unravel the miRNA-based microregulation of the p53 master regulatory network. Specifically, we identify putative microRNA regulators of a) transcription factors that are upstream or downstream to p53 and b) p53 interactants. The putative <it>p53-miRs </it>and their targets are prioritized using current knowledge of cancer biology and literature-reported cancer-miRNAs.</p> <p>Conclusion</p> <p>Our predicted p53-miRNA-gene networks strongly suggest that coordinated transcriptional and <it>p53-miR </it>mediated networks could be integral to tumorigenesis and the underlying processes and pathways.</p
CisMols Analyzer: identification of compositionally similar cis-element clusters in ortholog conserved regions of coordinately expressed genes
Combinatorial interactions of sequence-specific trans-acting factors with localized genomic cis-element clusters are the principal mechanism for regulating tissue-specific and developmental gene expression. With the emergence of expanding numbers of genome-wide expression analyses, the identification of the cis-elements responsible for specific patterns of transcriptional regulation represents a critical area of investigation. Computational methods for the identification of functional cis-regulatory modules are difficult to devise, principally because of the short length and degenerate nature of individual cis-element binding sites and the inherent complexity that is generated by combinatorial interactions within cis-clusters. Filtering candidate cis-element clusters based on phylogenetic conservation is helpful for an individual ortholog gene pair, but combining data from cis-conservation and coordinate expression across multiple genes is a more difficult problem. To approach this, we have extended an ortholog gene-pair database with additional analytical architecture to allow for the analysis and identification of maximal numbers of compositionally similar and phylogenetically conserved cis-regulatory element clusters from a list of user-selected genes. The system has been successfully tested with a series of functionally related and microarray profile-based co-expressed ortholog pairs of promoters and genes using known regulatory regions as training sets and co-expressed genes in the olfactory and immunohematologic systems as test sets. CisMols Analyzer is accessible via a Web interface at
GenomeTrafac: a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs
Transcriptional cis-regulatory control regions frequently are found within non-coding DNA segments conserved across multi-species gene orthologs. Adopting a systematic gene-centric pipeline approach, we report here the development of a web-accessible database resource—GenomeTraFac ()—that allows genome-wide detection and characterization of compositionally similar cis-clusters that occur in gene orthologs between any two genomes for both microRNA genes as well as conventional RNA-encoding genes. Each ortholog gene pair can be scanned to visualize overall conserved sequence regions, and within these, the relative density of conserved cis-element motif clusters form graph peak structures. The results of these analyses can be mined en masse to identify most frequently represented cis-motifs in a list of genes. The system also provides a method for rapid evaluation and visualization of gene model-consistency between orthologs, and facilitates consideration of the potential impact of sequence variation in conserved non-coding regions to impact complex cis-element structures. Using the mouse and human genomes via the NCBI Reference Sequence database and the Sanger Institute miRBase, the system demonstrated the ability to identify validated transcription factor targets within promoter and distal genomic regulatory regions of both conventional and microRNA genes
Identification of new p53 target microRNAs by bioinformatics and functional analysis
Abstract
Background
The tumor suppressor p53 is a sequence-specific transcription factor that regulates an extensive network of coding genes, long non-coding RNAs and microRNAs, that establish intricate gene regulatory circuits influencing many cellular responses beyond the prototypical control of cell cycle, apoptosis and DNA repair.
Methods
Using bioinformatic approaches, we identified an additional group of candidate microRNAs (miRs) under direct p53 transcriptional control. To validate p53 family-mediated responsiveness of the newly predicted target miRs we first evaluated the potential for wild type p53, p63β and p73β to transactivate from p53 response elements (REs) mapped in the miR promoters, using an established yeast-based assay.
Results
The REs found in miR-10b, -23b, -106a, -151a, -191, -198, -202, -221, -320, -1204, -1206 promoters were responsive to p53 and 8 of them were also responsive to p63β or p73β. The potential for germline p53 mutations to drive transactivation at selected miR-associated REs was also examined. Chromatin Immuno-Precipitation (ChIP) assays conducted in doxorubicin-treated MCF7 cells and HCT116 p53+/+ revealed moderate induction of p53 occupancy at the miR-202, -1204, -1206, -10b RE-containing sites, while weak occupancy was observed for the miR-23b-associated RE only in MCF7 cells. RT-qPCR analyses cells showed modest doxorubicin- and/or Nutlin-dependent induction of the levels of mature miR-10b, -23b, -151a in HCT116 p53+/+ and MCF7 cells. The long noncoding RNA PVT1 comprising miR-1204 and −1206 was weakly induced only in HCT116 p53+/+ cells, but the mature miRs were not detected. miR-202 expression was not influenced by p53-activating stimuli in our cell systems.
Conclusions
Our study reveals additional miRs, particularly miR-10b and miR-151a, that could be directly regulated by the p53-family of transcription factors and contribute to the tuning of p53-induced responses.
</jats:sec
- …
