513 research outputs found

    Storage of Natural Language Sentences in a Hopfield Network

    Full text link
    This paper look at how the Hopfield neural network can be used to store and recall patterns constructed from natural language sentences. As a pattern recognition and storage tool, the Hopfield neural network has received much attention. This attention however has been mainly in the field of statistical physics due to the model's simple abstraction of spin glass systems. A discussion is made of the differences, shown as bias and correlation, between natural language sentence patterns and the randomly generated ones used in previous experiments. Results are given for numerical simulations which show the auto-associative competence of the network when trained with natural language patterns.Comment: latex, 10 pages with 2 tex figures and a .bib file, uses nemlap.sty, to appear in Proceedings of NeMLaP-

    Towards cross-lingual alerting for bursty epidemic events

    Get PDF
    Background: Online news reports are increasingly becoming a source for event based early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challenges as opportunities due to the patterns of reporting complex spatiotemporal events. Results: In this article we study the problem of utilising correlated event reports across languages. We track the evolution of 16 disease outbreaks using 5 temporal aberration detection algorithms on text-mined events classified according to disease and outbreak country. Using ProMED reports as a silver standard, comparative analysis of news data for 13 languages over a 129 day trial period showed improved sensitivity, F1 and timeliness across most models using cross-lingual events. We report a detailed case study analysis for Cholera in Angola 2010 which highlights the challenges faced in correlating news events with the silver standard. Conclusions: The results show that automated health surveillance using multilingual text mining has the potential to turn low value news into high value alerts if informed choices are used to govern the selection of models and data sources. An implementation of the C2 alerting algorithm using multilingual news is available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup

    What's unusual in online disease outbreak news?

    Get PDF
    Background: Accurate and timely detection of public health events of international concern is necessary to help support risk assessment and response and save lives. Novel event-based methods that use the World Wide Web as a signal source offer potential to extend health surveillance into areas where traditional indicator networks are lacking. In this paper we address the issue of systematically evaluating online health news to support automatic alerting using daily disease-country counts text mined from real world data using BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance against expert moderated ProMED-mail postings. Results: We report sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), mean alerts/100 days and F1, at 95% confidence interval (CI) for 287 ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period. Results indicate that W2 had the best F1 with a slight benefit for day of week effect over C2. In drill down analysis we indicate issues arising from the granular choice of country-level modeling, sudden drops in reporting due to day of week effects and reporting bias. Automatic alerting has been implemented in BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news alerts have the potential to enhance manual analytical methods by increasing throughput, timeliness and detection rates. Systematic evaluation of health news aberrations is necessary to push forward our understanding of the complex relationship between news report volumes and case numbers and to select the best performing features and algorithms

    Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses

    Full text link
    Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related messages using 587 million messages from Twitter micro-blogs. We first filtered messages based on syndrome keywords from the BioCaster Ontology, an extant knowledge model of laymen's terms. We then filtered the messages according to semantic features such as negation, hashtags, emoticons, humor and geography. The data covered 36 weeks for the US 2009 influenza season from 30th August 2009 to 8th May 2010. Results showed that our system achieved the highest Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of 3.98% over the previous state-of-the-art method. The results indicate that simple NLP-based enhancements to existing approaches to mine Twitter data can increase the value of this inexpensive resource.Comment: 10 pages, 5 figures, IEEE HISB 2012 conference, Sept 27-28, 2012, La Jolla, California, U

    Recognition of medication information from discharge summaries using ensembles of classifiers

    Get PDF
    BACKGROUND: Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. METHODS: We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. RESULTS: Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. CONCLUSIONS: Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition

    Special issue on bio-ontologies and phenotypes

    Get PDF
    The bio-ontologies and phenotypes special issue includes eight papers selected from the 11 papers presented at the Bio-Ontologies SIG (Special Interest Group) and the Phenotype Day at ISMB (Intelligent Systems for Molecular Biology) conference in Boston in 2014. The selected papers span a wide range of topics including the automated re-use and update of ontologies, quality assessment of ontological resources, and the systematic description of phenotype variation, driven by manual, semi- and fully automatic means

    Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events

    Get PDF
    Web-based expert systems dedicated to epidemic intelligence were developed to detect health threats. The Early Alerting and Reporting (EAR) project, launched under the Global Health Initiative, aimed at assessing the feasibility and opportunity of pooling seven of those expert systems. A qualitative survey was carried out with EAR participants to document epidemic intelligence strategies and to assess perceptions regarding the performance of participating systems. Timeliness and sensitivity were rated with high scores illustrating the overall perceived value of all systems while weaknesses were underlined especially in terms of representativeness, completeness and flexibility. These findings were corroborated by the quantitative analysis performed on signals potentially related to influenza A/H5N1 events which occurred in March 2010. For the six systems for which this information was available; the detection rate ranged from 31% to 38%, and increased to 72% when considering the virtual combined system. The positive predictive values (PPV) ranged from 3% to 24% and the F1-score ranged from 6% to 27%. These low scores point out false positive signals related to varying abilities of the systems to efficiently sort-out information and reduce background noise. For the seven systems sensitivity ranged from 38% to 72%. An average difference of 23% was observed between the sensitivities calculated for human cases and epizootics, underlining the difficulties to develop an efficient algorithm or a single pathology. The sensitivity increased to 93% when the virtual combined system was considered, clearly illustrating the systems’ complementarities. The average delay between the detection of the A/H5N1 events by the systems and their official reporting by WHO or OIE was 10.2 days (CI95%, 6.7; 13.8). This work illustrates the diversity in implemented epidemic intelligence activities, differences in systems designs and the potential added values and opportunities for synergy: between systems, between users and between systems and users.JRC.G.2-Global security and crisis managemen

    Towards classifying species in systems biology papers using text mining

    Get PDF
    Abstract Background In recent years high throughput methods have led to a massive expansion in the free text literature on molecular biology. Automated text mining has developed as an application technology for formalizing this wealth of published results into structured database entries. However, database curation as a task is still largely done by hand, and although there have been many studies on automated approaches, problems remain in how to classify documents into top-level categories based on the type of organism being investigated. Here we present a comparative analysis of state of the art supervised models that are used to classify both abstracts and full text articles for three model organisms. Results Ablation experiments were conducted on a large gold standard corpus of 10,000 abstracts and full papers containing data on three model organisms (fly, mouse and yeast). Among the eight learner models tested, the best model achieved an F-score of 97.1% for fly, 88.6% for mouse and 85.5% for yeast using a variety of features that included gene name, organism frequency, MeSH headings and term-species associations. We noted that term-species associations were particularly effective in improving classification performance. The benefit of using full text articles over abstracts was consistently observed across all three organisms. Conclusions By comparing various learner algorithms and features we presented an optimized system that automatically detects the major focus organism in full text articles for fly, mouse and yeast. We believe the method will be extensible to other organism types.</p
    corecore