103 research outputs found

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    The oral microbiome – an update for oral healthcare professionals

    Get PDF
    For millions of years, our resident microbes have coevolved and coexisted with us in a mostly harmonious symbiotic relationship. We are not distinct entities from our microbiome, but together we form a 'superorganism' or holobiont, with the microbiome playing a significant role in our physiology and health. The mouth houses the second most diverse microbial community in the body, harbouring over 700 species of bacteria that colonise the hard surfaces of teeth and the soft tissues of the oral mucosa. Through recent advances in technology, we have started to unravel the complexities of the oral microbiome and gained new insights into its role during both health and disease. Perturbations of the oral microbiome through modern-day lifestyles can have detrimental consequences for our general and oral health. In dysbiosis, the finely-tuned equilibrium of the oral ecosystem is disrupted, allowing disease-promoting bacteria to manifest and cause conditions such as caries, gingivitis and periodontitis. For practitioners and patients alike, promoting a balanced microbiome is therefore important to effectively maintain or restore oral health. This article aims to give an update on our current knowledge of the oral microbiome in health and disease and to discuss implications for modern-day oral healthcare

    Benchmarking natural-language parsers for biological applications using dependency graphs

    Get PDF
    BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques

    Cross-lingual C*ST*RD: English access to Hindi information

    Get PDF
    We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA’s Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions

    Adaptation of Data and Models for Probabilistic Parsing of Portuguese

    No full text

    Cellular proteins which can specifically associate with simian virus 40 small t antigen

    Full text link
    When crude, radiolabeled extracts of various cells were applied to homogeneous simian virus 40 small t antigen-Sepharose adsorbents, three cell proteins (57, 32, and 20 kilodaltons [kDa]) bound specifically. Each also bound to an insoluble, truncated t derivative composed of the COOH-terminal 123 residues of the protein. The binding of these proteins was greatly inhibited after reduction and alkylation of the t ligand. Therefore, some element of native conformation, but not all of the primary structure of t, is necessary for this binding property, which may constitute a discrete, in vitro biochemical function of this protein. Results of cell fractionation experiments suggested that the 57- and 32-kDa proteins are nonnuclear cell constituents, whereas the 20-kDa protein was closely associated with a detergent-washed nuclear fraction. Specific immunoblotting and comparative partial proteolytic digestion analyses indicated that the 57-kDa protein is tubulin, a major component of the cytoskeleton. In this regard, t and tubulin were observed to coimmunoprecipitate from crude cell extracts after incubation with monospecific anti-t antibody. Therefore, it is possible that t and tubulin interact in vivo.</jats:p
    corecore