9 research outputs found
Iniciativas de evaluación para la indización semántica de literatura médica en español: PLANTL, LILACS, IBECS Y BIOASQ
XVI Jornadas Nacionales de Información y Documentación en Ciencias de la Salud. Oviedo, 4-5 de abril de 2019El proyecto Faro de Sanidad del Plan de Impulso de las Tecnologías del Lenguaje (PlanTL) pretende fomentar el desarrollo de sistemas de procesamiento del lenguaje natural (PLN), minería de textos y traducción automática para español y lenguas cooficiales. Una actividad importante del PlanTL es la organización de campañas de evaluación de sistemas de PLN y minería de textos, un mecanismo que no sólo es clave para evaluar la calidad de los resultados obtenidos por sistemas y algoritmos predictivos, sino que representa un motor fundamental para fomentar el desarrollo de herramientas y recursos de tecnologías del lenguaje.
Debido a la importancia de la literatura para la toma de decisiones en medicina y el volumen considerable de publicaciones en español, el Plan TL, en colaboración con el BSC, el CNIO, la BNCS y la iniciativa BioASQ ha lanzado una tarea competitiva relacionada con la indización automática de la literatura médica en español con términos DeCS. Su fin es generar recursos de etiquetado semántico que sirvan de ayuda a la indización manual. La tarea BioASQ (bioasq.org) de indización semántica biomédica en español se realizará usando resúmenes de artículos de revistas contenidas en las bases de datos LILACS (Literatura Lationamericana en Ciencias de la Salud) y IBECS1 (Índice Bibliográfico Español en Ciencias de la Salud) como conjunto básico etiquetado y, a partir de ellos, desarrollar los algoritmos de indización automática, facilitando así el desarrollo de modelos de inteligencia artificial.
La evaluación de los sistemas se realiza con la plataforma de BioASQ, mediante un sistema de evaluación continua. En él, se solicita a los participantes que asignen automáticamente términos DeCS a los registros nuevos añadidos a las bases de datos a medida que se hacen públicos, y antes de que se haya completado la indización manual. El rendimiento de indización se calcula comparando indización automática y manual.
Gracias a los resultados de ediciones previas de BioASQ para la indización de PubMed, se ha mejorado este proceso en dicho recurso. Esta tarea de indización biomédica en español servirá para generar recursos comparables para indizar LILACS e IBECS y otros conjuntos documentales.The health flagship project of the Plan for the Advancement of Language Technology (PlanTL) tries to promote the development of natural language processing systems (NLP), text mining and machine translation resources for Spanish and co-official languages. There is a growing demand for a better exploitation of datasets generated by clinicians, especially electronic health records, as well as the integration and management of this kind of data in personalized medicine platforms integrating also information extracted from the literature. In this context, the PlanTL collaborates in the organization of evaluation efforts of clinical NLP and text mining systems, a key mechanism to evaluate the quality of results obtained by such automated systems and a fundamental mechanism to promote the development of tools and resources related to language technologies.
Given the importance of literature for medical decision-making and the growing volume of Spanish medical publications, the TL Plan, in collaboration with the BSC, CNIO, the Biblioteca Nacional de Ciencias de la Salud and the BioASQ team have launched a shared task on automatic indexing of abstracts in Spanish with DeCS terms. The aim of this tracks is to generate semantic annotation resources that can be used to assist manual indexing. The Spanish biomedical semantic indexing track of BioASQ (bioasq.org) will rely on abstracts of journals contained in the LILACS databases as a basic Gold Standard manually labeled benchmark set for the development of automatic indexing algorithms particularly those based on artificial intelligence language models.
The evaluation of participating systems is done through the BioASQ platform, which requests results in a continuous evaluation process, i.e. automatically asking for DeCS term assignment for newly added documents to LILACS, as they are made public, and before the manual indexing results are publicly released. The indexing performance in BioASQ is calculated by comparing automatic indexing against manual annotations.
Thanks to the results of previous editions of BioASQ for indexing PubMed, the MeSH indexing process of this resource was considerably improved. This novel effort on medical indexing in Spanish will serve to generate comparable resources to semantically index not only LILACS but also other health databases and repositories in Spanish.N
Results of the Seventh Edition of the BioASQ Challenge
The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research. © 2020, Springer Nature Switzerland AG
Drug-Drug Interaction Prediction on a Biomedical Literature Knowledge Graph
Knowledge Graphs provide insights from data extracted in various domains. In this paper, we present an approach discovering probable drug-to-drug interactions, through the generation of a Knowledge Graph from disease-specific literature. The Graph is generated using natural language processing and semantic indexing of biomedical publications and open resources. The semantic paths connecting different drugs in the Graph are extracted and aggregated into feature vectors representing drug pairs. A classifier is trained on known interactions, extracted from a manually curated drug database used as a golden standard, and discovers new possible interacting pairs. We evaluate this approach on two use cases, Alzheimer’s Disease and Lung Cancer. Our system is shown to outperform competing graph embedding approaches, while also identifying new drug-drug interactions that are validated retrospectively. © 2020, Springer Nature Switzerland AG
Modeling the off-target effects of CRISPR-Cas9 experiments for the treatment of Duchenne Muscular Dystrophy
Duchenne Muscular Dystrophy (DMD) is a neuromuscular disorder caused by the absence of the dystrophin protein. If left untreated, it causes movement problems at the age of 10-12 years, and death occurs in the 20-30 years due to heart failure. There is currently no cure for this disease, only symptomatic treatment. Genome editing approaches like the CRISPR-Cas9 technology can provide new opportunities to ameliorate the disease by eliminating DMD mutations and restoring dystrophin expression. While it is true that on-target activity can be influenced by the guide specificity, the proposed approach focuses on the devastating results that off-target cleavage can cause (e.g., unexpected mutations). This is why reducing off-target effects is the first priority in guide design. The rapid growth of the Artificial Intelligence field has helped researchers employ artificial feature extraction and Machine Learning approaches to evaluate the potential off-target scores. This work presents our approach in evaluating off-targets of CRISPR-Cas9 gene editing specifically for the DMD disorder, using Machine Learning. We offer a comparison between four regression methods that predict the insertions-deletions (indels) produced based on a pair guide RNA and the equivalent off-target and evaluate the results using the Spearman correlation metric. We propose the most suitable method, a Decision Tree Regressor, for this problem and a comparison of the results with some state-of-art tools. The performance of our tool with Cross Validation is better than the independent performance of the other tools except from Elevation which performed about as good as ours. © 2022 ACM
Overview of BioASQ 2020: The Eighth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements. © 2020, Springer Nature Switzerland AG
