12 research outputs found

    Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism

    Full text link
    Named Entity Recognition (NER) is a vitally important task of Natural Language Processing (NLP), which aims at finding named entities in natural language text and classifying them into predefined categories such as persons (PER), places (LOC), organizations (ORG), and so on. In the Arabic context, the current NER approaches based on deep learning are mainly based on word embedding or character-level embedding as input. However, using a single granularity representation has problems with out-of-vocabulary (OOV), word embedding errors, and relatively simple semantic content. This paper presents a multi-headed self-attention mechanism implemented in the BiLSTM-CRF neural network structure to recognize Arabic named entities on social media using two embeddings. Unlike other state-of-the-art approaches, this approach combines character and word embedding at the embedding layer, and the attention mechanism calculates the similarity over the entire sequence of characters and captures local context information. The proposed approach better recognized NEs in Dialect Arabic, reaching an F1 value of 74.15% on Darwish’s dataset (a publicly available Arabic NER benchmark for social media). According to our knowledge, our findings outperform the current state-of-the-art models for Arabic Named Entity Recognition on social media.</jats:p

    FedQAS: Privacy-Aware Machine Reading Comprehension with Federated Learning

    No full text
    Machine reading comprehension (MRC) of text data is a challenging task in Natural Language Processing (NLP), with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to “understand” a text, and then to be able to answer questions about it using deep learning. However, until now, large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQuAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.</jats:p

    A Web-Based Platform for Mining and Ranking Association Rules

    No full text

    Association Rules Algorithms for Data Mining Process Based on Multi Agent System

    No full text
    International audienceIn this paper, we present a collaborative multi-agent based system for data mining. We have used two data mining model functions, clustering of variables in order to build homogeneous groups of attributes, association rules inside each of these groups and a multi-agent approach to integrate the both data mining techniques. For the association rules extraction, we use both apriori algorithm and genetic algorithm.The main goal of this paper is the evaluation of the association rules obtained by running apriori and genetic algorithm using quantitative datasets in multi agent environment

    Investigation of Tissue Components Impacts on Dose Enhancement Factor Using Monte Carlo Code

    No full text
    Despite the progress of science in cancer treatments and radiotherapy improvements, there are still several side effects that occur during tumors treatment, particularly on healthy tissues surrounded tumors. Newer treatment methods are being explored lately, one of which is the use of nanoparticles, wherein the tumor is injected with gold nanoparticles. Its aim is to enhance tumor sensitivity to radiation and reduce radiation damage to healthy tissues. Tissue type may play an effective role in enhancing the dose being received under the use of nanoparticles. This study aims to find the effect of different tissue components on dose enhancement factor through MCNP6 and GATE simulations, as well as to accurately compare  the simulation results of these two code packages for dose enhancement factors. A 125I brachytherapy source was simulated in phantoms for five tissues or materials (adipose tissue, breast tissue, soft tissue, water, and brain tissue). MCNP6 simulation code was validated by comparing its results with a previous study by Cho et al. Gold nanoparticles were injected as a mixture at a concentration of 7 mg/g into tissues inside a tumor. MCNP6 and GATE simulation results were compared. It was estimated from MCNP simulations that the highest radiation dose enhancement of 2.34 occurs in adipose tissue while lowest dose enhancement of 1.69 is in brain. In comparison, from GATE results, the estimates were that the highest value of dose enhancement factor also occurred in adipose tissue at 2.01, and the lowest value in brain at 1.48. The comparison between two codes suggest that they are compatible with the percentage difference in all tissues being less than 15 %. This study confirms that both MCNP6 and GATE codes could calculate DEF for different tissues under irradiation from a low-energy source

    Selecting Relevant Association Rules From Imperfect Data

    No full text
    International audienceAssociation Rule Mining (ARM) in the context of imperfect data, e.g., imprecise data, missing data, has received little attention so far despite the prevalence of such data in a wide range of real-world applications. In this work, we present a ARM approach that can be used to handle imprecise data and derive imprecise rules. Based on the belief functions framework and Multiple Criteria Decision Analysis, the proposed approach relies on a selection procedure for identifying the most relevant rules while considering information characterizing their interest-ingness. The several measures of interestingness defined for comparing the rules as well as the selection procedure, are presented. We also show how a priori knowledge about attribute values defined into domain tax-onomies can be used to reduce the search space and the complexity of the mining process, in addition to help identifying relevant rules for the domain of interest. Our appoach is illustrated using a concrete simplified case study related to humanitarian projects analysis
    corecore