42 research outputs found
Securing the Mobile Future: An Extensive Analysis of the Threat Landscape from Mobile Devices User Perspectives
Mobile devices are becoming increasingly popular in the modern era, posing a significant threat to the security of organizations. As a result, mobile devices have become more and more vulnerable to cyberattacks. The current study aims to assess the threat landscape of mobile devices from the perspectives of users, highlighting the importance of user-centered perspectives in designing solutions to mobile security threats. A total of 47 participants of different demographic profiles were recruited as respondents, and the primary data were collected through an online questionnaire. The study concludes with a discussion of the findings’ implications for research and practice in mobile security. The paper outlines the current state of the art in mobile device security, including vulnerability to security threats, privacy risks, trust in security measures, awareness of potential threats, and satisfaction with the security provided by mobile operating systems. The results demonstrated that users are unaware of data protection and mobile device security, which have a substantial impact on an organization’s performance. However, this study lays a foundational perspective for future advancements in security audits, aiming to enhance cybersecurity practices in an increasingly mobile-centric corporate landscape
SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis
Data annotation is an important but time-consuming and costly procedure. To
sort a text into two classes, the very first thing we need is a good annotation
guideline, establishing what is required to qualify for each class. In the
literature, the difficulties associated with an appropriate data annotation has
been underestimated. In this paper, we present a novel approach to
automatically construct an annotated sentiment corpus for Algerian dialect (a
Maghrebi Arabic dialect). The construction of this corpus is based on an
Algerian sentiment lexicon that is also constructed automatically. The
presented work deals with the two widely used scripts on Arabic social media:
Arabic and Arabizi. The proposed approach automatically constructs a sentiment
corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to
Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi
test sets, respectively. Ongoing work is aimed at integrating transliteration
process for Arabizi messages to further improve the obtained results.Comment: To appear in the 9th International Conference on Brain Inspired
Cognitive Systems (BICS 2018
ALLaM: Large Language Models for Arabic and English
We present ALLaM: Arabic Large Language Model, a series of large language
models to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is
carefully trained considering the values of language alignment and knowledge
transfer at scale. Our autoregressive decoder-only architecture models
demonstrate how second-language acquisition via vocabulary expansion and
pretraining on a mixture of Arabic and English text can steer a model towards a
new language (Arabic) without any catastrophic forgetting in the original
language (English). Furthermore, we highlight the effectiveness of using
parallel/translated data to aid the process of knowledge alignment between
languages. Finally, we show that extensive alignment with human preferences can
significantly enhance the performance of a language model compared to models of
a larger scale with lower quality alignment. ALLaM achieves state-of-the-art
performance in various Arabic benchmarks, including MMLU Arabic, ACVA, and
Arabic Exams. Our aligned models improve both in Arabic and English from their
base aligned models
The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets
The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data then used to fine-tune downstream NLP tasks. In this paper, we aim to study the evolution of language representation models by analyzing their effect on an under-researched NLP task: emotion analysis; for a low-resource language: Arabic. Most of the studies in the field of affect analysis focused on sentiment analysis, i.e., classifying text into valence (positive, negative, neutral) while few studies go further to analyze the finer grained emotional states (happiness, sadness, anger, etc.). Emotion analysis is a text classification problem that is tackled using machine learning techniques. Different language representation models have been used as features for these machine learning models to learn from. In this paper, we perform an empirical study on the evolution of language models, from the traditional term frequency–inverse document frequency (TF–IDF) to the more sophisticated word embedding word2vec, and finally the recent state-of-the-art pretrained language model, bidirectional encoder representations from transformers (BERT). We observe and analyze how the performance increases as we change the language model. We also investigate different BERT models for Arabic. We find that the best performance is achieved with the ArabicBERT large model, which is a BERT model trained on a large dataset of Arabic text. The increase in F1-score was significant +7–21%
The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets
The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data then used to fine-tune downstream NLP tasks. In this paper, we aim to study the evolution of language representation models by analyzing their effect on an under-researched NLP task: emotion analysis; for a low-resource language: Arabic. Most of the studies in the field of affect analysis focused on sentiment analysis, i.e., classifying text into valence (positive, negative, neutral) while few studies go further to analyze the finer grained emotional states (happiness, sadness, anger, etc.). Emotion analysis is a text classification problem that is tackled using machine learning techniques. Different language representation models have been used as features for these machine learning models to learn from. In this paper, we perform an empirical study on the evolution of language models, from the traditional term frequency–inverse document frequency (TF–IDF) to the more sophisticated word embedding word2vec, and finally the recent state-of-the-art pretrained language model, bidirectional encoder representations from transformers (BERT). We observe and analyze how the performance increases as we change the language model. We also investigate different BERT models for Arabic. We find that the best performance is achieved with the ArabicBERT large model, which is a BERT model trained on a large dataset of Arabic text. The increase in F1-score was significant +7–21%.</jats:p
