10 research outputs found
Detection of fake news in the Syrian war.
Thesis. M.S. American University of Beirut. Department of Computer Science, 2019. T:6926Advisor : Dr. Fatima Abu Salem, Associate Professor, Computer Science ; Committee members : Dr. Shady Elbassuoni, Assistant Professor, Computer Science ; Dr. Mohamad Jaber, Assistant Professor, Computer Science ; Dr. May Farah, Assistant Professor, Media Studies.Includes bibliographical references (leaves 85-90)After almost eight years of conflict, the humanitarian situation in Syria continues to deteriorate year after year. With multiple opposing parties involved in the armed conflict, much of the news reported about the Syrian war seems to be biased or inclined to support a certain party over the others. With serious human rights violations taking place in the Syrian war, and news sources blaming different sides of the conflict for these violations, interest in the detection of fake news surrounds the Syrian war. In this work, we built a streaming and scraping model to extract news articles of interest from news sources' websites. We built a labeled dataset of news articles about the Syrian conflict. Finally, we built a feature extraction model along with a machine learning model that is able to detect fake news in the Syrian conflict and generalize to other types of fake news
Using Facebook advertising data to describe the socio-economic situation of Syrian refugees in Lebanon
While the fighting in the Syrian civil war has mostly stopped, an estimated 5.6 million Syrians remain living in neighboring countries1. Of these, an estimated 1.5 million are sheltering in Lebanon. Ongoing efforts by organizations such as UNHCR to support the refugee population are often ineffective in reaching those most in need. According to UNHCR's 2019 Vulnerability Assessment of Syrian Refugees Report (VASyR), only 44% of the Syrian refugee families eligible for multipurpose cash assistance were provided with help, as the others were not captured in the data. In this project, we are investigating the use of non-traditional data, derived from Facebook advertising data, for population level vulnerability assessment. In a nutshell, Facebook provides advertisers with an estimate of how many of its users match certain targeting criteria, e.g., how many Facebook users currently living in Beirut are “living abroad,” aged 18–34, speak Arabic, and primarily use an iOS device. We evaluate the use of such audience estimates to describe the spatial variation in the socioeconomic situation of Syrian refugees across Lebanon. Using data from VASyR as ground truth, we find that iOS device usage explains 90% of the out-of-sample variance in poverty across the Lebanese governorates. However, evaluating predictions at a smaller spatial resolution also indicate limits related to sparsity, as Facebook, for privacy reasons, does not provide audience estimates for fewer than 1,000 users. Furthermore, comparing the population distribution by age and gender of Facebook users with that of the Syrian refugees from VASyR suggests an under-representation of Syrian women on the social media platform. This work adds to growing body of literature demonstrating the value of anonymous and aggregate Facebook advertising data for analysing large-scale humanitarian crises and migration events
Using Facebook advertising data to describe the socio-economic situation of Syrian refugees in Lebanon
While the fighting in the Syrian civil war has mostly stopped, an estimated 5.6 million Syrians remain living in neighboring countries1. Of these, an estimated 1.5 million are sheltering in Lebanon. Ongoing efforts by organizations such as UNHCR to support the refugee population are often ineffective in reaching those most in need. According to UNHCR's 2019 Vulnerability Assessment of Syrian Refugees Report (VASyR), only 44% of the Syrian refugee families eligible for multipurpose cash assistance were provided with help, as the others were not captured in the data. In this project, we are investigating the use of non-traditional data, derived from Facebook advertising data, for population level vulnerability assessment. In a nutshell, Facebook provides advertisers with an estimate of how many of its users match certain targeting criteria, e.g., how many Facebook users currently living in Beirut are “living abroad,” aged 18–34, speak Arabic, and primarily use an iOS device. We evaluate the use of such audience estimates to describe the spatial variation in the socioeconomic situation of Syrian refugees across Lebanon. Using data from VASyR as ground truth, we find that iOS device usage explains 90% of the out-of-sample variance in poverty across the Lebanese governorates. However, evaluating predictions at a smaller spatial resolution also indicate limits related to sparsity, as Facebook, for privacy reasons, does not provide audience estimates for fewer than 1,000 users. Furthermore, comparing the population distribution by age and gender of Facebook users with that of the Syrian refugees from VASyR suggests an under-representation of Syrian women on the social media platform. This work adds to growing body of literature demonstrating the value of anonymous and aggregate Facebook advertising data for analysing large-scale humanitarian crises and migration events
Meta-learning for fake news detection surrounding the Syrian war
In this article, we pursue the automatic detection of fake news reporting on the Syrian war using machine learning and meta-learning. The proposed approach is based on a suite of features that include a given article's linguistic style; its level of subjectivity, sensationalism, and sectarianism; the strength of its attribution; and its consistency with other news articles from the same “media camp”. To train our models, we use FA-KES, a fake news dataset about the Syrian war. A suite of basic machine learning models is explored, as well as the model-agnostic meta-learning algorithm (MAML) suitable for few-shot learning, using datasets of a modest size. Feature-importance analysis confirms that the collected features specific to the Syrian war are indeed very important predictors for the output label. The meta-learning model achieves the best performance, improving upon the baseline approaches that are trained exclusively on text features in FA-KES. © 2021 The Author
Meta-learning for fake news detection surrounding the Syrian war: An interview with co-author Roaa Al Feel
FA-KES: A Fake News Dataset around the Syrian War
Most currently available fake news datasets revolve around US politics, entrainment news or satire. They are typically scraped from fact-checking websites, where the articles are labeled by human experts. In this paper, we present FA-KES, a fake news dataset around the Syrian war. Given the specific nature of news reporting on incidents of wars and the lack of available sources from which manually-labeled news articles can be scraped, we believe a fake news dataset specifically constructed for this domain is crucial. To ensure a balanced dataset that covers the many facets of the Syrian war, our dataset consists of news articles from several media outlets representing mobilisation press, loyalist press, and diverse print media. To avoid the difficult and often-subjective task of manually labeling news articles as true or fake, we employ a semi-supervised fact-checking approach to label the news articles in our dataset. With the help of crowdsourcing, human contributors are prompted to extract specific and easy-to-extract information that helps match a given article to information representing “ground truth” obtained from the Syrian Violations Documentation Center. The information extracted is then used to cluster the articles into two separate sets using unsupervised machine learning. The result is a carefully annotated dataset consisting of 804 articles labeled as true or fake and that is ideal for training machine learning models to predict the credibility of news articles. Our dataset is publicly available at https://doi.org/10.5281/zenodo.2607278. Although our dataset is focused on the Syrian crisis, it can be used to train machine learning models to detect fake news in other related domains. Moreover, the framework we used to obtain the dataset is general enough to be used to build other fake news datasets around military conflicts, provided there is some corresponding ground-truth available
Dataset for fake news and articles detection
We have produced a labeled dataset that presents fake news surrounding the conflict in Syria. The dataset consists of a set of articles/news labeled by 0 (fake) or 1 (credible). Credibility of articles are computed with respect to a ground truth information obtained from the Syrian Violations Documentation Center (VDC). In particular, for each article, we crowdsource the information extraction (e.g., date, location, Number of casualties) job using the crowdsourcing platform Figure Eight (formally CrowdFlower). Then, we match those articles against the VDC database to be able to deduce whether an article is fake or not. The dataset can be used to train machine learning models to detect fake news.
</p
