161 research outputs found
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
The Shapley Value in Knapsack Budgeted Games
We propose the study of computing the Shapley value for a new class of
cooperative games that we call budgeted games, and investigate in particular
knapsack budgeted games, a version modeled after the classical knapsack
problem. In these games, the "value" of a set of agents is determined only
by a critical subset of the agents and not the entirety of
due to a budget constraint that limits how large can be. We show that the
Shapley value can be computed in time faster than by the na\"ive exponential
time algorithm when there are sufficiently many agents, and also provide an
algorithm that approximates the Shapley value within an additive error. For a
related budgeted game associated with a greedy heuristic, we show that the
Shapley value can be computed in pseudo-polynomial time. Furthermore, we
generalize our proof techniques and propose what we term algorithmic
representation framework that captures a broad class of cooperative games with
the property of efficient computation of the Shapley value. The main idea is
that the problem of determining the efficient computation can be reduced to
that of finding an alternative representation of the games and an associated
algorithm for computing the underlying value function with small time and space
complexities in the representation size.Comment: A short version to appear in the 10th Conference on Web and Internet
Economics (WINE 2014
Efficient Online Crowdsourcing with Complex Annotations
Crowdsourcing platforms use various truth discovery algorithms to aggregate
annotations from multiple labelers. In an online setting, however, the main
challenge is to decide whether to ask for more annotations for each item to
efficiently trade off cost (i.e., the number of annotations) for quality of the
aggregated annotations. In this paper, we propose a novel approach for general
complex annotation (such as bounding boxes and taxonomy paths), that works in
an online crowdsourcing setting. We prove that the expected average similarity
of a labeler is linear in their accuracy \emph{conditional on the reported
label}. This enables us to infer reported label accuracy in a broad range of
scenarios. We conduct extensive evaluations on real-world crowdsourcing data
from Meta and show the effectiveness of our proposed online algorithms in
improving the cost-quality trade-off.Comment: full version of a paper accepted to AAAI'2
Harm Mitigation in Recommender Systems under User Preference Dynamics
We consider a recommender system that takes into account the interplay
between recommendations, the evolution of user interests, and harmful content.
We model the impact of recommendations on user behavior, particularly the
tendency to consume harmful content. We seek recommendation policies that
establish a tradeoff between maximizing click-through rate (CTR) and mitigating
harm. We establish conditions under which the user profile dynamics have a
stationary point, and propose algorithms for finding an optimal recommendation
policy at stationarity. We experiment on a semi-synthetic movie recommendation
setting initialized with real data and observe that our policies outperform
baselines at simultaneously maximizing CTR and mitigating harm.Comment: Recommender Systems; Harm Mitigation; Amplification; User Preference
Modelin
Understanding Malvertising Through Ad-Injecting Browser Extensions
Malvertising is a malicious activity that leverages advertising to distribute various forms of malware. Because advertising is the key revenue generator for numerous Internet companies, large ad networks, such as Google, Yahoo and Microsoft, invest a lot of effort to mitigate malicious ads from their ad networks. This drives adversaries to look for alternative methods to deploy malvertising. In this paper, we show that browser extensions that use ads as their monetization strategy often facilitate the deployment of malver-tising. Moreover, while some extensions simply serve ads from ad networks that support malvertising, other extensions maliciously alter the content of visited webpages to force users into installing malware. To measure the extent of these behaviors we developed Expector, a system that automatically inspects and identifies browser extensions that inject ads, and then classifies these ads as malicious or benign based on their landing pages. Using Expector, we auto-matically inspected over 18,000 Chrome browser extensions. We found 292 extensions that inject ads, and detected 56 extensions that participate in malvertising using 16 different ad networks and with a total user base of 602,417
- …
