149 research outputs found
A Marauder's Map of Security and Privacy in Machine Learning
There is growing recognition that machine learning (ML) exposes new security
and privacy vulnerabilities in software systems, yet the technical community's
understanding of the nature and extent of these vulnerabilities remains limited
but expanding. In this talk, we explore the threat model space of ML algorithms
through the lens of Saltzer and Schroeder's principles for the design of secure
computer systems. This characterization of the threat space prompts an
investigation of current and future research directions. We structure our
discussion around three of these directions, which we believe are likely to
lead to significant progress. The first encompasses a spectrum of approaches to
verification and admission control, which is a prerequisite to enable fail-safe
defaults in machine learning systems. The second seeks to design mechanisms for
assembling reliable records of compromise that would help understand the degree
to which vulnerabilities are exploited by adversaries, as well as favor
psychological acceptability of machine learning applications. The third pursues
formal frameworks for security and privacy in machine learning, which we argue
should strive to align machine learning goals such as generalization with
security and privacy desiderata like robustness or privacy. Key insights
resulting from these three directions pursued both in the ML and security
communities are identified and the effectiveness of approaches are related to
structural elements of ML algorithms and the data used to train them. We
conclude by systematizing best practices in our community.Comment: This report summarizes the keynote presented by the author in October
2018 at AISec (colocated with ACM CCS) on security and privacy in machine
learnin
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
Deep neural networks (DNNs) enable innovative applications of machine
learning like image recognition, machine translation, or malware detection.
However, deep learning is often criticized for its lack of robustness in
adversarial settings (e.g., vulnerability to adversarial inputs) and general
inability to rationalize its predictions. In this work, we exploit the
structure of deep learning to enable new learning-based inference and decision
strategies that achieve desirable properties such as robustness and
interpretability. We take a first step in this direction and introduce the Deep
k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest
neighbors algorithm with representations of the data learned by each layer of
the DNN: a test input is compared to its neighboring training points according
to the distance that separates them in the representations. We show the labels
of these neighboring points afford confidence estimates for inputs outside the
model's training manifold, including on malicious inputs like adversarial
examples--and therein provides protections against inputs that are outside the
models understanding. This is because the nearest neighbors can be used to
estimate the nonconformity of, i.e., the lack of support for, a prediction in
the training data. The neighbors also constitute human-interpretable
explanations of predictions. We evaluate the DkNN algorithm on several
datasets, and show the confidence estimates accurately identify inputs outside
the model, and that the explanations provided by nearest neighbors are
intuitive and useful in understanding model failures
On the Effectiveness of Defensive Distillation
We report experimental results indicating that defensive distillation
successfully mitigates adversarial samples crafted using the fast gradient sign
method, in addition to those crafted using the Jacobian-based iterative attack
on which the defense mechanism was originally evaluated.Comment: Technical Repor
Extending Defensive Distillation
Machine learning is vulnerable to adversarial examples: inputs carefully
modified to force misclassification. Designing defenses against such inputs
remains largely an open problem. In this work, we revisit defensive
distillation---which is one of the mechanisms proposed to mitigate adversarial
examples---to address its limitations. We view our results not only as an
effective way of addressing some of the recently discovered attacks but also as
reinforcing the importance of improved training techniques
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Many machine learning models are vulnerable to adversarial examples: inputs
that are specially crafted to cause a machine learning model to produce an
incorrect output. Adversarial examples that affect one model often affect
another model, even if the two models have different architectures or were
trained on different training sets, so long as both models were trained to
perform the same task. An attacker may therefore train their own substitute
model, craft adversarial examples against the substitute, and transfer them to
a victim model, with very little information about the victim. Recent work has
further developed a technique that uses the victim model as an oracle to label
a synthetic training set for the substitute, so the attacker need not even
collect a training set to mount the attack. We extend these recent techniques
using reservoir sampling to greatly enhance the efficiency of the training
procedure for the substitute model. We introduce new transferability attacks
between previously unexplored (substitute, victim) pairs of machine learning
model classes, most notably SVMs and decision trees. We demonstrate our attacks
on two commercial machine learning classification systems from Amazon (96.19%
misclassification rate) and Google (88.94%) using only 800 queries of the
victim model, thereby showing that existing machine learning approaches are in
general vulnerable to systematic black-box attacks regardless of their
structure
Analyzing and Improving Representations with the Soft Nearest Neighbor Loss
We explore and expand the to measure
the of class manifolds in representation space: i.e.,
how close pairs of points from the same class are relative to pairs of points
from different classes. We demonstrate several use cases of the loss. As an
analytical tool, it provides insights into the evolution of class similarity
structures during learning. Surprisingly, we find that
the entanglement of representations of different classes in the hidden layers
is beneficial for discrimination in the final layer, possibly because it
encourages representations to identify class-independent similarity structures.
Maximizing the soft nearest neighbor loss in the hidden layers leads not only
to improved generalization but also to better-calibrated estimates of
uncertainty on outlier data. Data that is not from the training distribution
can be recognized by observing that in the hidden layers, it has fewer than the
normal number of neighbors from the predicted class
Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness
Adversarial examples are malicious inputs crafted to cause a model to
misclassify them. Their most common instantiation, "perturbation-based"
adversarial examples introduce changes to the input that leave its true label
unchanged, yet result in a different model prediction. Conversely,
"invariance-based" adversarial examples insert changes to the input that leave
the model's prediction unaffected despite the underlying input's label having
changed.
In this paper, we demonstrate that robustness to perturbation-based
adversarial examples is not only insufficient for general robustness, but
worse, it can also increase vulnerability of the model to invariance-based
adversarial examples. In addition to analytical constructions, we empirically
study vision classifiers with state-of-the-art robustness to perturbation-based
adversaries constrained by an norm. We mount attacks that exploit
excessive model invariance in directions relevant to the task, which are able
to find adversarial examples within the ball. In fact, we find that
classifiers trained to be -norm robust are more vulnerable to
invariance-based adversarial examples than their undefended counterparts.
Excessive invariance is not limited to models trained to be robust to
perturbation-based -norm adversaries. In fact, we argue that the term
adversarial example is used to capture a series of model limitations, some of
which may not have been discovered yet. Accordingly, we call for a set of
precise definitions that taxonomize and address each of these shortcomings in
learning.Comment: Accepted at the ICLR 2019 SafeML Worksho
Rearchitecting Classification Frameworks For Increased Robustness
While generalizing well over natural inputs, neural networks are vulnerable
to adversarial inputs. Existing defenses against adversarial inputs have
largely been detached from the real world. These defenses also come at a cost
to accuracy. Fortunately, there are invariances of an object that are its
salient features; when we break them it will necessarily change the perception
of the object. We find that applying invariants to the classification task
makes robustness and accuracy feasible together. Two questions follow: how to
extract and model these invariances? and how to design a classification
paradigm that leverages these invariances to improve the robustness accuracy
trade-off? The remainder of the paper discusses solutions to the aformenetioned
questions
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Deep learning algorithms have been shown to perform extremely well on many
classical machine learning problems. However, recent studies have shown that
deep learning, like other machine learning techniques, is vulnerable to
adversarial samples: inputs crafted to force a deep neural network (DNN) to
provide adversary-selected outputs. Such attacks can seriously undermine the
security of the system supported by the DNN, sometimes with devastating
consequences. For example, autonomous vehicles can be crashed, illicit or
illegal content can bypass content filters, or biometric authentication systems
can be manipulated to allow improper access. In this work, we introduce a
defensive mechanism called defensive distillation to reduce the effectiveness
of adversarial samples on DNNs. We analytically investigate the
generalizability and robustness properties granted by the use of defensive
distillation when training DNNs. We also empirically study the effectiveness of
our defense mechanisms on two DNNs placed in adversarial settings. The study
shows that defensive distillation can reduce effectiveness of sample creation
from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be
explained by the fact that distillation leads gradients used in adversarial
sample creation to be reduced by a factor of 10^30. We also find that
distillation increases the average minimum number of features that need to be
modified to create adversarial samples by about 800% on one of the DNNs we
tested
MixMatch: A Holistic Approach to Semi-Supervised Learning
Semi-supervised learning has proven to be a powerful paradigm for leveraging
unlabeled data to mitigate the reliance on large labeled datasets. In this
work, we unify the current dominant approaches for semi-supervised learning to
produce a new algorithm, MixMatch, that works by guessing low-entropy labels
for data-augmented unlabeled examples and mixing labeled and unlabeled data
using MixUp. We show that MixMatch obtains state-of-the-art results by a large
margin across many datasets and labeled data amounts. For example, on CIFAR-10
with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by
a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a
dramatically better accuracy-privacy trade-off for differential privacy.
Finally, we perform an ablation study to tease apart which components of
MixMatch are most important for its success
- …
