Search CORE

149 research outputs found

A Marauder's Map of Security and Privacy in Machine Learning

Author: Papernot Nicolas
Publication venue
Publication date: 02/11/2018
Field of study

There is growing recognition that machine learning (ML) exposes new security and privacy vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited but expanding. In this talk, we explore the threat model space of ML algorithms through the lens of Saltzer and Schroeder's principles for the design of secure computer systems. This characterization of the threat space prompts an investigation of current and future research directions. We structure our discussion around three of these directions, which we believe are likely to lead to significant progress. The first encompasses a spectrum of approaches to verification and admission control, which is a prerequisite to enable fail-safe defaults in machine learning systems. The second seeks to design mechanisms for assembling reliable records of compromise that would help understand the degree to which vulnerabilities are exploited by adversaries, as well as favor psychological acceptability of machine learning applications. The third pursues formal frameworks for security and privacy in machine learning, which we argue should strive to align machine learning goals such as generalization with security and privacy desiderata like robustness or privacy. Key insights resulting from these three directions pursued both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by systematizing best practices in our community.Comment: This report summarizes the keynote presented by the author in October 2018 at AISec (colocated with ACM CCS) on security and privacy in machine learnin

arXiv.org e-Print Archive

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Author: McDaniel Patrick
Papernot Nicolas
Publication venue
Publication date: 13/03/2018
Field of study

Deep neural networks (DNNs) enable innovative applications of machine learning like image recognition, machine translation, or malware detection. However, deep learning is often criticized for its lack of robustness in adversarial settings (e.g., vulnerability to adversarial inputs) and general inability to rationalize its predictions. In this work, we exploit the structure of deep learning to enable new learning-based inference and decision strategies that achieve desirable properties such as robustness and interpretability. We take a first step in this direction and introduce the Deep k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations. We show the labels of these neighboring points afford confidence estimates for inputs outside the model's training manifold, including on malicious inputs like adversarial examples--and therein provides protections against inputs that are outside the models understanding. This is because the nearest neighbors can be used to estimate the nonconformity of, i.e., the lack of support for, a prediction in the training data. The neighbors also constitute human-interpretable explanations of predictions. We evaluate the DkNN algorithm on several datasets, and show the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures

arXiv.org e-Print Archive

On the Effectiveness of Defensive Distillation

Author: McDaniel Patrick
Papernot Nicolas
Publication venue
Publication date: 18/07/2016
Field of study

We report experimental results indicating that defensive distillation successfully mitigates adversarial samples crafted using the fast gradient sign method, in addition to those crafted using the Jacobian-based iterative attack on which the defense mechanism was originally evaluated.Comment: Technical Repor

arXiv.org e-Print Archive

Extending Defensive Distillation

Author: McDaniel Patrick
Papernot Nicolas
Publication venue
Publication date: 15/05/2017
Field of study

Machine learning is vulnerable to adversarial examples: inputs carefully modified to force misclassification. Designing defenses against such inputs remains largely an open problem. In this work, we revisit defensive distillation---which is one of the mechanisms proposed to mitigate adversarial examples---to address its limitations. We view our results not only as an effective way of addressing some of the recently discovered attacks but also as reinforcing the importance of improved training techniques

arXiv.org e-Print Archive

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Author: Goodfellow Ian
McDaniel Patrick
Papernot Nicolas
Publication venue
Publication date: 23/05/2016
Field of study

Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output. Adversarial examples that affect one model often affect another model, even if the two models have different architectures or were trained on different training sets, so long as both models were trained to perform the same task. An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim. Recent work has further developed a technique that uses the victim model as an oracle to label a synthetic training set for the substitute, so the attacker need not even collect a training set to mount the attack. We extend these recent techniques using reservoir sampling to greatly enhance the efficiency of the training procedure for the substitute model. We introduce new transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees. We demonstrate our attacks on two commercial machine learning classification systems from Amazon (96.19% misclassification rate) and Google (88.94%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in general vulnerable to systematic black-box attacks regardless of their structure

arXiv.org e-Print Archive

Analyzing and Improving Representations with the Soft Nearest Neighbor Loss

Author: Frosst Nicholas
Hinton Geoffrey
Papernot Nicolas
Publication venue
Publication date: 05/02/2019
Field of study

We explore and expand the

\textit{Soft Nearest Neighbor Loss}

to measure the

\textit{entanglement}

of class manifolds in representation space: i.e., how close pairs of points from the same class are relative to pairs of points from different classes. We demonstrate several use cases of the loss. As an analytical tool, it provides insights into the evolution of class similarity structures during learning. Surprisingly, we find that

\textit{maximizing}

the entanglement of representations of different classes in the hidden layers is beneficial for discrimination in the final layer, possibly because it encourages representations to identify class-independent similarity structures. Maximizing the soft nearest neighbor loss in the hidden layers leads not only to improved generalization but also to better-calibrated estimates of uncertainty on outlier data. Data that is not from the training distribution can be recognized by observing that in the hidden layers, it has fewer than the normal number of neighbors from the predicted class

arXiv.org e-Print Archive

Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

Author: Behrmannn Jens
Carlini Nicholas
Jacobsen Jörn-Henrik
Papernot Nicolas
Tramèr Florian
Publication venue
Publication date: 25/03/2019
Field of study

Adversarial examples are malicious inputs crafted to cause a model to misclassify them. Their most common instantiation, "perturbation-based" adversarial examples introduce changes to the input that leave its true label unchanged, yet result in a different model prediction. Conversely, "invariance-based" adversarial examples insert changes to the input that leave the model's prediction unaffected despite the underlying input's label having changed. In this paper, we demonstrate that robustness to perturbation-based adversarial examples is not only insufficient for general robustness, but worse, it can also increase vulnerability of the model to invariance-based adversarial examples. In addition to analytical constructions, we empirically study vision classifiers with state-of-the-art robustness to perturbation-based adversaries constrained by an

\ell_p

norm. We mount attacks that exploit excessive model invariance in directions relevant to the task, which are able to find adversarial examples within the

\ell_p

ball. In fact, we find that classifiers trained to be

\ell_p

-norm robust are more vulnerable to invariance-based adversarial examples than their undefended counterparts. Excessive invariance is not limited to models trained to be robust to perturbation-based

\ell_p

-norm adversaries. In fact, we argue that the term adversarial example is used to capture a series of model limitations, some of which may not have been discovered yet. Accordingly, we call for a set of precise definitions that taxonomize and address each of these shortcomings in learning.Comment: Accepted at the ICLR 2019 SafeML Worksho

arXiv.org e-Print Archive

Rearchitecting Classification Frameworks For Increased Robustness

Author: Chandrasekaran Varun
Fawaz Kassem
Jha Somesh
Papernot Nicolas
Tang Brian
Wu Xi
Publication venue
Publication date: 03/12/2019
Field of study

While generalizing well over natural inputs, neural networks are vulnerable to adversarial inputs. Existing defenses against adversarial inputs have largely been detached from the real world. These defenses also come at a cost to accuracy. Fortunately, there are invariances of an object that are its salient features; when we break them it will necessarily change the perception of the object. We find that applying invariants to the classification task makes robustness and accuracy feasible together. Two questions follow: how to extract and model these invariances? and how to design a classification paradigm that leverages these invariances to improve the robustness accuracy trade-off? The remainder of the paper discusses solutions to the aformenetioned questions

arXiv.org e-Print Archive

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Author: Jha Somesh
McDaniel Patrick
Papernot Nicolas
Swami Ananthram
Wu Xi
Publication venue
Publication date: 14/03/2016
Field of study

Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10^30. We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested

arXiv.org e-Print Archive

MixMatch: A Holistic Approach to Semi-Supervised Learning

Author: Berthelot David
Carlini Nicholas
Goodfellow Ian
Oliver Avital
Papernot Nicolas
Raffel Colin
Publication venue
Publication date: 23/10/2019
Field of study

Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success

arXiv.org e-Print Archive