85 research outputs found
Towards Practical Privacy-Preserving Protocols
Protecting users' privacy in digital systems becomes more complex and challenging over time, as the amount of stored and exchanged data grows steadily and systems become increasingly involved and connected. Two techniques that try to approach this issue are Secure Multi-Party Computation (MPC) and Private Information Retrieval (PIR), which aim to enable practical computation while simultaneously keeping sensitive data private. In this thesis we present results showing how real-world applications can be executed in a privacy-preserving way. This is not only desired by users of such applications, but since 2018 also based on a strong legal foundation with the General Data Protection Regulation (GDPR) in the European Union, that forces companies to protect the privacy of user data by design.
This thesis' contributions are split into three parts and can be summarized as follows:
MPC Tools
Generic MPC requires in-depth background knowledge about a complex research field. To approach this, we provide tools that are efficient and usable at the same time, and serve as a foundation for follow-up work as they allow cryptographers, researchers and developers to implement, test and deploy MPC applications. We provide an implementation framework that abstracts from the underlying protocols, optimized building blocks generated from hardware synthesis tools, and allow the direct processing of Hardware Definition Languages (HDLs). Finally, we present an automated compiler for efficient hybrid protocols from ANSI C.
MPC Applications
MPC was for a long time deemed too expensive to be used in practice. We show several use cases of real-world applications that can operate in a privacy-preserving, yet practical way when engineered properly and built on top of suitable MPC protocols. Use cases presented in this thesis are from the domain of route computation using BGP on the Internet or at Internet Exchange Points (IXPs). In both cases our protocols protect sensitive business information that is used to determine routing decisions. Another use case focuses on genomics, which is particularly critical as the human genome is connected to everyone during their entire lifespan and cannot be altered. Our system enables federated genomic databases, where several institutions can privately outsource their genome data and where research institutes can query this data in a privacy-preserving manner.
PIR and Applications
Privately retrieving data from a database is a crucial requirement for user privacy and metadata protection, and is enabled amongst others by a technique called Private Information Retrieval (PIR). We present improvements and a generalization of a well-known multi-server PIR scheme of Chor et al., and an implementation and evaluation thereof. We also design and implement an efficient anonymous messaging system built on top of PIR. Furthermore we provide a scalable solution for private contact discovery that utilizes ideas from efficient two-server PIR built from Distributed Point Functions (DPFs) in combination with Private Set Intersection (PSI)
Lessons Learned: Defending Against Property Inference Attacks
This work investigates and evaluates multiple defense strategies against
property inference attacks (PIAs), a privacy attack against machine learning
models. Given a trained machine learning model, PIAs aim to extract statistical
properties of its underlying training data, e.g., reveal the ratio of men and
women in a medical training data set. While for other privacy attacks like
membership inference, a lot of research on defense mechanisms has been
published, this is the first work focusing on defending against PIAs. With the
primary goal of developing a generic mitigation strategy against white-box
PIAs, we propose the novel approach property unlearning. Extensive experiments
with property unlearning show that while it is very effective when defending
target models against specific adversaries, property unlearning is not able to
generalize, i.e., protect against a whole class of PIAs. To investigate the
reasons behind this limitation, we present the results of experiments with the
explainable AI tool LIME. They show how state-of-the-art property inference
adversaries with the same objective focus on different parts of the target
model. We further elaborate on this with a follow-up experiment, in which we
use the visualization technique t-SNE to exhibit how severely statistical
training data properties are manifested in machine learning models. Based on
this, we develop the conjecture that post-training techniques like property
unlearning might not suffice to provide the desirable generic protection
against PIAs. As an alternative, we investigate the effects of simpler training
data preprocessing methods like adding Gaussian noise to images of a training
data set on the success rate of PIAs. We conclude with a discussion of the
different defense approaches, summarize the lessons learned and provide
directions for future work
Association of asthma severity and educational attainment at age 6-7 years in a birth cohort : Population based record-linkage study
Acknowledgments: We are very grateful to David Fone for help in the concept of the study. SP received a Translational Health Research Platform Award from the National Institute for Social Care and Health Research (grant reference: TPR08- 15 006) for the development of WECC.JD was supported by The Centre for the Development and Evaluation of Complex Interventions for Public Health Improvement (DECIPHer), a UKCRC Public Health Research: Centre of Excellence. Funding from the British Heart Foundation, Cancer Research UK, Economic and Social Research Council (RES-590-28-0005), Medical Research Council, the Welsh Government and the Wellcome Trust (WT087640MA), under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged. SP was an applicant in the Centre for the Improvement of Population Health through E-records Research (CIPHER), one of four UK e-health Informatics Research Centres funded by a joint investment from: Arthritis Research UK, the British Heart Foundation, Cancer Research UK, the Chief Scientist Office (Scottish Government Health Directorates), the Economic and Social Research Council, the Engineering and Physical Sciences Research Council, the Medical Research Council,the National Institute for Health Research, the National Institute for Social Care and Health Research (Welsh Government) and the Wellcome Trust (Grant reference: 7 MR/K006525/1).Peer reviewe
MOTION - A Framework for Mixed-Protocol Multi-Party Computation
We present MOTION, an efficient and generic open-source framework for mixed-protocol secure multi-party computation (MPC). MOTION is built in a user-friendly, modular, and extensible way, intended to be used as tool in MPC research and to increase adoption of MPC protocols in practice. Our framework incorporates several important engineering decisions such as full communication serialization, which enables MPC over arbitrary messaging interfaces and removes the need of owning network sockets. MOTION also incorporates several novel performance optimizations that improve the communication complexity and latency, e.g., 2x better online round complexity of precomputed correlated Oblivious Transfer (OT).
We instantiate our framework with protocols for N parties and security against up to N-1 passive corruptions: the MPC protocols of Goldreich-Micali-Wigderson (GMW) in its arithmetic and Boolean version and OT-based BMR (Ben-Efraim et al., CCS\u2716), as well as novel and highly efficient conversions between them, including a non-interactive conversion from BMR to arithmetic GMW.
MOTION is highly efficient, which we demonstrate in our experiments. Compared to secure evaluation of AES-128 with N=3 parties in a high-latency network with OT-based BMR, we achieve a 16x better throughput of 16 AES evaluations per second using BMR. With this, we show that BMR is much more competitive than previously assumed. For N=3 parties and full-threshold protocols in a LAN, MOTION is 10x-18x faster than the previous best passively secure implementation from the MP-SPDZ framework, and 190x-586x faster than the actively secure SCALE-MAMBA framework. Finally, we show that our framework is highly efficient for privacy-preserving neural network inference
Datenschutzkonforme Weitergabe von Versichertendaten aus dem Forschungsdatenzentrum
Die Nachfrage nach der breiten Verfügbarkeit von medizinischen Daten zu Forschungszwecken nimmt stetig zu. Die enormen Datenmengen bieten vor allem für Big Data Verfahren ein großes Potential. In der Gesetzgebung soll dieser Bedarf durch ein Forschungsdatenzentrum, das im Digitale-Versorgung-Gesetz (DVG) geregelt wird, erfüllt werden. Hierbei stellen sich allerdings eine Reihe von Fragen bezüglich des Datenschutzes. So sollen die Daten zwar in pseudonymisierter oder anonymisierter Form vorliegen, allerdings kann nach wie vor ein Re-Identifizierungsrisiko bestehen. Diese Ausarbeitung analysierte die bestehende Gesetzeslage und zieht überblickartig Vergleiche zu internationalen Vorschriften bezüglich der Regulierung anonymer Daten. Auf Basis dieser Analyse wird eine Erweiterung des Forschungsdatenzentrums skizziert, das mithilfe von Privatsphäre-wahrenden Technologien eine datenschutzkonforme Weitergabe von Versichertendaten ermöglichen kann
MP2ML: A Mixed-Protocol Machine Learning Framework for Private Inference
Privacy-preserving machine learning (PPML) has many applications, from medical image classification and anomaly detection to financial analysis. nGraph-HE enables data scientists to perform private inference of deep learning (DL) models trained using popular frameworks such as TensorFlow. nGraph-HE computes linear layers using the CKKS homomorphic encryption (HE) scheme. The non-polynomial activation functions, such as MaxPool and ReLU, are evaluated in the clear by the data owner who obtains the intermediate feature maps. This leaks the feature maps to the data owner from which it may be possible to deduce the DL model weights. As a result, such protocols may not be suitable for deployment, especially when the DL model is intellectual property.
In this work, we present MP2ML, a machine learning framework which integrates nGraph-HE and the secure two-party computation framework ABY, to overcome the limitations of leaking the intermediate feature maps to the data owner. We introduce a novel scheme for the conversion between CKKS and secure multi-party computation to execute DL inference while maintaining the privacy of both the input data and model weights. MP2ML is compatible with popular DL frameworks such as TensorFlow that can infer pre-trained neural networks with native ReLU activations. We benchmark MP2ML on the CryptoNets network with ReLU activations, on which it achieves a throughput of 33.3 images/s and an accuracy of 98.6%. This throughput
matches the previous state-of-the-art work, even though our protocol is more accurate and scalable
Secure Similar Sequence Query on Outsourced Genomic Data
The growing availability of genomic data is unlocking research potentials on genomic-data analysis. It is of great importance to outsource the genomic-analysis tasks onto clouds to leverage their powerful computational resources over the large-scale genomic sequences. However, the remote placement of the data raises personal-privacy concerns, and it is challenging to evaluate data-analysis functions on outsourced genomic data securely and efficiently. In this work, we study the secure similar-sequence-query (SSQ) problem over outsourced genomic data, which has not been fully investigated. To address the challenges of security and efficiency, we propose two protocols in the mixed form, which combine two-party secure secret sharing, garbled circuit, and partial homomorphic encryptions together and use them to jointly fulfill the secure SSQ function. In addition, our protocols support multi-user queries over a joint genomic data set collected from multiple data owners, making our solution scalable. We formally prove the security of protocols under the semi-honest adversary model, and theoretically analyze the performance. We use extensive experiments over real-world dataset on a commercial cloud platform to validate the efficacy of our proposed solution, and demonstrate the performance improvements compared with state-of-the-art works
Ferret: Fast Extension for coRRElated oT with small communication
Correlated oblivious transfer (COT) is a crucial building block for secure multi-party computation (MPC) and can be generated efficiently via OT extension. Recent works based on the pseudorandom correlation generator (PCG) paradigm presented a new way to generate random COT correlations using only communication sublinear to the output length. However, due to their high computational complexity, these protocols are only faster than the classical IKNP-style OT extension under restricted network bandwidth.
In this paper, we propose new COT protocols in the PCG paradigm that achieve unprecedented performance. With 50 Mbps network bandwidth, our maliciously secure protocol can produce one COT correlation in 22 nanoseconds. More specifically, our results are summarized as follows:
- We propose a semi-honest COT protocol with sublinear communication and linear computation. This protocol assumes primal-LPN and is built upon a recent VOLE protocol with semi-honest security by Schoppmann et al. (CCS 2019). We are able to apply various optimizations to reduce its communication cost by roughly 15x, not counting a one-time setup cost that diminishes as we generate more COTs.
- We strengthen our COT protocol to malicious security with no loss of efficiency. Among all optimizations, our new protocol features a new checking technique that ensures correctness and consistency essentially for free. In particular, our maliciously secure protocol is only 1-3 nanoseconds slower for each COT.
- We implemented our protocols, and the code will be publicly available at EMP-toolkit. We observe at least 9x improvement in running time compared to the state-of-the-art protocol by Boyle et al. (CCS 2019) in both semi-honest and malicious settings under any network faster than 50 Mbps.
With this new record of efficiency for generating COT correlations, we anticipate new protocol designs and optimizations will flourish on top of our protocol
QUOTIENT: Two-Party Secure Neural Network Training and Prediction
Recently, there has been a wealth of effort devoted to the design of secure
protocols for machine learning tasks. Much of this is aimed at enabling secure
prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs
are trained on data, a key question is how such models can be also trained
securely. The few prior works on secure DNN training have focused either on
designing custom protocols for existing training algorithms, or on developing
tailored training algorithms and then applying generic secure protocols. In
this work, we investigate the advantages of designing training algorithms
alongside a novel secure protocol, incorporating optimizations on both fronts.
We present QUOTIENT, a new method for discretized training of DNNs, along with
a customized secure two-party protocol for it. QUOTIENT incorporates key
components of state-of-the-art DNN training such as layer normalization and
adaptive gradient methods, and improves upon the state-of-the-art in DNN
training in two-party computation. Compared to prior work, we obtain an
improvement of 50X in WAN time and 6% in absolute accuracy
- …
