Updated: Nov 5, 2020
The advent and continual advancement of privacy-preserving technology has substantial potential for industry, government and the decentralised web. One of the most interesting applications is in the distributed computation of a shared machine-learning model over sensitive personal data.
Distributed, privacy-preserving machine-learning is, however, especially vulnerable to a specialised data poisoning attack which compromises the integrity or accuracy of the model by crafting malicious data-inputs. In this article, we explore the tensions between maintaining the privacy of inputs to a shared machine learning model and defending against poisoned data-sets.
This is Part I of II on a brief series about high-level challenges in privacy-preserving machine-learning. The next article will consider the worsening of the Black Box AI dilemma under these techniques.
“Federated learning is generically vulnerable to model poisoning (pg. 1)… [and with] secure aggregation, there is no way to detect that aggregation includes a malicious model , nor who submitted this model (pg. 6)” Bagdasaryan et al.
Federated Machine learning
Federated Machine Learning (FML) is a process by which a network of users collaboratively train a shared learning model over individual datum without compromising the privacy or security of their contributions. Google has provided an accessible explanation and has been trialling the technology in production environments since 2017 (G-Board). We would also recommend reading the formal introduction to FML by Yang et al, which is the basis for much of this explanatory section.
A Two Sentence Summary. Classical machine-learning enabled organisations to determine patterns in datasets from which predictions could be draw but this required that the data was pooled in a centralised location. FML distributes the data across a network of devices or sources and provides for a way to determine those same patterns and predictions without pooling or revealing the input-data.
Horizontal and Vertical Federated Learning
The architecture of an FML network can be classified as either horizontal or vertical according to the distribution characteristics of the data.
Horizontal Federated Learning We use horizontal federated learning (HFL) when the expected data shares the same feature-space, but the related samples are substantially different, such as between data collected on a mobile device. In 2017, Google researchers modelled the language of The Complete Works of William Shakespeare over a horizontally-partitioned FML architecture. Characteristically, sample-based learning follows a straight-forward architecture which is reminiscent of the edge-fog-cloud (also see) data-processing structure. The architecture of a HFL network consists of four basic stages, where a set of randomly selected users will participate in a round of the protocol by:
Collecting, collating and cleaning their data
Updating their local model with a differentially-private contribution.
Compiling and synthesising contributions amongst the network.
Sharing the updated model with the server and back to the network.
Once the model is returned to the network, the users continue to contribute to the shared model whilst enjoying the benefits of their local copy. We only really need to provide further explanations for the second and third stages.
Stage Two: A Differentially-Private Contribution. Informally, a differentially-private contribution to the shared learning model guarantees that the contributed datum cannot be extracted from the output model. The idea of a user-level differentially-private contribution is explained well in this Medium article, this Harvard entry, or the highly-recommended PrivacyBook by Dwork and Roth. Traditionally, a differentially-private algorithm will perturb the contributed data by way of a Laplace mechanism, the size of which is commensurate to the ‘privacy-cost’ of an operation on the data-set. Critically, we can observe that the guarantees of differential-privacy are not absolute and will hold except for that which “can be inferred from the output of the computation” (Section 6.1 of Bonawitz et al). single-We won’t say much more on the subject, but suffice to say we will thoroughly explore the idea in future work on this publication.
Stage Three: The Compilation and Synthesis of Contributions The ‘compilation’ and ‘synthesis’ of the user-level contributions into a We will discuss the implementation details of interpolation when we consider the methods required to ‘reassemble’ the ‘puzzle pieces’ into the secret.monolithic learning model are expected to provide particular security and privacy guarantees. Whilst the exact details are left to the particular implementation, we can generally expect the involvement of a Secure Aggregation (SA) or secure Multi-Party Computation (sMPC) layer. Google, for example, likely uses this variant of Secure Aggregation which relies on a Threshold Secret Sharing (TSS) supported by a ‘double-masking’ of the user-level contributions. There isn’t much more to say here, except that FML is also exposed to the vulnerabilities of these components, which, in this case, is the potential for a malicious user to either selectively or massively corrupt the model by falsifying their shares. That said, we can begin to roughly understand and generalise the reasoning behind the security model.
The Security and Privacy Model There isn’t any ‘general’ FML model to interrogate, but we can expect that, for now, the system is only realistically secure in the honest-but-curious model with a partially-trusted third-party. Often, we say that the model is sufficiently secure for commercialisation because we can guarantee the differential-privacy of user contributions, but this overlooks the potential for a malicious party to compromise the model via crafted data inputs. TSS, for example, is an extension of Shamir’s Secret Sharing, the output of which is intentionally sensitive to corruption of the constituent ‘shares’ as a consequence of the reliance on Langrange Interpolation. We divest and note that, whilst there are theoretical defences against such an obtuse corruption of the scheme, the security of the FML is inexorably connected to the security of the underlying protocol.
Vertical Federated Learning The network may also federate over a set of databases that share a sample-space through vertical federated learning (VFL). For example, there might be a collection of otherwise disparate databases which are commonly connected by the existence of a SampleID field, such as a passport or social security number. The “key challenge” for VLF researchers has been determining “how to enable local label information from one participant to be used for training” the global model without breaking the privacy guarantees of the protocol (Source). This has, as a consequence of the associated difficulties, resulted in a more complex architecture which relies on a partially-trusted third party.We will discuss the implementation details of interpolation when we consider the methods required to ‘reassemble’ the ‘puzzle pieces’ into the secret.
The High-Level Architecture of a Horizontally-Partitioned FML Network: See Pg. 12:9 in Yang et al.This is an area of FML which is still under substantial development, and whilst we won’t go further with our explanation, you can expect us to return to the idea in future publications. For our purposes, HFL is sufficient to explain the potential implications of data-poisoning, to which we now turn.
Generally, the effectiveness of a machine-learning model is directly and causally related to the quality and quantity of data which informs the classification process. The adversarial example, or data poisoning attack, corrupts the integrity or accuracy of a model by maliciously crafting contributions to a data-set. Though data poisoning has been studied for a number of years in the context of centralised machine-learning, FML is uniquely exposed to the vulnerability by the absence of some critical controls. We focus on two important papers that were published over the past six months which are worth reading and are our main references for this article: Sun et al (2020), and Bagdasaryan et al (2019).
A Before / After of a Poisoned Centralised ML Model: Biggio et al, 20120.
The ‘accuracy’ of a model is the chance of a ‘classification error’ for an extrapolated prediction. For example, the model might erroneously classify a dog as a fox because the training data was poisoned by an adversary. The critical contribution of Sun et al is an algorithm (AT²FL) which computes the optimal attack strategy to corrupt the shared learning model, as well as the categorisation of direct and indirect attacks.
Direct and Indirect Attacks For Sun et al, a ‘direct’ attack is defined as the case where an attacker directly compromises a partition of nodes contributing to a shared learning model. On the other hand, an ‘indirect’ attack would involve corrupting the data such that subsequent contributors are poisoned and, in this way, the classification errors are propagated throughout the model. Unfortunately, the author’s only provide a vague description of an indirect attack as an exploitation of the “communication protocol”, so our focus will be on the direct scenario.
Considering the Evaluation Parameters The experimental parameters in Sun et al considered a range of datasets which were sliced into a set of discrete partitions emulating the execution of an FML network. In this scenario, half of the nodes were corrupted for the purposes of the attack whilst the remainder would contribute honestly to the shared learning function. The variable of interest was the ratio of poisoned to clean data which the corrupted nodes would inject into the round contribution. These values ranged between 0–40%, and had a significant effect on the classification error reproduced in the diagram below:
A figure extracted from Pg. 7 of Sun et al.
Whilst data poisoning in Sun et al was concerned with corrupting the totality of the shared learning model, Bagdasaryan et al consider an attack which compromises the model on a particular prediction or input. The corruption is a vulnerability placed in the model for future exploitation, colloquially referred to as a ‘backdoor’, which is the namesake of the paper.
A Visualisation of the ‘Backdoor’ Attack on Pg. 1 of Bagdasaryan et al.Interestingly, they design the attack as a two-task learning problem, whereby the global model “learns the main task during normal training and the backdoor task only during the rounds when the attacker is selected” (pg. 5). One example of such a backdoor task may be the insertion of advertisement material in word-prediction software which is invoked on the usage of a particular keyword:
A Before / After Showing the Effects of a Backdoor into a Shared Learning Model on Pg. 7 of Bagdasaryan et al.
To this end, Bagdasaryan et al provide a technique of ‘model replacement’ which inserts the vulnerability into the shared learning model by falsifying the user-contribution. This is supported by a novel ‘constrain-and-scale’ mechanism which guards against ‘anomaly detection’ in the system. The idea is to leverage the constrain-and-scale stage of the attack to modify the malicious data to ‘blend in’ with the clean data amongst which it is aggregated.
Considering the Effectiveness of the Backdoor Exploit The authors of the paper simulated the attack and, overall, the results showed significant promise in the capacity for even a single-shot attack to disrupt the shared learning model. On a network of 80,000 participants contributing to a shared word-prediction task, an attacker controlling only 8 participants was able to achieve 50% accuracy on the backdoor task (pg. 2). Extending our previous discussion, an attacker controlling 1% of the participants has an equivalent capacity to a traditional data-poisoning attacker with 20% of the nodes (pg. 8).
The highest-performing backdoor attempts were those that were distinct from the honest data-distribution and inserted in later rounds. This is because the persistence of the backdoor was determined by whether subsequent rounds would ‘forget’ the malicious learning. Overall, the attack would seem to be highly effective in subverting the integrity of the shared learning model.
The Silencing of the Poison Sniffer
You would expect that the inability to inspect individual contributions without leaking the privacy of those inputs would increase the susceptibility of the shared learning model to the exploits of an adversary. Interestingly, the perturbations associated with differentially-private learners imply a higher lower-bound on the costs of an attack, suggesting they enjoy a sort of ‘natural resistance’ to data-poisoning attacks. The problem is, instead, far more fundamental to both centralised machine-learning and the underlying, privacy-preserving protocols which are currently available.
The Performance of Defences Against Data Poisoning Using the Mean Squared Error (MSE). Source.
There are a range of defences responding to data-poisoning attacks on centralised machine-learning networks, and these can be classified as either ‘noise-resilient regression’ algorithms or ‘adversarially-resilient defences’ ( Medium, ArXiv.) Broadly speaking, noise-resilient regression tends towards a process of ‘sanitisation’ or ‘anomaly detection’ (such as via input perturbation), are challenged by the existence of the so-called ‘inliers’. These are poisoned data-points which fall within the standard distribution of ‘clean’ data inputs, and are difficult to ‘oust’ as ‘outliers’. Some ‘adversarially-resilient’ defences, such as TRIM, rely on the capacity to ‘embed’ resilience to data-poisoning attacks in the updates to the learning model. The frustration with embedding such a defence in the shared learning model is that they rely on a trusted-party correctly applying the defence in the computation of their contribution. State-of-the-art FML networks, however, do not guarantee the correctness of the computation.
Bagdasaryan et al observe in Section 6.1 that the problem of “how to [embed such a defence in the model] securely and efficiently is a difficult open problem”, and that these defences are ineffective until and “unless the secure aggregation protocol incorporates anomaly detection into aggregation”. To some extent, this has been ‘tucked away’ for future work, with Bonawitz et al qualifying on Page 10 that they “only show input privacy for honest users”, because “it is much harder to additionally guarantee correctness” so as to defend against malicious events such as data poisoning.
This means that, at least for now, commercial FML networks are unable to prevent an attacker from deviating and breaking the correctness of the computation to overcome defences in the shared learning model. On a positive note, we will eventually be able to ‘bake in’ these defences to the underlying privacy-preserving protocol, and it is likely only a matter of time until we can all enjoy secure privacy-preserving machine-learning.
You can find these articles and series on our publication page.