Combining differential privacy and homomorphic encryption for privacy-preserving collaborative machine learning (PhD thesis)
Published:
The purpose of this PhD is to design protocols to collaboratively train machine learning models while keeping the training data private. To do so, we focused on two privacy tools, namely differential privacy and homomorphic encryption. While differential privacy enables to deliver a functional model immune to attacks on the training data privacy by end-users, homomorphic encryption allows to make use of a server as a totally blind intermediary between the data owners, that provides computational resource without any access to clear information. Yet, these two techniques are of totally different natures and both entail their own constraints that may interfere: differential privacy generally requires the use of continuous and unbounded noise whereas homomorphic encryption can only deal with numbers encoded with a quite limited number of bits. The presented contributions make these two privacy tools work together by coping with their interferences and even leveraging them so that the two techniques may benefit from each other. In our first work, SPEED, we built on Private Aggregation of Teacher Ensembles (PATE) framework and extend the threat model to deal with an honest-but-curious server by covering the server computations with a homomorphic layer. We carefully define which operations are realised homomorphically to make as less computation as possible in the costly encrypted domain while revealing little enough information in clear to be easily protected by differential privacy. This trade-off forced us to realise an argmax operation in the encrypted domain, which, even if reasonable, remained expensive. That is why we propose SHIELD in another contribution, an argmax operator made inaccurate on purpose, both to satisfy differential privacy and lighten the homomorphic computation. The last presented contribution combines differential privacy and homomorphic encryption to secure a federated learning protocol. The main challenge of this combination comes from the fact that the encryption induces a quantisation of the noise, that complicates the differential privacy analysis and justifies the design and use of a novel quantisation operator that commutes with the aggregation.
Recommended citation: Grivet Sébert, A. (2023). Combining differential privacy and homomorphic encryption for privacy-preserving collaborative machine learning (Doctoral dissertation, Université Paris-Saclay).
Download Paper