CITADEL: PrivaCy and bIas mitigation in federaTed leArning for Digital hEaLth
Federated learning (FL) is a promising paradigm that is gaining grip in the context of privacy-preserving machine learning for edge computing systems. Thanks to FL, several data owners called clients (e.g., organizations in cross-silo FL) can collaboratively train a model on their private data, without having to send their raw data to external service providers. FL was rapidly adopted in several thriving applications such as digital healthcare, that is generating the world’s largest volume of data. In healthcare systems, the problems of privacy and bias are particularly important.
Although FL is a first step towards privacy by keeping the data local to each client, this is not sufficient since the model parameters shared by FL are vulnerable to privacy attacks, as shown in a line of recent literature. Thus, there is a need to design new FL protocols that are robust to such privacy attacks. Furthermore, FL clients may have very heterogeneous and imbalanced data, which may incur FL model bias, with disparities among socioeconomic and demographic groups. Recent studies show that the use of AI may further exacerbate disparities between groups, and that FL may be a vector of bias propagation among different FL client. In this context, recent works appeared in ICDE, NDSS and AAAI show that bias, privacy and data curation and preparation (i.e., for correcting missing or duplicate values) compete; handling them independently – as done usually – may have negative side-effects on each other.
Therefore, there is a need for a novel multi-objective method for FL data preparation and cleaning, bias mitigation, and protection against privacy threats. This is particularly challenging in FL where no global knowledge about statistical information of the overall heterogeneous data is available, a knowledge that is necessary in classical state-of-the-art techniques. The CITADEL (PrivaCy and bIas mitigation in federaTed leArning for Digital hEaLth) project tackles this challenge and aims to precisely handle the issues raised at the intersection of FL data cleaning, their privacy and their bias, through: (i) Novel distributed FL protocols; (ii) A multi-objective approach to take into account privacy, fairness and quality aspects, these objectives being antagonistic; (ii) Applying these techniques in two use cases of FL-based digital health with real medical data.
