• Understanding threats in federated learning



    Understanding threats in federated learning. - Alberto Pedruozo-Ulloa (atlanTTic Research Center, Universidade de Vigo)

    Outline

    • Introduction: Conventional vs Federated Learning.
    • In the line of fire: Privacy Threats and Attacks.
    • Gossiping adversaries: Honest but Curious.
    • Into the dark side: Getting Malicious.
    • Privacy metrics: Can we measure the Privacy Leakage?
    • Conclusions*: Episode PETs - a new Hope

    Introduction: Conventional vs Federated Learning.

    High-level Workflow in Machine Learning



    Gather enough data

    • Data collection
    • Gather relevant data.
    • Preprocessing data
    • Clean and prepare the data for analysis.

    Train a model

    • Build an ML model (choose an adequate model, model training, hyperparameter tuning, evaluate the model, etc.).

    Deploy the model

    • Use the trained model for your application.

    What if data comes from different sources?
    All local data must be sent to a trusted party.

    • Encryption can be applied to protect data in transit and at rest.
    • However… we must still trust the party doing the ML training!



    Federated Learning

    • Training without explicitly sharing data.
      FL allows the training of ML models without explicit sharing of training data.
    • Only local updates are exchanged.
    • Cross-silo FL
      A model is built from the training sets of a reduced number of servers.



    In the line of fire: Privacy Threats and Attacks.



    Some example attacks

    • Is there a specific person in the database?
    • Can we reconstruct attributes of people in the database?
    • Can either the Aggregator or any Data Owner poison the updates?




    The power of the aggregator

    • Initially proposed to avoid moving the training data out.
      • Reducing communication costs and “ensuring data privacy”.
    • Some example attacks:
      • Is there a specific person in the database of a particular hospital?
      • Can we reconstruct attributes of the people in the database?


    Membership inference

    • General cancer risk: 350 per 100000 people (aged 45-49)
    • “Cancer risk” knowing that specific person is contained in the training data: 1 per 2 people

    img

    Gossiping adversaries: Honest but Curious.

    Honest but curious Aggregator and/or Data Owners

    • This adversary
      • Does not deviate from the prescribed steps.
      • May still try to learn information from the data exchanged.


    img

    Private Aggregation with Privacy-Enhancing Technologies (PETs)

    • PET methods can help to counter the confidentiality threats from the Aggregator and DOs (e.g., Homomorphic Encryption, Differential Privacy, etc.).

    img

    Into the dark side: Getting Malicious.

    Malicious Aggregator and/or Data Owners

    • This adversary
      • May actively deviate from the prescribed steps to try to learn information from the data exchanged.





    Some possible fixes (without PETs)

    • Malicious Aggregators -> Adding redundancy in the aggregation.
      If we add extra parties in charge of the aggregation, we can check whether all of them provide the same aggregation.
    • Malicious Data Owners -> Trying new aggregation rules.
      We could have a more robust aggregation by considering, for example, the median instead of the mean.

    Privacy metrics: Can we measure the Privacy Leakage?

    Privacy metric in TRUMPET


    • Should provide a “score” proportional to the privacy risks in the FL implementation. We could need more than one metric depending on the particular attack…
    • Some possible examples for this score
    • Measure the effectiveness of the State of the Art attacks.
    • Measure the remaining privacy budget.
    • How much y tells us about x?

    • If we have a statistical model for x and y, the mutual information comes naturally as a measure of leakage.
    • We have defined a privacy metric for membership inference attacks based on the mutual information between the membership variable for the target record and the observations.



    Conclusions

    • Federated Learning (FL) appeared as a promising solution to train data coming from different sources.
    • Only parameter updates are exchanged.
    • Data is never directly shared.
    • Despite its potential, FL introduces several relevant security and privacy challenges.
    • To mitigate these issues:
    • More robust architectures can be designed.
    • PET (Privacy Enhancing Technologies) techniques can be incorporated (SMPC, HE, DP, ZKPs, etc).
    • In TRUMPET we propose mechanisms to quantify the privacy leakage caused by the exchanged updates.