An Introduction to Adversarial Machine Learning

Adversarial machine learning is concerned with the design of machine learning (ML) algorithms that can resist various security challenges and attackers.

Miguel Hernandez

Jan. 18, 22 · Tutorial

Likes (5)

Comment

Save

3.3K Views

Adversarial machine learning is concerned with the design of ML algorithms that can resist security challenges, the study of the capabilities of attackers, and the understanding of attack consequences.

Adversarial Machine Learning states that there are four types of attacks that ML models can suffer.

Extraction Attacks

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access.

It is produced by making requests to the target model with inputs to extract as much information as possible, and with the set of inputs and outputs, train a model called substitute model.

Extracting a model is hard. The attacker needs a huge compute capacity to re-train the new model with accuracy and fidelity, and training a substitute model is equivalent to training a model from the ground up.

Defenses

Limit the output information when the model classifies a given input.
Use differential privacy.
Use ensembles.
Proxy between end-user and model like PRADA.
Limit the number of requests.

Inference Attacks

Inference attacks aim to reverse the information flow of a machine learning model. They allow an adversary to have knowledge of the model that was not explicitly intended to be shared.

Inference attacks pose severe privacy and security threats to individuals and systems. They are successful because private data are statistically correlated with public data, and ML classifiers can capture such statistical correlations.

There are three types of inference attacks:

Membership Inference Attack (MIA).
Property Inference Attack (PIA).
Recovery training data.

Defenses

Use advanced cryptography (e.g., differential cryptography, homomorphic cryptography, secure multi-party computation).
Techniques such as dropout.
Model compression.

Poisoning Attacks

This technique involves an attacker inserting corrupt data in the training dataset to compromise a target machine learning model during training.

Some data poisoning techniques aim to trigger a specific behavior in a computer vision system when it faces a specific pattern of pixels at inference time. Other data poisoning techniques aim to reduce the accuracy of a machine learning model on one or more output classes.

This attack is difficult to detect when performed on training data, as the attack can propagate between different models using the same data.

The adversary seeks to destroy the availability of the model by modifying the decision boundary and, as a result, producing incorrect predictions.

Finally, the attacker could create a backdoor in a model. The model behaves correctly (returning the desired predictions) in most cases, except for certain inputs specially created by the adversary that produce undesired results. The adversary can manipulate the results of the predictions and launch future attacks.

Defenses

Protect the integrity of training data.
Protect the algorithms; use robust methods to train models.

Evasion Attacks

An adversary inserts a small perturbation (in the form of noise) into the input of a machine learning model to make it classify incorrectly (example adversary).

They are similar to poisoning attacks, but their main difference is that evasion attacks try to exploit weaknesses of the model in the inference phase, not in the training.

The attacker’s knowledge of the target system is important. The more they know about your model and how it's built, the easier it is for them to mount an attack on it.

An evasion attack happens when the network is fed an “adversarial example” — a carefully perturbed input that looks and feels exactly the same as its untampered copy to a human — but that completely throws off the classifier.

Defenses

Training with adversarial examples to make the model robust.
Transform the input to the model (input sanitization).
Gradient regularization.

Tools for Adversarial Machine Learning

Adversarial Robustness Toolbox (ART)

Adversarial Robustness Toolbox (ART) is a Python library for machine learning security. ART provides tools that enable developers and researchers to defend and evaluate machine learning models and applications against the adversarial threats of evasion, poisoning, extraction, and inference.

ART supports all popular machine learning frameworks:

TensorFlow
Keras
PyTorch
scikit-learn

All data types:

Images
Tables
Audio
Video

And machine learning tasks:

Classification
Object detection
Speech recognition

Here is the pip installation:

pip install adversarial-robustness-toolbox

This is an attack example:

from art.attacks.evasion import FastGradientMethod
attack_fgm = FastGradientMethod(estimator = classifier, eps = 0.2)
x_test_fgm = attack_fgm.generate(x=x_test)
predictions_test = classifier.predict(x_test_fgm)

And this is a defense example:

from art.defences.trainer import AdversarialTrainer
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=tf.keras.optimizers.Adam(lr=0.01), metrics=["accuracy"])
defence = AdversarialTrainer(classifier=classifier, attacks=attack_fgm, ratio=0.6)
(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()
defence.fit(x=x_train, y=y_train, nb_epochs=3)

Counterfit

Counterfit is a command-line tool and generic automation layer for assessing the security of machine learning systems.

Developed for security audits on ML models, it implements black box evasion algorithms and based on ART and TextAttack.

This is the command list:

--------------------------------------------------------
Microsoft
                          __            _____ __
  _________  __  ______  / /____  _____/ __(_) /_
 / ___/ __ \/ / / / __ \/ __/ _ \/ ___/ /_/ / __/
/ /__/ /_/ / /_/ / / / / /_/  __/ /  / __/ / /
\___/\____/\__,_/_/ /_/\__/\___/_/  /_/ /_/\__/

                                        #ATML

--------------------------------------------------------

list targets

list frameworks

load <framework> 

list attacks

interact <target>

predict -i <ind>

use <attack>

run

scan

Final Words

"If you use machine learning, there is the risk for exposure, even though the threat does not currently exist in your space." and "The gap between machine learning and security is definitely there." -Hyrum Anderson, Microsoft

References

Thanks

Special thanks to @jiep as a co-writer of this article.

Machine learning

Published at DZone with permission of Miguel Hernandez. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

An Introduction to Adversarial Machine Learning

Adversarial machine learning is concerned with the design of machine learning (ML) algorithms that can resist various security challenges and attackers.

Extraction Attacks

Defenses

Inference Attacks

Defenses

Poisoning Attacks

Defenses

Evasion Attacks

Defenses

Tools for Adversarial Machine Learning

Adversarial Robustness Toolbox (ART)

Counterfit

Final Words

References

Thanks

Related

Partner Resources