Evaluating Adversarial Attack Detectors using Formal Verification Methods

Can machine learning models be attacked?

As machine learning models are used more and more, we want to know about any potential security issues with these models and prevent them before they become a problem. It turns out that researchers found that machine learning models can be attacked using so-called adversarial attacks. These are very small modifications of the model’s input such that a human does not see any difference, but it causes the model to make a completely incorrect decision.

How can we reliably estimate the quality of an adversarial attack detector?

This can cause security problems in the real world. For example, consider a “stop” sign on the side of the road. A hacker might add some small stickers to this sign such that you do not see any difference, while your car suddenly thinks it is a “highway” sign.

Do you want to be part of our research center?

Don’t hesitate to reach out! Or apply directly to one of our assignments.