As machine learning models are used more and more, we want to know about any potential security issues with these models and prevent them before they become a problem. It turns out that researchers found that machine learning models can be attacked using so-called adversarial attacks. These are very small modifications of the model’s input such that a human does not see any difference, but it causes the model to make a completely incorrect decision.
How can we reliably estimate the quality of an adversarial attack detector?
This can cause security problems in the real world. For example, consider a “stop” sign on the side of the road. A hacker might add some small stickers to this sign such that you do not see any difference, while your car suddenly thinks it is a “highway” sign.