The epoch of AI is already in motion and it will be touching our everyday lives more and more. This very same blog may have been written by using some AI-based module. The source code underlying this blog platforms may have been implemented together with some AI companion.
All this triggers a lot of excitement, but also some concerns, for sure in people having some security background (you know, the usual mood breakers).
How do we ensure that these AI-components will not jeopardize the security and safety of the system in which they will be integrated? There are already many examples of real AI-components that have been easily fooled with simple attacks (face recognition, autonomous driving, malware classification, etc). How can we make these AI-components more secure, robust, and resilient against these attacks?
Adversarial machine learning (AML, in short) is the process of extracting information about the behavior and characteristics of an ML system and/or learning how to manipulate the inputs into an ML system in order to obtain a preferred outcome. As explained very well in the document released by NIST in Jan 2024 "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations" [1], studying AML enables understanding how the AI system can be compromised and therefore how it can also be better protected against adversarial manipulations. AML, as discussed in [1], has a very broad scope, touching different security and privacy aspects of AI components. In particular, four main types of attacks are considered: (1) evasion, (2) data and model poisoning, (3) data and model privacy, and (4) abuse.
In collaboration with the University of Cagliari, Pluribus one, and Eurecom, we started a PhD subject in October 2022 focusing on "Security Testing for AI components", mainly targeting evasion attacks. The rest of this blog will introduce this line of study and mention what we have been done so far.
In our work we study evasion attacks in multiple industrial domains to understand how and at which extent these attacks can be mitigated. Concretely, we aim to contribute to the community effort toward an open-source testing platform for AI components. We are currently channeling our work into SecML-Torch, an open-source Python library designed to evaluate the robustness of AI components. The ultimate goal for us is to build a feedback loop based on adversarial retraining to increase the robustness of the AI component under-test.
In an evasion attack, the adversary’s goal is to generate adversarial examples to alter AI model behavior, i.e., fool its classification result.
Depending on the situation, the adversary may have perfect (white-box), partial (gray-box), or very limited knowledge (black-box) of the AI system under-testing. In the first case, attackers can stage powerful gradient-based attacks, and they can easily guide the optimization of adversarial examples with the full observability of the victim AI model itself. A classical example of partial knowledge is when the attacker knows the learning algorithm and the feature representation, but not the model weights and training data. In black-box scenarios, the attacker does not know the AI system under-testing and can only access it via queries.
In our study, we focused so far on gray-box and black-box scenarios. As industrial domains we have been targeting Web Application Firewall (WAF) and Anti-Phishing classifiers. Both these domains are consuming more and more AI, with their classifiers getting advantage of complex AI models trained on many historical data (e.g., CloudFlare ML WAF, Anti-phishing Vade Secure). Creating adversarial examples for these industrial domains is more challenging than doing it for simpler domains such as image recognition. Indeed the input space the attacker can play with is more elaborated than adding noise to the different pixels of an image.
Our work on adversarial machine learning for WAF is already well explained in this blog [2] written by Davide Ariu CEO at Pluribus One. For the reader interested to the full technical details, I can suggest our paper pre-print "Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning" here [3]. In summary we showed how
For our study on Anti-phishing, we will write a specific blog as continuation of this one. Full details are available in our paper "Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors" available here [4]. The manipulations created for this work will be soon available in SecML-Torch.
Let me finish by thanking the main leader of all this line of research: Biagio Montaruli, our fellow PhD candidate.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
23 | |
13 | |
11 | |
10 | |
8 | |
7 | |
6 | |
6 | |
5 | |
5 |