A novel backdoor attack approach named BARWM on real-world deep learning models deployed in mobile applications utilizes DNN-based steganography to generate imperceptible, sample-specific backdoor triggers, achieving high attack effectiveness and stealthiness.
The authors collected 38,387 mobile apps and extracted 89 real-world models to evaluate the effectiveness of BARWM, as compared to the baseline methods, BARWM achieves significantly better attack performance while maintaining the normal performance of the models.
Backdoor attacks allow adversaries to embed malicious functionality into DNNs during training, enabling them to control the model’s behavior on specific inputs.
The existing backdoor attack techniques, including invisible backdoor attacks, aim to conceal the presence of the backdoor, which highlights the vulnerability of on-device models, particularly those using frameworks like TensorFlow Lite, to backdoor attacks.
While the use of DNN-based steganography to generate imperceptible backdoor triggers for on-device models, aiming to improve both the effectiveness and stealthiness of these attacks compared to previous methods like DeepPayload.
The threat model outlines a novel approach for backdooring real-world on-device deep learning (DL) models, as attackers extract and analyze DL models from Android apps, reconstruct them for training, and then poison them using a steganography-based technique.
It involves generating sample-specific, imperceptible triggers by embedding target strings within benign images using a trained encoder-decoder network, which are used to retrain the model, creating a backdoor that activates when the specific trigger is present in the input.
The resulting backdoored model is then seamlessly integrated back into the original app, enabling the attacker to manipulate the model’s output without altering its structure or functionality in most cases.
By utilizing the power of DNN-based steganography, this approach is able to improve the stealth and efficiency of the attack, thereby making it significantly more difficult to detect and mitigate.
The research evaluates the effectiveness and stealthiness of a novel backdoor attack method, BARWM, against real-world on-device machine learning models.
Experiments on various CNN architectures demonstrate that BARWM outperforms existing methods like DeepPayload, BadNets, and Invisible Attack in terms of attack success rate while minimizing the impact on the model’s benign accuracy.
BARWM generates highly imperceptible, sample-specific triggers, resulting in significantly higher stealthiness compared to baselines, while further evaluation on real-world mobile models extracted from Android applications confirms the superior performance and robustness of BARWM in injecting backdoors into these models.
It leverages DNN-based steganography to generate imperceptible, sample-specific triggers, mimicking adversarial perturbations, which enhance stealth and efficacy compared to traditional backdoor attacks, posing a significant threat to real-world models, particularly those used in image classification and object detection.
While the research primarily focuses on backdoor attacks, it emphasizes the importance of understanding model behavior, including inferring category labels, for effective attack strategies and robust defense mechanisms.