Advancing Certified Robustness of Explanation via Gradient Quantization
Published in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024
Explaining black-box models is fundamental to gaining trust and deploying these models in real applications. As existing explanation methods have been shown to lack robustness against adversarial perturbations, there has been a growing interest in generating robust explanations. However, existing works resort to empirical defense strategies and these heuristic methods fail against powerful adversaries. In this paper, we certify the robustness of explanations motivated by the success of randomized smoothing. Specifically, we compute a tight radius in which the robustness of the explanation is certified. While a challenge is how to formulate the robustness of the explanation mathematically, we quantize the explanation into discrete spaces to mimic classification in randomized smoothing. To address the high computational cost of randomized smoothing, we introduce randomized gradient smoothing. Also, we explore the robustness of the semantic explanation by certifying the robustness of capsules. In the experiment, we demonstrate the effectiveness of our method on benchmark datasets from the perspectives of post-hoc explanation and semantic explanation respectively. Our work is a promising step towards filling the gap between the theoretical robustness bound and empirical explanations. Our code has been released at here.
Recommended citation: Yang Xiao, Zijie Zhang, Yuchen Fang, Da Yan, Yang Zhou, Wei-Shinn Ku, Bo Hui."Advancing Certified Robustness of Explanation via Gradient Quantization. " Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 24). October 21--25, 2024, Boise, ID, USA.