2023
Article
Advances in Neural Information Processing Systems (NeurIPS)

M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities, and Models

By: Himabindu Lakkaraju, Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai and Haoyi Xiong

Abstract

While Explainable Artificial Intelligence (XAI) techniques have been widely studied to explain predictions made by deep neural networks, the way to evaluate the faithfulness of explanation results remains challenging, due to the heterogeneity of explanations for various models and the lack of ground-truth explanations. This paper introduces an XAI benchmark named M4, which allows evaluating various input feature attribution methods using the same set of faithfulness metrics across multiple data modalities (images and texts) and network structures (ResNets, MobileNets, Transformers). A taxonomy for the metrics has been proposed as well. We first categorize commonly used XAI evaluation metrics into three groups based on the ground truth they require. We then implement classic and state-of-the-art feature attribution methods using InterpretDL and conduct extensive experiments to compare methods and gain insights. Extensive experiments have been conducted to provide holistic evaluations as benchmark baselines. Several interesting observations are made for designing attribution algorithms. The implementation of state-of-the-art explanation methods and evaluation metrics of M4 is publicly available at https://github.com/PaddlePaddle/InterpretDL.

Keywords

AI and Machine Learning

Citation

Lakkaraju, Himabindu, Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, and Haoyi Xiong. "M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities, and Models." Advances in Neural Information Processing Systems (NeurIPS) (2023).

About The Author

Himabindu Lakkaraju

Technology and Operations Management

More Publications

M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities, and Models

Abstract

Keywords

Citation

About The Author

Himabindu Lakkaraju

More from the Authors

Fair Machine Unlearning: Data Removal while Mitigating Disparities

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Post Hoc Explanations of Language Models Can Improve Language Models