摘要:SummaryAchieving human-level performance on some of the machine reading comprehension (MRC) datasets is no longer challenging with the help of powerful pre-trained language models (PLMs). However, the internal mechanism of these artifacts remains unclear, placing an obstacle to further understand these models. This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance, revealing the potential explainability in PLM-based MRC models. To ensure the robustness of the analyses, we perform our experiments in a multilingual way on top of various PLMs. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process, showing strong correlations to the final performance than other parts. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.Graphical abstractDisplay OmittedHighlights•What are the important components that account for explainability of MRC models?•Robust explainability investigation through multilingual and multi-aspect analyses•The focus of attention varies in different layers and heads•Passage-to-question and passage understanding play important roles in MRCComputer science; Machine perception; Computational intelligence