Glossary Term


The degree to which a human can derive meaning from the entire model and its components through the use of post-hoc methods (e.g., verification, visualizations of important predictors).

Related terms: Interpretability

A distinction between interpretability and explainability is needed since some in the ML and statistics community favor producing interpretable models (i.e., restricting model complexity beforehand to impose interpretability; (Rudin et al. 2019, Rudin et al. 2021) while the general trend in the ML community is to continue developing partially interpretable and black box models and implementing post-hoc methods to explain them. A deep neural network (DNN) or dense random forest is uninterpretable, but through external explanation methods, can be approximately understood. Explanation methods must be approximate as they would otherwise be as incomprehensible as the black box model itself. This is not a limitation of explanation methods as suggested by other studies (e.g., Rudin et al. 2019, Rudin et al. 2021) since abstracting a complex model is required for human understanding. For example, it is common in the weather community to replace the full Navier-Stokes equation with conceptual models that are more understandable. The degree of explainability, though, is controlled by the model complexity (Molnar et al. 2020). As the number of features increases or their interactions become more complex, the explanations for the ML model behavior become less compact and possibly less accurate. At the moment, it is unclear how much improvement in understanding of high-dimensional, highly non-linear models current and future explanation methods will offer (Fig. 1). 

Fig. 1 From Flora et al. (2022, AIES)

Fig. 1 Illustration of the relationship between understandability and model complexity. Fully interpretable models have high understandability (with little to gain from explainability) while partially interpretable or simpler black box models have the most to gain from explainability methods. With increased dimensionality and non-linearity, explainability methods can improve understanding, but there is considerable uncertainty about the ability of future explanation methods to improve the understandability of high-dimensional, highly non-linear methods.

Return to Glossary