Day 2 Discussion
On day 2, we learned more about explainability versus interpretability and about deep learning XAI Methods.
Please have one of your team members reply in the comments to to each of these questions after discussing them with the team.
See the Jupyter notebook for this topic.
Discussion prompts
- Today we introduced the topics of AI explainability and interpretability. Building on what you learned in both sessions today, what do you think the goals of explainability and interpretability are (or should be) when working with your end user? In other words, why do (or should) these terms matter for end users?
- How can you use the techniques covered today (either in the lecture or in your trust-a-thon activity) to better meet the needs of your end users?
Q. 1: We think that goals of explainability and interpretability are critical when working with end-users. For example, users need to trust the results to feel confident before acting on them. Clear communication and visualization also help increase end-user trust, and this is simpler to do if we have an interpretable and explainable model. Then, modelers can visually show connections between the prediction and the inputs to the model that contributed the most to the end result on top of simply being able to interpret the results. An understanding of model results by the modelers – no “black box” models – will improve communication with end-users and increase user understanding and trust.
Q. 2: We’ve known that with the increase in wind speed come noticeable changes in storm structure and intensity. For tropical cyclone forecasters (i.e. the first end-users), the expectation is that AI methods can highlight the structural features such as the eye, eyewall(s), and spiral rainbands that could contribute to the extreme winds. Compared to the whole storm, some of the features might be tiny in scale (e.g., the cyclone eye). Thus, using the smoothGrad method may not be a good choice because it might smooth out some key structures. XRAI represents spiral structure well so it could be an informative and explainable metric when the goal is to estimate wind speed for stronger cyclones. However, this may not be a good choice for weak storms. Unorganized snow and cloud patches (e.g., circular or elliptical shape) in satellite images may mislead users to think them to be structured eyewall. We can also carefully choose the reference image (when using a path attribution method) tailored to the needs of the users. To exemplify, if end-users care about severe storms, choosing a weak storm can be one of the references.
This is a very nice summary that goes into some detail regarding the scientific results. Well done!
A1) Interpretability and explainability are absolutely crucial for the overall adoption of AI systems in our daily lives, and more so when the stakes for the correct prediction are high (like with meteorological, medical, and financial applications). As the adoption of AI increases in our daily lives, the need to be able to correctly interpret the decision made by the AI system, and to be able to explain what was the decision-making process, or the factors that led to the outcome generated by the model will be very important. This can ensure that the AI system can be trusted by both the developers and the end-users, while also helping to build better and more explainable models.
A2) The ability to explain the CNN’s decision-making process using saliency and other advanced methods is very helpful, as it will help to understand what was the decision-making process that led to the outcome and would give the developer the ability to check if that aligns with the desired results. This can also be very crucial in the identification and then correction of certain biases that may exist in the training dataset (example: The model may be detecting a polar bear in the image by looking at the polar environment (because all the images in training data might have such background), which is not what we should expect it to do. This can be identified and appropriate action can be taken to correct such biases)
Great observation about using XAI for bias correction! That’s a great example of a reason to use XAI.
Ans 1: In this context I think that explainability is important as it will tell us how well we are able to explain natural phenomenon using our AI models and if its not accurate then what more can be done to improve it and what are the factors that we are missing due to which we are getting wrong results. Interpretability is also important as it will tell us that is our model following the physical laws or not. Like we should be able to connect our results that we get from the AI model with the physical laws. If we are able to do both of these things quite accurately then our AI model is good for our end users.
Ans 2: We should try to focus on all of these factors so that we don’t end up creating some AI model which will give wrong results. So after creating an model we should make sure that it is ready to use for end user or not.
Yes, many domain scientists especially equate interpretability with “does this follow physics”. In my experience, people are more likely to be comfortable using an AI model if they are convinced it has some basis in physics (or at least, isn’t constantly violating basic physical laws). There is actually a sub-field of machine learning called “physics-guided ML” which uses basic physical laws to constrain ML models.
Summary of Team 37’s discussion:
1. The reason we emphasize explainability and interpretability is to build trust for our AI model outputs. Because end users are not only interested in what AI models tell us, but also how they work. We need to explain what input features are important contributions to the outputs. Every AI model can give us a prediction, but only those explainable/interpretable can win the most trusts of end users.
2. End users or stakeholders may value different aspects of what an AI model can do, such as region of interest for weather forecasters. We can evaluate and explain our AI models by using global and local XAI methods. The information may help end users understand the strength and limitation of the AI models, and assess how to apply the models.
I like that you emphasized the difference between global and local predictions–an ongoing discussion whose outcome will vary based on your end goals.
1. Explainability and Interpretability are two topics of AI that are often overlooked. For our scenario with the Tropical Forecaster, the results of our model must have output results that are as accurate as possible, even if it means going back to the drawing board and training it again or adding more data. Since the forecaster will be making decisions that impact whole communities and will need to be in touch with emergency managers, any bad output could affect the lives of many people in his/hers community. In terms of interpretability, it should be clear what the model is doing to someone with a non-technical background. Simply treating it as a black box that receives and input and yields and output is not good enough. For a convolutional model, we could explain that our input image gets passed through multiple layers, features are extracted and then passed through a layer that ‘remembers’ key features to create an output. This would at least give them an idea of what’s going on instead of abstracting everything.
2. We thought of a slightly alternate scenario/model. Let’s say the model that our forecaster is using requires an immense amount of computational power to run, and may not produce an output in a short enough timespan to alert his/her community. In this case, we say that the main bottle neck is the resolution/number of features put into the model. By applying a feature reduction technique like Shapley values or running XGboost to perform feature selection, this would help us reduce the number of input features needed while retraining a high enough accuracy. However, we also though that it might be worth using a lighter-weight architecture while maintaining as many input features of the image as possible in order to ensure the info the emergency people receive is as high resolution as possible.
You touched an important thing–iteration! You almost never design your model once and present the results–you identify the problem, ask questions, make your best first attempt, look at the results, share them with other stakeholders, and then usually you go back to the drawing board (often more than once). Keeping your end users and stakeholders involved in the entire design process, rather than just handing them some plots at the end, increases your ability to provide them with useful, actionable information.
Also, I like your alternative scenario that highlights the fact that we often have to balance high precision/accuracy and speed. Depending on the product/design, getting reasonably accurate information to your end user quickly may be more useful than a slightly more accurate prediction that takes longer.
Thoughts from our discussion on todays questions/notebook from team34:
XRAI most easy to interpret as well as smooth grad.
Performance plot was helpful. End user might benefit from error bars/ ranges on the estimate. Need more information on what the end user needs. Is there a particular threshold they are looking at for decision making.
End user needs some domain knowledge to interpret plots/decisions. Forecast has this, does the transportation official?
Important for use to understand why the model is making decisions it is, to know what is going on and why.
Could benefit from using images that are not so perfect… rotated, missing bits, shifted storms.
While the maps are noisy, the consistency between maps/results of areas of importance. i.e. the eye of the storm aid confidence
Co-production of knowledge. Interfacing with end user is important for development. We need a ongoing conversation.
No one best methods. Use many methods and triangulate. All methods/models have advantages/disadvantages. Notebook today and lectures show us the value of using many methods.
Nice work Team 34! You are doing a nice job of thinking about how to make your results more useful and interpretable for your end users. Asking these questions about user needs before and during your ML model design and training process can help you provide models with more utility for your end users. In particular, you identified the possibility that decision makers might have some kind of threshold to help them decide–that would be a great thing to know before designing your AI model.
And yes, interfacing across different disciplines is very tricky–as you correctly identify, someone designing ML models for weather and a weather forecaster will probably be able to communicate with each other and share results (though even this can be trickier than you might think!). But when trying to talk to the transportation official, it is especially important to establish common goals and a common language.
Overall excellent work Team 34!!
Recap of Stakeholders (for quick reference):
User #1: Forecaster in a Tropical Region
Background: Wind speeds are an important dimension for forecasting tropical cyclones. While there is some opportunity for direct measurements (such as with aircrafts), observational data is not often available. Forecasters need other ways to get this crucial information.
Key user needs: The forecaster needs to know roughly how strong wind speeds are so they can assess the local impacts and provide accurate information to local emergency managers. They also need this information to have high resolution so they can advise the emergency managers on where the priority areas are for their disaster responses. If the information is not accurate or precise enough, the forecasters may lead the emergency management officials to the wrong areas and/or waste valuable time in the wake of the tropical cyclone.
User #2: Department of Transportation Official
Background: Wind speeds are an important factor for transportation officials when making decisions about closing and (reopening) bridges. Strong winds can cause driving conditions to deteriorate, making driving on them incredibly dangerous. Crashes are not only dangerous to those driving, but they could also block important routes for emergency response vehicles. The decision to close the bridges is also very important because they are needed for people to evacuate, especially from barrier islands, so closing the bridge means potentially taking away some people’s ability to evacuate from a storm.
Key user needs: The transportation official needs extremely precise data on wind speeds so they can effectively walk the line between keeping pathways open for evacuation and making sure driving conditions are safe.
Great summary of your users and their needs–having a strong grasp on end user needs beforehand will help you design more useful AI models!
Ans1: In simple words (my understanding), the interpretability is related to the cause effects model accuracy. It becomes very important for high-risk scenarios. The explainability is related to the steps used from input to output (e.g., parameters, nodes, etc.) that make the AI models transparent out of the black box. Overall, explainability is the explanation of what is happening in the ML model from input to output.
Ans2: The interpretability will help to understand the causes of the occurrence of some catastrophic weather events in terms of the features.
Great start here! But how do these terms and ideas relate to the end users? Why are they (or are they not) important for them?
Nice! The interpretability can also help with trustworthiness–decision makers often combine model information with their own knowledge and experience (as well as other models). For example, a weather forecaster who has to make a decision will probably be less inclined to trust a model they do not feel like they can interpret or that they physically understand.