-
Notifications
You must be signed in to change notification settings - Fork 5
Description
The following peer review was solicited as part of the Distill review process.
The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.
General Comments
Paper summary:
Several feature attribution methods rely on an additional input (besides the one being explained) called the “baseline”. The paper discusses how the choice of baseline impact the attributions for an input, and proposes the idea of averaging over several baselines when good individual choices do not exist. It does this in the context of the specific attribution method called “Integrated Gradients” and the specific task of object recognition on the ImageNet dataset.
Pros:
- The paper is very well-written an easy to follow. It offers a very nice exposition of the Integrated Gradients method. The interactive visualization immensely help with understanding the various ideas
- The paper tackles the important and thorny issue of picking baselines in feature attribution methods. The visualization that allows choosing different segments of the input image as a baseline is very clever. It makes the sensitivity of the attributions to the choice of baselines very apparent.
Cons:
- The paper views the baseline as a mere implementation detail of Integrated Gradients (and other feature attribution methods). This is a bit misleading. The Integrated Gradients paper considers the baseline to be a part of the attribution problem statement. The various axioms are also defined for the pair of input and baseline. In that sense, Integrated Gradients posits that one must commit to a baseline while formulating the attribution problem.
- It would help to have more discussion on properties of Expected Gradients (and more generally of the idea of “averaging over baselines”). It is also not clear if one must simply average the attributions across different baselines. Instead, one may study the distribution over attributions to identify differnet patterns, say via clustering. (See the next section for more suggestions.)
Suggestions:
Below are some suggestions on improving / extending this paper:
- The idea of averaging over several baselines seems quite general, and so the paper could be greatly strengthened by including an additional example (preferable for a task on text or tabular inputs)
- It would help to discuss what axioms do Expected Gradients satisfy? Is there a new completeness axiom to tell us that we have taken enough background samples?
- Computing Expected Gradients involves computing the average attribution relative to a random sample of baseline points. The sampling brings uncertainty, and I wonder if the authors considered quantifying the uncertainty with confidence intervals?
- An attractive property of the black baseline is that it is encoded as zero, and therefore it is clear how to interpret the sign of the attribution — positive attribution means that the model prefers the pixel to be brighter. If the baseline is non-zero then the sign of the attribution is harder to interpret. A positive attribution would mean that the model prefers for the pixel to move away from the baseline. This may mean making the pixel brighter or darker depending on which side of the baseline the pixel lies. The problem is exacerbated when several different baselines are considered. It would help if the authors comment on interpreting the sign of the attributions.
- While the formalism discussed in the paper assumes a certain input distribution D, in practice, we only have a certain sample of the distribution. Often the sample may not be representative. In such cases, I worry that artifacts of the sample may creep into the Expected Gradients. It would help if the authors comment on this.
- When considering multiple baselines it could be that the attribution to a pixel is positive for some baselines and negative for some others, and the average attribution ends up being near zero. In such cases, I wonder if the expectation is right summarization of the distribution of attributions across different baselines? Instead, one could consider clustering the attributions (from different baselines) to separate the different patterns at play.
- The idea of averaging gradients across a sample of points is also used by SmoothGrad (https://arxiv.org/abs/1706.03825). Is there a formal connection between Expected Gradients and SmoothGrad?
Minor:
- In the second to last figure, what is the value of alpha used for parts (2) and (4)?
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Explanation of existing results
Advancing the Dialogue | Score |
---|---|
How significant are these contributions? | 4/5 |
Outstanding Communication | Score |
---|---|
Article Structure | 4/5 |
Writing Style | 4/5 |
Diagram & Interface Style | 4/5 |
Impact of diagrams / interfaces / tools for thought? | 4/5 |
Readability | 4/5 |
Scientific Correctness & Integrity | Score |
---|---|
Are claims in the article well supported? | 3/5 |
Does the article critically evaluate its limitations? How easily would a lay person understand them? | 2/5 |
How easy would it be to replicate (or falsify) the results? | 4/5 |
Does the article cite relevant work? | 3/5 |
Does the article exhibit strong intellectual honesty and scientific hygiene? | 3/5 |