Deep dive into “failure cases”

For example, GNN and MLP can be both biased but towards different directions.

GNN and MLP can be both biased but towards different directions. What if MLP is right about a low-confidence instance while GNN is wrong?

This is a good question! Let us combine together your “failure case” questions in the original and follow-up comments for a joint analysis. We believe you are looking for a conceptual illustration. Thus, in the following, we will not focus on the theoretical results in Section 2.3.

What are the failure cases?

Let’s comprehensively construct possible failure cases as the reviewer suggested. Since this question is mainly for conceptual understanding, our goal is not to rigorously cover all possible corner cases. Rather, we use typical failure cases that seem controversial to aid our understanding of the experts’ interactions.

Looking at the overall training objective of Eq 1, the intuitive “success cases” should be:

The MLP loss is low, and its confidence is high: confident & correct MLP predictions
The MLP loss is high, and its confidence is low: unconfident & incorrect MLP predictions

Thus, the typical failure cases are just the opposite of the above success cases: 1. Confident & incorrect MLP predictions, 2. Unconfident & correct MLP predictions

Further, since we are dealing with two experts, we can have a more specific definition by considering the experts’ relative strengths. So the failure cases can be refined as:

Case 1: Confident & incorrect MLP predictions + correct GNN predictions
Case 2: Unconfident & correct MLP predictions + incorrect GNN predictions

Side note: nodes corresponding to Case 1 should have different self-features than the nodes corresponding to Case 2. Otherwise, their confidence would be the same.

Since the balance between the experts are ultimately controlled by the loss, we quantify each term in Eq 1.

Case 1: high ; high ; low ; low .
Case 2: low ; low ; high ; high .

How to address failure cases?

Observation: The commonality between the two failure cases is that for one expert, both its loss and the weight coefficient in front of the loss are high. To address the failure case, the Mowst training should be able to either reduce the loss, OR reduce the weight coefficient.

In summary, Mowst will take the following steps:

Update the MLP model to make the predictions on the case 1 nodes closer to random guess, and
Update the confidence function to make it have an “over-confident” shape, so that the MLP expert has higher on the case 2 nodes.

The two steps do not independently take effect. For step 1, it is a simple task for an MLP to “learn” a random guess. So executing step 1 will not have much effect on the predictions on the case 2 nodes. For step 2, an “over-confident” means that it is easier for the MLP to achieve high confidence. For sake of discussion, let’s manually construct a simple function as an example, where

if dispersion of is less than , and
if dispersion of is larger than .

If we just consider the above special function, we can make more “over-confident” with a smaller . Note that updating the confidence function in step 2 affects on both case 1 and case 2 nodes. So we next analyze the joint effect on the case 1 & 2 nodes after executing both steps 1 & 2.

In step 2, we can reduce until the dispersion of the MLP’s case 2 predictions is higher than (we can always do so since MLP’s case 2 predictions are correct by definition). Simultaneously, in step 1, we need to push MLP’s case 1 predictions more and more towards random guess, until the predictions have a dispersion lower than (we can always do so since random guess has 0 dispersion).

Net effect of reduced Mowst loss

Each term in the loss will change after executing steps 1 & 2. For case 1, will reduce (to 0 under our example confidence function). will increase. The net effect is reduced overall loss: we now have (by definition of case 1, is low). For case 2, will increase (to 1 under our example confidence function). will remain the same. The net effect is also reduced overall loss: where is low by definition of case 2.

Net effect of improved prediction behaviors

After jointly executing steps 1 & 2:

For case 1, MLP has unconfident incorrect predictions.
For case 2, MLP has confident correct predictions.

So through Mowst training, we have converted the two typical failure cases into two success cases.

Please do not hesitate to let us know of any further questions.