Analysis: Robustness to Partiality

In response to the reviewer’s concerns, we are happy to report that we performed a theoretically grounded experimental evaluation of robustness which we will be happy to include in the revision using the additional page of the camera-ready paper.

Specifically, we seek to evaluate the robustness of the isometry-estimation module in isolation in the presence of occlusions. We consider the encoder-free paradigm – consisting of the projection to the learned basis, the estimation of the isometric map as in Equation (8), followed by unprojection – which is the same setup in the toric and spherical laplacian experiments. This setup allows us to study exactly the effect of masking on the latents (as requested by the reviewer) which in this case are the input images. To do so, we consider two observations and which only partially correspond and denote and to be the diagonal overlap masks such that the -th diagonal element of is 1 if is the overlap with and otherwise, with defined in the same manner. We observe that the equivariance error under the partiality can thus be defined by the magnitude of the difference between the components of that get mapped to under . That is, we define the partiality equivariance error to be .

Using the toric laplacian experiments as a base, we consider two models of partiality. In the first, we no longer consider the domain to be toric and instead shifted images are clipped at the boundaries with the resulting empty pixels masked to zero – corresponding to the type of partiality often observed in video. In the second, we randomly mask out patches in the shifted image. We train five instances of our model under both partiality regimes, masking out approximately 10%, 20%, 30%, 40%, and 50% of pixels in each instance and measuring the resulting partial equivariance error on the test set. The results are shown in the table below.

Percent Occluded	10%	20%	30%	40%	50%
Shift Mask	5.66%	9.42%	14.43%	17.42%	20.47%
Patch Mask	6.69%	13.62%	23.54%	34.14%	45.35%

Notably, we see that the partial equivariance error is consistently lower in the presence of shift-based occlusions, and increases less as the percentage of occluded pixels increases. We observe that the principal difference between these two regimes is that the unoccluded pixels exist in a contiguous block under the shift mask, whereas the region of unoccluded pixels is fragmented under the patch masking. Intuitively, the contiguous matches between large blocks (comprising the majority of the image) act as a strong regularizer that de-prioritizes matching the occluded areas. Conversely, when the correspondence is interrupted and fragmented it offers weaker regularization and the resulting map is more affected by the spurious matches induced by the occlusion.