Q1. Impact uniformity loss: How should we understand in eq. 20? is a single point, so I assume we are considering a Dirac distribution at . Theoretically, the distance from a Dirac distribution to a uniform distribution in the sphere is constant regardless of where the Dirac distribution is placed, due to invariance to rotations, right? Why does this term favour uniformity if it is a constant term? Or am I misunderstanding something?

Answer. are the representations from the network projected on the hypersphere of two augmented versions of the same image. Thus, is not a single point. The self-supervised loss in Eq. (20) is proposed in [8] (See Eq. (86)), and we closely follow their setting for this experiment.

Q2. Radon Transform measure preservation: Why does the proposed Radon transformation in eq. 8 transform a probability distribution defined on into a probability distribution defined on This is mentioned in lines 332-333, but in line 268 it says that . So it does not immediately follow that the Radon transform preserves the measure.

Answer. In the case where is a probability distribution on , is a distribution of . This follows directly from the proof provided in Appendix A.1. We recall the proof as follows: Since is non-negative on , is also non-negative on . Moreover,

Thus, is a probability distribution on .

Q3. STSW Computation on continuous measures: In section 5 you explain how to compute STSW in practice, but it is assumed that the probability distributions are discrete. Is it possible to get a closed form analogous to that in eq. 19 for non-discrete distributions?

Answer. Equation (19) is derived directly from the closed-form expression presented in [6]. For a general probability distribution, a closed-form expression can be obtained by replacing the summation with integration, as demonstrated in [7]. This approach is analogous to the well-known closed-form expression for the 1-dimensional Wasserstein distance: formulas for general distributions involve integrations, while those for discrete distributions use summation. In practice, implementations for discrete distributions rely on fundamental operations such as matrix multiplication, sorting, and similar techniques.

In applications, we typically work with discrete probability distributions. This is why we focus on discrete probabilities in the paper.

Q4. Injectivity of the radon transform: In Theorem 4.3 it is proved that if the splitting map is invariant, then the spherical Radon transform is invariant. What would be the consequences of using a non-injective spherical Radon transform? What structure might be missed?

Answer. This is a significant contribution of our paper, as the injectivity of a Radon transform variant is often a crucial requirement. It determines whether the derived metric (such as STSW) qualifies as a true metric or remains a pseudo-metric. Without injectivity, the Radon transform could lead to a pseudo-metric, allowing the possibility of two distinct probability distributions having a distance of zero. Consequently, using a pseudo-metric in applications could result in unstable performance.

The study of injectivity in Radon Transform variants has been extensively explored in numerous studies, including [1], [2], [3], [4], [5], and others.