Dear reviewer S7W7, Thanks for your valuable comments! We are addressing your questions as follows:

Redefinition of and

You are correct that and are defined in both Sections 2.1 and 2.2. These two definitions are actually equivalent but expressed using different notations: one for the standard empirical and population risks, and the other adapted to the supersample setting. We will clarify this equivalence in the revised version.

Training and Testing Separation in

The interpretation where is the training sample and is the test sample only applies to the illustrative example preceding Section 3.1. In the main analysis, we adopt the supersample setting, where the binary variable determines the training sample and the test sample . Hence, is equally likely to serve as either a training or test sample.

The symmetry in the supersample setting implies that the distributions of are actually identical under these two procedures:

Original supersample formulation: Assign training and test samples as and respectively, and define , .
Illustrative example formulation: Always fix for training and for testing, and define , .

We will revise the paper to unify the example and main analysis settings to avoid confusion.

It is true that our SICIMI term conditions on , whose distribution reflects a mixture of training and test samples. However, our focus remains on inductive learning algorithms, not transductive ones. Unlike transductive methods, which may leverage unlabeled test data during training, inductive algorithms do not access any information about the test set in the learning phase. The test samples are only used in the analysis stage to evaluate generalization bounds. Therefore, the two setups are fundamentally different and not directly comparable.

Convexity of

Here is a proof sketch for this result: The joint convexity of follows directly from that of the Jensen-Shannon divergence. Specifically, when considering Bernoulli random variables, we have and . Moreover, the joint convexity of -divergences follows from the convexity of the mapping , which is inherited from the definition of convex -functions.