2. Assumptions

[W2.1] why is affected by ? In my opinion, is just a treatment to observe and does not affect the 'value' of . (i.e., the value of is affected only by and observed only when ).

Response: In fact, is not just a treatment to observe and does affect the 'value' of . We further clarify the definition of in Section 2 and add a notation table in Figure 1 to make the presentation clearer.

Specifically, , , and are the feature, treatment (e.g., exposure), and feedback (e.g., conversion)} of the user-item pair . These broad concepts can have different meanings in different recommendation scenarios.
For instance, one may consider equals 1 or 0 represents whether the item is exposed to user or not, and is the conversion indicator. Thus it is plausible that the exposure of the item affects the conversion.

[W2.2] If is affected by , I think the assumption 3 is not hold.

Response: We would like to distinguish here between the observed feedback and the potential feedback .

We agree with the reviewer that is affected by , e.g., the exposure of the item affects the conversion.
Nonetheless, what assumption 3 states is that the potential feedback is independent with the exposure indicator .
Since the definition of is the potential feedback that would be observed if item had been exposed to user (i.e., had been set to 1). Thus it is plausible that is independent with .

[W2.3] In the paper, is a scalar (a continuous variable), not a representation vector.

Response: We thank the reviewer for pointing out this issue.

On a theoretical level, our approach allows to be a vector. Noting that there is a significant difference in the statistical theory when is a vector and a scalar, we discuss in detail the new proofs and the theorems in main text for multi-dimensional in Appendix G.
On a experimental level, as the reviewer captured, we only considered the cases when is a scalar (a continuous variable), but such implementation has already achieved significant performance improvement. In the future, we will conduct more experiments to explore the use of multi-dimensional to implement the proposed methods.

3. Minor concerns

In the real-world experiment, the authors use 5% MAR test ratings for the propensity estimation. This process is unrealistic.

Response: Indeed, in our original manuscript, we implement the proposed methods using both Logistic Regression (LR) and Naive Bayes (NB) for estimating the propensities. LR does not require any MAR test ratings for the propensity estimation, whereas NB requires 5% MAR test ratings for the propensity estimation. We carefully revised the statement about propensity estimation in Section 6, as well as highlighting the results of the experiments using LR and NB in Table 2.

[W3.2] In the semi-synthetic experiment, the definition of neighborhood effect is the number of neighbor pairs with . what does it mean? Since {, }, I cannot understand the is chosen to be the median of all .

Response: We thank the reviewer for the careful reading and apologize for the typo here. Actually, is chosen to be the median of all . We have fixed this typo in our revised manuscript.

Finally, we would like to kindly remind the reviewer that he/she might have mistakenly flagged "Flag For Ethics Review" due to checking both "No ethics review needed" and "Yes, Responsible research practice (e.g., human subjects, data release)".

We hope the above discussion will fully address your concerns about our work. We really appreciate your insightful and constructive comments to further help us improve the quality of our manuscript. Thank you!

References

[1] Yu Zheng et al. Disentangling User Interest and Conformity for Recommendation with Causal Embedding, WWW 2021.

[2] Mouxiang Chen et al. Adapting Interactional Observation Embedding for Counterfactual Learning to Rank. SIGIR 2021.