Strengths

Originality

The iterative procedure for computing the path gradient has no memory overhead over non-path gradients and is potentially faster (see Weakness 3).
Path gradients are applied to the forward KL with reduced variance by applying the same algorithm to .
The approach has the potential to be generically applied to abitrary coupling blocks, if clarified.

Quality

The theoretical results might be correct, but I cannot judge at this point (see below). I have some doubts on the baseline experiments (see below).

Clarity

The motivation and main chain of reasoning are clear, but several parts of the manuscript lack clarity and detailed explanations (see below).

Significance

Making use of path gradients in order to regularize for the known unnormalized density of training data has the potential to greatly reduce compute over classical methods, so this chain of work is relevant to the machine learning + natural sciences community. Allowing the forward KL to make use of the unnormalized density is attractive, as the forward KL may have better properties than reverse KL (mode covering instead of mode seeking).

We thank the reviewer for the positive feedback. We agree that our method is generically applicable to coupling-based flows and is of relevance to natural sciences applications.

Weaknesses

1. The notation of Proposition 3.2 and its proof in the appendix are sloppy and I cannot determine the correctness: what is the inverse of the rectangular matrix
? Is it a pseudo-inverse, or is it a part of the network Jacobian? I suggest to greatly rewrite this proposition as a Theorem that outlines the general idea of the recursion (that the path gradient can be constructed iteratively by vector-Jacobian products with the inverse of each block, if I am right). Then proceed to derive concrete realizations for coupling blocks and affine couplings in particular if they allow for unique results.

The Jacobian matrix is square and invertible and there is thus no subtlety in defining its inverse. In more detail, we define a coupling block in Eq 6 as

where is an invertible function for any choice of . By bijectivity, the Jacobian matrix from above is not only square but also invertible. We have added a remark after Proposition 3.3 to emphasize this and we have rewritten the proof, so that this becomes more clear.

We have revised the manuscript implementing your suggestions. Specifically, we have added Proposition 3.2, where we first state the recursion for a general flow, i.e., not necessarily a coupling flow:

As we also remark in the modified manuscript, the evaluation of this expression however involves inversion of the Jacobian and is therefore prohibitively expensive for generic flow architectures (see answer immediately below for more details).