The overall writing quality could be improved; there a few typos (e.g., line 053, an novel; line 219, it does not related), the paper does not use parenthesis for in-text citing, some figures are not commented (e.g., Figure 2 and Figure 3).
The theoretical results and mathematical formulation are far from being sound:
- Equation 2 is not well defined; what does it mean to solve an optimization for all ? does it mean that we are solving optimization problem?
- The optimization problem in (2) aims at maximizing the divergence between two conditional distributions, without having any further constraints. This formulation does not much the goal described in the paper, as it does not constrain the target distribution to have a good generative power on the non-private concepts.
- Similar problems as above hold also for problem (3).
- Theorem 1 is trivial.
- I doubt the correctness of Lemma 1. For example, when is the normal distribution, Lemma 1 implies that both and induce the same distribution. Further justification and explanation is needed.
- In lines 308--310, the paper claims an equivalence between (2) and (3) without proving it.
- nit: is usually reserved for discrete integration, while is used in the non-discrete case.
- The paper promises detailed proofs of the theoretical results, but (in my opinion) does not fulfill this promise.
I do not really understand the motivation behind the clustered aggregation scheme in Section 3.3. For example, if all clusters have the same size, this aggregation would be equivalent to a normal aggregation.