Q1. How Theorem 4.3 can be improved, for example, when the dataset is non-convexly separated or the number of classes is greater than 2?

Theorem 4.3 explains that if a two-layer ReLU network is initialized in proximity to the polytope cover, gradient descent may converge to the global minimum. This result readily extends to multiple polytopes or a polytope-basis cover. Specifically, consider a scenario with polytopes covering a single class and a three-layer network as defined in Proposition 3.7, equation (5) in the revised paper. Assuming the subnetworks, denoted as , are initialized near polytopes, satisfying the conditions in Theorem 4.3. Then, we can apply Theorem 4.3 to each polytope individually. This is possible because is the minimum of two-layer ReLU networks - it is important to note that the operation in the network executes all gradients in backpropagation, except only one subnetwork for each input. This establishes the extension of Theorem 4.3 to a polytope-basis cover for three-layer ReLU networks.

Q2. Is it possible to obtain some UAP results of the 3-layer ReLU networks to approximate a compactly supported functions , in terms of smoothness?

Thank you for a insightful question. We obtained that it is possible. Since we proposed a method to approximate the indicator function in by a 3-layer ReLU network, a common idea in Lebesgue theory easily generalizes this result to approximate a compactly supported functions . We propose the result in the appendix. See Theorem C.5 in the revised paper. As the reviewer anticipated, the width is related with the error bound and the Lipshictz constant of .

Remark. Notably, (Wang et al., 2022) recently proved that 2-layer ReLU networks cannot approximate a compactly supported function in , while 3-layer ReLU networks can. However, their proof establishes only the existence of such networks without offering any width bounds. Consequently, our result in Theorem C.5 represents a refinement of their findings. This is one example that our results on polytope can be leveraged to derive width bounds for the UAP.

Reference.

[Wang et al., 2022] Ming-Xi Wang and Yang Qu. Approximation capabilities of neural networks on unbounded domains. Neural Networks, 145:56–67, 2022.

Q3. How can your results be applied to real datasets?

In the revised paper, we supplied empirical application of our results, by constructing a polytope-basis cover of the given real dataset. See General Comment KR1 in the overall comments.