c:["$","div",null,{"className":"container py-8 max-w-6xl mx-auto","children":["$","$e",null,{"fallback":null,"children":["$","$L16",null,{"paper":{"id":"kOtFuzoA93","title":"Novel Kernel Models and Uniform Convergence Bounds for Neural Networks Beyond the Over-Parameterized Regime","abstract":"$17","keywords":["uniform convergence","reproducing kernel banach space","reproducing kernel hilbert space","relu","resnet"],"primary_area":"learning theory","venue":"ICLR 2025 Conference Withdrawn Submission","conference":"ICLR","year":2025,"status":"withdrawn","is_accepted":false,"avg_rating":4,"avg_rating_normalized":4,"rating_min":3,"rating_max":5,"rating_std":1,"review_count":4,"comment_count":5,"creation_date":"2024-09-26","modification_date":"2024-11-25","forum_link":"https://openreview.net/forum?id=kOtFuzoA93","pdf_link":"https://openreview.net/pdf?id=kOtFuzoA93","arxiv_id":null,"arxiv_url":null,"arxiv_match_method":null,"arxiv_matched_at":null,"tldr":"We construct two exact models for neural networks - one for the network as a whole, the other for the change during training - and use them to derive non-vacuous, well-behaved bounds on Rademacher complexity.","created_at":"2026-01-21T12:24:37.403065+00:00","updated_at":"2026-04-22T06:53:14.626848+00:00","authors":[{"id":"~Alistair_Shilton1","name":"Alistair Shilton","openreview_id":"~Alistair_Shilton1","position":0},{"id":"~Sunil_Gupta2","name":"Sunil Gupta","openreview_id":"~Sunil_Gupta2","position":1},{"id":"~Santu_Rana1","name":"Santu Rana","openreview_id":"~Santu_Rana1","position":2},{"id":"~Svetha_Venkatesh1","name":"Svetha Venkatesh","openreview_id":"~Svetha_Venkatesh1","position":3}]},"stats":{"ratings":[{"id":"pOl7Gr0wSZ","value":5,"confidence":3},{"id":"hyUgnXhFE7","value":3,"confidence":4},{"id":"OgVRu4Tskv","value":5,"confidence":2},{"id":"Pa2344SQLN","value":3,"confidence":2}],"avg_rating":4,"rating_min":3,"rating_max":5,"rating_std":1.1547005383792515,"detailed_scores":{"soundness":[3,3,2,3],"contribution":[3,2,2,2],"presentation":[2,2,1,2],"originality":[],"quality":[],"clarity":[],"significance":[]}},"commentTree":[{"id":"pOl7Gr0wSZ","paper_id":"kOtFuzoA93","replyto":"kOtFuzoA93","number":1,"type":"Official_Review","role":"reviewer","rating":5,"confidence":3,"soundness":3,"contribution":3,"presentation":2,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":5,"summary":"This paper introduces two novel kernel models, the global model and the local model, for understanding neural networks beyond the over-parameterized regime. The global model offers insights into Rademacher complexity for arbitrary neural networks, while the local model extends the NTK to provide a more detailed approximation during training steps.","questions":"1. How do the proposed modifications to the He and Glorot initializations affect the practical training of neural networks in terms of convergence speed and stability?\n\n2. What are the implications of these models for understanding the behavior of neural networks in non-standard architectures like recurrent or convolutional neural networks?\n\nIf the authors can clearly answer the above questions of this paper, especially for the local model part, I will consider raising the score.","soundness":3,"strengths":"The paper constructs rigorous theoretical frameworks that generalize well beyond the common over-parameterized settings, applying to any neural network configuration.","confidence":3,"weaknesses":"1. The authors have claimed that their analysis is beyond the overparameterized regime with the LOCAL DUAL MODE. However, the insights are primarily derived within the framework of neural tangent kernels and its local extension, which might not capture all dynamics of general neural networks. For example, a general neural network is not only the sum of the $f$ initialization and the change $\\Delta f$. \n \n2. The theoretical models and their predictions are not empirically validated with experimental data, which might raise questions about their practical applicability. For example, the author can compare their theoretical Rademacher complexity bounds to empirically measured values on standard image classification datasets like MNIST or CIFAR-10.","contribution":3,"presentation":2,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."]},"created_at":"2024-10-22T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

This paper introduces two novel kernel models, the global model and the local model, for understanding neural networks beyond the over-parameterized regime. The global model offers insights into Rademacher complexity for arbitrary neural networks, while the local model extends the NTK to provide a more detailed approximation during training steps.

","questions":"

\n
How do the proposed modifications to the He and Glorot initializations affect the practical training of neural networks in terms of convergence speed and stability?
\n
\n
What are the implications of these models for understanding the behavior of neural networks in non-standard architectures like recurrent or convolutional neural networks?
\n

If the authors can clearly answer the above questions of this paper, especially for the local model part, I will consider raising the score.

","strengths":"

The paper constructs rigorous theoretical frameworks that generalize well beyond the common over-parameterized settings, applying to any neural network configuration.

","weaknesses":"$18","code_of_conduct":"

Yes

"}},{"id":"hyUgnXhFE7","paper_id":"kOtFuzoA93","replyto":"kOtFuzoA93","number":2,"type":"Official_Review","role":"reviewer","rating":3,"confidence":4,"soundness":3,"contribution":2,"presentation":2,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":3,"summary":"This paper presents a global model and a local model of neural networks. The global model casts the neural network in the reproducing kernel Banach space. The local model casts the change in neural network weight with a local instrinsic neural kernel. The authors use both models to derive Rademacher complexity bound.","questions":"Are the nodes mentioned on line 126 - 127 like layer in the usual notion of neural networks?","soundness":3,"strengths":"The notion of local-instrinsic neural kernel (LiNK) is novel, and it is interesting for the authors to show that the neural tangent kernel can be seen as a first-order approximation of the LiNK.","confidence":4,"weaknesses":"$19","contribution":2,"presentation":2,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."],"details_of_ethics_concerns":"None."},"created_at":"2024-11-02T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

This paper presents a global model and a local model of neural networks. The global model casts the neural network in the reproducing kernel Banach space. The local model casts the change in neural network weight with a local instrinsic neural kernel. The authors use both models to derive Rademacher complexity bound.

","questions":"

Are the nodes mentioned on line 126 - 127 like layer in the usual notion of neural networks?

","strengths":"

The notion of local-instrinsic neural kernel (LiNK) is novel, and it is interesting for the authors to show that the neural tangent kernel can be seen as a first-order approximation of the LiNK.

","weaknesses":"$1a","code_of_conduct":"

Yes

","details_of_ethics_concerns":"

None.

"}},{"id":"OgVRu4Tskv","paper_id":"kOtFuzoA93","replyto":"kOtFuzoA93","number":3,"type":"Official_Review","role":"reviewer","rating":5,"confidence":2,"soundness":2,"contribution":2,"presentation":1,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":5,"summary":"This paper presents two models - called global and local models - of neural-networks applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations. \n\nThe first model is exact (un-approximated) and global (applicable for arbitrary weights), casting the neural network in reproducing kernel Banach space (RKBS). \n\nThe second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) with a well-defined local-intrinsic neural kernel (LiNK).","questions":"Same as the weakness.","soundness":2,"strengths":"Given the results are true and the assumptions are reasonable, this paper would solve the most intriguing problem in the theoretical studies of neural networks: it actually shows that the neural network can achieve the parametric convergence rate -- root of n. \n\nFor example, Theorem 3 and Corollary 5. has showed that the R_{N}(\\calF)\\leq 1/\\sqrt{N}.","confidence":2,"weaknesses":"1. The notation is rather annoying. ( I tried to figure out exact meaning of W^[j] and W^{[\\widetilde{j},j]} etc. This made me can not fully understand the essential contribution in Theorem 1 and Theorem 6. The two theorem seem to be the stepstone of their whole claim)\n\n2. Given the theorem 1 and theorem 6 are correct, they further claimed that the Rademacther complexity could be bounded by some quantity such as \\underline{\\psi} in theorem 3, \\psi_{\\Delta} in theorem 8. This paper then introduce some condition such that the radematcher complexity is upper bounded by 1/\\sqrt{N}. (However, it is well known that provided that the weights live in a compact set which is independent of width and depth, we can show that the radamachter complexity is upper bounded by 1/\\sqrt{N}.) There is no discussion on if they made the essential same assumption on the weight space, i.e.. if the assumption in the paper is essentially assumed that \"the weights live in a compact set which is independent of width and depth\".","contribution":2,"presentation":1,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."],"details_of_ethics_concerns":"N/A"},"created_at":"2024-11-03T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

This paper presents two models - called global and local models - of neural-networks applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations.

The first model is exact (un-approximated) and global (applicable for arbitrary weights), casting the neural network in reproducing kernel Banach space (RKBS).

The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) with a well-defined local-intrinsic neural kernel (LiNK).

","questions":"

Same as the weakness.

","strengths":"

Given the results are true and the assumptions are reasonable, this paper would solve the most intriguing problem in the theoretical studies of neural networks: it actually shows that the neural network can achieve the parametric convergence rate -- root of n.

For example, Theorem 3 and Corollary 5. has showed that the R_{N}(\\calF)\\leq 1/\\sqrt{N}.

","weaknesses":"$1b","code_of_conduct":"

Yes

","details_of_ethics_concerns":"

N/A

"}},{"id":"Pa2344SQLN","paper_id":"kOtFuzoA93","replyto":"kOtFuzoA93","number":4,"type":"Official_Review","role":"reviewer","rating":3,"confidence":2,"soundness":3,"contribution":2,"presentation":2,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":3,"summary":"The paper presents a comprehensive theoretical study introducing two novel models—referred to as global and local models—to analyze neural networks using reproducing kernel Banach and Hilbert spaces (RKBS and RKHS). By doing so, it extends kernel-based approaches to capture neural network behaviors beyond the common over-parameterized regime. Key contributions include exact formulations for bounding Rademacher complexity and establishing a link between the neural tangent kernel (NTK) and a proposed local-intrinsic neural kernel (LiNK), providing a framework that may generalize the NTK beyond its typical settings.","questions":"1. To enhance comprehension and showcase the utility of the proposed models, include concrete examples demonstrating how the bounds and feature maps behave in specific cases of network topologies and activations. For instance, evaluating the Rademacher complexity bounds for well-known neural network topologies or comparing LiNK and NTK in practical scenarios would be highly beneficial.\n2. Given the great generality of the paper, it is important for the authors to stress more how the bounds presented here compare with the ones already present in the literature for \"standard\" network structures.\n3. Adding numerical experiments would allow to better illustrate the results presented in the paper.","soundness":3,"strengths":"$1c","confidence":2,"weaknesses":"$1d","contribution":2,"presentation":2,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."]},"created_at":"2024-11-12T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

The paper presents a comprehensive theoretical study introducing two novel models—referred to as global and local models—to analyze neural networks using reproducing kernel Banach and Hilbert spaces (RKBS and RKHS). By doing so, it extends kernel-based approaches to capture neural network behaviors beyond the common over-parameterized regime. Key contributions include exact formulations for bounding Rademacher complexity and establishing a link between the neural tangent kernel (NTK) and a proposed local-intrinsic neural kernel (LiNK), providing a framework that may generalize the NTK beyond its typical settings.

","questions":"

To enhance comprehension and showcase the utility of the proposed models, include concrete examples demonstrating how the bounds and feature maps behave in specific cases of network topologies and activations. For instance, evaluating the Rademacher complexity bounds for well-known neural network topologies or comparing LiNK and NTK in practical scenarios would be highly beneficial.
Given the great generality of the paper, it is important for the authors to stress more how the bounds presented here compare with the ones already present in the literature for \"standard\" network structures.
Adding numerical experiments would allow to better illustrate the results presented in the paper.

","strengths":"$1e","weaknesses":"$1f","code_of_conduct":"

Yes

"}},{"id":"5qrTsyuIrF","paper_id":"kOtFuzoA93","replyto":"kOtFuzoA93","number":1,"type":"Withdrawal","role":"author","rating":null,"confidence":null,"soundness":null,"contribution":null,"presentation":null,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"comment":"Dear reviewers,\n\nWe would like to thank the reviewers helpful assessment of our paper. As the consensus appears to be that the paper needs work to improve clarity, particularly with regard to the mathematical complexity and generality of results, we have chosen to withdraw our submission for further work. We will take suggestions onboard and work to refocus the paper more on practical examples and real-world architectures, connecting back to the known behaviour of e.g. CNNs or transformer networks, before resubmitting at a later date.\n\nRegards, the authors.","withdrawal_confirmation":"I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors."},"created_at":"2024-11-25T00:00:00+00:00","modified_at":"2024-11-25T00:00:00+00:00","replies":[],"contentHtml":{"comment":"

Dear reviewers,

We would like to thank the reviewers helpful assessment of our paper. As the consensus appears to be that the paper needs work to improve clarity, particularly with regard to the mathematical complexity and generality of results, we have chosen to withdraw our submission for further work. We will take suggestions onboard and work to refocus the paper more on practical examples and real-world architectures, connecting back to the known behaviour of e.g. CNNs or transformer networks, before resubmitting at a later date.

Regards, the authors.

","withdrawal_confirmation":"

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.

"}}],"submissionHistory":[]}]}]}]