PaperHub
6.0
/10
Poster4 位审稿人
最低6最高6标准差0.0
6
6
6
6
3.5
置信度
正确性2.8
贡献度2.8
表达2.5
ICLR 2025

PIN: Prolate Spheroidal Wave Function-based Implicit Neural Representations

OpenReviewPDF
提交: 2024-09-25更新: 2025-03-02

摘要

关键词
Prolate Spheroidal Wave FunctionsImplicit Neural RepresentationsMLPs

评审与讨论

审稿意见
6

This paper argues that the Gaussian and Gabor wavelet activation functions cannot achieve the optimal space-frequency trade-off, and thus may not effectively capture distant relevance. Motivated by this, the paper proposes using the Legendre polynomial numerical estimation of the Prolate Spheroidal Wave Function (PSWF) as the activation function, which has been proven in previous work to offer the optimal space-frequency trade-off. The authors demonstrate improvements through experiments on various tasks, including image regression, neural radiance fields, and so on.

优点

  1. The motivation presented in the paper is compelling and provides a solid foundation for the proposed approach.
  2. This paper offers a insightful critique of previous works, identifying limitations in existing activation functions and highlighting the need for improvement.
  3. The experiments encompass a diverse range of tasks, allowing for a more comprehensive and thorough comparison. This diversity enhances the validity of the findings and demonstrates the effectiveness of the proposed method across different domains.

缺点

  1. The exact implementation of the numerical estimation of the Prolate Spheroidal Wave Function lacks clarity in the main text. For instance, the approximation order is not explicitly stated, and the impact of the approximation order on the computational complexity (in terms of space/time consumption) and the model's performance is not clearly explained.
  2. The experiments in tasks, such as Occupancy Field Representation and Neural Radiance Field, appear to be conducted on a subset of the entire dataset, which may undermine the persuasiveness of the results.
  3. The presentation could be improved by refining certain details, such as employing the vector graphics and employing the \autoref{} command for figure citations to ensure consistency and clarity.

问题

  1. Could you clarify the approximation order used for the Prolate Spheroidal Wave Function (PSWF)? Additionally, does this approximation affect the theoretical properties of the PSWF? The paper mentions that PSWF possesses infinite support in space, but it remains unclear whether the finite-order approximation may potentially diminish this property.
  2. I observed that a bandwidth parameter cc governs the frequency. Could you provide guidance on how to select this parameter effectively? Does it influence model performance, and if so, what strategies should users follow to determine an optimal value, especially for tasks in continuous domains like Neural Radiance Fields (NeRF), where there might be no frequency bounds, unlike the image regression scenario?
评论

First and foremost, the authors would like to thank the reviewer for their thoughtful and insightful questions, which have provided us with an excellent opportunity to clarify and strengthen our work.

W1:

We sincerely thank the reviewer for their insightful question. We acknowledge that the numerical implementation has not been detailed in the main text and will make sure to incorporate this in the revised version. We would also like to gently point out that a more comprehensive description is available in Appendix regarding the approximation method we used. Specifically, we utilized cubic spline approximation, which provides a third-order approximation between two successive points while maintaining continuity and differentiability. A simple regression would often fail to retain the exact data point, and the differentiability properties between discrete points, where differentiability properties are indeed needed during the backpropagation. Therefore, the cubic spline approximation, which is of third-order and computationally efficient, serves as an optimal choice, and does the intended task.

W2: We are thankful for the reviewer regarding his question. The following table summarizes the occupancy field results for different INRs.

Table: IoU for Occupancy Fields

ModelAsian DragonArmadilloHappy BuddhaLucy
Siren0.954730.976850.981550.96503
Wire0.937800.956740.956180.99797
Gauss0.996200.989190.995940.99060
ReLU+PE0.983620.992740.998240.98211
PIN0.998370.998240.998950.99917

The following table summarizes NeRF results

Table: NeRF for different objects

ObjectPINWIRESIRENGAUSS
Chair33.59030.32831.61033.233
Hotdog36.34033.45231.26136.224
Mic33.15029.06031.72532.761
Ship28.82225.90227.25228.767
Materials29.65426.18428.02829.407
Ficus27.23522.59224.55626.940

W3: We are thankful for the suggestion by the reviewer. The authors will make sure to improve the presentation of the paper by utilizing vector graphics and the suggested commands in the revised manuscript.

Q1: We utilized a cubic spline approximator, which is order 3. The authors do not believe this would affect any theoretical properties of the PSWFs, as when we get the discretized solution to the governing equation for PSWF, we utilized a cubic spline approximator between every successive points. This cubic spline approximation is necessary because a continuous function, rather than a set of discrete points, is required as an activation function in a neural network. So as far as our knowledge is concerned there is no any potential way of diminishing any theoretical properties. However, if a simpler approximator such as a basic quadratic or cubic polynomial, or even a neural network, were used to approximate the discretized solution, it could potentially compromise some of the properties of PSWFs. These methods do not guarantee passing through the discretized points, which could lead to inaccuracies in differentiation and, consequently, incorrect outcomes during backpropagation.

Q2:

We are grateful for reviewer regarding this question. As explained in Section 6 of the paper, we utilize an explicit control mechanism, where the frequency is now governed by the parameter ω\omega. This makes ω\omega the frequency-controlling variable instead of cc. Now, regarding how to effectively select the parameter ω\omega, one possible approach is to perform a grid search to identify the values that maximize the results. However, as you mentioned, this often leads to suboptimal outcomes when a different dataset is presented, particularly in NeRF and other applications where strong generalization is crucial. (Incidentally, this is the standard approach adopted by many INR baselines.) Instead of relying on a grid search, we use the most straightforward configuration for ω\omega, which is ω=1\omega = 1. We initialize all experiments with this value and make ω\omega a learnable parameter. Consequently, it gets adjusted dynamically based on the loss function of the intended task. This approach benefits from our spline approximation, which allows the activation function parameters to traverse in the loss landscape more effectively, as explained in the Appendix of the paper

评论

We sincerely believe we have addressed the reviewer's questions, including the additional results and detailed explanations for the approximation method. As the deadline for the discussion period approaches (November 26th), we wanted to gently follow up to see if you've had the opportunity to review our response. If there are any further questions or clarifications needed, please let us know. We would be more than happy to provide detailed answers to ensure all concerns are fully addressed.

评论

Thank you for your response, and I sincerely apologize for the delayed reply. The authors have addressed all my questions, and I would like to maintain my original voting score.

评论

We sincerely thank the reviewer for the thoughtful and insightful questions, and these provided us with an excellent opportunity to clarify and further strengthen our work.

审稿意见
6

This paper introduces the use of prolate spheroidal wave functions (PSWF) for implicit neural representation (INR). By employing PSWF as the activation function in INR, the method excels not only in representing images and 3D shapes but also significantly outperforms existing approaches in various vision tasks that rely on INR generalization, including image inpainting, novel view synthesis, edge detection, and image denoising.

优点

  1. Extensive experiments across various vision tasks demonstrate the effectiveness of PSWF.

  2. A comprehensive theoretical analysis highlights the advantages of using PSWF.

缺点

  1. When comparing different INR methods, do you ensure that the same parameters are used? Could you provide the specific parameters for each INR method?

  2. I am curious about the decoding complexity of the PSWF-based INR. Could you provide the decoding speed or time for the different INR methods?

  3. Besides the vision tasks mentioned in the paper, can PSWF also improve performance in image super-resolution?

问题

I noticed that the authors use initial INR methods as their baselines. However, there are several approaches aimed at enhancing the expressivity and generalizability of INR, including improvements in training strategies [1] and input signals [2]. Could PSWF be applied to these methods to further enhance the representation performance of INR?

[1] Improved Implicit Neural Representation with Fourier Reparameterized Training, CVPR 2024

[2] Disorder-invariant implicit neural representation, CVPR 2023.

伦理问题详情

There are no ethics concerns.

评论

First and foremost, the authors would like to thank the reviewer for their thoughtful and insightful questions, which have provided us with an excellent opportunity to clarify and strengthen our work.

W1:

For all the experiments, we have ensured the optimal parameters of the other methods have been used. However, for the proposed method, every experiment has been conducted with the same parameters unlike others. As can be seen, these baselines do require specific fine-tuning to get the results. However, PIN does not require any of those conditions.
The following table summarizes the activation function parameters utilized for each application.

Table: Configurations for WIRE, SIREN, GAUSS, and PIN Across Experiments

MethodConfiguration
WIREImage Representation, Inpainting, Edge Detection (ω=20\omega=20, σ=10\sigma=10), Occupancy Field (ω=20\omega=20, σ=40\sigma=40), Image Denoising (ω=5\omega=5, σ=5\sigma=5), NeRF (ω=40\omega=40, σ=40\sigma=40)
SIRENω=30\omega=30 for all experiments
GAUSSσ=30\sigma=30 for all except NeRF (σ=7.85\sigma=7.85)
PINT=1T=1, ω=1\omega=1, b=0b=0 for all

W2:

We are thankful for the reviewer regarding the question. The following table summarizes the training speed corresponding to different INRs. As can be seen from this table, PIN's runtime is comparable with that of previous INRs.

Table: Convergence Time Across Methods

MethodConvergence Time (min)
PIN6.63
WIRE11.59
SIREN6.29
GAUSS7.47
ReLU+PE4.05

W3:

We are thankful to the reviewer for raising this question. We attempted the image super-resolution task on the "Boy" image from the Set14 dataset [ref] and the "Cameraman" image. The following table summarize the results for the "Boy" and "Cameraman" images in 2nd and 3rd columns respectively.

Table: PSNR for Image Super Resolution

MethodPSNR (dB)PSNR (dB)
PIN20.9723.73
WIRE19.2622.33
SIREN19.5823.17
GAUSS20.2422.70
ReLU+PE18.7521.67

[ref]. Awesome-Super-Resolution/dataset.md at master ·559 ChaofWang/Awesome-Super-Resolution — github.com.560 https : / / github . com / ChaofWang / Awesome -561 Super-Resolution/blob/master/dataset.md.562 [Accessed 20-11-2024]

Q1:

We are thankful for the question, and the suggestions. The authors acknowledge that PIN is compared with initial methods. When considering improvements to [1], we believe there is potential to enhance its methods using PSWFs. However, the core idea of [1] is to employ a fixed Fourier basis and decompose the neural network weight matrix into a product of trainable and fixed Fourier basis matrices. Given this, a key question arises: how can the Fourier basis and PSWFs be effectively combined? A possible approach is to modify [1] by incorporating Fourier basis elements of PSWFs, and using the same weight update rule as in [1].However, without experimental validation, it is difficult to definitively state whether this adaptation would further enhance [1]. When it comes to enhancing [2], which basically looks the problem in another way, more specifically the input; we firmly believe PSWFs can be incorporated into [2], as their proposal mechanism is based on the input. Detailed experiments are indeed needed to verify the claim. To further demonstrate the effectiveness of the proposed method, we compared the proposed approach with recently released FINER, INCODE, FR-INR on the entire Kodak image dataset. The following table summarizes the results. The suggested references, new methods, and additional results will be included in the revised version of the paper.

Table: Comparison of Methods Based on PSNR (dB)

MethodPSNR (dB)
PIN40.17
INCODE34.07
FR-INR37.91
FINER35.87
评论

Many thanks to the author for addressing my concerns. I have no further questions. I keep my current score unchanged.

评论

We sincerely thank the reviewer for the thoughtful and insightful questions, and these provided us with an excellent opportunity to clarify and further strengthen our work.

审稿意见
6

The paper proposes a novel Implicit Neural Representation (INR) utilizing Prolate Spheroidal Wave Functions (PSWFs) to improve performance and generalization in computer vision tasks. By leveraging the optimal space-frequency domain concentration of PSWFs, the proposed method addresses the noise artifacts over smoother areas and poor generalization of existing INRs, demonstrating superior results in image inpainting, novel view synthesis, edge detection, and image denoising.

优点

  1. The paper clearly explains the limitations of current INRs and proposes a novel activation function (PSWFs) for INR to overcome these limitations.
  2. The localization and expressivity properties of PIN have been theortically proven.
  3. The results validate the effectiveness of the proposed INR across various tasks. The authors not only show the good representation ability of PIN for various signals, such as image, Occupancy Filed, and NeRF, but also show that PIN has a very good performance on image Image Inpainting.

缺点

  1. The activation function seems to be rather computationally heavy, making it necessary to report its speed in applications.
  2. As I have reviewed the previous submission of this paper, I notice that the Fig.4 is replaced with a scene with better results. I wonder how these new results are obtained and why these results are not attached in previous submission? Are they obtained by tuning parameters for each scene specifically?
  3. Since the first NeurIPS submission of this paper, several new INRs have been proposed, including FINER (and its extension, FINER++) and H-SIREN. However, these new INRs are not cited or compared within the current manuscript.

问题

See Weakness.

评论

First and foremost, the authors sincerely thank the reviewer for their thoughtful and insightful questions. Furthermore, we greatly appreciate your previous comments, which have been instrumental in refining our work and providing us with an excellent opportunity to further clarify and strengthen it.

W1:

The effective training times for the proposed method is as follows. As can be seen from the table, PIN's runtime is comparable with that of previous INRs.

Table: Convergence Time Across Methods

MethodConvergence Time (min)
PIN6.63
WIRE11.59
SIREN6.29
GAUSS7.47
ReLU+PE4.05

W2:

We thank the reviewer for their thoughtful question and careful review. We would like to clarify that the improved results were not obtained by fine-tuning our proposed method on specific datasets or scenes. Unlike existing baselines, our method uses the same configuration across all experiments, regardless of the data modality.

In the previous submission, we reported results for a 3D dataset, but it did not significantly contrast our results with other INRs. During the rebuttal for the previous submission, we noted that PIN performs well on all 3D datasets except the one used in that submission. Therefore, for this submission, we replaced the previously reported results with those from different 3D datasets to better showcase the performance gap of our method.

W3:

We are thankful for the reviewer for the question, and the suggestions. The authors acknowledge that we compared PIN with initial methods. However, to demonstrate the effectiveness of the proposed method, we compared the proposed approach with recently released FINER, INCODE, FR-INR on the entire Kodak image dataset. The following table summarizes the results. Further, the authors will make sure to cite the suggested and latest methods in the revised version of the manuscript.

Table: Comparison of Methods Based on PSNR (dB)

MethodPSNR (dB)
PIN40.17
INCODE34.07
FR-INR37.91
FINER35.87
评论

Could you please provide a comparison about the network size in the Tab.2 of your response?

评论

We are thankful for the reviewer regarding the question. For a fair comparison, we utilized an MLP with 300 hidden neurons and 5 layers for all methods.

评论

Thank you. All of my concerns have been addressed. I reserve my original voting score.

评论

We sincerely thank the reviewer for the thoughtful and insightful questions, and these provided us with an excellent opportunity to clarify and further strengthen our work.

审稿意见
6

The paper proposes Prolate Spheroidal Wave Function-based Implicit Neural Representations (PIN), an effective representation inspired by Prolate Spheroidal Wave Functions (PSWFs). The proposed PIN outperforms other INR baselines in various reconstruction tasks.

优点

  • The proposed PIN outperforms other INR baselines in various reconstruction tasks, including image reconstruction, image inpainting, and occupancy field and NeRF.
  • Detailed ablation studies are conducted.

缺点

  • The metrics provided in the paper are only evaluated on a few individual examples. I appreciate the evaluation on the Kodak Lossless True Color Image Dataset. However, it only contains 24 images. It is ideal to perform larger-scale evaluations, e.g. calculating the mean PSNR for a thousand images. For example, one may consider using the DIV2K dataset [1].

[1] Agustsson E, Timofte R. Ntire 2017 challenge on single image super-resolution: Dataset and study[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017: 126-135.

问题

  • I am interested in the runtime of the proposed PIN. Will the new formulation hurt the speed of INRs? Is there a trade-off between quality and runtime?
  • How is the experiment conducted for the Image Inpainting task? Are the missing pixels marked as black and fed into the INR? Or are those coordinates masked out and not used during training? Are the pixel masks provided to the INR?
评论

First and foremost, the authors would like to thank the reviewer for their thoughtful and insightful questions, which have provided us with an excellent opportunity to clarify and strengthen our work.

W1: We sincerely thank the reviewer for emphasizing the importance of larger-scale evaluations and for recognizing our use of the Kodak Lossless True Color Image Dataset. Evaluating INRs necessitates retraining the model for each image, making large-scale assessments computationally demanding. While many major baselines in INR research evaluate their methods on only two or three examples, we are, to the best of our knowledge, the first to conduct a comprehensive evaluation across the entire Kodak dataset. In addition, we extended our analysis to the DIV2K dataset as suggested, even though with some limitations. Due to time and resource constraints, we evaluated PIN and other state-of-the-art (SOTA) methods on a randomly selected subset of 30 images from DIV2K. The average PSNR values for this subset are presented in the table1 below, and as shown, PIN outperforms other methods on this subset as well. Given that these images were selected randomly, we believe this performance pattern is representative of PIN’s general superiority and would likely extend to the entire DIV2K dataset. Combining these results, we report the overall average PSNR metrics for 54 images (24 from Kodak and 30 from DIV2K) in table2. We deeply value the reviewer's suggestion regarding larger-scale evaluations and are actively considering this for future work. Specifically, leveraging meta-learning or other training efficiency mechanisms could make such evaluations more feasible. Thank you again for highlighting this aspect and for your constructive feedback. Further, we will incorporate these results to the revised manuscript.

Table 1: PSNR Variation across DIV2K dataset

MethodPSNR (dB)
PIN41.46
WIRE30.12
SIREN38.77
GAUSS28.13
ReLU+PE26.74

Table 2: PSNR Variation Across DIV2K and Kodak Datasets

MethodPSNR (dB) Avg of (DIV2K + KODAK)
PIN40.88
WIRE31.61
SIREN38.05
GAUSS27.19
ReLU+PE27.48

Q1:

We are thankful for the reviewer's question. We computed run-times for the convergence. The following table provides the run-times. As can be seen from the table, PIN's runtime is comparable with that of previous INRs.

Table: Convergence Time Across Methods

MethodConvergence Time (min)
PIN6.63
WIRE11.59
SIREN6.29
GAUSS7.47
ReLU+PE4.05

Q2:

We are thankful for the reviewer regarding the question. When it comes to INRs, they are based on the coordinates of the signal that is being provided to it. So for training the inpainting task, the coordinates corresponding to the inpainted regions are masked out, and during the testing time the entire coordinates of the image is provided. This will effectively asses the method's generalization abilities for unseen coordinates.

评论

We sincerely believe we have addressed the reviewer's questions, including a thorough evaluation on the DIV2K dataset and detailed explanations for the inpainting task. As the deadline for the discussion period approaches (November 26th), we wanted to gently follow up to see if you've had the opportunity to review our response. If there are any further questions or clarifications needed, please let us know. We would be more than happy to provide detailed answers to ensure all concerns are fully addressed.

评论

Dear Reviewer 8yGC,

We hope this message finds you well. As the discussion period is set to conclude tomorrow, and noting that the other reviewers have already responded, we wanted to kindly follow up to check if you have had the chance to review our response. If there are any additional questions or areas requiring clarification, please let us know. We would be happy to provide detailed answers to ensure all your concerns are fully addressed.

Thank you for your time and consideration.

评论

Dear authors,

Thanks for the detailed response and the additional evaluations. My concerns are well resolved and I am happy to raise the score.

评论

We sincerely thank the reviewer for the thoughtful and insightful questions, and these provided us with an excellent opportunity to clarify and further strengthen our work.

评论

Dear Reviewers,

Thanks for your contributions in reviewing this paper.

As the author-reviewer discussion deadline is approaching, please could you take a look at the authors' rebuttal (if not yet) and see if it addressed your concerns or if you have any further questions. Please feel free to start a discussion.

Thanks,

AC

AC 元评审

In this paper, the authors presented a new activation function for implicit neural representations (INRs) -- Prolate Spheroidal Wave function (PSWF). Motivated by the challenges faced by existing INRs and their struggle to generalise to unseen coordinates, the authors introduced the PSWF-based INRs, termed PIN, leveraging the optimal space-frequency domain concentration of PSWFs. Experimental evaluations over a few different vision tasks (including image inpainting, novel view synthesis, edge detection, and image denoising) show the effectiveness of the proposed PIN. The strengths of this paper include:

  • The proposed method was well-motivated, with a clear and solid foundation for the approach.
  • The paper did a good job in analysing and explaining the limitations of existing INRs, which could provide insights for following research in this direction.
  • The proposed method was backed up with a comprehensive theoretical analysis.
  • An extensive experimental analysis covering several vision tasks, showing the effectiveness and validity of the proposed method.

The weaknesses of this paper include:

  • The rationale for employing INRs to address low-level problems (as raised by a reviewer during 2nd phase discussion, details see below). Forward training-based models typically offer better generalisation capabilities.
  • Potential issues with the infinitely differentiable property of cubic spline that was used for the PSWF in this paper (also confirmed in the 2nd phase).
  • Insufficient evaluations, computational complexity concern, issues with the results, and missing comparison to recent related works.

Most of the concerns/weaknesses were well addressed in the rebuttal phase and the reviewers also acknowledged that. Overall, this paper presented an interesting idea with insights into INRs, and the AC think this would be of interest to a group of audience in ICLR. As a result, the AC is happy to recommend an Accept, but the authors are highly suggested to incorporate the further provided evidence and clarifications during the discussions to the final version, and please also merge the Appendix (currently in a separate file), which has some essential analysis and results, to the end of the main paper.

审稿人讨论附加意见

This paper received review comments from four expert reviewers. During the rebuttal period, there was a heated discussion between the authors and reviewers. with the additionally provided results and evidence by the authors, most of the concerns raised by the reviewers were well addressed, and two reviewers raised their ratings, ending up with 4 borderline Accept. In the AC-reviewers discussion phase, reviewer BT7N further summarised the strengths and weaknesses of this paper, including the concerns about the rationale for applying INRs to low-level problems and the differentiable property of cubic spline. The AC agreed with them, while found them not major issues. However, the authors are suggested to carefully consider these points and add discussions in their final version.

Although this paper finally received a borderline rating, after carefully checking the paper and the discussions, the AC found this paper could provide insights to the community, and a group of audience in ICLR can benefit from reading it, as a result, worth being presented at ICLR. These led to the final decision of this paper.

最终决定

Accept (Poster)