Geometric Algebra Planes: Convex Implicit Neural Volumes
摘要
评审与讨论
This paper provides a generic strategy for grid-based representation of volumetric features, for use in applications like NeRF where the task involves computing a spatial quantity in 3D. The basic idea, inspired by geometric algebra, is to include features that are constant along planes/lines in addition to a feature per grid cell. This approach in some sense generalizes past approaches, as articulated in the appendix.
Although the title/discussion claims a link to geometric algebra, it seems the connection here is somewhat superficial. As far as I can tell, the paper does not make use of any algebra from geometric algebra (e.g., assorted products and geometric operations) and could be described more simply without this language. It seems like the basic idea here is simply to drop indices from a grid-based volume representation --- e.g., rather than having all features be f_ijk we can have some features with fewer indices f_ij/f_jk/f_ik (“plane feature grids” in parlance of the paper) and f_i/f_jf_k (“line feature grids”). Invoking geometric algebra adds complexity to the text without benefit.
I appreciated the theoretical analysis in the paper, but the results do not seem reflective of the typical use cases. In particular the theorems require piecewise-constant interpolation between grid elements as well as a convex objective, neither of which are common practice and are quite restrictive assumptions (indeed many of the experiments in section 5 are unable to use convex objective functions).
I appreciate the ‘framework’ for understanding assorted approaches to parameterizing volumes of features, and appendix A.1 helped me translate the discussion in the paper to something more concrete. Given the zoo of prior models articulated in past work (see page 15), however, it seems the current paper is a variation on a fairly common theme in the literature.
The experiments here begin to validate that the proposed strategy is relatively effective, although they are far from comprehensive (just a few examples drawn from standard datasets in computer vision).
For the reasons above, I am unable to suggest publication of the work at this time. To improve the paper, I might suggest more deeply applying geometric algebra or removing it altogether in favor of simpler language, ensuring that the model and assorted objective functions can reasonably be implemented/reproduced from the text of the paper alone (or including code), and testing the proposed representations more thoroughly.
The paper claims to be the “first class of implicit neural volume representations that can be trained by convex optimization.” I might argue that the approach outlined here is far from a typical “neural” representation, to the point that one could equally argue the same point for a grid or kernel method (if I understand sec 3.1 properly, this method relies on a grid rather than a generic neural parameterization).
Page 2: “At the same time, any GA-Planes model can be trained by nonconvex optimization towards any objective.” --- This claim seems empty relative to any other alternative in the literature. I would suggest removing this sentence altogether.
The introduction to geometric algebra and its notation on page 3 is rather terse. I doubt this is a tool that a typical ICLR reader is likely to know, even if they’re working on 3D problems. I would suggest making a more intuitive (but still compact) introduction to the relevant notions in the paper.
The notation at the beginning of section 3.1 was hard for me to conceptualize. Perhaps an illustration would help. Has the term “feature grid” been defined anywhere? Concretely, how are the variables g_{i(jk)} stored/represented? What is a “basis element” in this context? Alternatively, it might be useful to simply give an equation that maps a 3D coordinate to the value of its feature as predicted by the proposed architecture.
In eq(4), what is the order of operations relating * to \odot?
“Full implementation details are available in our code, which will be released upon publication” --- in this case, I can’t actually see the implementation details during the review process.
Should I understand (3) as having some features per x/y/z coordinate independently, some features per xz/yz/xy pair, and and some features per xyz grid point? So it’s just a grid with some features repeated along different axes?
Theorem 1 and 2 rely on nearest-neighbor interpolation, which seems to contradict how people would typically implement such models. What goes wrong in the general case? Do the experiments show examples with this interpolant?
I appreciate the theoretical analysis in Theorems 1-3. The statements of these theorems are quite long and can definitely be streamlined (for example, the statement of Theorem 2 takes half a page alone). But, I appreciated that section 4.2 gave some intuitive discussion of the impact/interpretation of these results in terms of upsampling.
In Figure 1, why not use the GA-Planes model instead of something “similar to the structure” of GA-Planes?
The NeRF experiments here seem to involve only a single ‘lego’ scene. This seems below the standard of experimentation in the field. I would suggest experimenting on several examples to make sure we can be confident in the reported results.
In general, I would have a hard time translating the high-level approaches described in this paper into equations I could implement. For example, the experiments described in sec5 lack unambiguous descriptions of the architectures and loss functions (the sections contain only text without equations, architecture parameters, and so on), and I would struggle to articulate the difference between the convex, semiconvex, and nonconvex cases mentioned in Tables 1-2.
优点
See above.
缺点
See above.
问题
See above.
伦理问题详情
None
Thanks for the thoughtful review and helpful comments. We have revised our paper with extensions to our theoretical and experimental results; changes are summarized in the joint comment to all reviewers. We respond to reviewer-specific comments and questions below.
We have edited the discussion of geometric algebra in the revised paper, to clarify our use of the algebra in geometric algebra, and provide a more intuitive introduction since we agree many readers may be unfamiliar with geometric algebra (GA). Thanks for the suggestion. In particular, we restrict our (nonconvex) GA-Planes model to use only products of features that are valid in GA. Therefore, our modeling choice heavily relies on the geometric product in GA. In three-dimensional GA, the geometric product is used to construct higher grade objects, such as bivectors and trivectors. As an example consider the canonical basis vectors . The geometric product is a bivector, i.e., an area form, representing the plane spanned by and . The geometric product of and is the trivector representing the volume form. On the other hand, the geometric product () and () is since due to the fact that all vectors squared result in Euclidean norm squared.
In our model we allow all the features that obey GA, e.g., we allow line plane features but not planeplane, because in GA line plane = volume but plane plane is not. For comparison, the K-planes method uses features like plane plane plane not obeying GA. The TensoRF model does not use all the GA based features, but a subset of them only. Specifically, TensoRF uses either line line line or line plane (the main method) but not both of them at once, and also lacks the volume grid feature.
Although the theorem statements in the initial submission specify use of piecewise constant (nearest neighbor) interpolation, this is actually not required for the theorem and has been removed in the revision. We have also streamlined the presentation of the theorems, and included a more thorough discussion of interpolation in the revised appendix A.2 (before the proofs). In the theorem statements, the outer product of low-rank components U and V actually has the same resolution as the target matrix M, so there is no interpolation needed for these parameters. We have also experimentally verified that, in the context of 2D image fitting (matching our theorems), our models perform very similarly whether they use nearest neighbor (right) or more realistically linear/bilinear interpolation (left) (the figure is in revised appendix A.3 as well as here: https://imgur.com/a/523NGVa). In our 3D experiments, we use linear/bilinear/trilinear interpolation of features.
In our original submission, we included theorems using linear decoders in the main text, and analogous results with convex and nonconvex MLP decoders in the appendix (due to space limitations, since the linear versions are a bit simpler). However, we agree with the reviewer that the results with MLP decoders are closer to practice and thus more interesting, and we have accordingly moved them to the main paper in the revision. Table 1 in the revised paper summarizes key takeaways from these theoretical results, comparing the maximum rank of a 2D model depending on how it combines features (addition, concatenation, or multiplication) and how it decodes features (linear, convex MLP, or nonconvex MLP). These results are then validated by the 2D image fitting experiments in the revised paper (and the link https://imgur.com/a/523NGVa). We believe this rearrangement of the MLP decoder theory and accompanying experimental validation greatly improves our paper; thanks for the suggestion.
The review states “many of the experiments in section 5 are unable to use convex objective functions”...although it is true that our radiance field experiments are inherently nonconvex (due to the nonlinear forward model), all other experiments in section 5 are using convex objectives (i.e. 2 out of 3 of our experimental settings are based on convex optimization; we include the nonconvex radiance field experiment to show versatility of GA-Planes).
It's true that there are a zoo of similar models proposed; indeed part of the contribution of our work is a framework to unify many of these models. However, we highlight that none of the other proposed models come with any theoretical connections to matrix completion, geometric algebra, or convex optimization. This theoretical component is a key aspect of our contribution (in addition to matching or exceeding the empirical performance of these other models).
Our revised paper includes experiments in the radiance field setting on all 8 scenes from the NeRF dataset, rather than only the Lego scene (average results are in the revised main paper and here https://imgur.com/a/O4NQrGY, and per-scene results and renderings are in the revised appendix and at the links https://imgur.com/a/psnr-ssim-lpips-pareto-optimal-curves-of-compared-models-bsXUXIL, https://imgur.com/a/MQTua7H). Results are similar when we consider all scenes compared to our initial results on only the Lego scene, suggesting that performance is not overfit to the specific scene we originally tested. In total, our paper includes experiments on 3 distinct tasks plus a toy 2D setting to validate the theory, as well as a theoretical contribution; we emphasize that our contribution is neither purely empirical nor purely relevant to radiance fields. Nonetheless, we do believe that extending our radiance field experiments to additional scenes strengthens our revised paper, and we appreciate the valuable suggestion.
The review states “if I understand sec 3.1 properly, this method relies on a grid rather than a generic neural parameterization”. This understanding appears to miss a critical component of our model, which is the decoder, typically an MLP (fully-connected neural network). Indeed all of our experiments use an MLP decoder, either convex or nonconvex (standard). Nearly all common neural parameterizations are structured as some representation of position (grid, hash table, Fourier embedding, etc.) followed by an MLP decoder, and our GA-planes family follows the same structure. All the models in the "zoo" in the appendix are in this same framework. We have added a new figure (figure 1 in the revised manuscript, also available at https://imgur.com/a/FVSIhW4) to clarify our GA-Planes architectures for convex, semiconvex, and nonconvex models, all of which include an MLP decoder component.
With that sentence, we intended to specify that GA-Planes enables high-performing convex formulations, but is not constrained to use convex optimization (whereas most of the prior models are constrained to only use nonconvex optimization). We have nonetheless removed this sentence in the revision.
Our revised paper includes a new figure (figure 1 in the revision, also available at https://imgur.com/a/FVSIhW4) to illustrate the nonconvex, semiconvex, and convex GA-Planes models we use in our experiments, including our use of multiresolution grids (“basis elements” refer to geometric algebra basis elements, which are represented as grids/tensors in the model). We have also edited the equations and the order of descriptions in section 3 for greater clarity. We have also added a table in the appendix that details the specific model configurations we used in our experiments, since there are indeed interesting design choices in terms of allocating limited parameters among the different feature grids. Thanks for the suggestions.
Multiplication () happens before concatenation (). We have added parenthesis in the revision to make this clear.
The reviewer’s understanding is partially correct, but not entirely. There is no repetition of features along dimensions, since that would waste valuable memory. Rather, various subsets of the same 3D coordinates are used to interpolate features in different feature grids.
Figure 1 is intended to complement the theorems, which are in the 2D setting rather than GA-Planes which is for 3D. In these 2D experiments we can solve for the optimal GA-Planes style 2D representation by using singular value decomposition, which is not possible in 3D. The purpose of this figure is to validate the concept of a low-rank plus low-resolution representation, by showing that it can outperform the classic low-rank plus sparse representation popular in matrix factorization. However, in the revised paper (and linked above in our response regarding piecewise constant interpolation) we have added 2D image fitting experiments that do exactly match the optimization setting of GA-Planes and the 2D theorems.
The experiments in section 5 use exactly the three model versions described in section 3; we have added a “method” figure to make these models more intuitive (https://imgur.com/a/FVSIhW4). We added feature combination methods used (addition or concatenation) on the tables to clarify the models further. The nonconvex MLP decoder version is standard, and the semiconvex and convex versions are described by the equations in sections 3.2 and 3.3, respectively. The radiance field experiments use the standard NeRF objective function, while the other experiments use direct supervision to minimize mean squared error in 3D. A table of model hyperparameters has been added to the appendix of the revised paper.
We hope our responses above and our revised paper address your comments; please let us know if you have any further questions. Thanks so much for your time and input!
We have also added our code as a zip file under the supplementary materials.
Thank you to the authors for the comprehensive response and for making revisions to the paper. I have raised my score a point to reflect the changes they made in the clarity of the work and in particular the more comprehensive experiments.
That said, I remain unconvinced about the need to invoke complex mathematical language here for a simple concept (dropping indices in a tensor input, which appears in past work). The connection to geometric algebra seems weak at best---even in the author response---and while I am confident I understand the paper's technical content, I would consider this to be an example of "unnecessary mathiness" relative to the contribution. There's no geometric algebra here. This is particularly the case because the experiments don't demonstrate that the proposed "placement" of features motivated by geometric algebra are somehow the 'magic sauce' that makes these models work dramatically better. The connections to low rank factorization are interesting properties of the proposed model, but I'm not sure how to use this theoretical insight to practical benefit.
Thanks so much for taking a look at our reply and paper revisions. It seems the reviewer’s two remaining concerns are regarding “unnecessary mathiness” with respect to geometric algebra, and uncertainty over the usefulness of the theoretical results. We are happy to address these two points.
Geometric algebra: We agree that our paper does not make heavy mathematical use of geometric algebra, but rather relies on it for inspiration behind our choice of which grid features to include in our model, and how to combine them. We believe that this usage of geometric algebra as inspiration (rather than intensive mathematical usage) is presented accurately in the revised paper, but we are open to further edits if there are specific phrases the reviewer thinks could be more clear.
However, we respectfully disagree with the claim that “the experiments don't demonstrate that the proposed "placement" of features motivated by geometric algebra are somehow the 'magic sauce' that makes these models work dramatically better.” Rather, in our radiance field experiments, the only difference between GA-Planes and many of the other methods (K-Planes without proposal sampling, TensoRF, and the three ablations of GA-Planes) is in the inclusion of all the GA-motivated feature grids (rather than only a subset of them), and the choice to combine them in groups to produce a trivector (i.e. respecting the algebra of geometric algebra, which e.g. K-Planes does not). Since GA-Planes performs better than all these otherwise similar models, the experiment does support the hypothesis that even this inspiration-level usage of geometric algebra is the “magic sauce” that makes GA-Planes work best.
Practical usefulness of the theoretical results: Thanks for asking about how to use the theoretical connections to low-rank matrix factorization for practical benefit. We believe this aspect of our contribution is substantially stronger in our revised paper. In particular, we provide two new figures/tables that shed light on how our theory can be used to inform practical aspects of model selection. In Table 1 in the revision (https://imgur.com/a/T2S7fVR), we summarize our 2D theoretical results in terms of the maximum attainable rank of a representation, as a function of the operation used to combine features and the type of decoder. These upper bounds on model expressivity inform practitioners that (1) If the decoder is linear (or more generally, simple), features should be combined by multiplication rather than addition or concatenation, and (2) If an MLP decoder is used, the rank of the representation will be limited by the resolution of the grid features rather than by their feature dimension. Indeed this second theoretical insight is practically useful in finding the set of parameter allocations that work best in our radiance field experiments, in which we prioritize maintaining some features with high resolution even at the cost of reducing the feature dimension (these model hyperparameters are in Table 6 in the revised appendix).
These upper bounds in Table 1 are complemented by 2D image fitting experiments in Figure 5 (in the revised appendix, and at https://imgur.com/uSInNRK), which directly inform practice in 3 ways beyond the insights from Table 1: (1) combining features by multiplication rather than addition is beneficial even when using an MLP decoder, (2) standard nonconvex MLP decoder outperforms convex MLP decoder, even though both representations have the same maximum rank, and (3) continuous interpolation of features (left figure) outperforms nearest neighbor interpolation of features (right figure).
Please let us know if we have addressed your concerns or if you have any further questions. Thank you again for your thoughtful engagement with our work.
Dear reviewer, the time window for reviewers to ask questions is closing tonight. Please let us know if we have addressed your concerns or if you have any further questions. Thank you for your thoughtful engagement with our work.
This work introduces a family of volumetric representations combining ideas from geometric algebra and convex optimization. First, features are learned on a predefined tensor basis represented by line, plane and volume elements. Using (bi/tri)linear interpolation and different elementary operations (concatenation, addition, multiplication), these are combined into multivectors. Second, these are then processed by a decoder MLP. This construction generalizes many previous volumetric representations, both classical and neural. Under additional restrictions on the multivectors, the decoder, and the objective, the entire optimization can be made convex or semi-convex.
After the theoretical discussion, different realizations of these GA-Planes models are tested empirically on three different tasks - radiance fields, 3D segmentation, and video segmentation. The results indicate good performance, especially for smaller models.
优点
- The proposed representation family helps put many previous and potential future methods in a unifying framework. This is also done in a mathematically precise manner. I believe such efforts are important in the fast-paced ML research.
- Similarly, the discussed trade-off between model size, expressivity, and optimizability makes for a good and easy to follow story. The paper is well-written overall.
- The empirical results suggest good potential of the method.
缺点
- While different types of experiments are considered, each contains only a single problem instance (lego scene and skateboarding video). This makes it hard to draw strong conclusions about the empirical performance.
- While the method generalizes neatly and offers a lot of flexibility, it is not entirely clear how to make these design choices. There is a good discussion on the elementary operations and also on lines vs. planes vs. volumes, but the multi-resolution aspect of each seems to be overlooked, simply stating "we use multi-resolution copies [..]".
- Limitations are not discussed.
- Even though there is great value in considering the representation of a single signal, the authors overlook the opportunity to also discuss the generative setting, which would reach a broader audience.
问题
- Could you please explain how you chose the different resolutions as mentioned in W2?
- Similarly, in the last sentence of section 5.1 you make it sound like the model can allocate parameters to different dimensional features adaptively, i.e. automatically. However, my understanding is that you need to predefine all the feature grid sizes. Could you please clarify this and potentially adjust the wording?
- To compare the (non-/semi-) convex formulations, you report the final performance metrics, and in addition you discuss the convergence guarantees. Are there other relevant practical aspects, in particular the number of training iterations or time?
- Can you comment on using GA-Planes in the generative setting?
Suggestions
- Please include a discussion of the limitations.
- The presentation could potentially benefit from some schematic representations of the method and especially GA concepts.
- During the submission process, please use the appropriate template which includes the line numbers. This is helpful for the review process.
I thank the authors for their considerate reply and for addressing my concerns and suggestions. I believe the extended experiments and the improved presentation further strengthen the paper.
I would additionally suggest the authors reconsider the color scheme of the plots. This spectrum of blue-purple colors is problematic with certain color blindness and grayscale printing. Please consult publishing practices for formatting figures.
Thanks for the thoughtful review and helpful comments. We have revised our paper with extensions to our theoretical and experimental results; changes are summarized in the joint comment to all reviewers. We respond to reviewer-specific comments and questions below.
Our revised paper includes experiments in the radiance field setting on all 8 scenes from the NeRF dataset, rather than only the Lego scene (average results are in the revised main paper and at https://imgur.com/a/O4NQrGY , and per-scene results and renderings are in the revised appendix as well as at https://imgur.com/a/psnr-ssim-lpips-pareto-optimal-curves-of-compared-models-bsXUXIL, https://imgur.com/a/MQTua7H). Results are similar when we consider all scenes compared to our initial results on only the Lego scene, suggesting that performance is not overfit to the specific scene we originally tested. Thanks for the suggestion; including more complete experiments certainly strengthens our revised paper.
Our revised paper includes a new figure (figure 1 in the revision) to illustrate the nonconvex, semiconvex, and convex GA-Planes models we use in our experiments, including our use of multiresolution grids (also accessible here: https://imgur.com/a/FVSIhW4). We have also added a table in the appendix that details the specific model configurations we used in our experiments, since there are indeed interesting design choices in terms of allocating limited parameters among the different feature grids (appendix A.8). We found these specific models via a grid search on the Lego scene, from which we selected the pareto-optimal configurations with respect to PSNR and model size. We have clarified this procedure in the revised paper (i.e. amended the wording to clarify that the model family can be adapted to different settings, but this adaptation is not automatic).
We have added a discussion of limitations to the revised paper; thanks for the suggestion. In particular, currently we have shown GA-Planes only for 3D (or smaller) representations, not for higher dimensions (e.g. dynamic volumes), and currently we demonstrate GA-Planes for reconstruction rather than also generation. We expect both of these extensions to be possible, but beyond the scope of the present paper. We also expect that our convex GA-Planes model may benefit from optimization with specialized convex solvers, though we do not explore this in the present paper.
In our experiments we don’t experience major difficulties in nonconvex training, though we observe its sensitivity to initialization increases as model size decreases. We added a new figure (https://imgur.com/a/6ZiD1rF) that compares IOU curves of convex, semiconvex and nonconvex GA-Planes models with very small size, initialized with 3 different seeds. While (semi)convex models display robustness across initializations, the IOU curve of the nonconvex model suffers from large variations. In particular, for one seed the nonconvex model completely fails to fit the video. Importantly, although in our experiments we use the same first-order optimization algorithm for all models, our convex GA-Planes formulation is compatible with any convex solver (e.g. cvxpy), allowing it to inherit decades of research in efficient convex optimization algorithms (which would e.g. avoid the need to tune algorithm hyperparameters). We appreciate the reviewer for highlighting this important point, which we have also made more prominent in the revision.
We hope our responses above and our revised paper address your comments; please let us know if you have any further questions. Thanks so much for your time and input!
Thank you for your feedback! We indeed have changed the color scheme, and tested it with grayscale to ensure visual clarity. The updated figure for the average results for the Blender dataset can be seen here: https://imgur.com/a/sPmRWkH. The per-scene plots are updated in the revised paper.
This paper proposes GA-Planes, a new family of models for representing volumes. GA-Planes include features stored in a tensor basis and a neural decoder. This setup is flexible, and can be adapted to convex, semiconvex, or nonconvex training.
For 2D setting with a linear decoder, the authors show that GA-Planes can be reduced to a low-rank plus low-resolution matrix factorization. For 3D tasks, GA-Planes are tested on radiance field reconstruction, 3D segmentation, and video segmentation, showing competitive results in terms of expressiveness, model size, and ease of optimization.
优点
The motivation of the work is clear to me. The idea of GA-Planes seems interesting and is indeed quite flexible. The numerical experiments show competitiveness of the proposed method, particularly in 3D applications.
缺点
- My main concern of this paper is on the theoretical part. I am not convinced that the authors have indeed proved the equivalence to matrix completion in 2D. See questions for more parts that are unclear to me.
- Convexity/Semiconvexity seems to be of interest to the author, and is mentioned throughout the paper. It is exciting to see that GA-Planes can be adapted to convex optimization-based training. However, I am not sure if the numerics have provided any evident that this might be good to the GA-Planes family.
- The presentation can be improved. For example, I am not quite sure what other dots (those not connected) in Figure 1a means.
问题
- In the proof provided, I can see that the GA-Planes model can indeed be written in the form of (15), but I don't see how the training problem is equivalent to problem (9). Is this correct for more general 'true' representation y (e.g. higher rank)? Can we relate the optimal solutions of the two problems?
- In most experiments, the non-convex GA-Planes give the best result. Are there any advantage of using convex GA-Planes? For example, the authors mention optimizing globally regardless of initialization. Have you encountered any difficulties while adopting the non-convex setting, such as the choice of initialization?
Thanks for the thoughtful review and helpful comments. We have revised our paper with extensions to our theoretical and experimental results; changes are summarized in the joint comment to all reviewers. We respond to reviewer-specific comments and questions below.
Our theorems implicitly assume that the GA-Planes objective is also to minimize the Frobenius norm of the error. We appreciate the clarification question, and we have made this assumption explicit in the revised paper. This assumption is valid (albeit in 3D) for our experiments on 3D and video segmentation, for which we use direct supervision to minimize mean squared error (space carving supervision in 3D segmentation). However, our radiance field experiments use indirect pixel-level supervision so the setting there is not identical to the theorems.
Although in our experiments we use the same first-order optimization algorithm for all models, our convex GA-Planes formulation is compatible with any convex solver (e.g. cvxpy), allowing it to inherit decades of research in efficient convex optimization algorithms (which would e.g. avoid the need to tune algorithm hyperparameters). We appreciate the reviewer for highlighting this important point, which we have also made more prominent in the revision. Thank you for bringing up the question about sensitivity to initialization. In our main experiments, we didn’t encounter such an issue as the model sizes were large enough for the nonconvex model to be able to generalize and be trained reliably. However, we trained semiconvex, nonconvex and convex GA-Planes models with a very small number of parameters for the video segmentation task, to show robustness of our (semi)convex models. We added a figure in the revised appendix which compares test frame IOU score curves of nonconvex, semiconvex and convex formulations (all using concatenated features) that are initialized with three different seeds (we keep the randomness in the gating weights constant). This preliminary experiment shows that the variation in the IOU scores of (semi)convex models is minimal, while the nonconvex model suffers (it completely fails to fit the video in one seed). Figure link: https://imgur.com/a/6ZiD1rF
The curves show pareto-optimal model configurations; the disconnected dots are other (suboptimal) model configurations that we included for completeness. We have removed these suboptimal models from the figure in the revision, since they are not really necessary and may not be clear to other readers. Thanks for the feedback!
We hope our responses above and our revised paper address your comments; please let us know if you have any further questions. Thanks so much for your time and input!
Thank you for your detailed response and the effort you have put into revising the paper. I appreciate the improvements in clarity and the extensions to the theoretical and experimental results.
Theoretical Results: Thank you for the clarification. However, I believe the theorem mainly shows that the representation can indeed be written in the form . Introducing the concept of low-rank completion seems unnecessary and may overcomplicate the explanation. Although the attempt to draw connection with low-rank completion is indeed interesting, I am not sure the practical significance of the result.
Experimental Results on Robustness: The additional experiments comparing the robustness of (semi-)convex and non-convex formulations are a valuable addition. The results provide evidence supporting the claim that (semi-)convex formulations improve robustness to initialization. However, the impact of this result could be even stronger if more random seeds were tested to further confirm the observed trends.
Thanks so much for taking a look at our reply and paper revisions. It seems the reviewer’s remaining concerns/questions are regarding (1) practical significance of the theoretical connection to low-rank matrix completion, and (2) demonstrating the initialization robustness of the (semi)convex formulations over more random seeds. We are happy to address these points.
(1) Thanks for asking about how to use the theoretical connections to low-rank matrix factorization for practical benefit. We believe this aspect of our contribution is substantially stronger in our revised paper. In particular, we provide two new figures/tables that shed light on how our theory can be used to inform practical aspects of model selection. In Table 1 (https://imgur.com/a/T2S7fVR) in the revision, we summarize our 2D theoretical results in terms of the maximum attainable rank of a representation, as a function of the operation used to combine features and the type of decoder. These upper bounds on model expressivity inform practitioners that (1) If the decoder is linear (or more generally, simple), features should be combined by multiplication rather than addition or concatenation, and (2) If an MLP decoder is used, the rank of the representation will be limited by the resolution of the grid features rather than by their feature dimension. Indeed this second theoretical insight is practically useful in finding the set of parameter allocations that work best in our radiance field experiments, in which we prioritize maintaining some features with high resolution even at the cost of reducing the feature dimension (these model hyperparameters are in Table 6 in the revised appendix). These upper bounds in Table 1 are complemented by 2D image fitting experiments in Figure 5 (https://imgur.com/uSInNRK) (in the revised appendix), which directly inform practice in 3 ways beyond the insights from Table 1: (1) combining features by multiplication rather than addition is beneficial even when using an MLP decoder, (2) standard nonconvex MLP decoder outperforms convex MLP decoder, even though both representations have the same maximum rank, and (3) continuous interpolation of features (left figure) outperforms nearest neighbor interpolation of features (right figure).
(2) We have repeated the same experiment (fitting a small-scale GA-Planes model) over 10 random seeds (rather than 3), and share the results here (https://imgur.com/a/KwhFkxi) (we will also update it in the final paper). The curve shows the average IoU score for each model type throughout training, and the error bars show the standard deviation of IoU across the 10 random seeds. On average, the semiconvex model performs best, followed by the convex model, with the nonconvex model performing worst on average. We can also see from the error bars that the convex model is extremely stable, with standard deviations that are visually imperceptible; this is also the case for the semiconvex model in the latter half of training. In contrast, the nonconvex model is highly unstable, with very large standard deviations throughout training. Although with a lucky seed the nonconvex model can outperform the (semi)convex ones, the opposite is true on average.
Please let us know if this addresses your concerns or if you have any further questions. Thank you again for your thoughtful engagement with our work.
Dear reviewer, the time window for reviewers to ask questions is closing tonight. Please let us know if we have addressed your concerns or if you have any further questions. Thank you for your thoughtful engagement with our work.
This paper introduces a family of discretized implicit representations designed to enhance optimizability, memory efficiency, and expressiveness. The approach combines multiple discretizations of the 3D volume at different dimensions (1D, 2D, 3D) and resolutions to learn volume features. These features are then combined and input into a shared decoder, implemented as either an MLP or a convex MLP. Additionally, it is shown that with a linear decoder, the problem can be framed as a low-rank plus low-resolution matrix factorization.
优点
Analyzing the properties of discrete implicit representations in terms of optimizability, memory usage, and expressiveness is highly valuable.
Several variants of the proposed method are tested across a diverse set of tasks, including both 2D and 3D problems.
The equivalence between one-dimensional grid representation and low-rank plus low-resolution matrix factorization appears intriguing.
缺点
Unclear method formulation
The method section is hard to read and follow. Some missing elements:
- Functions and vectors are not introduced with dimensions
- Not all elements of the models are formulated (such as the multi-resolution copies)
- Some concepts are introduced before their definition, such as the feature grids, following eq (4). What is the quantity c enumerating on, in eq (5)?
- The definition in equation (5) is challenging to interpret. What does represent, and why are both and needed? What are their dimensions? The separation between equations (5) and (6) adds to the confusion and may frustrate the reader.
- Certain terms are not properly defined. For instance, what exactly does ‘semi-convex’ mean, and why exactly (proof) is the proposed model considered semi-convex?”
Unclear claim regarding generalizing existing models
It is unclear in what way the method generalizes previous work. The proposed factorizations appear to have been introduced in earlier studies, and this method seems to focus more on feature engineering in combining these factorizations. I may be mistaken, but the text would benefit from a clearer explanation of how it offers generalization.
Unclear relation between the introduction of the GAPlanes model and advocating for convex formulations.
The relationship between the introduction of the GAPlanes model and the emphasis on convex formulations is unclear. Specifically, the connection between the integration of convex MLP (in cases where the loss is convex) and its relevance to the GAPlanes model is not well established. This approach appears more like an application of convex MLP rather than a novel contribution. Additionally, the empirical benefits of convex formulations seem uncertain. How does this relate to the goals of this work?
问题
I would appreciate any feedback on the weaknesses and questions outlined above.
Thanks for the thoughtful review and helpful comments. We have revised our paper with extensions to our theoretical and experimental results; changes are summarized in the joint comment to all reviewers. We respond to reviewer-specific comments and questions below.
- We have introduced a new figure to clarify our method, including the use of multiresolution grids. (https://imgur.com/a/FVSIhW4)
- Regarding dimensions: the dimensions of the feature grids are introduced in the bullet points at the beginning of section 3.1. We have added explicit dimensions for the extracted feature vectors (which are derived from these feature grids, so their length matches the feature dimension in the grids).
- Thanks for pointing out places where the description of our method can be simplified/clarified. We have edited the text following your suggestions so that the method will be clear to future readers. In particular, we removed the unnecessary definition of in eq 5, and moved eq 5 to be before eq 3 so that the notation is clear (note that some of the equation numbers have therefore changed in the revised paper). Here, was indexing over the different sets of coordinates in each of the grid features, but these are enumerated explicitly in eq 6 so the shorthand is not really necessary and has been removed.
- We have clarified the definition of semiconvexity in the revised paper, as well as provided a reference which proves that all local optima are also globally optimal, meaning that first-order optimization methods will succeed. Basically, semiconvexity here refers to the Burer-Monteiro factorization of a convex objective. Such problems are biconvex (separately convex in each parameter group), and moreover, it can be shown that local optima are global [1].
[1] Arda Sahiner, Tolga Ergen, Batu Ozturkler, John M Pauly, Morteza Mardani, and Mert Pilanci. Scaling convex neural networks with burer-monteiro factorization. In ICLR, 2024.
This is discussed in more detail in appendix A.1 (in the original submission as well as the revision). GA-Planes is a family of models that includes many previously proposed models as special cases; this is what we mean by generalization. However, the specific models in this family that we use in our experiments have not been used in prior work, because no prior methods (to our knowledge) have combined line, plane, and volume features, and an MLP decoder, in the same model. Further, we adopt feature combinations (in the multiplicative model) that approximate a volume element (trivector) based on geometric algebra operations. For instance, we do not use plane products (unlike K-Planes), as this is not a valid operation in geometric algebra. Compared to TensoRF, GA-Planes represents the 3D volume with line products (like TensoRF-CP) in addition to line-plane products (like TensoRF-VM) and a low-resolution volume feature, generalizing TensoRF factorizations.
We designed the GA-Planes architectures specifically to be compatible with convex optimization. Any model that involves multiplication of features (e.g. K-Planes) cannot be convex, and not all models that can be trained by convex optimization will achieve good quality. For example, in our 3D and video segmentation experiments we find that the GA-Planes models perform comparably well regardless of whether we use the convex, semiconvex, or nonconvex versions, while the previously proposed Tri-Planes model (from EG3D) performs markedly worse under convex or semiconvex formulations. Although in our experiments we use the same first-order optimization algorithm for all models, our convex GA-Planes formulation is compatible with any convex solver (e.g. cvxpy), allowing it to inherit decades of research in efficient convex optimization algorithms. We appreciate the reviewer for highlighting this important point, which we have also made more prominent in the revision. The empirical advantage of having a (semi)convex formulation is improved robustness to initialization (especially in small models), and fast optimizability with specialized solvers. We provide preliminary robustness analysis comparing nonconvex, semiconvex and convex GA-Planes in the video segmentation setting in the revised appendix, and at https://imgur.com/a/6ZiD1rF for convenience.
We hope our responses above and our revised paper address your comments; please let us know if you have any further questions. Thanks so much for your time and input!
Thank you to the authors for providing a detailed rebuttal. While some of my concerns have been adequately addressed, the following issues remain:
GA-Planes generalization claim. I feel the explanation does not adequately justify the claim that GA-Planes is a generalization of previous methods. While GA-Planes introduces new combinations of features and operations, the inclusion of novel components—such as combining line, plane, and volume features, or using geometric algebra-inspired operations—does not inherently qualify it as a generalization of existing methods.
To be considered a true generalization, the framework would need to comprehensively subsume previous methods, enabling them to be derived as special cases without additional assumptions or constraints. Simply incorporating or extending certain aspects of prior approaches, as described, does not satisfy this requirement. Furthermore, the statement that GA-Planes avoids certain operations (e.g., plane products in K-Planes) is in conflict with the generalization claim. Instead, it suggests a constrained from, rather than a generalization of, those methods.
"For example, in our 3D and video segmentation experiments we find that the GA-Planes models perform comparably well regardless of whether we use the convex, semiconvex, or nonconvex versions"
Could the authors clarify why this is considered a positive outcome?
"The empirical advantage of having a (semi)convex formulation is improved robustness to initialization (especially in small models)"
The empirical evidence provided to support this claim appears to be quite limited. First, I would expect the selection of a convex method performing on par with the non-convex method in terms of average IoU scores, which seems plausible based on the above statement. Second, evaluating only three seeds in a single experiment represents an extremely small sample size, making it difficult to draw a robust and generalizable conclusion.
Thanks so much for taking a look at our reply and paper revisions. It seems the reviewer’s remaining concerns/questions are regarding (1) interpretation of GA-Planes as a generalization of previous methods, (2) why it’s a positive outcome that GA-Planes performs comparably well in convex, semiconvex, and nonconvex formulations, and (3) empirical evidence of benefits to convex and semiconvex formulations compared to nonconvex formulations, for small models. We are happy to address these.
(1) There may be some minor misunderstanding regarding the GA-Planes model family versus the specific GA-Planes models we use in our radiance field and video/volume segmentation experiments. The GA-Planes model family does “comprehensively subsume previous methods” because it allows for any subset of line, plane, and volume features (including the empty subset) to be combined by any set of operations (addition, concatenation, and elementwise multiplication) and then decoded by a decoder, typically an MLP. This is summarized in appendix A.1 (in both the original and revised papers), and pasted here (https://imgur.com/a/UUPlDDu) for convenience. Note that the GA-Planes family does include some members that do not respect geometric algebra, i.e. whose features are combined in ways that do not represent a trivector. Our radiance field experiments focus on a GA-Planes model (one member of this model family) that does respect geometric algebra, but our convex-objective experiments use a simpler GA-Planes model that avoids multiplication but also does not quite respect geometric algebra, for the sake of convexity.
(2) For our volume and video segmentation experiments, we show that the specific GA-Planes model we use works well in all 3 (convex, semiconvex, and nonconvex) formulations, whereas the baseline models (other members of the GA-Planes family that were proposed previously) only work well under nonconvex but not convex formulations. This is a positive outcome because it means that, unlike these other models, our specific GA-Planes model can enjoy the benefits of convexity in terms of stable optimization without sacrificing representation capacity/quality.
(3) We have repeated the same experiment (fitting a small-scale GA-Planes model) over 10 random seeds, and share the results here (https://imgur.com/a/KwhFkxi) (we will also update it in the final paper). The curve shows the average IoU score for each model type throughout training, and the error bars show the standard deviation of IoU across the 10 random seeds. On average, the semiconvex model performs best, followed by the convex model, with the nonconvex model performing worst on average. We can also see from the error bars that the convex model is extremely stable, with standard deviations that are visually imperceptible; this is also the case for the semiconvex model in the latter half of training. In contrast, the nonconvex model is highly unstable, with very large standard deviations throughout training. Although with a lucky seed the nonconvex model can outperform the (semi)convex ones, the opposite is true on average.
Dear reviewer, the time window for reviewers to ask questions is closing tonight. Please let us know if we have addressed your concerns or if you have any further questions. Thank you for your thoughtful engagement with our work.
We would like to thank all of the reviewers for their time and thoughtful engagement with our work. We have revised our manuscript based on your suggestions, and summarize the changes in the revision as follows:
- Radiance field experiments are now evaluated on all 8 scenes from the NeRF-Blender dataset. We have edited the results in the main paper to show average results across all scenes, and added per-scene results and renderings in the revised appendix. Per-scene results: https://imgur.com/a/psnr-ssim-lpips-pareto-optimal-curves-of-compared-models-bsXUXIL (lego scene is not included here since it was presented before) Example renderings: https://imgur.com/a/MQTua7H Average results over all scenes: https://imgur.com/a/O4NQrGY
- We have added theorems 3 and 4 detailing the connection to 2D matrix completion when using convex and nonconvex MLP decoders. These results were previously in the appendix, but they have been expanded and brought to the main paper; proofs are in the revised appendix (A.2).
- We have streamlined the presentation of our theorem statements.
- We have added a concise summary of our theoretical results in terms of the maximum rank attainable by each type of 2D GA-Planes (in Table 1 of the revision), and corresponding lower bounds on fitting errors (Appendix A.4).
- We have added experimental validation of the proofs in the 2D image fitting setting which also compares linear and nearest neighbor interpolation. Figure link is here: https://imgur.com/a/523NGVa
- We have added a method overview figure, including the convex, semiconvex, and nonconvex versions of GA-Planes that we use in our experiments. Figure link: https://imgur.com/a/FVSIhW4
- We have added a table in the appendix detailing the specific model configurations (hyperparameters) that we used in the experiments (section A.8).
- We have added a figure comparing IOU scores of (semi/non)convex GA-Planes models on test frames at each epoch across 3 random seeds (gating weights were initialized with a different fixed seed) in the revised appendix. This is a preliminary support for semiconvex and convex models being robust to different initializations. https://imgur.com/a/6ZiD1rF
The paper develops a neural volume approximation which can be fit by convex optimization. This representation, called GAPlanes, approximates the given volume as a superposition of tensor products of basis elements (i.e., a superposition of lines, planes and voxels). This yields a type of low-rank plus low-resolution decomposition. The paper explores a number of variants of this model. For example, the paper studies variants that concatenate features (can be convex) vs multiply features (inherently nonconvex). It also studies variants which couple concatenated features with a convex neural network (nonlinear network with the nonlinear part frozen at initialization), yielding a bilinear model. The paper evaluates the resulting models on a variety of tasks, including radiance field fitting, 3d segmentation and video segmentation. For radiance field fitting, at large model size, the proposed method matches the best comparison baseline; at smaller model sizes, the proposed method is more accurate. Interestingly, for segmentation problems, the convex and semiconvex (bilinear) variants of this representation match or exceed nonconvex baselines.
The main strength of the paper is that it shows how to fit neural volumes using convex and bilinear (or, in the language of the paper, semi-convex) formulations. In experiment, these tractable formulations perform well compared to nonconvex baselines. Moreover, for image approximation the proposed approximation exhibits better performance than baseline at low complexities. At the same time, reviewers retained concerns about the paper’s central contribution: unifying various volume representations using language from geometric algebra. After considering author response, the reviews remained mixed, placing the paper below the bar for acceptance.
审稿人讨论附加意见
Initial reviews of the paper were mixed. A central contribution of the work is to provide a framework which combines various types of geometric building blocks for volume features (points, lines, planes). Reviewers expressed a mixed evaluation of this contribution: reviewer dwor found it a unifying and mathematically precise framework, while reviewer uQQD was less convinced by the role of geometric algebra in this mathematical framework. The author response clarified that geometric algebra serves as an inspiration for the proposed framework, but does not play a detailed technical role. Ultimately, after discussion and author responses, the reviewer evaluation of the paper’s central contribution remained mixed [d3v9,uQQD].
Other reviewer concerns included the theoretical connection between the proposed method and matrix completion in 2D [d3v9]. The paper shows that in 2D this formulation is equivalent to a certain constrained low-rank recovery problem. The discussion clarified that this connection is correct (the representation indeed can yield a rank 2 recovery problem with constraints on the factors).
Other issues included clarity and presentation issues [ynDE,d3v9], and questions about experiments [ynDE,dwor], many of which were adequately addressed in the course of discussion.
Reject