Supermodular Rank: Set Function Decomposition and Optimization
We introduce the supermodular rank and obtain improved guarantees for set function optimization.
摘要
评审与讨论
The paper studies the set function optimization problem, where there is a ground element set and the goal is to pick an element subset such that some certain objective is maximized. The authors mainly consider two concrete models: matroid-constrained maximization and set function ratio minimization. In the first model, we are given a monotone, non-negative function with generalized curvature and submodularity ratio , and a matroid system . The goal is to pick an element subset such that is maximized. The authors prove that there exists a framework such that by applying it to any approximation algorithm with queries ( is a polynomial function), a better approximation ratio can be obtained in time , where is the elementary submodular rank. In the second model, we are given two set functions and the goal is to pick a subset such that is minimized. The authors prove that when are normalized positive monotone functions and has a bounded elementary submodular rank, they obtain an approximation ratio polynomially that was previously only available when function was submodular. Finally, the authors conduct experiments to investigate the empirical performance of the algorithms.
优点
-
The paper considers two classical and important models in set function optimization. The main contribution of the paper is introducing the concept of elementary submodular rank and building on it to extend the previous results to a more general function class.
-
Both theoretical analyses and experimental evaluations for the proposed algorithms are provided in the paper.
缺点
- A main weakness is that the paper is not well-written. The structure is quite confusing. The formal definitions of the considered models are not provided until section 4. Several new definitions are introduced before Section 4. However, the absence of any intuitive explanations renders the paper less accessible to readers.
问题
(1) Could you give some intuition about the elementary-submodular-rank-based trick used in the paper?
A main weakness is that the paper is not well-written. The structure is quite confusing. The formal definitions of the considered models are not provided until section 4. Several new definitions are introduced before Section 4. However, the absence of any intuitive explanations renders the paper less accessible to readers.
We thank the reviewers for their comments. We have updated the manuscript to add more explanatory examples to improve the paper’s readability. The changes are highlighted in blue.
(1) Could you give some intuition about the elementary-submodular-rank-based trick used in the paper?
The intuition might be more apparent if we think of continuous functions. The idea behind the elementary rank can be thought of (not strictly true, but ok for intuition reasons) as determining the number of coordinate directions in which the function is non-convex. Once we have determined these directions, we decompose our function into convex orthogonal functions. Then, we independently optimize these functions.
The reviewer thanks the author for their response and the newly added examples in the revision. I believe this is an interesting work. However, the paper seems to still need a cleaner overall structure to make the main contributions more readable.
We sincerely thank the reviewer for their comment. Our paper is currently structured as follows.
- Section 2 provides the mathematical background on supermodular functions and cones
- Section 3 defines the supermodular rank
- Section 4 provides an application of supermodular rank to set function optimization.
- Section 5 presents numerical experiments.
We kindly ask what structural changes the reviewer would like to see.
The reviewer thanks the author for their further response. For me, it's more natural to introduce the target optimization problems (Section 4) as soon as possible, so that the reader can get the main goal quickly. And then state why defining the supermodular rank is helpful and necessary. This seems a better way to organize the paper, but this is just my personal opinion, the authors should avoid overfitting.
This paper considers the supermodular and submodular optimization problem. In these problems, we are given a supermodular/submodular or a related function defined over a ground set. The goal is to select a certain subset of the ground elements such that (1) the selected subset satisfies some properties; (2) the value of the selected subset is optimized. If a function is not submodular/supermodular, one can describe it into several parameters, and the approximation ratio shall also be related to these parameters.
The main contribution of this work is a new approach to grading the space of set function. They propose a new concept called supermodular/submodular rank, which is defined over a partial order set. Based on such a concept, they show that a function can be decomposed into a summation of several \p-supermodular/submodular functions. Then, one can improve the approximation by such a splitting.
优点
-
The high-level idea of this paper is clear. The main technical idea is to decompose a function into a sum of functions that are \pi-supermodular/submodular. And then split the problem into several submodular pieces. This improves the approximation when the submodular/supermodular rank is bounded.
-
This paper is technically involved. To my knowledge, there is no such definition and decomposition in the literature. Probably the most related one is that a submodular function can be decomposed into n! additive functions, but the definition used in this paper is quite different from this.
缺点
-
The presentation of this work is poor. It seems that the authors ran out of space and moved a lot of background knowledge to the appendix. Without this knowledge, it's hard to get the definition of \pi-supermodular. After moving, it seems that the authors didn't do careful proofreading. For example, R(alpha, gamma) in Theorem 25 is not defined. This significantly impairs readability. I appreciate that the authors also try to explain their ideas with some examples, and I also understand that the space issue is not the authors' fault, but it is a fact that the paper does need a better presentation.
-
To my understanding of Table 2, the proposed algorithm improves the previous ratio only in the case where the elementary submodular rank is a constant. If this is true, it’s not clear how important this improvement is. Because the paper didn't include a discussion about whether there exists some famous functions with a constant submodular rank. Besides this, the paper only includes an upper bound on the rank of a function but excludes the way to compute the rank of a function. Maybe I missed something, and such a computation is trivial, but this should be stated explicitly.
问题
See my second comment in Weaknesses.
The presentation of this work is poor. It seems that the authors ran out of space and moved a lot of background knowledge to the appendix. Without this knowledge, it's hard to get the definition of \pi-supermodular. After moving, it seems that the authors didn't do careful proofreading. For example, R(alpha, gamma) in Theorem 25 is not defined. This significantly impairs readability. I appreciate that the authors also try to explain their ideas with some examples, and I also understand that the space issue is not the authors' fault, but it is a fact that the paper does need a better presentation.
We thank the reviewers for their comments. We have updated the manuscript to add more explanatory examples to improve the paper’s readability. The changes are highlighted in blue.
To my understanding of Table 2, the proposed algorithm improves the previous ratio only in the case where the elementary submodular rank is a constant. If this is true, it’s not clear how important this improvement is. Because the paper didn't include a discussion about whether there exists some famous functions with a constant submodular rank. Besides this, the paper only includes an upper bound on the rank of a function but excludes the way to compute the rank of a function. Maybe I missed something, and such a computation is trivial, but this should be stated explicitly.
Computing the rank of a function (unless we know something about the function) is not easy. We do not have a simple method for doing so. However, we can provide examples of functions that have low submodular rank.
RBM
The first example comes from Restricted Boltzmann Machines (RBMs). These are graphical models for modeling probability distributions on . They have a hyperparameter , the number of hidden nodes. Prior work (See references [Allman et al., 2015]) has shown that when , this model can represent distributions if and only if is -supermodular and satisfies certain polynomial equality constraints. Then, when we move to more significant values of , it can be shown that the model can model distribution only if is rank- supermodular. Thus log RBM distributions are functions with bounded supermodular rank. In this case, the optimization problem would correspond to finding modes of the distribution.
One hidden layer neural networks
Building on this, we have the following.
Proposition: Let be a real-valued function on that is the composition of an affine function and a convex function. That is, where is a vector of length and is a scalar and is convex. Then is sign(w)-supermodular.
Proof:We show that the elementary imset inequalities are satisfied. Such an inequality involves four vectors on a two-dimensional face of the cube . We have two indices that vary and a fixed value of , the vector restricted to the set . Let be the entry of the face where . Letting , we seek to compare with . By the definition of -supermodularity, if both entries and have the same sign, we require while if , we require The inequalities hold by the fact that is convex. For example, if , then , and we apply the definition of convexity as applied to a comparison of four points.
Thus, in particular, a one-hidden layer ReLU neural network, when restricted to with hidden nodes with positive outer weights, is rank- supermodular.
I'd like to thank the authors' response. I am satisfied with my second question. It is particularly interesting to see that there is a rank-k-supermodular function in neural networks. I believe this is an interesting work, and it's a good paper, but not in its current version due to the presentation.
This work measures how far a function from being submodular or supermodular. The main idea is the decomposition of into the sum of the smallest number of submodular function with a different total order on individual variables. The number is the submodular rank of . The less interesting part of this work is the elementary submodular rank where the order can be reversed in one variable at most for each function. The paper proposes a simple R-SPLIT algorithm using the proposed notion, which splits up a function of rank into pieces and runs a simple algorithm e.g. greedy on each piece returning the best solution.
优点
It is clear that this work is novel in terms of the definitions of supermodular/submodular rank and elementary rank. This work also provides both theoretical results and empirical evaluation.
缺点
The theoretical contribution of this work seems to be very weak. In particular, the elementary rank basically says that the function becomes submodular for all assignments to a subset of the variables, which leads to an exhaustive search for this subset. Hence, demonstrated by the complexity of the algorithmic part, the contribution of this work does not meet the bar of top-tier conferences such as ICLR. I have two additional comments:
- The current paper is very tough to read.
- The empirical evaluation is not convincing.
My suggestion to the authors is to consider submitting this paper to other venues such as ICALP, SODA, and ESA. The main reason behind this suggestion is that I think the paper can be presented much better without the practical part (which is not convincing in my opinion), which can be replaced by highlighting some non-trivial theoretical results such as Theorem 10.
问题
- Seems like when the submodular rank is high, the algorithms are impractical.
- Empirical evaluation is not convincing.
The theoretical contribution of this work seems to be very weak. In particular, the elementary rank r basically says that the function becomes submodular for all assignments to a subset of the variables, which leads to an exhaustive search for this subset.
As the reviewer pointed out, the notion of supermodular rank is novel and interesting. Please see the general response for an explanation of our contributions.
The elementary supermodular rank is a particular case where we restricted the types of permutations that were allowed. Here, apriori, there is no reason that this notion would simplify to such an elementary property, but the fact that it does is quite interesting.
Building on this, since the idea of supermodular rank is novel, this gradation of the space of set functions is new. It provides a valuable way to think about the complexity of set function optimization.
Regarding the complexity of our algorithms, recall that there is no free lunch in optimization. Our paper provides a lower bound that says that set function optimization for a rank function necessarily requires function evaluations. Our algorithms can maintain a low complexity for optimizing functions with low elementary supermodular rank. By necessity, the complexity of optimizing general objective functions, or functions that have a high elementary supermodular rank, is high. To illustrate this more concretely, we can provide examples that serve as a lower bound as follows.
Let be your favorite submodular function on a set of size , and define to be an elementary rank- submodular function with pieces given by . All of the pieces of are just shifts of . To get an approximation algorithm for general 's, at the very minimum, we need to evaluate the function once at each of the shifts. Thus, we will always need at least time. As mentioned in Remark 27, if the decomposition is known, which is the case for the lower bound, the only exponential term in the time complexity for R-split is . Thus, the algorithm achieves the complexity lower bound.
We would also like to highlight that our algorithm is different from other methods. In particular, typically, algorithms for submodular optimization, such as the plain greedy, are analyzed for things like weak submodularity, but no changes are made to the algorithm to deal with the lack of submodularity. We exploit the non-submodularity structure by providing gradation on the space of functions.
In light of the above strengths, the fact our algorithm is additionally simple is an additional strength.
Hence, demonstrated by the complexity of the algorithmic part, the contribution of this work does not meet the bar of top-tier conferences such as ICLR.
The primary contribution of the paper is not algorithmic but theoretical. The main contributions are
- The definition of rank
- The proof of the maximal rank
- The gradation of the space set functions
- The results show that theoretical guarantees obtained for submodular functions can be lifted to higher ranks but with a necessary penalty. That is, this penalty cannot be avoided in some cases.
The current paper is very tough to read.
We thank the reviewers for their comments. We have updated the manuscript to add more explanatory examples to improve the paper’s readability. The changes are highlighted in blue.
The empirical evaluation is not convincing.
We kindly disagree. The experimental results show a consistent and significant improvement in the performance compared to Greedy. For example,
- Figure 1a shows that the theoretical bound improves by 400% in one instance.
- Figure 1b shows orders of magnitude decrease in the error
- Figure 2a shows a consistent improvement as well.
- Figure 2b shows that in most cases, going from greedy to 1-split greedy, the percentage of times we find the optimal set more than doubles.
- Figure 2c shows that it scales to large problems, and we consistently have over 5% improvement.
Seems like when the submodular rank is high, the algorithms are impractical.
Yes, but we provide an impossibility result in approximating the solution to a related problem with less work.
I would like to thank the authors for their response. My initial understanding of the paper would appear to be correct.
We thank the reviewer for their comment, but we kindly disagree with the reviewer. The experimental evidence does show significant improvements and we have updated the paper (available on openreview) to improve readability.
The authors introduce the concept of supermodular rank for functions defined on partially ordered sets. Supermodular rank characterizes how a function can be decomposed into a sum of functions that exhibit a "supermodular" property. This concept allows for a more refined understanding of the structure of set functions. The authors propose optimization algorithms, namely R-SPLIT and R-SPLIT RATIO, for optimizing monotone set functions and the ratio of set functions. These algorithms provide a trade-off between computational cost and accuracy, offering theoretical guarantees for their performance.
优点
-
Introduction of Supermodular Rank: The concept of supermodular rank is considered interesting and valuable for understanding the structure of set functions.
-
Optimization Algorithms: The proposed optimization algorithms, R-SPLIT and R-SPLIT RATIO, are seen as valuable contributions. They provide a trade-off between computational cost and accuracy while offering theoretical guarantees for their performance.
缺点
-
Complex and Notation-Heavy: The paper is noted as being quite complex and filled with notation, making it challenging to follow. Simplifying the presentation or providing additional explanations could enhance the accessibility of the material.
-
Lack of Clarity on Performance Improvement: The paper's comparison to existing solutions, particularly in Table 2, is mentioned as lacking clarity. It is not immediately clear how the proposed algorithm outperforms existing solutions. The authors should elaborate on the additional benefit brought by the proposed algorithm's computational overhead.
问题
See Weakness 2 above.
We thank the reviewer for their feedback.
> Complex and Notation-Heavy The paper is noted as being quite complex and filled with notation, making it challenging to follow. Simplifying the presentation or providing additional explanations could enhance the accessibility of the material.
We thank the reviewers for their comments. We have updated the manuscript to add more explanatory examples to improve the paper’s readability. The changes are highlighted in blue.
Lack of Clarity on Performance Improvement: The paper's comparison to existing solutions, particularly in Table 2, is mentioned as lacking clarity. It is not immediately clear how the proposed algorithm outperforms existing solutions. The authors should elaborate on the additional benefit brought by the proposed algorithm's computational overhead.
The table should be read not as saying that we improve the performance of the method (which we do show happens empirically), but in a different manner. Specifically, prior work showed that for a family of functions , we can prove that we have a good approximation rate. With our framework, we can show that this approximation rate can be lifted to a larger family of functions such that . Table 1 shows that the size of can be significantly bigger than .
We thank the reviewers for their comments. We have updated the manuscript to add more explanatory examples to improve the paper’s readability.
Additionally, we would like to reiterate the contributions of our paper.
Theoretical Contributions
Set function optimization is an important problem that has many applications. Prior work has shown that if the function has a favorable structure, particularly if it is submodular, then it is easy to optimize. However, on the other extreme, if we have set functions that have no structure, then it is clearly impossible to do better than to evaluate the function at all possible inputs.
However, as grows, the percentage of submodular set functions decreases exponentially, as shown in our paper. Prior work has tried to define a broader class of functions with favorable structure by introducing quantities such as submodularity ratio and curvature. These works then analyze the performance of Greedy on this broader class of functions and show that the performance degrades and hence we have bad approximations.
However, we take a complementary approach. We take our set of functions and partition the whole space by defining a measure of complexity on the space. Namely, submodular rank. We then show that the maximum rank is . Using this, we show that as the rank increases from 1 to , the complexity of getting good approximations of the optimizer grows exponentially in . We then complement this result by showing that our imposed structure presents a way to optimize these functions. Unlike prior work, the idea is to present methods that do not degrade the approximation quality.
Empirical Contributions
We show that for small values of , our method is feasible for large problems. Furthermore, we show that our method provides consistent improvement.
Functions with low supermodular rank
The first example comes from Restricted Boltzmann Machines (RBMs). These are graphical models for modeling probability distributions on . They have a hyperparameter , the number of hidden nodes. Prior work (See references [Allman et al., 2015]) has shown that when , this model can represent distributions if and only if is -supermodular and satisfies certain polynomial equality constraints. Then, when we move to more significant values of , it can be shown that the model can model distribution only if is rank- supermodular. Thus log RBM distributions are functions with bounded supermodular rank. In this case, the optimization problem would correspond to finding modes of the distribution.
One hidden layer neural networks
Building on this, it is easy to see the following.
Let be a real-valued function on that is the composition of an affine function and a convex function. That is, where is a vector of length and is a scalar and is convex. Then is sign(w)-supermodular.
Thus, in particular, a one-hidden layer ReLU neural network, when restricted to with hidden nodes with positive outer weights, is rank- supermodular.
The paper introduces the concept of supermodular rank for a function on a lattice, which is the smallest number of functions in a decomposition of the function into supermodular functions. The paper designs algorithms that are able to leverage a low supermodular rank structure in order to maximize such functions subject to constraints.
The reviewers appreciated the novelty of the definition of supermodular rank. The reviewers remained concerned about the significance of the main contributions, the practical applicability of this work, and the presentation. The overall consensus was that the paper does not meet the threshold for acceptance.
为何不给更高分
The theoretical results of the paper seem to be weak, and the proposed algorithms have limited practical applicability due to high running times.
为何不给更低分
N/A
Reject