Discrete and Continuous Difference of Submodular Minimization
We extend results for minimizing the difference of submodular (DS) functions, from set functions to general functions over both discrete and continuous domains.
摘要
评审与讨论
This paper explores the minimization of the difference of submodular (DS) functions over both discrete and continuous domains, extending prior work that was restricted to set functions. The authors introduce a new variant of the DC Algorithm (DCA) to minimize DS functions, providing theoretical guarantees comparable to previous work in the set function case. The study demonstrates that DS functions naturally arise in applications such as quadratic programming and sparse learning. The proposed method outperforms existing baselines in integer compressive sensing and integer least squares tasks. Experimental results show significant improvements in recovery probability and error rates.
给作者的问题
Please refer to the above.
论据与证据
The claims in the paper appear well-supported both theoretically and experimentally, with no evident problematic claims.
方法与评估标准
I think the proposed methods and evaluation criteria make sense for the problem and application at hand. The problem of minimizing the difference of submodular functions (DS functions) in both discrete and continuous domains is a well-defined and important one in optimization, with practical applications in areas like quadratic programming, sparse learning, and compressive sensing.
理论论述
I have conducted a preliminary review of the theoretical proofs in the paper, and they appear to be correct. However, I should note that my examination was not exhaustive, and there remains a possibility that some details may have been overlooked.
实验设计与分析
The authors should include more results regarding the running time of their algorithm to better demonstrate its efficiency.
补充材料
I have reviewed all of the Supplementary Material provided with the paper to enable a more comprehensive evaluation of the overall work.
与现有文献的关系
The problem addressed in this paper may be related to regularized submodular optimization, which involves optimizing a submodular function that is regularized by subtracting a linear or convex function [1][2].
[1] Kazemi, Ehsan, et al. "Regularized submodular maximization at scale." International Conference on Machine Learning. PMLR, 2021.
[2] Cui, Shuang, et al. "Constrained subset selection from data streams for profit maximization." Proceedings of the ACM Web Conference 2023. 2023.
遗漏的重要参考文献
Please refer to the above.
其他优缺点
N.A.
其他意见或建议
N.A.
Thank you for your positive review and helpful feedback. We address below your comments and questions.
1- Results on the proposed algorithm's running time
We report the average running time of the compared methods on the integer least squares experiment (Section 5.1) in Figure 4. The reported times for DCA and ADMM do not include the time for the RAR initialization, and for OBQ do not include the time for the relaxed solution initialization (which has similar time as RAR). We observe that our algorithm is significantly faster that the optimal Gurobi solver, even for the small problem instance (). For the larger instance (), Gurobi is already too slow to be practical. While our algorithm is slower than heuristic baselines, it remains efficient, with a maximum runtime of seconds for . We will include these plots and this discussion in the revision.
The primary focus of our experiments was to demonstrate that our algorithm outperforms state-of-the-art baselines on challenging problems, rather than optimizing its computational efficiency. For example, the most computationally intensive part of our algorithm is computing subgradients of the continuous extensions, which requires evaluating successive marginals . In our current implementation, we compute these marginals by doing separate calls to . This can be significantly sped up by reusing computation from evalutating when computing $F^t(y^{i-1} + e_{p_i}).
2- Relation to regularized submodular optimization [1][2]
The problem studied in [1][2] is that of maximizing the difference between a submodular set function and a modular set function over constraints. Our problem setup does not consider constraints. But the unconstrained version of that problem is indeed a special case of DS set function minimization. We will cite works on unconstrained regularized submodular set function optimization in the revision.
Submodular functions are commonly studied as set functions, which can be viewed as functions defined on the vertices of the hypercube . This paper, however, similar to some prior literature, considers an extension of submodularity, where functions are defined over cartesian products of compact subsets of . They study the problem of minimizing the difference of these generalized submodular functions. For discrete domains, they propose an algorithm to solve DS via a reduction to integer lattice domains followed by their variant of DCA (difference of convex functions minimization algorithm), which converges to a local minimum at a rate of . They claim that with discretization, the same method applies to continuous domains as well.
给作者的问题
To my understanding, the paragraph Computational Complexity in Section 4 discusses the complexity of your DCA algorithm. Could you please specify the theoretical guarantees of your DS algorithm as a whole?
论据与证据
Claims like "The results can be easily extended to unequal s" need some explanation.
方法与评估标准
Overall, Methods and Evaluation criteria seems fine.
This paper builds upon "Difference of Submodular Minimization via DC Programming" by El Halabi et al., so it should include a clear comparison between its DCA variant and those of Halabi et al. in this setting.
理论论述
The main body of the paper includes only proof sketches and high-level ideas. They seemed sound, but I was not able to verify their correctness.
实验设计与分析
The experiments compare their proposed algorithm for integer least squares problem (a special case of quadratic programs) and integer compressive sensing problem (a special case of Sparse Learning) with appropriate baselines for the respective problem.
补充材料
I have looked through the supplementary material, but I have not examined them closely.
与现有文献的关系
This paper extends the result of the paper "Difference of Submodular Minimization via DC Programming" by El Halabi et al. (2023) to the setting proposed in the paper "Submodular functions: from discrete to continuous domains" by Bach.
遗漏的重要参考文献
.
其他优缺点
The paper is hard to follow.
This work appears to build on established ideas without introducing a significant departure from prior research.
其他意见或建议
.
Thank you for your valuable feedback. We address below your comments and questions.
1- Explain the claim "The results can be easily extended to unequal 's"
The extension to unequal k_i’s follows directly, though the notation becomes more cumbersome. The key modifications are:
- The relaxed domain (i.e., the domain of the DC program in Eq. (11)) changes to .
- The map is adjusted to map from to , with a similar definition.
- The continuous extension is now defined on , and the summation in Eq. (6) runs from to .
2- Comparison with the DCA variants of El Halabi et al. (2023)
Please refer to our response to Reviewer 3rHC (3rd item), where we include a clear comparison between the DCA variants.
3- The paper is hard to follow.
We understand that the technical overhead may be challenging for readers, and we are happy to revise the paper to improve accessibility. Could you clarify which parts were hard to follow and suggest areas where clarity could be improved?
4- This work appears to build on established ideas without introducing a significant departure from prior research.
All reviewers seem to agree that the problem we address is interesting and well motivated, and that our results are valuable. We believe that achieving our results by extending existing results in a relatively simple way should not be seen as a weakness.
That said, we would like to clarify that while our results build on existing work, they did require novel ideas and proofs. In particular, the main challenges are the following:
- One factor that makes our results appear “straightforward” is the use of simpler and cleaner notation. For example, we represent the input of the continuous extension as a matrix , instead of a list of vectors as in Bach and Axelrod. This change significantly simplified the presentation of the results.
- While the proof that any function on a discrete domain is a DS function (Proposition 3.1) is a straighforward extension of the set function result in (Iyer & Bilmes, 2012), the analogous result for continuous domains (Proposition 3.3) required leveraging an alternative definition of submodularity and relating it to function smoothness.
- Extending DCA (whether our variant or those in (El Halabi et al., 2023)) from set functions to general discrete functions is non-trivial. It required restricting the non-increasing permutation of to be row-stable, to ensure that it's a non-increasing permutation of too. This is essential to guarantee local minimality (see proof of Theorem 4.5-b). This is not needed in the set function case (), where any non-increasing permutation of is row-stable.
- Our proposed DCA variant (Algorithm 1) introduces a novel approach for selecting the subgradient using the local search step (lines 3-4), which ensures direct convergence to an approximate local minimum. This improves performance in some settings compared to the variant (extended to general discrete domains) proposed in (El Halabi et al., 2023). For further details, please see our response to Reviewer 3rHC (3rd item).
- Identifying important applications (Section 3.2) that have natural DS decompositions but are not DC, and do not have easy discrete DC decompositions, was also non-trivial. It required exploiting properties of DC functions (e.g., in Proposition A.1) and discrete DC functions (see Section A.2).
- Showing that Lipschitz continuity is not necessary for bounding the discretization error for continuous domains, as in the case of the -norm (Proposition G.2), is novel and its proof is non-trivial. We believe that this result could be generalized to other similar functions.
5- Clarification on Theoretical Guarantees and Computational Complexity of DS Algorithm
Our full DS algorithm is presented in Algorithm 1. Its theoretical guarantees are given in Theorem 4.5, and its computational complexity is discussed in the corresponding paragraph in Section 4.2. The algorithm applies DCA to the DC Problem (11) using the DC decomposition given at the beginning of paragraph "Algorithm", while also maintaining iterates as the solution to the original DS Problem (1). The iterates are updated with DCA updates (see Eq. (9)), while iterates are obtained by rounding (line 7). As explained, the rounding step can be skipped if the solution of the subproblem is integral.
This paper considers the minimization of a difference of submodular functions (DS) in both the continuous and discrete domains. Unlike the submodular minimization problem, which can be solved in polynomial time, this problem cannot even be approximated efficiently. This paper accomplishes two main things: (i) It shows how broad the class of DS functions are by proving that many existing functions can be represented this way, although it is computationally hard to find a representation, and (ii) It provides an algorithm and proves that the algorithm returns an approximate local minimum (even finding a true local minimum is computationally hard. This algorithms builds upon the fact that the continuous extension of DS functions via their Lovasz extension is a difference of convex functions for which the DCA algorithm exists. This paper proposes a modified variant of the DCA algorithm. Finally, they include an experimental section where they compare their algorithm to existing alternatives on two different applications.
给作者的问题
None
论据与证据
Yes
方法与评估标准
Yes
理论论述
I did not thoroughly check correctness, but I did not notice any issues.
实验设计与分析
Yes, the experiments looked reasonable to me.
补充材料
No
与现有文献的关系
This paper is related to several existing related works that have studied special cases of this problem, e.g. Narasimhan and Bilmes [2005]. This work is of interest to the submodular optimization community.
遗漏的重要参考文献
I did not notice any missing essential references.
其他优缺点
Strengths
- I think their problem statement is interesting, and even though it's minimization I think it is related to submodular maximization applications as well. In particular, I wonder if summarization objectives commonly found in submodular maximization papers where there is a diversity penalty can also be viewed in this sort of problem statement.
- The problem is computationally challenging, but they were able to develop an algorithm that finds an approximate local optimum efficiently. This is an interesting type of algorithm, and maybe could be applied more broadly.
- Their result is connected with convex optimization, and they build upon the existing DCA algorithm.
- They provided an experimental section.
Weaknesses
- The algorithm may not be extremely novel, but this is fine in my opinion.
- Some of the writing, in particular in the introduction, was a bit hard to interpret. E.g. "developed algorithms for this special case, that monotonically decrease the objective value" I did not understand when reading the introduction.
其他意见或建议
None
Thank you for your positive review. We will improve the writing of the introduction.
The minimization of a difference of submodular functions is studied, in a discrete and continuous setting. The discrete setting is more like a lattice that generalizes set optimization. A variant of the DC algorithm is developed that uses local search ideas. Experiments are performed to validate the algorithm.
给作者的问题
- See strengths and weaknesses.
- Is the algorithm of El Halabi et al. (2023) compared, either theoretically or empirically? As this is also a variant of DCA, shouldn't it be a natural baseline?
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
I did not check. All proofs are relegated to appendix or results from other works.
实验设计与分析
Yes. See strengths and weaknesses.
补充材料
No.
与现有文献的关系
Generalizes DS set function optimization to discrete domains of R^n. For continuous domains, there is a work of El Halabi et al. (2023). I'm unsure how the results in this paper are related to the former, although they do cite and discuss.
遗漏的重要参考文献
No.
其他优缺点
Strengths:
- Exposition is fairly easy to follow, even for someone unfamiliar with the area. Background and preliminary information goes until page 4. However, see weaknesses.
- The problem setup is very general, which makes it difficult. Some natural applications are given, which motivate the work well.
- The experimental results are convincing, although perhaps a natural baseline was left out. See weaknesses.
Weaknesses:
- Although much background information relevant to the problem studied is presented, I found it difficult to get a good idea of what is novel to this paper, and what is challenging about the variant that is studied. Such a good exposition of context is given that it becomes difficult to separate the contribution from the context. I think the algorithm developed (a DCA variant) is interesting. But I don't know enough to really judge its distance from standard DCA. The authors did not explain clearly was is different.
- I didn't understand why the reduction discussed on line 092 for DR submodular functions, could not be extended to the case of DS functions on discrete domains. And then existing techniques from DS set optimization applied. A related issue is the reduction mentioned on line 317. The authors state that this reduction (line 317) is more expensive, but no justification is given in the main text. Also, I don't think the authors compared with this approach experimentally.
其他意见或建议
- Line 208: funcion -> function
- Line 167: us -> use
- The writing style is generally fine but it is a little informal. I kind of like that, because it means it wasn't written by an AI. But still, there were multiple examples of incomplete sentences (often starting with "Though"). Though is an informal shortening of although, and starts a subordinate clause that needs to be paired with a main clause. As in: "Although I like cats, I like dogs better."
Thank you for your valuable feedback. We address below your questions.
1- Distinguishing novel contributions from background
Our main contributions are outlined in the introduction (lines 61-74, 1st col). The DS minimization problem over general discrete and continuous domains (Problem 1) has not been studied before. Prior work only considered special cases, where is a set function (), or is submodular (). DS Minimization is significantly harder than submodular minimization, which is solvable in polynomial time, whereas even finding a local minimum or a polynomial approximation factor for DS minimization is provably hard.
Standard DCA (Eq. 9) is not guaranteed to converge to an approximate local minimum, even for set functions (Section 4.2, lines 324- 333). Our DCA variant (Algorithm 1) differs by maintaining iterates for the original problem , obtained by rounding the iterates of the relaxed problem , and by carefully choosing a subgradient which ensures convergence to an approximate local minimum. For further discussion on the technical novelty of our results, see our response to Reviewer eVr5 (4th item).
2- Extension of the DR submodular reduction to DS functions & Cost of the submodular set function reduction and empirical comparison
The reduction by Ene & Nguyen (2016) (line 092) applies only to DR-submodular functions, a subclass of submodular functions that are concave along nonnegative directions. For a DS function , the submodular components and are not necessarily DR-submodular. This reduction cannot be readily extended to general submodular functions. It relies on a carefully chosen map , where , to construct an equivalent submodular set function whose domain scales with . when is not DR-submodular, does not preserve submodularity.
We discuss the general reduction for submodular functions (line 317) in Appendix B and compare its empirical performance. Our results show that the reduction approach is indeed slower than the direct approach used by Bach and us. See also our response to Reviewer Ecvy (6th item) for a discussion of the theoretical complexity of the two approaches. We will add a brief explanation in the main text.
3- Relation to (El Halabi et al., 2023) & Theoretical and empirical comparison with their DCA variants
El Halabi et al. (2023) study a special case of our DS minimization problem where is a set function (); they do not consider continuous domains.
They proposed two DCA variants: One is infeasible, as it requires trying subgradients per iteration, while the other (Algorithm 2 therein) restarts from the best neighboring point at convergence if the solution is not an approximate local minimum. Both can be extended to general discrete domains similarly to our DCA variant. The main non-trivial change is the restriction of the permutation to be row-stable. Our DCA variant (Algorithm 1) instead selects a single subgradient using a local search step (lines 3-4), ensuring direct convergence to an approximate local minimum without restarts.
Our theoretical guarantees (Theorem 4.5) generalize those of (El Halabi et al. 2023) to general discrete domains; recovering the same guarantees in the set function case.
We empirically compared our DCA variant (DCA-LS) to an extension of the more efficient DCA variant from El Halabi et al. (2023) (DCA-Restart) on all experiments included in the paper. We report their performance on integer least squares (ILS) in Figure 5 and running times in Figure 6. Similarly, Figure 7 and Figure 8 show their performance and running times on integer compressive sensing (ICS). We plot running times for all values at and . DCA-LS matches or outperforms DCA-Restart on all experiments. The two variants perform similarly when initialized with a good solution (LASSO in ICS, RAR in ILS), otherwise DCA-LS performs better, sometimes by a large margin. In terms of runtime, DCA-Restart is faster on ILS and for some values in ICS, e.g., , but slower for others, e.g., . Thus, the choice between the two variants is problem dependent.
We will revise the paper to include these results and adjust the claim on line 343-344 (1st col), which originally stated that our DCA variant is more efficient than those in (El Halabi et al. 2023). This is only true for one of them. This claim was based on an earlier, less extensive comparison on an ICS experiment, where DCA-LS was both faster and performed better than DCA-Restart.
The paper investigates the minimization of difference-of-submodular (DS) functions over both discrete (products of finite sets) and continuous (products of intervals) domains. The authors establish that every function on a discrete domain and every smooth function on a continuous domain admits a DS decomposition.
For DS minimization over discrete domains, they show that the problem can be reduced to the case of having an integer lattice as domain. In the continuous setting, they approximate the problem by discretization and then reducing it to the integer lattice case approximatively. The key result is that DS minimization on an integer lattice is equivalent to minimizing a continuous extension (generalizing the Lovász extension), enabling the use of a difference-of-convex (DC) algorithm. This algorithm monotonically decreases function values and converges to a local minimum.
The authors validate their approach with experiments on integer least squares and integer compressive sensing, demonstrating improved performance over state-of-the-art methods.
给作者的问题
- Can you think of any theoretic justification why the attempt of translating the DS problem directly to a DC problem works better than reducing first to the set function case?
论据与证据
The main claims of the paper are:
- Any function on a discrete domain and any smooth function on a continuous domain can be expressed as a DS function.
- One can (approximately, in the continuous case) reduce the problem to an integer lattice domain.
- Minimizing a DS decomposition over an integer lattice domain is equivalent to minimizing a continuous extension.
- The algorithm computes an approximate local minimum.
- Applying a DC algorithm to integer least squares and integer compressive sensing improves performance over existing methods.
The first three claims are theoretical and well-supported. The paper also discusses the computational complexity and the theoretical guarantees of the algorithm’s output, which appear to be correct. The experiments further support the empirical claims.
Below are some minor issues:
-
It is not entirely clear whether the theoretical guarantees extend to a DS function over a continuous domain after discretization. While the single results put together suggest this should be the case, the paper would benefit from a formal theorem stating: The algorithm finds a local optimum with guarantee G in time T for the class of DS functions C.
-
The paper asserts that finding the best DS decomposition is generally infeasible, but it does not define what constitutes the "best" decomposition, nor does it provide strong support for this claim. The authors could strengthen this point by referencing literature.
-
After Proposition 3.1, the paper states: "Obtaining tight lower bounds α and β in the above proof requires exponential time in general" and "Note that loose bounds on α and β would degrade performance." Both claims lack supporting argumentation. A brief explanation or reference would help clarify these statements.
-
On line 241 (right), the sentence "We show in Appendix A.2 that both applications do not have a natural discrete DC decomposition." seems overly strong, as the authors only prove that a specific decomposition is not DC. A more precise formulation would improve accuracy.
方法与评估标准
The chosen problems (integer least squares and integer compressive sensing) to evaluate the performance of the proposed algorithm and the selected metrics (recovery probability and estimation error) appear reasonable. The competing algorithms seem to represent the state of the art, but I am not deeply familiar with the field or these specific methods.
理论论述
The theoretical results appear sound and well-supported, and the proof sketches are reasonable. However, I did not verify all the proofs in detail.
实验设计与分析
I read the experimental setup in the main paper and it appeared sound to me. But I did not validate any technical details and I am not an expert neither of the problems nor the methods.
补充材料
I read through the appendix, but did not examine the technical details closely. I reviewed Appendices A and B more carefully.
与现有文献的关系
The paper is well-grounded in the existing literature and acknowledges prior results and ideas upon which it builds.
遗漏的重要参考文献
As discussed in the Claims and Evidence section, the paper asserts that finding the best DS decomposition is generally infeasible, but it does not connect this claim to existing literature. The authors could reference works such as Decomposition Polyhedra of Piecewise Linear Functions (https://arxiv.org/abs/2410.04907), which explores the complexity of DS and DC decompositions.
其他优缺点
I appreciate the idea of relating general DS functions to a DC function, and it is interesting that this approach seems to work better in practice than reducing the problem to the set function case and minimizing the Lovász extensions via a DC algorithm, as illustrated in the appendix. This together with the experiments showing better performance than exisiting methods for DS functions makes the paper a valuable contribution.
From a theoretical perspective, the results don’t seem particularly profound, as they appear to be a straightforward extension of existing knowledge.
其他意见或建议
There are statements in the main paper where the proofs are deferred to the appendix, but no links are provided to direct the reader to the relevant proofs. As a result, readers must search through the appendix to find the proofs, even for key results such as Proposition 4.4 and 4.5. Including hyperlinks would make the paper more reader-friendly.
Further suggestions:
-
It would be helpful to define what a subgradient is.
-
For the definition of round, it could be clarified that the goal is to obtain the vector, not the index. This is only clear from the context, but the definition itself is somewhat ambiguous.
-
Providing some intuition and perhaps including a small illustration of an epsilon-subdifferential would be beneficial.
-
A sentence or two following Proposition 2.5, explaining the implications of the statement on the theoretical guarantees of the algorithm, would make it easier to understand its significance.
Thank you for your positive review and helpful feedback. We address below your comments and questions.
1 - Extension of theoretical guarantees to continuous domains
Theorem 4.5 extends to continuous domains as follows:
Let be defined as in Section 4.1, i.e., where for some . Let , where are Algorithm 1's iterates for . Then:
a)
b) At convergence: for all . In particular, .
c) Number of iterations is at most .
However, the second guarantee in (b) is meaningful only if , as Lipschitz continuity of already implies . We will clarify this in the paper.
2- Define "best" DS decomposition and clarify its infeasiblility
By best DS decomposition, we meant the one with the tightest or in the proofs of Propositions 3.1 and 3.3. On lines 243-245 (1st col), we cite a reference showing that computing a tight is exponential hard. Computing the tightest also has exponential complexity, even for set functions. Indeed, we can test if is submodular, which has exponential complexity (Seshadhri & Vondrák, 2014), by computing and checking if . We will clarify this in the revision.
Like DC functions, DS functions have infinitely many DS decompositions. Finding the "best" one is an even more difficult question, as it's unclear how to define "best". Thank you for the suggested reference, we will cite it.
Seshadhri, C., & Vondrák, J. (2014). “Is submodularity testable?” Algorithmica, 69(1), 1-25.
3- Impact of loose bounds on and
Looser bounds on and lead to a DS decomposition where the continuous extensions of and have larger Lipschitz constants,slowing down optimization. As discussed in Section 4.2 ("Computational complexity"), the runtime of Algorithm 1, particularly for solving the submodular minimization at each iteration , depends on the Lipschitz constant* of , the continous extension of . Here, is a modular approximation of satisfying . We can lower bound by the Lipschitz constant of : . For example, for a non-decreasing function and , taking gives Thus, a larger yields a larger . We will clarify this in our revision.
*Typo on line 373:
4- Revise the sentence on line 241 (right) to be more accurate.
We will revise the sentence to: "We show in Appendix A.2 that the natural DS decomposition in both applications is not a discrete DC decomposition as defined in (Maehara & Murota, 2015) and cannot be easily adapted into one for general discrete domains, even when ignoring the integer-valued restriction."
5- Theoretical results seem straightforward extensions of existing work
See our response to Reviewer eVr5 (4th item), where we clarify the novel aspects of our theoretical results.
6- Why is the direct approach better than set function reduction?
As discussed in (Bach, 2019, section 4.4), reducing a submodular function to a submodular set function leads to slower optimization due to the larger Lipschitz constant of the continuous extension of . Specifically, while we can bound , the bound for is times larger, i.e., , where is defined in Eq. (26) in Appendix B.
This affects submodular minimization algorithms in both (Bach, 2019) and (Axelrod et al., 2020), as their complexity scales with the Lipschitz constant. For example, Bach's algorithms have complexity when applied directly to . Using the reduction to , this increases by factor.
Thank you for your other suggestions. We will incorporate them in our revision.
This paper studies how to minimize a difference of submodular functions on discrete and continuous domains that go beyond the standard set function case. The authors provide a DC-type algorithm, which is natural for this problem, but nevertheless novel. The algorithm is analyzed theoretically and empirically. Most of the reviewers agree that this is a solid contribution to ICML, with the more knowledgeable reviewers being the more positive ones. I therefore support this sentiment and recommend acceptance. I urge the authors to take reviewer feedback seriously and revise the paper accordingly.