The Space Complexity of Approximating Logistic Loss
摘要
评审与讨论
This paper gives a d \eps^{-2} bounds for coresets of points/labels that preserve logistical functions. Such a bound is similar to the bounds obtained in some recent upper bounds, and the paper experimentally verifies that a natural complexity measure is likely needed for constructing such coresets. The paper also gives an algorithm for estimating this complexity measure.
优点
Logistical functions are widely used in a multitude of learning tasks, and coresets have been studied for a variety of numerical/optimization problems.
The bounds shown are likely the right ones, and significantly demystifies the complexity measure.
缺点
The results doesn't cover the `for this solution' sparsification that's weaker than the 'for all' sparsfication, but is still sufficient in many applications. However, that version will likely require significantly different approaches.
问题
Are there wider classes of functions that the approach for showing coreset lower bounds here can generalize to?
局限性
yes
The results doesn't cover the `for this solution' sparsification that's weaker than the 'for all' sparsfication, but is still sufficient in many applications. However, that version will likely require significantly different approaches
This is a great point, and we agree with the reviewer: it is not obvious how to prove the proposed quantification. This will probably require new approaches. In the final version, we will mention this as an open problem for future work.
Are there wider classes of functions that the approach for showing coreset lower bounds here can generalize to?
Yes, our bounds also apply to approximations of ReLU loss, as our proof operates by first showing that an -relative error approximation to logistic loss provides an -relative error approximation to ReLU loss. It seems likely that our lower bound would apply to the class of “hinge-like loss functions” defined by Mai et al (Definition 7 in their paper).
Thank you for these clarifications and further details. My opinions on the paper are unchanged.
The paper gives mainly lower bounds against any data compression for logistic regression parameterized by a complexity measure . They prove lower bounds as well as . Both are derived by reduction from the indexing indexing problem and a direct sum argument to extend the bounds by the d factor.
There are also new algorithms for computing by a linear program, and additive error bounds for low rank approximation for logistic regression. I found the latter more interesting than the former but it is completely hidden in the appendix.
优点
- nearly tight lower bounds against data compression for logistic regression
- hold against arbitrary data reduction techniques, not only coresets
- well written paper
缺点
- state of the art discussion on upper bounds is slightly outdated: updating to the upper bounds https://arxiv.org/abs/2406.00328 will only strengten the contributions.
- the bound is only from a bound for some case where
问题
- any chance the two bounds can be combined to a bound?
- if not then do you think might be possible?
- any chance the bound can be generalized to hold for arbitrary (or some range of) , to give for instance when ?
- to apply the direct sum argument over the indexing problems on the block diagonal matrix, is it not needed to apply some sort of union bound over the failure probability of each of the d indexing instances? If yes, is this possible in your reduction while losing only a factor or will the resulting bound only hold against 1/d failure probability?
局限性
see above
state of the art discussion on upper bounds is slightly outdated: updating to the upper bounds https://arxiv.org/abs/2406.00328 will only strengten the contributions.
Thank you for pointing out this new result. This Arxiv submission was made after NeurIPS deadline, so we were not aware of this paper. We will indeed update our paper to compare to this tighter upper bound. Indeed, this strengthens the contributions of our work and we appreciate your comment and suggestion.
the bound is only from a bound for some case where
Our proof of space complexity using immediately extends to all values of by the fact we can always pad with zeroes.
any chance the two bounds can be combined to a bound?
We believe that our two lower bounds will be informative for achieving an bound. However, we did not find a way to readily bridge this gap. Both lower bounds work by reduction to the problem, but in quite different ways. Intuitively, the difficulty is that the non-linearity of ReLU is used in the two proofs in different ways. Unfortunately, constructing a multiplicative trade-off between the two constructions is not straightforward.
if not then do you think might be possible?
Yes, this lower bound can be derived from our two provided lower bounds by the fact that any approximation to the logistic loss on a matrix must also provide an equally accurate approximation to the logistic loss of any query vector on either matrix or . Hence, we obtain the lower bound you have mentioned by letting be constructed from Theorem 1 and be constructed from Theorem 3.
Any chance the bound can be generalized to hold for arbitrary (or some range of) , to give for instance when ?
By using our construction and padding with zeroes as described above, the lower bound can be immediately extended to all .
to apply the direct sum argument over the indexing problems on the block diagonal matrix, is it not needed to apply some sort of union bound over the failure probability of each of the indexing instances? If yes, is this possible in your reduction while losing only a factor or will the resulting bound only hold against failure probability?
Since we are going for a lower bound, the probabilistic nature of the guarantees actually works in our favor. A relative error approximation to the logistic loss on the diagonal block matrix simultaneously provides the relative error guarantee to each of its blocks . Therefore, obtaining a relative error approximation for with probability requires that each is approximated with probability greater than (, where is the success probability of estimating to be precise).
Thank you for the rebuttal, I like the paper and will keep my score, which is already on the positive side.
Two more comments on your rebuttal:
- I think Woodruff, Yasuda, 2023 had a more 'natural' way of providing LB for different values of rather than padding with zeros which you might want to consider
- your comment regarding is clearly straightforward from the two bounds. Actually, I was asking whether you believe in an upper bound (in case there are principled difficulties for achieving )
I think Woodruff, Yasuda, 2023 had a more 'natural' way of providing LB for different values of rather than padding with zeros which you might want to consider
Regarding Woodruff/Yasuda 2023: Is the reviewer referring to the extension of the case to for the bound using Woodruff/Yasuda 2023 rather than our padding technique? We agree that this is indeed worth exploring. However, it is not obvious to us (at least within this short time window) whether this is possible. We will keep thinking about this and we very much appreciate the reviewer's suggestion. Just as a final note, this would not alter the final bounds and our method is also valid and correct.
your comment regarding is clearly straightforward from the two bounds. Actually, I was asking whether you believe in an upper bound (in case there are principled difficulties for achieving )
From our perspective, it still seems feasible that an lower bound holds and the difficulty stems from technical complexity in combining the two reductions smoothly.
One intuitive insight related to our proofs is that the parameter describes how much better an -relative error approximation to ReLU loss (which can be approximated by logistic loss) is than an -relative error approximation to an -norm on worst case vectors in absolute terms. This is because can be approximately of . However, this relation cannot hold for all vectors simultaneously. This cannot be immediately combined with the L1-norm lower bound, but may be useful to note in pursuing either an upper bound or lower bound.
Thank you, I appreciate the additional comments.
The paper explores the space complexity lower bounds for data structures that approximate logistic loss with -relative error in logistic regression problems. The authors provide an space complexity lower bound for datasets with constant -complexity, demonstrating that existing coreset constructions are optimal within this regime up to lower order factors. They also establish a general space lower bound for constant , highlighting that the dependency on is intrinsic and not an artifact of mergeable coresets. Additionally, the paper refutes a prior conjecture regarding the difficulty of computing by presenting an efficient linear programming formulation. Empirical comparisons show that the proposed algorithm for computing is more accurate than prior approximate methods. These contributions enhance the understanding of the fundamental limits and efficiencies in approximating logistic loss, providing insights that can guide future improvements in coreset constructions and related data structures.
优点
The paper provides an space complexity lower bound when and a general assuming is constant. Also, the paper refute a prior conjecture by providing a linear programming to compute .
缺点
N/A. The reviewer is not familiar with space complexity area.
问题
A minor issue, the font of the paper seems different from other papers.
局限性
Yes.
A minor issue, the font of the paper seems different from other papers.
Thanks for pointing this out. We have fixed the issue.
Thanks, I will keep my scores.
This paper establishes lower bounds on the space complexity for data structures approximating logistic loss with relative error, showing that existing methods are optimal up to lower order factors. It proves that any coreset providing an -relative error approximation to logistic loss requires storing rows of the original data matrix. The paper also demonstrates that the dependency on the complexity measure is fundamental to the space complexity of approximating logistic loss. Further an efficient linear programming formulation is provided to compute , refuting a prior conjecture about its computational difficulty.
优点
-
The paper introduces new space complexity lower bounds for data structures approximating logistic loss, which is a novel contribution in logistic regression. It also refutes a prior conjecture about the complexity measure \mu_y(X) and provides an efficient linear programming formulation.
-
The paper is built upon existing work in the field. It provides rigorous theoretical proofs and empirical comparisons to prior methods, demonstrating the validity and effectiveness of the proposed approaches.
-
The paper is clearly written, with a structured presentation of the problem, contributions, and results. Definitions and notations are provided to aid understanding, and the inclusion of related work helps contextualize the contributions.
缺点
-
The paper should clearly articulate how its contributions differ from existing works, especially those by Munteanu et al. and Mai et al. Highlighting specific improvements or unique approaches.
-
The experiments could have been conducted on more diverse datasets, including real-world applications beyond synthetic and KDDCup datasets.
-
The paper introduces a new method to compute the complexity measure µy(X). Including more detailed explanations would be useful.
问题
局限性
Yes. Authors mention that more comprehensive experiments in computing for real datasets as an open direction.
The paper should clearly articulate how its contributions differ from existing works, especially those by Munteanu et al. and Mai et al. Highlighting specific improvements or unique approaches.
Our work is complementary to the works of Munteanu et al. and Mai et al. Both of those papers primarily consider providing coreset constructions for logistic loss with upper bounds on the number of sampled points for relative error guarantees. On the other hand, the primary contributions of our work are providing space complexity lower bounds for approximating logistic loss as well as an exact and tractable method (as opposed to being NP Hard to evaluate, which was previously conjectured) to compute the complexity parameter that is present in both the upper bound of prior work and our lower bounds.
The reviewer is correct that Munteanu et al. does have an space complexity lower bound when is unbounded, but that lower bound is superseded by the more recent results of Woodruff and Yasuda. The latter paper provides a lower bound for coreset construction. We compare our work to their result and explain how our result is more general in the contributions section, as well as in lines 263-266. In the final version, we will also add additional explanation after lines 263-266 on how the reduction used in the lower bound of Woodruff and Yasuda differs from ours.
The experiments could have been conducted on more diverse datasets, including real-world applications beyond synthetic and KDDCup datasets.
We agree that further experiments on understanding the behavior of in real datasets would be an interesting problem, and we note this direction in our future works section. However, our work is focused on theoretically understanding the space complexity of approximating logistic loss. A thorough experimental exploration of the behavior of in real datasets would be an implementation-focused question and is not the main focus of this paper.
The paper introduces a new method to compute the complexity measure µy(X). Including more detailed explanations would be useful.
We will add a section in the appendix of the final version with more explicit details on computing using the LP introduced in the proof of Theorem 4.
I thank the authors for their response. I have updated my scores by 1.
Dear reviewers,
Thank you for taking the time to review the paper. Notice that the discussion period has begun, and will last until August 13 (4 more days).
During this time your active participation and engagement with the authors is very important, and highly appreciated. Specifically, please read the responses, respond to them early on in the discussion, and discuss points of disagreement.
Thank you for your continued contributions to NeurIPS 2024.
Best, Your Area Chair.
This paper establishes lower bounds on the space complexity for data structures approximating logistic loss with relative error, showing that existing methods are optimal up to lower order factors.
The reviewers acknowledge that this paper introduces new space complexity lower bounds for data structures approximating logistic loss, which is a novel contribution in logistic regression. It also refutes a prior conjecture about the complexity measure and provides an efficient linear programming formulation. Additionally, the reviewers commend the paper for its clarity and quality of writing.
Overall, this paper makes novel contributions. Therefore, I recommend accepting it, with the suggestion that the authors address the new literature pointed out by Reviewer 9S9F in the final version.