Learning Multi-Objective Program Through Online Learning
摘要
评审与讨论
The authors study the problem of inverse multi-objective optimization. Their goal is to learn the parameters of functions and a constraint set given observations that are on the efficient frontier where the observations may be noisy. The authors derive an algorithm that leverages a dualization and converges at a rate of . They run experiments in a synthetic QP setting and portfolio optimization problem.
优点
-
The portfolio optimization problem seems like the strongest concrete motivating problem for this task. I can kind of believe that a financial company might be interested in applying these methods to estimate the factors driving investing decisions.
-
The algorithm looks effective for the portfolio setting, at least over 1000 rounds.
缺点
- The theory seems extremely similar to Dong, Chen, Zeng (2018):
- Difference: multiobjective optimization problem (this paper) instead of single objective (2018)
- Difference: additional context parameter in objective and constraints (2018) vs. not (here). (I think the authors are arguing in the conclusion that the decision becomes the context, but I didn’t follow that remark).
- Similarity: assumption 2.1 (verbatim), 3.1 (this paper’s 3.1 is slightly stronger), 3.2 and text following (verbatim), Lemma 3.1 and text following the lemma (verbatim), Theorem 3.2 (factor of 2 weaker in this paper) and text following the theorem is identical
- Alg 1 (this paper) seems like the adaptation of Alg 1 (2018) to this setting (solving a different KKT system)
- Alg 2 is a tweak to Alg 1 that is new.
As this paper seems more driven by theory than by experiments, I don't think there is enough new here to justify acceptance. All of the theory seems to go through with really minor tweaks. At the very least, the paper has not argued why the theory is new.
-
The portfolio experiments do not show how well the method scales with number of samples.
-
The experiments do not show the effect of different levels of noise.
-
The experiments do not compare to any baseline methods.
问题
-
What happens without noise in the experiments? It seems that the convergence has slowed to a halt, basically, at the end of the experiments. Is that because of noise (that last gap with never be overcome) or is it coming from approximation?
-
Why is convexity a mild condition? I agree it occurs often.
伦理问题详情
There are paragraphs that are verbatim taken from Dong, Chen, Zeng (2018). They appear to be presented as if they are new in this paper—they do not acknowledge the previous paper, even though that paper is cited in related work.
The Authors propose an online learning framework for learning the parameters of multi-objective optimization problems using noisy data generated by decision makers. That is, they solve inverse optimization problems iteratively as they receive new data. Since the authors deal with problems involving discontinuous loss functions, they cannot use the usual gradient based approaches. They propose two algorithms, show their convergence rate and provide experiments using synthetic and real data.
优点
The paper is well written and easy to follow. The authors propose algorithms to solve multi-objective inverse optimization problems which can handle noisy data. They also can work with discontinuous loss functions as their approach does not use gradient based methods. They provide experiments using synthetic and
缺点
I believe the contributions of the paper is not strong enough to be accepted in this conference. The novelty of the results is not clear to me. There are already non-gradient based approaches for solving single objective inverse optimization problems using online learning in the literature. The authors needs to highlight their differences with the existing methods and maybe provide some experimental comparisons.
问题
Does the novelty of the work comes from solving multiple-objective inverse problems? How the algorithms proposed differ from the ones in the literature for single objective inverse problems?
The paper provides an online solution to the problem of inverse multiobjective optimization. In the setting handled, observations corrupted by a bounded amount of noise are revealed sequentially in rounds 1, 2, ... Two algorithms are given (one with theoretical guarantees, and the other one without theoretical guarantees but faster in practice), and an illustration of their performance is given in a couple experiments (synthetic QP and portfolio optimization). The theoretical guarantee for the first algorithm provides an O(sqrt((T)) bound on the regret subject to several assumptions on the structure of the problem --- in particular, compared to prior work multiple objectives (and multiple constraints) are possible to handle.
优点
- The setting is online and now has several objectives compared to 1 in prior work; so the problem studied by the authors is quite an interesting one, and appears new in the literature.
- The provided two main experiments appear well-illustrated and executed (Caveat: unfortunately they only illustrate two objectives --- note that the paper's algorithm works for more than two, with the curse of dimensionality kicking in pretty fast (at rate K^{1/(p-1)}), so it appears important to see how quickly the algorithm may become practically infeasible as a function of the growing dimension of the objective); and, furthermore, given my objections on the theoretical side of the paper listed below, it should have had many more interesting experiments on other tasks to sway me with its empirical contribution).
缺点
- The proposed algorithmic setup, and analysis, are very similar to Dong et al (2018), except for the fact that while Dong et al have 1 objective and their loss function is seen to be convex/smooth --- meanwhile, the authors here give several extra assumptions that once again make the loss convex and sufficiently Lipschitz, and a similar argument then gives the new generalized statement. Thus, I judge that from a theoretical standpoint the present work is quite incremental. Moreover, the authors only mention Dong et al briefly and do not go into any technical comparison at all, so if I hadn't read Dong et al I would have had a more positive opinion of the paper's novelty; thus such a literature review appears inappropriate.
- A significant issue that I personally have with this paper is that it appears, in several spots in the main part (i.e. excluding the appendices), to blatantly "oversell" its technical contributions, which I do not consider appropriate and therefore am even more so not willing to support this paper at this point: --- Right before section 3.1, it says "Thus, the popular gradient descent algorithms Bottou, Kulis & Bartlett fail and our problem is significantly more difficult than most of them." Meanwhile, later on, the setup of Kulis and Bartlett is successfully used by the authors after they impose several simple to state assumptions on the problem. --- Right after the statement of Theorem 3.2, it is stated that "Our extension involves several critical and complicated analyses for the structure of the optimal solution set as well as the loss function, which is essential to our theoretical understanding." In reality, the proof contained in the appendix basically plugs the assumptions made by the authors into Kulis & Bartlett's result, aided by a few lines of simple calculation with nothing more than Cauchy Schwarz --- I think it should be clear from this description that not only is the novelty of the proposed analysis quite slim, but also nothing "critical and complicated" is learned about the "structure" of the problem from there.
- There is quite a bit of chaos surrounding the distinction between the idealized loss l and the sampled loss l_K. For instance, the regret bounds are derived as if the algorithms used the ideal loss l, whereas in fact they use l_K as presented and used. No nontrivial guarantees are given with respect to e.g. randomly sampling the weights, other than the result in the appendix --- which notes the intuitive worst-case (i.e. curse of dimensionality type) gap of 1/K^{1/(p-1)} between the ideal and approximated losses, where p is the number of objectives and K is the number of samples, but that result only seems to consider (but doesn't explicitly spell out) a uniform grid of weights.
- The accelerated algorithm would be interesting to analyze, which would significantly improve the technical novelty of the paper; but this is not provided.
问题
- Why is Assumption 3.3 stated as it is? Its statement is not particularly well justified at the moment, and the only purpose it seems to have in the proofs is to imply that the idealized loss l is convex. However, the reader wouldn't know this unless the appendix is thoroughly checked. Once again, barring a misunderstanding on my part, this might appear to be an instance of deliberate obfuscation for the purposes of the review process.
- Assumption 3.3 should be endowed with more examples, simply because while the authors promise "Examples are given in Appendix", but the corresponding section of the appendix contains just one simple --- and not worked out --- example, which says "Although tedious, one can check that one can check [sic] that Assumption 3.3 a is indeed satisfied".
- More depth in the experimental section would be required to change my overall evaluation of the paper; e.g. along the lines I've indicated in the Strengths section.
- What is the purpose of including, and elaborating on, the KKT-based program for solving for the implicit update 3, if it is both more interpretable and parallelizable to solve K parallel problems as in 4?
- Please disambiguate where l versus l_K are used (as noted above), and ideally I would like to see a more useful analysis of l vs. l_K than currently provided.