PaperHub
6.4
/10
Poster5 位审稿人
最低5最高8标准差1.2
7
5
5
7
8
3.0
置信度
正确性3.0
贡献度3.2
表达3.0
NeurIPS 2024

Automatic Outlier Rectification via Optimal Transport

OpenReviewPDF
提交: 2024-05-16更新: 2024-11-06
TL;DR

This paper introduces a novel framework for outlier detection using optimal transport with a concave cost function, integrating outlier rectification and estimation into a single optimization process.

摘要

关键词
Outlier Rectification; Optimal Transport; Statistically Robust

评审与讨论

审稿意见
7

A framework for regression tasks with outliers (see robust statistics) is introduced, based on an optimal transport cost function. A novel concave cost function directly facilitates the rectification of outliers. Mathematical proofs are given and evaluation is performed on two simpler and one real world task with state-of-the-art results.

优点

  • The theory and field of robust statistics is well explained and brought into context nicely with adjacent topics, with especially Fig. 1 and its explanation providing a good lens to view these problems

  • The paper is organized exceptionally well, despite its size (incl. the huge appendix)

  • All proofs in Appendix C are generally easy to understand, even without strong mathematical background. I was even able to thoroughly verify proofs in C.1 myself

  • All experiments are explained in great depth and detail in the appendix material, making both a surface and deeper understanding of all topics possible.

缺点

  • Some of the tasks involve artificially constructing outliers (such as the cluster of outliers in mean estimation & raising the price p to 10 * p in the deep learning volatility experiment), which may not correspond to real-world data corruption

  • as the code/data seems to be proprietary, no publication is intended, which may hinder reproducibility

Typos and Formatting Issues:

  • the authors falsely reference eq. (3), when really they want to link eq. (4) in lines, 145, 147, 149, 180, 183 (non exhaustive list)
  • missing citation in line 160
  • formula slightly clips into the text at line 156
  • typo in line 746: "esstimator"
  • typo in line 874: "exporation,"
  • wrong citation: Nietert et al "Outlier-robust wasserstein dro" appeared in NeurIPS 2023, not 2024

问题

Q1: why are there no other "robust" variants of regular estimatiors used as baselines for the experiments? (i.e. mean estimation task,...)

Q2: Mean&LAD (App.E): The tasks look relatively simple with two very clear clusters of inliers and outliers. Why are there no experiments where there are many outliers sparsely distributed across the data with high variance? These are the tasks I usually see in robust model fitting (i.e. for homography estimation in Computer Vision). Would the estimator struggle to transport many points different amounts of smaller distances, to stay within the "long haul" metaphore?

Q2.1: The same goes for the Deep Learning comparison. Without knowing the variance of the price, going from p to 10p seems like a huge step. Was an ablation on the factor of the price (here 10) done?

局限性

  • no potential negative societal impact exist
  • currently limited to regression tasks, although the authors point towards possible extensions to classification in Appendix B
  • evaluation is currently limited to the authors toy tasks and a few tasks from finance. the broad applicability of their method could have also contributed to more extensive experiments
审稿意见
5

The paper presents a new framework for outlier detection using optimal transport with a concave cost function. Unlike traditional methods that separate outlier detection and estimation, this approach integrates both into a single optimization process, improving efficiency. The concave cost function helps accurately identify outliers. The method outperforms traditional approaches in simulations and empirical analyses for tasks like mean estimation, least absolute regression, and fitting option implied volatility surfaces.

优点

  • The paper is well-written
  • The proposed method is novel and theoretically sound
  • Extensive experiments are conducted to support the claims

缺点

  • Could the authors discuss the relationship between the proposed method and M-estimator and RANSAC?

问题

Please see Weaknesses

局限性

N/A

审稿意见
5

The paper presents a novel method for outlier rectification in robust statistics. In particular, taking inspiration from distributionally robust optimization (DRO) methods, this paper proposes to extend the formulation to the field of robust statistics. To this end, a rectification set is constructed using optimal transport distances with concave cost functions. The authors have proven theoretically that the reformulation is eventually equivalent to an adaptive quantile (regression) estimator with the quantile controlled by the budget parameter for mean estimation and least absolute regression.

Experiments are conducted to demonstrate the effectiveness of the proposed method on various tasks, including mean estimation, least absolute regression, and option implied volatility surface estimation.

优点

Theoretically, the paper explores an interesting extension of DRO in the context of robust statistics. The introduction of optimal transport distance with concave loss functions is well-grounded. The resulting algorithm admits an intuitive explanation.

缺点

  1. It seems to me that theoretically, the presentation is not particularly clear regarding how certain designs, such as optimal transport and concave cost functions, affect the final algorithm. At the core of the connection is Proposition 1 in the paper, but the proof is not provided.

  2. Experiment-wise, I find that the baseline methods are somewhat weak. The baseline methods in Sec 5.2.1 are very old, and in Sec 5.2.2 the baseline method does not consider robustness in estimation. It would be good if the adaptive quantile estimator is included for comparison. It is also advised to not put the main experimental results of Sec 5.1 in Appendix.

Other comments: Eq.3 is wrongly referenced (I believe it should be Eq.4) in multiple sentences, such as line 145. reference is missing in line 160

问题

Please respond to the two points in Weaknesses.

局限性

I cannot find a discussion of limitations in the main paper or appendix.

审稿意见
7

This paper is, essentially, about robust model fitting. In this paper, it is proposed to optimize an optimal transport problem. As loss function, the authors choose concave functions ||z-z‘||^r with r \in (0,1). The idea behind choosing this shape of functions is that it allows to move outliers farther due to decreasing increases of transportation costs. The authors provide a proof that outliers can be identified easily by solving a linear program. Alternatively, they propose to use the quick select algorithm. Experimental results of the proposed algorithm are presented on option volatility prediction, a problem in finance.

优点

  • I like that the paper presents a mathematically concise framework how to identify outliers in data. This is different from many practical algorithms that propose ad-hoc heuristics.
  • I like that the paper is written in a relatively accessible way.
  • The problem addressed here is important in many real-world problem.

缺点

  • As far as I can can tell, the idea is novel yet somewhat similar to what has been proposed in this paper: Chin et al., Accelerated Hypothesis Generation for Multistructure Data via Preference Analysis, TPAMI 2012 It would be nice if the authors could cite and discuss this paper.

  • The authors state in their abstract that the commonly used 2-step approach (first outlier removal followed by model fitting) does not provide information about the outliers to the model fitting stage. While that is true, I wonder which information and how their algorithm provides to the model fitting as there is no flow of information (for instance, by backpropagation) between outlier removal or knot point identification and the gradient step.

问题

  • What is delta in Fig 2? Is this the budget? If so, please mention this in the caption of the figure.

  • I would have liked to see more figures on the selected knot points and lambda, for instance some examples for which the proposed algorithms (LP vs QSelect) work well or fail.

局限性

I do think so. I believe that this algorithm constitutes an important contribution to the field and can inspire new papers by other researchers.

审稿意见
8

A new approach for robust estimation is proposed based inspired by optimal transport. The general approach is introduced and then new estimators are derived for three particular cases : mean estimation, linear fitting and surface regression. The estimators are compared numerically with standard estimators on synthetic data for the first two cases and on financial datasets for surface regression. Many details, proofs and explanation are provided in the appendix which is 25 pages long.

优点

The proposed approach seems very original and interesting. A new robust estimator may have a major impact on all the learning research field.

缺点

The paper is well presented and explained but the author material is so large that an article is not enough. There is probably material for a book. This raise the question to publish a paper with so large appendix ? The link with neural networks needs to be reinforced.

A few details :

page 4, line 160 : reference missing. page 6, line 227 : distirbution => distribution

问题

Is this works may help to interpret and better understand the performances obtained by back-propagation during learning ?

Details :

page 5, line 200 : where is delta in problem (3) ? page 6, line 209 : is it equation (4) or (3) ?

局限性

The proposed approach being very generic, the limitations are quite reduced. There are probably no potential negative societal impact.

作者回复

We sincerely thank all reviewers for their positive feedback and appreciate the effort to review our work. We respond below to individual points.

fFC9

  • Material, neural nets. While there is significant material in our work, we note that other papers at NeurIPS have large appendices as well. As our paper introduces a new estimation framework and theoretical results, we didn't have space to explore links to neural nets in depth. We are investigating it in future work.

i67o

  • Flow of information. We respectfully disagree that “there is no flow of information… between outlier removal… and the gradient step.” Our min-min estimator is an alternating minimization procedure, where each step informs the next. The gradient step depends on outlier rectification, and information is carried over via parameter updates across iterations. This approach ensures more information flow between the substeps than a two-step method.

  • Chin et al. We appreciate this addition. While both papers address contaminated data, there are fundamental differences in terms of goals (hypothesis generation vs predictive modeling), the conceptual problem, theory, and application domains. We view this work as complementary and will certainly discuss and cite it in our camera-ready version’s related work.

  • Figures. We’ve included many robustness figures in the appendix showing performance for many delta and r values. Since knot points and lambda depend on these params, we refer to Figures 6, 8, 11, and 12 for a comprehensive view. If the reviewer has a specific case, we would be happy to add it.

CDCT

  • Designs, Proposition 1. We appreciate the feedback on how we present how e.g. the transport cost function affects the algorithm. Specific suggestions would help us address this point. Pages 4-5 demonstrate how concave vs. convex cost functions affect the algorithm. Appendix F provides further explanation. Sensitivity analyses throughout illustrate additional details. We hope these sections suffice but welcome more feedback. For Prop 1, we have given sources for the proof in the text following. The reviewer can find the proof in the cited work Blanchet and Murthy 2019, Theorem 1 and Remark 1.

  • Baselines. We value the feedback and opportunity to clarify. Our estimation approach can be applied to any regression algorithm, and we demonstrate this through diverse methods in sections 5.2.1 & 5.2.2, including a SOTA deep learning estimator. We disagree that the deep learning estimator lacks robustness, as it incorporates different regularizations that impart robustness. Our method aims to robustify regression estimators, and we believe we show this. We’ll move results for 5.1 to the main text, as suggested.

  • Limitations. We respectfully disagree that no limitations are addressed. App. D.2 and D.3 cover limitations of our procedure. In our camera-ready version, we’ll add a summary of these to the main text and reference these appendices.

iniy

  • Relationships. A full comparison is beyond scope, but we give an overview here and will include a version of this in our camera-ready version. M-estimators are a broad class of diverse extremum estimators containing robust and non-robust methods. Robust ones traditionally use specific loss functions to reduce outlier sensitivity. We believe our approach is more fundamental as it builds on top of a given general loss. The statistician has full control of the outlier modeling task (via specification of optimal transport theory using a concave transport cost) and the statistical task (via the chosen loss); also, our method gives the optimal rectifier (i.e. transporter). Our approach differs conceptually and demonstrates superior performance over the M-estimators in our experiments. We are open to additional M-estimators if suggested. RANSAC handles outliers by iteratively selecting random data subsets, fitting models, and evaluating models on inliers to reach a best model. In contrast, our method optimally selects the set of points for rectification using a single optimization procedure. RANSAC only identifies inliers, while our method does this and rectifies outliers. We believe our approach is more direct, potentially more computationally efficient. It's based on optimal transport theory rather than heuristic random sampling.

qw3a

  • Concerns. We chose 10 to represent "fat finger" mistakes, such as decimal misplacements in finance, which have historically caused significant losses. We recognize the reviewer’s concern wrt proprietary code. Our method is used by industry partners with IP restrictions common in finance, but we've disclosed the algorithm, hyperparameters, and other details needed for replication. The deep learning data and baseline code are on GitHub.

  • Questions. Q1: We use robust methods for mean estimation, e.g. the median and trimmed mean estimators. Notably, the trimmed mean uses the true corruption rate, a strong advantage. Despite this, our method, which doesn’t know any true parameters of the DGP, still outperforms this estimator. Q2: Our experiments for mean estimation and LAD regression are intentionally straightforward to make our novel framework easily understandable. Despite this, robust estimators like the trimmed mean and Huber perform poorly, indicating the difficulty of these problems and the improvement our estimator offers. Wrt deep learning, the choice of a multiplier of 10 is justified by the high variability in options prices, covering a wide range due to inherent price differences in options chains.

All

  • Details/typos. References and misspellings are fixed. References to Problem 3 have been corrected to Problem 4. The delta in Problem 4 is in the ball R(P’_n) in Eq. 5 (page 4 line 152). “Budget” has been added to Figs 2 and 3.

We hope the reviewers are satisfied with our responses. If our responses have adequately addressed concerns, then we politely request an increased score to reflect this change, if appropriate. Thank you!

评论

To Q2:

  • While I understand that keeping the experiments illustrates some of the core ideas, I am still missing a bit more variance in the experiments (like I mentioned, the outliers were always very clustered). The authors explain it with their setting and real-world cases in finance, it may hinder applicability to other fields.

  • Still, this paper marks a very good/solid advancement in this specific field of robust fitting. With the other mentioned fixes of the paper structure and my focus of this paper being more strictly on the finance application, I recommend to accept this paper and will adjust my score accordingly.

评论

I have read the rebuttal and other reviewers' comments, and agree that the paper has demonstrated its merit. Regarding the response to my comments, what regularizations does the deep learning estimator have to improve robustness?

评论

We greatly appreciate the positive feedback and additional opportunity to clarify. The rebuttal length limit was set at 6000 characters, which restricted the amount of detail that we provided in the response. We would have ideally written the following text in our rebuttal: a specific regularization, the “hard constraints” approach, imparts a measure of robustness for this application. These qualifications unfortunately had to be dropped due to the length limit.

To clarify further, what is meant by this text is the "hard constraints" approach to regularization explicitly enforces a shape constraint on the network by ensuring that the derivatives associated with the Dupire formula are non-negative. We believe that because this is a fundamental constraint motivated by the financial task at hand, it qualifies as robust in the sense that the network architecture ensures that these conditions hold regardless of the input. Additional details can be found in Sections 2 and 3 of the cited paper.

评论

We greatly appreciate the adjusted score and the additional positive feedback. We also appreciate the further questions on your question Q2. We understand the concern regarding the clustering of outliers, and we agree that an additional example with non-clustered outliers will improve the paper. We will add this example to the camera-ready version.

评论

Hello, it has been about two days since we've posted our rebuttal, and we wanted to gently remind the reviewers to view it. We politely request that the reviewers either provide further feedback, or, if they agree that our response has addressed their concerns, to please increase their score. Thank you!

最终决定

I find that the joint approach to solving fitting as well as outlier estimation proposed has potential to influence the robust statistics field. The paper's experimental validation could be further strengthened by studying broader class of outlier distributions. I am satisfied with the authors rebuttal and consider this to be a valuable paper to be accepted.