PaperHub
5.8
/10
Poster4 位审稿人
最低5最高6标准差0.4
6
5
6
6
2.8
置信度
正确性3.0
贡献度3.0
表达2.5
NeurIPS 2024

Generalized Tensor Decomposition for Understanding Multi-Output Regression under Combinatorial Shifts

OpenReviewPDF
提交: 2024-05-14更新: 2025-01-10
TL;DR

This paper tackles combinatorial distribution shift in multi-output regression using generalized tensor decomposition, Ft-SVD theorem, and a two-stage algorithm that exploits low-rank structure for better prediction under shifts.

摘要

关键词
multi-output regressiontensor singular value decompositiontensor completion

评审与讨论

审稿意见
6

This paper investigates the problem of multi-output regression under combinatorial covariate shift, namely when the input domains in the testing set differ significantly from the training set. The authors view the functional mapping from the inputs to the vector-valued output, which is evaluated discretely in the input domain, as a tensor with missing tubes, and convert the problem of making test set predictions as a tensor completion problem. Utilizing the framework of low tubal-rank tensor decomposition, the paper generalizes this framework from the discrete case to the continuous case and proposes a functional t-SVD method that decomposes the functional mapping into a series of embedding functions with associated singular values. Then an empirical risk minimization (ERM) algorithm is proposed and evaluated with a toy example numerical experiment. The major contribution of the paper is a solid theoretical analysis of the approximability of the functional t-SVD as well as the excess risk of the ERM algorithm and the writing of the paper is clear.

优点

  • The paper bridges the low-rank tensor decomposition in the discrete domain to a functional approximation problem in the continuous domain, which is a rather novel approach when dealing with missing data issues for functional data.

  • The paper provides a series of solid theoretical analyses regarding the approximability of the vector-valued functions as well as the excess risk bound of the empirical risk minimizer.

缺点

  1. The empirical analyses of the paper lack comparisons with some other benchmark approaches, making it hard to evaluate the effectiveness of the proposed methodology. It would be interesting and informative to consider other tensor decomposition approaches (CP, Tucker, Tensor-Train, Tubal) as benchmarks.

问题

  1. The tubal rank framework introduced in this paper, to the best of my knowledge, works for 3-order tensors. Does that indicate that the functional t-SVD framework considered mainly works for inputs with dimension 2? Is it possible to generalize the current approach to higher-order tensors (and thus higher-dimensional inputs)?

  2. The functional t-SVD framework in Theorem 1 resembles the Mercer decomposition of kernels. What is the main advantage of the current approach over some existing alternatives such as the multi-output Gaussian Process regression?

  3. Maybe I missed it in the manuscript, but is there a systematic approach for choosing the rank rr when implementing the functional t-SVD? Is there any sense of optimal rr that might be derived by minimizing the excess risk bound w.r.t. rr?

局限性

  1. Additional numerical experiments that demonstrate the effectiveness of the proposed functional t-SVD over existing benchmark methods can be informative and helpful.
作者回复

Weakness & Limitations: Lack comparisons with some other benchmark approaches

Re: We appreciate the suggestion. However, we must emphasize that our work is the first, to our knowledge, to address MOR under CDS. Existing tensor decomposition methods are not designed for this specific problem.

Following your suggestion, we have provided preliminary empirical support for our theory in tensor completion under missing not at random (MNAR) settings in comparison with some other benchmark approaches. Experimental results are shown in the PDF file.

In our revision, we will:

  • Clarify why direct comparisons may not be appropriate given the novelty of our problem setting.
  • Provide discussions on how our method uniquely addresses MOR under CDS compared to existing approaches.
  • Where possible, adapt existing methods to our setting and provide limited comparisons.

Q1: Generalization to higher-order tensors

Re: Excellent question! First, our Ft-SVD framework, while designed for 3-order tensors, is not strictly limited to two-dimensional input cases. It is versatile enough to handle multi-dimensional inputs that can be divided into two distinct sets.

Second, extending Ft-SVD to higher-order cases is indeed non-trivial and presents significant challenges. In response to your insightful query, we've developed a preliminary attempt at a higher-order extension of Ft-SVD.

Proposition (Functional higher-order t-SVD, informal).

Let F:X1×X2××XNRKF: X_1 \times X_2 \times \cdots \times X_N \to \mathbb{R}^K be a square-integrable vector-valued function, where XiRDiX_i \subset \mathbb{R}^{D_i}. Then there exist sets of functions Uini=1L2(Xn;RK),n=1,,N\\{U^n_i\\}_{i=1}^\infty \subset L^2(X_n; \mathbb{R}^K), n = 1,\ldots,N, and a core function S:X1×X2××XNRKS: X_1 \times X_2 \times \cdots \times X_N \to \mathbb{R}^K, satisfying:

F(x1,,xN)=i1=1iN=1S(i1,,iN)MUi11(x1)MMUiNN(xN)F(x_1,\ldots,x_N) = \sum_{i_1=1}^\infty \cdots \sum_{i_N=1}^\infty S(i_1,\ldots,i_N) \ast_M U^1_{i_1}(x_1) \ast_M \cdots \ast_M U^N_{i_N}(x_N)

where the convergence is in the L2L^2 sense, and M\ast_M denotes the t-product.

  • For each n=1,,Nn = 1,\ldots,N, the set of functions Uin\\{U^n_i\\} satisfies the orthogonality condition: XnUin(x)M(Ujn(x))dx=δijM1(1t)\int_{X_n} U^n_i(x) \ast_M (U^n_j(x))^{\top} dx = \delta_{ij} M^{-1}(\mathbf{1}_t) where 1tR1×1×K\mathbf{1}_t \in \mathbb{R}^{1 \times 1 \times K} is the t-scalar with all entries equal to 1, and MM is the linear transform defining the t-product.

  • The core function SS has the following properties:

    (i) all-orthogonality: for all 1nN1 \leq n \leq N, and 1αβ<1 \leq \alpha \neq \beta < \infty, we have X1Xn1Xn+1XNS(x1,,xn1,α,xn+1,,xN)MS(x1,,xn1,β,xn+1,,xN)dx1dxn1dxn+1dxN=0t.\int_{X_1} \cdots \int_{X_{n-1}} \int_{X_{n+1}} \cdots \int_{X_N} S(x_1,\ldots,x_{n-1},\alpha,x_{n+1},\ldots,x_N)^{\top} \ast_M S(x_1,\ldots,x_{n-1},\beta,x_{n+1},\ldots,x_N) dx_1\cdots dx_{n-1}dx_{n+1}\cdots dx_N = \mathbf{0}_t.

    (ii) ordering: for all possible values of nn, Sxn=1Sxn=2\\|S_{x_n=1}\\| \geq \\|S_{x_n=2}\\| \geq \cdots where Sxn=α\\|S_{x_n=\alpha}\\| is the L2L^2 norm of SS with the nn-th mode fixed at α\alpha.

The main idea of this preliminary extension is to generalize the two-way decomposition to a multi-way decomposition in the functional setting. This involves decomposing a function of multiple variables into a core function and multiple sets of basis functions, one for each variable, with their interactions governed by a generalized t-product. We invite further discussion on this topic, as developing higher-order extensions of Ft-SVD presents many challenges and opportunities for exploration.

Q2: Comparison with multi-output Gaussian Process regression

Re: The main advantage of our approach over alternatives like multi-output Gaussian Process regression is its specific design for handling CDS in MOR, which is not directly addressed by existing methods. Our framework provides:

  • A natural way to handle combinatorial shifts
  • Interpretable low-rank representations of the function
  • Potential computational advantages in high-dimensional spaces under CDS conditions

We will expand on these points in our revision.

Q3: Rank selection for Ft-SVD

Re: This paper proposes the novel concept and properties of Ft-SVD for the first time, focusing on its theoretical foundations. As such, we haven't yet developed a systematic approach for rank selection. This challenge is particularly complex because rank selection remains a difficult and actively researched problem even for existing (discrete) tensor decomposition frameworks, being considered one of the hot topics in the field. Moreover, the continuous nature of Ft-SVD adds an additional layer of complexity to this already challenging issue, making it an even more intricate problem in our context.

In our current work, determining an optimal rank presents significant challenges. The high complexity of MOR under CDS and our limited knowledge of the t-singular values of ground truth embeddings prevent us from providing a closed-form solution of the rank based on minimizing the excess risk bound. Nevertheless, your suggestion about deriving an optimal rank is excellent and opens up a promising avenue for future research.

For general Ft-SVD, we can draw inspiration from rank selection methods for classical matrix or tensor decompositions. We propose the following approaches as potential starting points:

  1. Analyzing t-singular value decay and selecting r based on a threshold.
  2. Employing cross-validation to choose r that minimizes prediction error on held-out data.
  3. Adapting information criteria such as AIC or BIC to the Ft-SVD setting.

While these ideas show promise, they require rigorous development and validation within the Ft-SVD framework. We greatly appreciate your valuable input, as it aligns with our goal of developing a more comprehensive and theoretically grounded approach to Ft-SVD implementation in future work.

评论

Thanks for the responses and I appreciate the additional experiments. Please add some discussions (brief) about the generalizations to higher-order tensors to the main text. I will keep my score at 6 given that the current scope of the method covers a special case with 2-dimensional input.

评论

We truly appreciate your thoughtful review, particularly your recognition of our paper's solid theoretical analysis and novel approach to bridging low-rank tensor decomposition with functional approximation.

As suggested, we will concisely discuss higher-order tensor generalizations in the main text, demonstrating our method's theoretical extensibility and potential for multi-dimensional applications.

Our approach to MOR under CDS, which you noted as a rather novel approach, aims to contribute to an important area of research in the field. This extension further explores the theoretical foundations of our work.

We believe these enhancements will clarify our work's theoretical depth and its potential implications for future research. Your insights have been crucial in refining our presentation.

We appreciate your thorough review and look forward to contributing to the ongoing discussions in this important area.

审稿意见
5

This paper proposes function t-SVD for multi-output regression under combinatorial shifts. Excess-risk bounds have been derived. Using simulation experiments risk bounds with combinatorial shifts has been compared with regular risk bounds.

优点

  1. Proposal of functional t-SVD, which is a descent contribution.
  2. Solid theoretical analysis

缺点

  1. Lack of any real-data experiments. It is hard to understand the usefulness of the proposed method in real-world application though the theoretical contribution may be high.
  2. Lack of comparison to other related function tensor decomposition methods either by theory or experiments.

问题

Lack of any real-world application is limitation of the paper. Since the paper is proposing a functional tensor decomposition methods, it would be useful give at least one application to understand its learning capability. Can authors provide some experiments on real-data set?

Some references such as function tensor-train is missing [1]. Can authors provide more detailed related methods?

How does the proposed method compare with other functional tensor decomposition methods? How does the theoretical results (ERM bounds) compared with any existing results?

1 Alex Gorodetsky, Sertac Karaman, Youssef Marzouk, A continuous analogue of the tensor-train decomposition, Computer Methods in Applied Mechanics and Engineering, Volume 347, 2019, Pages 59-84,

局限性

Limitations listed but insufficient in my opinion due to any experiments with real-data.

作者回复

Thank you for your thorough and constructive feedback. We address your concerns as follows:

W1 & Q1: Lack of real-data experiments

Re: We acknowledge the importance of real-world data experiments. However, the primary contribution of this paper is theoretical, as we provide the first formal definition and framework for the Combinatorial Distribution Shift (CDS) problem in multi-output regression (MOR). Our research introduces the Ft-SVD theorem and the ERM-DS algorithm, offering new theoretical insights, particularly addressing the unique challenges of MOR under CDS.

Nevertheless, we have provided preliminary empirical support through synthetic and real-world data in Missing Not at Random (MNAR) settings. These results offer initial insights into the application of our Ft-SVD framework for handling MOR tasks. The experimental results are included in the PDF file.

For future work, we plan to:

  • Collaborate with domain experts to collect large-scale datasets reflecting CDS characteristics, especially in complex MOR scenarios.
  • Design comprehensive experiments to evaluate our method's performance in real CDS scenarios, focusing on MOR.
  • Explore the applicability of our theoretical framework across various practical domains, enhancing the robustness and performance of MOR models.

Q2: Missing references and detailed related methods discussion

Re: Thank you for pointing out the missing reference to the Function Tensor-Train (FTT) method [1]. This method presents a continuous analogue of the Tensor-Train decomposition, effectively representing multivariate functions using matrix-valued univariate functions. This approach efficiently captures local features or discontinuities without increasing computational cost.

We will also discuss other related methods, as recommended by Reviewer yp2H, including [Bigoni et al., 2016] (Tensor Train for functional data), [Luo et al., 2023] (Tucker decomposition with neural networks), and [Fang et al., 2024] (Bayesian CP/Tucker decomposition). These will be included in the related work section to provide a comprehensive overview of existing methodologies in functional tensor decomposition.

W3 & Q3: Lack of comparison with other functional tensor decomposition methods, either by theory or experiments

Re: We appreciate the suggestion to compare our work with existing functional tensor decomposition methods, including [1]. However, it is important to note that our study is the first to address the problem of MOR under CDS. Consequently, there are no directly comparable functional tensor decomposition methods available for experimental comparison in this specific context.

  1. Function Tensor-Train (FTT) Method [1]: The FTT method efficiently represents multivariate functions, especially those with local features or discontinuities, using matrix-valued univariate functions. This allows for precise approximations without the need for fine discretization.
  2. Comparison with Our Proposed Method: Our Ft-SVD method differs significantly from FTT and other existing methods. While FTT focuses on local feature approximation, Ft-SVD specifically addresses MOR under CDS. The Ft-SVD framework extends the t-SVD approach to continuous feature domains, providing novel solutions for the complexities inherent in MOR with CDS.
  3. Differences in Theoretical Results (ERM Bounds): We provide explicit error bounds within the ERM framework under CDS, distinguishing our work from existing literature, including [1]. These bounds demonstrate the robustness of the Ft-SVD method in MOR tasks, especially in scenarios involving distribution shifts between training and testing data. Unlike most existing methods that assume consistent distributions between training and testing, our approach specifically addresses the additional challenges posed by distribution inconsistencies in MOR tasks. We analyze the impact of distribution shifts on the generalization ability of MOR models, using a rigorous mathematical framework. We show how model performance is affected by these shifts and propose strategies for model adjustment to minimize errors. Such detailed analysis is rarely covered in existing literature.

Innovation and Contribution: Our study proposes a novel Ft-SVD framework that extends the classical t-SVD, particularly for addressing the challenges of CDS in MOR. This framework provides new theoretical insights and rigorous performance guarantees, establishing a foundational understanding of how to manage distribution shifts in complex MOR scenarios. The theoretical contributions include explicit error bounds and a detailed analysis of the impact of distribution shifts on model generalization, setting our work apart from existing methodologies.

We will include these discussions in the related work section of our manuscript to better highlight the distinctions and advancements offered by our method. We believe these additions will help clarify our research contributions.

评论

Thank you for the update and I will keep my original score.

评论

We appreciate your detailed review and for considering our rebuttal. Thank you for recognizing the contributions of our work. We respect your decision to maintain the original score.

We have carefully addressed the concerns raised in your initial review, including adding more experimental results and clarifying our theoretical contributions. Your insights have been invaluable in improving our paper.

If you have any further questions or comments, we would be glad to address them. Thank you again for your time and expertise.

审稿意见
6

The paper introduces a novel approach to multi-output regression (MOR) under combinatorial distribution shifts (CDS) using a generalized tensor decomposition framework. The proposed Functional t-Singular Value Decomposition (Ft-SVD) extends classical tensor SVD to infinite and continuous feature domains, providing a new perspective on handling MOR tasks under CDS. The authors present a Double-Stage Empirical Risk Minimization (ERM-DS) algorithm designed to improve prediction accuracy in the presence of CDS. Through rigorous theoretical analysis, the paper establishes performance guarantees for the proposed algorithm and demonstrates its efficacy with synthetic data experiments.

优点

  1. Clear and Reasonable Motivation: The paper presents a clear and reasonable motivation for using tensor completion to address multi-output regression under combinatorial distribution shifts. By framing the problem as a tensor completion task, the method leverages the strengths of tensor decomposition techniques to manage the challenge of unseen feature combinations in MOR, effectively improving generalization capabilities.

  2. Solid Theoretical Analysis: The detailed proofs and theoretical guarantees offered in the paper are impressive and contribute significantly to the understanding and validation of the proposed methods. This rigorous approach ensures that the findings are well-supported and credible

缺点

  1. The paper's definition, introduction, and highlight of Combinatorial Distribution Shift (CDS) are insufficient. It is not clear why CDS poses a significant challenge in MOR and why it is important.

  2. The relevant definitions, assumptions, and proofs are presented with excessive detail, which makes it difficult for readers to grasp the core issues and contributions. A more concise and focused explanation would help in conveying the importance and difficulty of CDS in MOR.

  3. The paper does not adequately discuss the existing body of work on functional tensor completion, such as [Bigoni et al., 2016], [Luo et al., 2023], and [Fang et al., 2024]. Including a more comprehensive discussion on related works and how the proposed method compares to them would provide better context and highlight the contributions more effectively.

Ref:

  • Bigoni, Daniele, Allan P. Engsig-Karup, and Youssef M. Marzouk. "Spectral tensor-train decomposition." SIAM Journal on Scientific Computing 38.4 (2016): A2405-A2439.

  • Luo, Yisi, et al. "Low-rank tensor function representation for multi-dimensional data recovery." IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

  • Fang, Shikai, et al. "Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data." arXiv preprint arXiv:2311.04829 (2023).

问题

See weakness

局限性

See weakness

作者回复

Many thanks for the thoughtful and constructive feedback.

W1: Insufficient introduction of CDS in MOR

Re: We acknowledge the need to better define and highlight the importance of Combinatorial Distribution Shift (CDS) in the context of multi-output regression (MOR). CDS occurs when training data covers only a limited subset of possible attribute combinations, while real-world applications often present new, unseen combinations. In MOR, CDS directly challenges model generalization by altering the joint distribution of outputs for unfamiliar input combinations, which traditional methods struggle to handle effectively.

The importance of addressing CDS has not been widely recognized for several reasons. First, many practical applications have not frequently encountered the challenge of new input combinations, leading to a lack of perceived urgency. Second, traditional MOR models often perform well under consistent data distributions, masking their limitations when faced with distribution shifts. Finally, the complexity and high costs of collecting and annotating data for new combinations have limited related research and empirical analysis, resulting in less attention to CDS issues in both academia and industry.

However, as AI technology advances and its application areas expand, CDS issues will become increasingly frequent and critical (potential application scenarios are detailed in Appendix A.1). In this context, our study provides a foundational framework and introduces novel methodologies to meet these evolving demands.

In the revision, we will refine the definition and introduction of CDS, emphasizing its importance in multi-output regression. Specifically, we will clarify how CDS complicates the learning process by causing variations in the joint distribution of outputs, which traditional methods struggle to manage. This will underscore the necessity of developing new approaches to effectively address CDS in MOR.

W2: Excessive detail in definitions and proofs

Re: We will streamline the main text as follows:

  • Summarize core contributions and results concisely, avoiding excessive technical details. Key definitions and assumptions will be directly tied to the challenges posed by CDS.
  • Move extensive technical explanations to the appendix, keeping the main discussion focused on central issues and contributions.
  • Utilize practical examples and visual representations to illustrate the significance of our contributions without overwhelming the reader with complex details.

W3: Inadequate discussion of related work

Re: Thank you for your detailed review and feedback on our paper. We address the points regarding the lack of discussion on [Bigoni et al., 2016], [Luo et al., 2023], and [Fang et al., 2024] as follows.

  1. [Bigoni et al., 2016]: This work proposed a functional tensor decomposition method based on classical Tensor Train decomposition, focusing on approximation problems in multidimensional data, especially in high-dimensional continuous spaces. The motivation was to efficiently handle high-dimensional function evaluations by reducing computational complexity through tensor decompositions. Unlike this approach, our research tackles the challenge of multi-output regression under CDS by introducing the Functional t-VSD method. This method extends the classical t-SVD framework to continuous feature domains, offering a novel perspective for solving multi-output regression problems.
  2. [Luo et al., 2023]: This study utilizes Tucker decomposition combined with neural networks to model tensor mode functions, aiming to improve data recovery in multidimensional settings. The motivation behind this approach was to leverage the representational power of neural networks alongside tensor decompositions to handle complex data structures. While effective in deterministic settings, this work does not explicitly account for data uncertainty. In contrast, our work extends the t-SVD decomposition into the Ft-SVD framework and introduces the ERM-DS algorithm specifically designed to address the challenges of multi-output regression under CDS, thereby ensuring robust generalization capabilities.
  3. [Fang et al., 2024]: This paper developed methods within the CP/Tucker decomposition framework with a Bayesian approach to manage multimodal data. The motivation was to incorporate uncertainty quantification and probabilistic modeling, making it particularly valuable in scenarios requiring robust predictions. However, these methods primarily focus on Bayesian modeling without directly addressing the complexities introduced by CDS. Our work distinguishes itself by introducing the Ft-SVD framework, specifically targeting the challenges posed by CDS in multi-output regression. This approach offers a robust solution for handling continuous feature domains, providing a theoretical foundation for mitigating the impact of distribution shifts.

Innovation and Contribution: Our approach significantly differs from the works of [Bigoni et al., 2016], [Luo et al., 2023], and [Fang et al., 2024] in terms of problem focus, motivation, and methodology. We propose an innovative Ft-SVD theoretical framework to address multi-output regression problems under CDS. By analyzing spectral properties, we design the ERM-DS algorithm, which not only captures the spectral decay characteristics in different sub-domains but also provides theoretical performance guarantees. These contributions fill a gap in the existing literature concerning CDS issues and offer effective tools for representing and analyzing multi-output functions.

We will include these discussions in the related work section of our manuscript and clarify the distinctions and advancements our method provides. We believe these additions will help better understand the context and contributions of our research.

评论

I appreciate the authors' efforts in rebuttal, and I decided to increase my score.

评论

Thank you for your thorough review and for reconsidering our work. We greatly appreciate your decision to increase the score and for acknowledging the value of our contributions. Your insightful feedback will be instrumental in refining the paper for the final version, ultimately enhancing its quality and impact.

审稿意见
6

The authors propose a new model for decomposing vector-valued functions of two vector arguments. The proposed model considers a SVD decomposition of these functions in Hilbert space, effectively extending t-SVD to Hilbert spaces. Here t-SVD, short for tensor-SVD, consists of applying SVD to each matrix slice of a tensor, after multiplication by a unitary (although authors restrict these to orthogonal) matrix MM in the third dimension. They propose using this approach for Multi-Output Regression, and claim that the proposed approach is especially suited for generalizing pairs of arguments that are not present in the training data, in a similar fashion as t-SVD may be used for tensor completion.

优点

The method proposed in the paper appears to be very promising. The overall approach seems to be highly generalizable and it may possibly be used together with several function approximation techniques.

缺点

The biggest weakness is the lack of numerical experiments. The overall idea of the paper is interesting, but it is not ground-breaking. Without the multiplication by MM, this is essentially a regression in each dimension. Here numerical experiments are needed as a way to investigate possible choices of MM in applications: should we choose well-known orthogonal basis given by DFT or DCT? Or should we do a data-driven approach?

问题

Overall, the statement of Theorem 4 is not very clear to me. On the other hand, it seems to be a generalization of reference [37] of the paper to multiple output regression. Can you explain what are the differences and similatities between both results? Which parts of the result in [37] generalize well to the new approach, and what were the novel analysis that needed to be developed?

局限性

I believe that the limitations are adequately addressed.

作者回复

Thanks for the constructive feedback.

W1: Lack of numerical experiments

Re: While our main contribution is theoretical, we've included proof-of-concept experiments on Page 9 in the submission. Additional results, including synthetic and real-world data, are in the attached PDF.

W2: Interesting idea, but not ground-breaking

Re: We appreciate your assessment that our paper presents an interesting idea. We'd like to highlight that all reviewers have recognized the novelty and significance:

  • Rev. HXH7 characterized our approach as "rather novel" and praised our "series of solid theoretical analyses."
  • Rev. yp2H noted our "clear and reasonable motivation" and "solid theoretical analysis," emphasizing that our method "effectively improves generalization capabilities."
  • Rev. juWX described our functional t-SVD as "a decent contribution" with "solid theoretical analysis."

As NeurIPS values high quality, originality, clarity, and significance, we believe our work, while perhaps not "ground-breaking", aligns well with these criteria. Our paper makes fundamental contributions to an important and previously under-explored problem: MOR under CDS:

  1. Functional t-SVD: Extending t-SVD to infinite and continuous domains, which Rev. HXH7 noted as "a novel approach when dealing with missing data issues for functional data."
  2. First formal treatment of MOR under CDS: Providing a systematic definition and solution, addressing a crucial gap in current ML research.
  3. ERM-DS algorithm: Derived from our Ft-SVD framework, offering new generalization bounds that Rev. yp2H described as "impressive" and "contributing significantly to the understanding and validation of the proposed methods."

Our work advances understanding of MOR under CDS and learning under complex distribution shifts. As Rev. yp2H noted, our approach "effectively improves generalization capabilities" for unseen feature combinations. While building on existing concepts, our application to MOR under CDS represents a foundational step forward in addressing this important ML problem. We believe our work lays a foundation for future research, potentially influencing the development of more robust ML models under combinatorial distribution shifts.

W3: Choice of transformation matrix MM

Re: Thank you for this insightful observation. We would like to clarify our position:

  1. Theoretical focus: Our paper primarily establishes a theoretical framework for Ft-SVD and its application to MOR under CDS. Following t-SVD literature conventions e.g. [21], we assume MM is given, allowing us to focus on core theoretical contributions.
  2. Framework flexibility: While MM is assumed given, our framework accommodates various choices, enabling adaptation to different problem structures.
  3. Empirical investigation: We've conducted preliminary numerical experiments comparing different MM options (DFT, DCT, data-specific). Results are in the attached PDF.
  4. Future directions: Optimal MM selection opens new research avenues. We propose a potential approach:

A preliminary proposal.

  • We seek the best MM in the feasible transformation set M\mathcal{M} (typically the orthogonal matrix group): minMMΦ(M)\min_{M \in \mathcal{M}} \Phi(M) where Φ(M)\Phi(M) measures the quality of the Ft-SVD under MM. For example, the reconstruction error Φ(M)=XYF(x,y)Fk(x,y;M)F2dxdy,\Phi(M) = \int_{\mathcal{X}} \int_{\mathcal{Y}} \\|F(x,y) - F_k(x,y;M)\\|^2_F dx dy, where Fk(x,y;M)F_k(x,y;M) is the rank-kk Ft-SVD approximation under MM.

  • To capture the interdependence between the transformation and the optimal functional representation, we employ a bilevel optimization approach:

    minMMΦ(M)=Φ(M,F(M))\min_{M \in \mathcal{M}} \Phi(M) = \Phi(M, F(M)) subject to F(M)=argminFFΦ(M,F).F(M) = \arg\min_{F \in \mathcal{F}} \Phi(M, F).

    Here, FF represents the functional form of the data, and F\mathcal{F} is the space of square-integrable vector-valued functions. To solve this problem, we can use iterative methods such as gradient descent on M\mathcal{M}.

The initial proposal may lack rigor and requires refinement and validation. It's beyond our current paper's scope, which focuses on establishing fundamental theory. Our work provides a foundation for future research, including optimal MM selection. By establishing theoretical groundwork, we enable future studies to investigate these aspects in depth.

Q: Clarity of Thm 4 and its relation to [37]

Re: We apologize for any lack of clarity. Let us address your questions:

Similarity: Both Thm 4 and [37] provide excess risk bounds for learning under CDS, with bounds structured to include approximation and statistical error terms.

Difference: Our results diverge from [37] in:

  1. Framework: Introducing Ft-SVD for tensor completion in infinite-dimensional spaces.
  2. Scope: Generalizing to multi-output MOR, addressing inter-output dependencies.
  3. Characterization: Incorporating spectral decay patterns across frequency components.

Generalizations from [37]: The basic structure of the excess risk bound and the concept of analyzing learning under CDS.

Novel analysis: We introduce key theoretical innovations within our Ft-SVD framework to address MOR under CDS challenges:

  1. Spectral analysis theory for Ft-SVD, extending t-SVD to infinite-dimensional spaces.
  2. Tensor-based error decomposition for MOR based on Ft-SVD, which uncovers the intricate tensorial structure of MOR under CDS, revealing complexities not captured in matrix approaches [37].
  3. Ft-SVD-specific measures of embedding quality, e.g.., generalizing the α\alpha-conditioning conditions to tensor-structured infinite-dimensional spaces, enabling a more detailed understanding of model behavior under severe distribution shifts in MOR.

Modification: We will revise Thm 4 to highlight connections and differences with [37] more clearly. We'll also expand the related work section to better contextualize our contributions.

评论

Dear Reviewer,

We hope this message finds you well. We have carefully addressed all the concerns raised in your initial review, including additional experimental results and theoretical clarifications.

If you have any further questions, we are more than happy to answer them. We would greatly appreciate your thoughts on our responses. Should our clarifications adequately address your concerns, we would be grateful for any reconsideration you might deem appropriate.

Please note that after August 13, 11:59pm AoE, we will no longer be able to respond to any new questions due to the NeurIPS deadline. We appreciate your understanding.

Thank you for your valuable input throughout this process.

Best regards,
The Authors

作者回复

We appreciate the reviewers' constructive feedback and provide a summary of contributions, the feedback, and responses.

Contributions. Our work proposes the novel Functional t-Singular Value Decomposition (Ft-SVD) framework, addressing the challenges of multi-output regression (MOR) under combinatorial distribution shifts (CDS) for the first time. Key contributions include:

  1. Ft-SVD Framework: Extends traditional t-SVD to infinite and continuous feature domains, allowing for handling complex, high-dimensional data through a functional approach.
  2. Double-Stage Empirical Risk Minimization (ERM-DS) Algorithm: Developed within the Ft-SVD framework, this algorithm is tailored for MOR under CDS, leveraging spectral properties and domain-specific hypothesis classes to improve prediction accuracy.
  3. Theoretical Insights: Comprehensive theoretical analysis within the Ft-SVD framework, detailing approximability and excess risk bounds, enhances understanding of MOR models' generalization capabilities under CDS.

Positive Feedback: The reviewers described the methodology as "promising and broadly generalizable" (Rev. 1ZmA) and appreciated the "clear motivation for using tensor completion in multi-output regression under combinatorial distribution shifts" (Rev. yp2H). The theoretical analysis was praised for its "rigor" and "detailed proofs and guarantees" (Rev. yp2H, HXH7). The innovation in combining low-rank tensor decomposition with functional approximation, as well as the extension of t-SVD to infinite and continuous feature domains, was highlighted (Rev. HXH7, yp2H). The clarity of the problem formulation was noted, and the potential impact of the work was seen as offering "new possibilities for a wider range of tasks" and "a new perspective on handling MOR tasks under CDS" (Rev. 1ZmA, yp2H).

Common Concerns: We provide responses to the common concerns as follows:

  • Numerical Experiments: Need for more experiments (Rev. 1ZmA), especially with real-world data (Rev. juWX) and common benchmarks (Rev. HXH7).

    Re: We acknowledge the importance of real-world data experiments. However, the core focus and primary contribution of our paper is theoretical. Our work provides the first formal definition and comprehensive theoretical framework for the CDS problem in multi-output regression. This includes the novel Ft-SVD theorem and the ERM-DS algorithm, which offer new theoretical foundations and methodologies for understanding and solving the CDS problem.

    Nevertheless, we understand the need for real-world data validation and would like to address this concern. We have provided preliminary empirical support for our theory through:

    • Synthetic data experiments: We conducted more synthetic experiments, validating our method in a controlled environment.
    • Real tensor completion in missing not at random (MNAR) settings: We applied our method to real-world Velodyne LiDAR data.

    The additional experimental results are shown in the attached PDF.

    For future work, we propose:

    • Collaborating with domain experts to collect large-scale datasets that reflect CDS characteristics in specific fields.
    • Designing and implementing more comprehensive experiments to evaluate our method's performance in real CDS scenarios.
    • Investigating the applicability of our theoretical framework and methods across a wider range of practical domains.
  • Clarification and Presentation (Rev. yp2H, juWX): The need to clarify the significance of CDS in MOR and its unique challenges, as well as simplify the presentation of assumptions and technical proofs for better accessibility.

    Re: We will elaborate on the role and importance of CDS, highlighting how our approach addresses these challenges. Additionally, we will summarize key contributions more clearly, focusing the main text on core findings and moving more details to the supplementary materials for improved clarity.

Specific Responses: For specific concerns raised by individual reviewers, such as the exploration of different transformation matrices (Rev. 1ZmA), clarification on Theorem 4 and related work (Rev. yp2H), and the selection of rank and potential extensions to higher-order tensors (Rev. HXH7), please see the specific rebuttal where these issues are addressed in detail.

最终决定

The paper presents the functional t-svd framework for multi-output regression problems. All the reviewers including this AC have appreciated the paper. One concern was the lack of experiments which the authors have addressed in their rebuttal (and that are sufficient for this paper). In the camera-ready version, please include all the changes that have been asked by the reviewers.