PaperHub
6.1
/10
Poster4 位审稿人
最低3最高4标准差0.4
3
3
4
3
ICML 2025

Optimal Transfer Learning for Missing Not-at-Random Matrix Completion

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

Rate-optimal transfer learning for MNAR matrix completion in both the active learning and random design settings.

摘要

We study transfer learning for matrix completion in a Missing Not-at-Random (MNAR) setting that is motivated by biological problems. The target matrix $Q$ has entire rows and columns missing, making estimation impossible without side information. To address this, we use a noisy and incomplete source matrix $P$, which relates to $Q$ via a feature shift in latent space. We consider both the *active* and *passive* sampling of rows and columns. We establish minimax lower bounds for entrywise estimation error in each setting. Our computationally efficient estimation framework achieves this lower bound for the active setting, which leverages the source data to query the most informative rows and columns of $Q$. This avoids the need for *incoherence* assumptions required for rate optimality in the passive sampling setting. We demonstrate the effectiveness of our approach through comparisons with existing algorithms on real-world biological datasets.
关键词
Transfer LearningActive LearningMatrix CompletionMissing Not at Random

评审与讨论

审稿意见
3

This paper studies the problem of matrix completion with missing not-at-random mechanisms, where the observation pattern is row/columns-wise. Under such missing/observation family, the authors establish the minimax lower bound for entrywise estimation error. With side information, the authors propose a computationally efficient estimation framework which achieves the optimal rate under mild assumptions. The effectiveness of proposed method is demonstrated on both simulated data and real-world genomic and metabolic datasets.

给作者的问题

No

论据与证据

The claims are supported by rigorous theoretical investigations and numerical experiments.

方法与评估标准

The theoretical results are developed mostly in maxmax (entrywise LL_\infty) norm. It would be interesting to see the discussion in other forms (in an average sense, like Frobenius norm or entrywise L1L_1 norm). In numerical experiments, such metrics are also interesting and informative to see (RMSE, MAE, etc).

理论论述

I didn't go through the detailed proofs, but the ideas and discussions presented in the main body are sound to me.

实验设计与分析

If I understand it correctly, there's no real missing in the real world experiments. Are the missed entries synthetic? It would enhance the significance of this work if the experiments are conducted on datasets where the primary interest is the recovery of entries.

补充材料

No

与现有文献的关系

The presented method works effectiveness for biological findings.

遗漏的重要参考文献

No

其他优缺点

No.

其他意见或建议

line 1422, "do poorly in max errow"

作者回复

We thank the reviewer for their valuable feedback, and for highlighting that our claims are supported by rigorous theoretical investigations and numerical experiments. We will fix typos in the revision, and we address all other comments below.

Our methods perform best under new evaluation metrics (MAE and RMSE).

As the Reviewer suggests, we perform new experiments to compare our active and passive sampling methods on Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), as well as the the previously presented metrics of Max Squared Error, and Mean Squared Error (MSE). For ease of comparison, we report the results of Fig. 2 (gene expression transfer) and Fig. 3 (metabolic transfer) again in the tables below. The MAE/RMSE numbers are new.

For the gene expression transfer problem (Figure 2), our methods continue to out-perform baselines with pRow=pCol=0.5p_{\textup{Row}} = p_{\textup{Col}} = 0.5. For gene expression data, the errors are:

MethodMSEMax Squared ErrorMAERMSE
Passive (Ours)0.0043850.3000350.0444930.055198
Active (Ours)0.0182250.3721050.1032850.114654
LLL220.1517920.6262930.3434970.389449
BC220.5702541.0000000.6788620.754897

Next, we perform the same experiment for the metabolic transfer problem (Figure 3).

MethodMSEMax Squared ErrorMAERMSE
Passive (Ours)0.0002171.2929950.0009340.014638
Active (Ours)0.0000240.2942490.0006690.004883
LLL220.0003600.6511760.0069310.018147
BC220.0037901.0000000.0210860.055543

We will include the suggested error metrics (MAE and RMSE) in the revision.

Our theoretical upper bounds on Max Squared Error also imply upper bounds on Frobenius error, RMSE, and MAE.

The max-squared error bounds (Theorem 2.6 and Theorem 2.9) immediately imply bounds on Frobenius error, because 1mnQ^QF2Q^Qmax2\frac{1}{mn} \Vert \hat Q - Q \Vert_{F}^2 \leq \Vert \hat Q - Q \Vert_{max}^2.

Similarly, both the RMSE and the MAE are upper-bounded by the Max Absolute Error, which can be bounded via Theorems 2.6 and 2.9.

We will include this discussion in the revision.

Our experiments are conducted on datasets where the primary interest is the recovery of entries.

Our real-world experiments (Section 3.1) on gene expression (Fig. 2) and metabolic (Fig. 3) data are on datasets where the primary interest is recovery of missing entries (Parnell et al. 2013, King et al. 2016, Jalan et al. 2024).

We mask matrix entries to compare estimated vs. ground-truth values.

In each experiment, we mask entries of QQ for which the ground-truth is known, so that we can compare the estimated Q^ij\hat Q_{ij} to the ground truth value QijQ_{ij}. We mask according to the active and passive sampling frameworks (lines 302-320, left).

There are many entries of P,QP, Q for which the ground truth is unknown. Without knowing the ground truth, we cannot report estimation error. Therefore we do not perform estimation experiments on missing entries for which ground-truth values are not known.

审稿意见
3

This paper studies transfer learning for matrix completion in a Missing Not-at-Random (MNAR) setting, which is motivated by biological problems. The problem is challenging because entire rows and columns of the target matrix are are missing, making direct estimation impossible. This paper introduces a source matrix. This paper provides lower bounds for estimation error in both active and passive sampling settings. The proposed model is an efficient model that is minimax-optimal in the active setting and rate-optimal in the passive setting.

给作者的问题

  1. What happens if the structured transfer assumption (Definition 1.2) does not hold?
  2. Can the active sampling method be realistically implemented in biological experiments?
  3. How robust is the method to different missingness mechanisms?

论据与证据

  1. The proposed estimator is minimax-optimal for the active setting, while in real-world applications, exact optimization of sample selection is impractical.
  2. The transfer learning framework effectively corrects for the missingness structure in MNAR settings. However, the distribution shift model is restrictive.

方法与评估标准

  1. The paper defines active and passive sampling settings, but the evaluation assumes clean singular value decomposition (SVD) features can always be extracted, which is unrealistic in noisy biological data.
  2. Active sampling relies on an idealized G-optimal design, which assumes one can perfectly select the most informative rows/columns, an unrealistic assumption in biological experiments.

理论论述

  1. Minimax lower bounds for MNAR matrix completion are presented, but the practical significance of these bounds is unclear.
  2. The paper proves optimality under restrictive assumptions, while real-world missingness mechanisms are more complex.

实验设计与分析

  1. The paper evaluates on only two datasets, which is insufficient for demonstrating broad applicability.
  2. The paper does not evaluate how well the method generalizes across different datasets or varying missingness patterns.

补充材料

The attached material is the code.

与现有文献的关系

This paper studies transfer learning for matrix completion in a Missing Not-at-Random (MNAR) setting, which is motivated by biological problems.

遗漏的重要参考文献

Graph-based and VAE-based matrix completion methods should be discussed.

其他优缺点

Strengths:

  1. The problem of MNAR matrix completion with transfer learning is relevant in biological applications.
  2. The theoretical analysis of estimation error bounds contributes to the understanding of MNAR matrix recovery.

Weaknesses:

  1. The method is largely a combination of existing techniques.
  2. Real-world applicability is questionable, from my perspective, the active sampling strategy is difficult to implement practically.
  3. The assumptions about the structured linear feature shifts between source and target matrices is restrictive.

其他意见或建议

  1. Discuss practical limitations of active sampling and whether it can be applied in real experimental settings.
  2. Provide more details on implementation choices, such as hyperparameter tuning and computational complexity.
作者回复

We thank the reviewer for their valuable feedback, and for highlighting the relevance of our problem (MNAR matrix completion with transfer learning) to biological applications. We address their comments below.

Our minimax results guide algorithm design.

The practical significance of our minimax lower bounds for MNAR matrix completion is that no method can achieve a better estimation error than ours. So, the error guarantees in Theorems 2.6 and 2.9 are unimprovable in our setting.

Exact selection of rows and columns (in active sampling) is realistic for the biological settings we study.

The choice of exact rows and columns to query matches experimental design constraints in multiple settings (lines 9-16, right). Our model is designed to capture these settings. These include (i) metabolite balancing experiments (Christensen and Nielsen, 2000), (ii) gene expression microarrays (Hu et al. 2021), (iii) marker selection for single-cell RNA sequencing (Vargo and Gilbert, 2020) and (iv) patient selection for companion diagnostics (Huber et al. 2022). In Section 3.1, we study precisely the first two settings listed in the introduction: metabolic (Fig. 3) and gene expression (Fig. 2) data.

Some settings, such as electronic health records (Zhou et al. 2023), have different experimental design constraints (lines 46-49, right). Our model would not apply to these settings.

In the revision, we will include additional details on experimental protocols for metabolite balancing and gene expression microarray experiments, as suggested.

Our MNAR model captures the missingness structure of the biological datasets we study.

Our passive sampling model (lines 97-109, left) matches the row/column missingness structure present in gene expression data (Fig. 1, and lines 16-24, right) and metabolic network data (lines 330-342, left). We focus on these settings in our real-world experiments (Section 3.1).

For other kinds of data, other missigness structures may be present (lines 46-49, right). Our methods can be applied to any missingness structure, but we analyze the specific MNAR settings of the paper due to their biological importance.

If the Reviewer has particular missingness structures in mind, we are happy to discuss the applicability of our methods to those settings.

The distribution shift model is realistic for the datasets we study.

The matrix transfer model (Definition 1.2) is commonly used in the biological literature, such as in Genome-Wide Association Studies (see McGrath et al. 2024 and references therein). Our real-world experiments (Section 3.1) further validate that it is appropriate for gene expression (Fig. 2) and metabolic (Fig. 3) transfer problems.

No model applies perfectly to real-world data. It would be interesting to study other models of distribution shift, as we have stated (lines 437-439, right).

Our methods can be applied without any structured transfer assumptions.

Even if Definition 1.2 does not hold, our method can be applied. However, we only analyze its theoretical properties under Definition 1.2.

Our methods can use noisy SVD estimates.

We do not assume clean SVD features for either the source matrix PP or target matrix QQ. The source matrix, which follows Assumption 2.5, can be noisily observed in an MCAR or MNAR fashion (lines 167-172, right). The target matrix is observed with additive noise and has missing rows and columns for both active and passive sampling (lines 79-109, left).

Our active sampler does not assume idealized G-optimal design.

We compute an ϵ\epsilon-approximate G-optimal design (Definition 2.3) with convex optimization. Specifically, we use the Franke-Wolfe algorithm which runs in polynomial time (Lattimore and Szepesvari 2020). Setting ϵ=0.05\epsilon = 0.05 is sufficient (Theorem 2.6). A perfect GG-optimal design is not needed.

We will add graph-based and VAE-based references in the revision.

We will discuss VAE-based methods and graph-based methods in the revision, including but not limited to the (see discussion with Reviewer XH4Z) of Ipsen et al. 2020, as well as [1-2] for VAE, and Jalan et al. (NeurIPS 2024) and [3-4] for graphs.

[1] Ghalebikesabi, Sahra, et al. "Deep generative missingness pattern-set mixture models." International conference on artificial intelligence and statistics. PMLR, 2021.

[2] Cai, Hongmin, et al. "Realize generative yet complete latent representation for incomplete multi-view learning." IEEE Transactions on Pattern Analysis and Machine Intelligence46.5 (2023): 3637-3652.

[3] Wang, Yao, et al. "Matrix Completion with Graph Information: A Provable Nonconvex Optimization Approach." arXiv preprint arXiv:2502.08536 (2025).

[4] Zhan, Tong, et al. "Collective Matrix Completion via Graph Extraction." IEEE Signal Processing Letters (2024).

All algorithms are polynomial-time.

We use well-studied polynomial-time algorithms (SVD, least-squares, and Franke-Wolfe).

审稿意见
4

This paper studies a problem in which there are two matrices PP and QQ of the same dimension, where a noisy version of PP is observed and a noisy and partial view of QQ is observed. PP is known to be low-rank, and the two matrices are related via distribution shift. The objective is to recover the matrix QQ using this additional knowledge. The authors study two settings, one in which there is a budget to sample noisy entries of QQ (active sampling) and one in which, according to some distribution, noisy entries of QQ are revealed. An estimation algorithm for QQ is given, and analysis is provided for the estimation error of both settings.

给作者的问题

n/a

论据与证据

Yes.

方法与评估标准

Yes, and Figure 2 shows that that this method outperforms other methods as well.

理论论述

The theoretical claims appear to be correct.

实验设计与分析

Yes, the experiments appear to be sound and valid.

补充材料

Only briefly looked at the supplementary material, which appeared to be correct. It consists mostly of the proofs for theorems in the main paper.

与现有文献的关系

The problem analyzed in this paper is motivated by problems that arise in biology. The authors emphasize its relation to tasks such as various key biology and medical problems, for which this paper may be useful.

遗漏的重要参考文献

n/a

其他优缺点

Strengths:

  • The paper is comprehensive in its analysis of an interesting problem and studies two different settings, the passive and active sampling settings. As expected, the active sampling setting is easier. Minimax lower bounds and generic error bounds are given for both settings. Further, the solution is computationally efficient.
  • This paper generalizes on previous results by allowing for any kind of distribution shift rather than simply a rotational shift.
  • Empirically, there are strong results. The results appear to be complete and well-described. Weaknesses:
  • To improve readability, it would be better to first introduce the estimation framework, and then separately discuss active and passive learning settings.

其他意见或建议

See above.

作者回复

We thank the reviewer for their valuable feedback, and for highlighting that our paper has strong empirical results, and is comprehensive in its analysis of an interesting problem. We address their comments below.

We will reorganize the writing to first introduce the estimation framework, and then separately discuss the active and passive sampling settings.

Our estimation framework (Section 2.2) involves first learning features from the source matrix via SVD, and then estimating the target matrix via least-squares. This estimation framework applies to both the active sampling (lines 85-96, left) and passive sampling (lines 97-109, left) settings. To improve readability, we will reorganize the problem setup in the revision to first introduce the estimation framework, and then discuss both the active and passive sampling settings. We thank the reviewer for this valuable suggestion.

审稿意见
3

The authors study matrix completion in the MNAR setting under transfer learning. They establish minimax bounds for entry-wise estimation of target values under both active and passive sampling settings. Additionally, they propose a computationally efficient minimax-optimal estimator—leveraging the tensorization of G-optimal designs for active sampling—and a rate-optimal estimator for passive sampling. Experiments on two simulated datasets and two real-world datasets demonstrate the effectiveness of their method compared to two baseline approaches.

给作者的问题

Refer to Experimental Designs Or Analyses about the evaluation criterion.

论据与证据

The theoretical results are well established and proved.

方法与评估标准

The paper reports Mean Squared Error (MSE) for tasks involving gene expression microarrays and metabolic modeling. Max Squared Error (MSE_max) might be commonly used in these domains due to its sensitivity to extreme deviations, which can be biologically significant. It might be helpful for the authors to adopt other evaluation metrics (e.g., Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) ) for a more comprehensive assessment of model performance.

理论论述

Yes, I mainly check the theorem mentioned in the main paper.

实验设计与分析

In the experimental design, it would be valuable to include results for a non-transfer setting. Specifically, evaluating state-of-the-art imputation methods when trained solely on Q would provide a useful baseline. If the proposed method outperforms these non-transfer baselines, it would further highlight the significance of studying the transfer setting and demonstrate its practical advantages. I encourage the authors to consider this comparison to strengthen their empirical evaluation.

补充材料

I mainly review Appendix B, additional experiments and details.

与现有文献的关系

I think this paper help to establish the theoretical and practical approach for matrix complete under transfer setting.

遗漏的重要参考文献

I think it would the best if the author discuss more about missing data imputation method under MNAR in the literature.

其他优缺点

Strengths: The paper is well written and presents a sound theoretical framework. Weaknesses: The baseline methods in the experimental section are somewhat limited. While I acknowledge that few methods specifically address matrix completion in the transfer setting, it would be beneficial to incorporate baseline methods from the MNAR imputation literature—such as not-MIWAE [1]—to establish a stronger point of comparison. Evaluating these methods using only data from Q would help demonstrate the best possible performance without transfer, thereby further emphasizing the significance of the transfer setting.

[1] Ipsen, N.B., Mattei, P., & Frellsen, J. (2020). not-MIWAE: Deep Generative Modelling with Missing not at Random Data. ArXiv, abs/2006.12871.

其他意见或建议

None

作者回复

We thank the reviewer for their valuable feedback, and for highlighting that our paper is well written and presents a sound theoretical framework. We address their comments below.

The manuscript contains results for a non-transfer MNAR baseline (BC22).

The method of BC22 (IEEE Transactions on Information Theory, 2022) is an MNAR imputation method (lines 302-305, left). The manuscript includes this method for all experiments.

Our methods outperform not-MIWAE on MAE, RMSE, MSE, and Max Error.

As the Reviewer suggests, we perform new experiments to compare our methods against the not-MIWAE method of Ipsen et al. 2020. Below, we present the comparison of our active and passive sampling methods on Max Squared Error, Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). For ease of comparison, we report the results of Fig. 2 (gene expression transfer) and Fig. 3 (metabolic transfer) again in the tables below. The MAE/RMSE numbers, and the results of not-MIWAE, are new.

For the gene expression transfer problem (Figure 2), our methods out-perform not-MIWAE with pRow=pCol=0.5p_{\textup{Row}} = p_{\textup{Col}} = 0.5. We train not-MIWAE until convergence, with the latent dimension equal to the true matrix rank of QQ, and a batch size of 32. For gene expression data, the errors are:

MethodMSEMax Squared ErrorMAERMSE
Passive (Ours)0.0043850.3000350.0444930.055198
Active (Ours)0.0182250.3721050.1032850.114654
LLL220.1517920.6262930.3434970.389449
BC220.5702541.0000000.6788620.754897
not-MIWAE0.2078501.0000000.4159130.455765

Next, we perform the same experiment for the metabolic transfer problem (Figure 3).

MethodMSEMax Squared ErrorMAERMSE
Passive (Ours)0.0002171.2929950.0009340.014638
Active (Ours)0.0000240.2942490.0006690.004883
LLL220.0003600.6511760.0069310.018147
BC220.0037901.0000000.0210860.055543
not-MIWAE0.0066661.0000000.0303070.076841

As Reviewer XH4Z suggests, our methods may perform better because not-MIWAE is a non-transfer baseline. This further emphasizes the significance of the transfer setting, which our methods capture, as well as the method of LLL22.

We will include both the suggested baseline (not-MIWAE), and the suggested error metrics (MAE and RMSE) in the revision.

Our methods out-perform others under new evaluation metrics (MAE and RMSE).

See the above table. We will include these new metrics in the revision to give a more comprehensive assessment of model performance.

As the reviewer correctly notes, our main focus is on Max Squared Error, which is a commonly used evaluation metric due to its sensitivity to extreme deviations; this is biologically significant.

最终决定

This paper studies matrix completion in a novel framework that incorporates missingness of entire rows or columns with some side information available in the form of an out-of-domain source matrix. The results include both upper and lower bounds in both passive and active sampling regimes, where the latter refers to a situation where the learner is able to select a certain quota of full rows and columns to observe. Like the "MNAR" literature, the results are "out of distribution" in the sense that the upper bounds manage to control the recovery error in terms of the maximum error on any entry, even in though the sampling distribution is nonuniform. The reviewers unanimously agree the paper makes a strong contribution and should be accepted. I also find the results very interesting and powerful. It would be interesting to compare the results to the existing generalization analyses for IMC in the distribution-free i.i.d. sampling regime, where the performance measure is defined in-distribution (in expectation over the sampling distribution).