PaperHub
7.2
/10
Poster4 位审稿人
最低3最高5标准差0.8
3
4
3
5
ICML 2025

Curvature Enhanced Data Augmentation for Regression

OpenReviewPDF
提交: 2025-01-24更新: 2025-07-24

摘要

关键词
Manifold learningData augmentationRegression

评审与讨论

审稿意见
3

The paper proposed a new data augmentation approach called Curvature-Enhanced Manifold Sampling (CEMS) specifically for the regression task. It moved one step further by utilizing the second-order representation instead of a first-order approximation of the data manifold. Experiments are conducted on both in-distribution and out-of-distribution scenarios. Overall, the paper is well-written and interesting. However, there seem to be several important issues remaining unsolved.

给作者的问题

Since the generation process is based on a single sample z and its neighborhood, I was wondering how is the method motivated specifically for the regression problem instead of a general-purpose data augmentation approach? That is to say, what specific characteristics/constraints of regression make this method only suitable for regression, not for other tasks such as classification, segmentation, and more?

论据与证据

The contributions (theoretical and empirical) seem incremental and limited - changing from first-order to second-order manifold approximations. And it is not consistently better than the first-order (see Table 1). When and why the second-order is better (or worse) than the first-order?

方法与评估标准

It will be useful to show some augmented images for the image datasets so that we can have a straightforward and visual understanding of the generation. Furthermore, it would be useful to show some generated images for some widely used benchmarking datasets (e.g., MNIST) - although they are not originally for regression. For instance, on the MNIST data, there could be (at least) two ways to construct a regression dataset: <1> transforming the digit 0, 1, 2, ..., to continuous values, and <2> predicting the number of digits in one image, similar to https://github.com/shaohua0116/MultiDigitMNIST. Since Figure 1 is only a simple 1D example, in the above ways, it is possible to show the generated images from different DA approaches on real-world data, which should be useful for understanding different DA methods.

理论论述

For regression, the choice of loss functions is important for real-world datasets, some losses beyond RMSE, such as the Huber loss and the quantile loss, can be particularly useful. Will the choice of loss functions affect the result (theoretical or empirical)?

实验设计与分析

Furthermore, the model architectures seem too simple - simply a three-layer MLP. There are many aspects that can affect the result of a regression model. Hence, is DA always crucial, even when more advanced models/losses are used (e.g., TabTransformer and TabR)? Essentially the generated data is actually fake, and it may potentially do harm to the training if the data augmentation is not authentic enough.

补充材料

Although 3 different batch selection methods are discussed in Supplementary Material B, the results reported in the main text are based on kNN, so k here should be an important hyper-parameter. Is there any discussion on k?

与现有文献的关系

The prior works are cited.

遗漏的重要参考文献

The prior works have been cited.

其他优缺点

For data augmentation, it is important to determine how many augmented samples to generate. From Algorithm 1, it seems a new point is generated for each sample. Since the generated samples should contain more noise w.r.t. the original data, I think it's important to discuss how many generations to include for training. Moreover, not all generations are created equal, so maybe it would be better to consider sampling probability as in C-Mixup.

其他意见或建议

In Table 2, why is it "5.11" for DTI which is different from Table 3?

作者回复

Thank you for the thoughtful feedback. We’re glad the paper was found clear and the core contribution appreciated. Your comments on method design and evaluation helped improve the revised manuscript. Below, we address each point.

Additional tables

  1. The contributions (theoretical and empirical) seem incremental and limited.

    We appreciate the reviewer raising this critical point, and we welcome the opportunity to clarify the significance, context, and complexity of our contributions compared to first-order methods, such as FOMA.

    • Theoretical Significance and Motivation of Second-Order Approximation:
      Although the shift from first-order (FOMA) to second-order (CEMS) approximations might appear incremental, the explicit incorporation of curvature information significantly enriches geometric modeling. First-order methods like FOMA rely solely on linear approximations (via SVD), assuming local linearity and potentially overlooking key structure in curved regions. In contrast, our method introduces differential-geometric elements (e.g., tangent spaces, normal coordinates, Hessian estimates) to better capture nonlinear local geometry.

    • Empirical Results and Conditions for Improvement:
      While second-order methods may not always outperform first-order ones, CEMS performs better than FOMA on all but one dataset in Table 1 and shows consistent gains across architectures in Table 2. Second-order methods are especially effective on data with pronounced curvature, while first-order baselines suffice in flatter regions.

    • It will be useful to show some augmented images for the image datasets.
      We thank the reviewer for this suggestion. While Figure 1 (1D sine wave) offers the clearest geometric illustration, we now include visualizations from RCF-MNIST Figure a and Figure b, showing original, augmented, and difference images. These confirm that CEMS introduces smooth, semantically consistent perturbations.

  2. For regression, the choice of loss functions is important for real-world datasets.

    We agree. While we used RMSE to match standard benchmarks, CEMS is agnostic to the loss function. The augmentation process is independent of the training objective, and alternative losses like Huber or quantile loss could interact with the data differently. Evaluating these is a promising direction for future work.

  3. The model architectures seem too simple.

    • Model Architecture Choice:
      We followed C-Mixup, ADA and FOMA to ensure fair comparisons. This isolates the contribution of the DA method itself.
    • Applicability to Advanced Models:
      CEMS is model-agnostic and compatible with advanced models like TabTransformer or TabR. We now mention this as future work.
    • Is DA Always Crucial?
      Geometry-aware DA is particularly valuable in low-data or noisy regimes. Gains may vary with model complexity.
    • Authenticity of Augmented Data:
      CEMS constrains augmentation using second-order geometry, ensuring samples remain close to the true manifold.
  4. Although 3 different batch selection methods are discussed...

    • In CEMS, kk is determined by the mini-batch size and not tuned separately. We added a sensitivity analysis on batch size in the appendix. Table 4 in the additional tables file shows that our method is not overly sensitive to this parameter.
    • In CEMS_p, neighborhoods are constructed from the full dataset and kk is selected via cross-validation.
    • Optimal kk varies by dataset. In sparse regions, too large a kk may hurt performance, so we recommend validation-based tuning.
  5. For data augmentation, it is important to determine how many augmented samples to generate.

    • Number of Augmented Samples:
      We generate one sample per point for consistency with baselines, but our method supports generating more via resampling.
    • Noise and Sample Quality:
      Augmented samples are constrained by local curvature, but not all are equally useful.
    • Relation to C-Mixup:
      Incorporating sampling probabilities based on geometric uncertainty could improve robustness—an avenue for future work.
  6. In Table 2, why is it "5.11" for DTI which is different from Table 3?

    Thank you for catching this. We corrected the value to be 0.511 in the revised manuscript.

  7. Since the generation process is based on a single sample z and its neighborhood...

    CEMS was developed for regression, where DA methods from classification (e.g., mixup) do not translate directly due to continuous targets. We model a joint manifold over (X,Y)(X, Y), enabling smooth label-aware augmentation. However, CEMS is not limited to regression, it can be applied to classification or segmentation by operating on XX only.

审稿意见
4

This paper targets the problem of data augmentation for regression tasks where data has some intrinsic manifold structure. Specifically, the goal is to capture this manifold structure and generate new data on this manifold. Local neighborhoods are formed through nearest neighbor algorithms. The tangent space is formed by taking SVD within the locla neighborhood, and the local chart functions are quadraticly approximated, with gradient and hessian empirically estimated by linear systems. New samples are obtained by drawing from normal distributions over the tangent space and transforming back to ambient space. Simulation examples and numerical applications are conducted.

给作者的问题

N/A.

论据与证据

The method is straightforward and all makes sense.

It would be great if methods for determining the intrinsic dimension of manifold can be discussed.

方法与评估标准

Out-of-distribution evaluation is included, which is great.

理论论述

No theoreical claim present.

实验设计与分析

The numerical experiments all make sense. However, they all seem to have relatively low instrinsic dimensions and ambient dimensions.

Also, for the real world applications, are the intrinsic dimensions considered known, or are they determined by some preprocessing techniques such as the elbos of cumulative singular values?

补充材料

I have reviewed the supplementary material.

与现有文献的关系

There are many latest work on manifold learning, especially from the statistical side, that utilizes more sophisticated modeling for the local charts using e.g. Gaussian processes, spherelets, etc.

遗漏的重要参考文献

Here are a few examples mentioned above:

Faigenbaum-Golovin, Shira, and David Levin. "Manifold Reconstruction and Denoising from Scattered Data in High Dimension via a Generalization of L1-Median." arXiv preprint arXiv:2012.12546 (2020).

Dunson, David B., and Nan Wu. "Inferring manifolds from noisy data using gaussian processes." arXiv preprint arXiv:2110.07478 (2021).

Li, Didong, Minerva Mukhopadhyay, and David B. Dunson. "Efficient manifold approximation with spherelets." Journal of the Royal Statistical Society Series B: Statistical Methodology 84.4 (2022): 1129-1149.

其他优缺点

Overall the paper is well-written and clear. The method makes sense and is straightforward, but may lack novelty/originality compared to the state-of-the-art methods in this field.

其他意见或建议

N/A.

作者回复

We sincerely thank the reviewer for their thoughtful and constructive feedback. We are glad that the core methodology and experimental evaluations were found to be clear and well-executed. We appreciate the suggestions to improve the discussion of intrinsic dimension estimation and to better situate our work within the broader landscape of manifold learning literature, particularly recent advances from the statistical perspective. In the revised manuscript, we have addressed these points through additional discussion, new references, and clarification of our experimental setup. We respond in detail to each comment below.

  1. It would be great if methods for determining the intrinsic dimension of manifold can be discussed.
    We thank the reviewer for the suggestion. A discussion has been added to the appendix, reviewing classical and modern intrinsic dimension (ID) estimation methods, including both statistical and geometric approaches. Due to space constraints and the inability to upload additional text, we are unable to include it here.

  2. The numerical experiments all make sense. However, they all seem to have relatively low intrinsic dimensions and ambient dimensions.
    While some datasets in our experiments such as Airfoil and NO2 have low ambient and intrinsic dimensions (6/3 and 8/6, respectively), others involve significantly higher dimensions. For instance, Exchange-Rate has an ambient dimension of 1352 and intrinsic dimension of 234, while Electricity reaches 54,249 ambient and 20 intrinsic dimensions. This range highlights the diversity of our benchmarks and demonstrates the scalability and robustness of our method across both low- and high-dimensional regimes, supporting the manifold hypothesis.

  3. Also, for the real world applications, are the intrinsic dimensions considered known, or are they determined by some preprocessing techniques such as the elbows of cumulative singular values?
    In real-world applications, the intrinsic dimension is generally not known a priori and must be estimated. In our experiments, we employ the TwoNN estimator based on the method of Facco et al. (2017), which leverages minimal neighborhood statistics to robustly estimate local intrinsic dimensionality. This approach is parameter-free and can be applied consistently across datasets. While alternative techniques such as analyzing the elbow of cumulative singular values can also be used, we chose a method that aligns well with our second-order manifold framework and scales effectively across diverse data regimes.

  4. There are many latest work on manifold learning, especially from the statistical side, that utilizes more sophisticated modeling for the local charts using e.g. Gaussian processes, spherelets, etc.
    We thank the reviewer for their thoughtful observation. Our approach is general and can incorporate any module for estimating the local chart. In this work, we used a simple and widely accepted method based on PCA to demonstrate the core ideas. Importantly, other, more advanced tools can be used. We have added a discussion in the Related Work section to acknowledge this flexibility and to better situate our method within the broader literature.

  5. Overall the paper is well-written and clear. The method makes sense and is straightforward, but may lack novelty/originality compared to the state-of-the-art methods in this field.
    We appreciate the reviewer's overall positive assessment and the opportunity to clarify the novelty of our contribution. While second-order approximations of manifolds have indeed been explored previously, our primary contributions are as follows:

    • Differentiable Second-Order Augmentation:
      To our knowledge, ours is the first method explicitly designed to provide a differentiable, second-order manifold approximation tailored specifically toward data augmentation for regression tasks. Previous second-order methods typically focus on manifold embedding and dimensionality reduction rather than augmentation aimed at improved neural network generalization.

    • Practical and Efficient Implementation:
      Our method introduces a practical mini-batch-based strategy for second-order local manifold approximation, significantly improving computational efficiency and scalability. This practicality is crucial for widespread adoption in neural network training contexts, distinguishing our work from more computationally demanding prior methods.

    • Empirical Demonstration of Effectiveness:
      We empirically demonstrate the significant benefits of explicitly incorporating curvature information into data augmentation, showing substantial improvements over state-of-the-art methods in several regression settings.

审稿意见
3

This paper proposes a data augmentation method tailored for regression problems, leveraging the manifold hypothesis in the joint input-output space. The method approximates the data manifold up to the second order and samples new data points that adhere to this approximation. The effectiveness of the approach is demonstrated across multiple in-distribution and out-of-distribution benchmarks, where it achieves comparable or superior performance to existing state-of-the-art methods.

给作者的问题

Please refer to the questions in Other strengths and weaknesses section.

Other questions:

  1. Why is the linear system solved per point (line 258, col. 2, page 5)?
  2. What types of datasets benefit most from this method, and in which scenarios might it be less effective?
  3. How does CEMS compare when applied in data space vs. latent space, and what are the trade-offs?

论据与证据

The authors substantiate their claims with extensive empirical evaluations on several real-world datasets, comparing their method against state-of-the-art techniques.

方法与评估标准

The proposed methods and evaluation criteria appear appropriate for addressing data augmentation in regression.

However, key hyperparameters such as the number of neighbors kk, the intrinsic manifold dimensionality dd, the choice of σ\sigma, batch size, the choice of the space (data or latent space), would influence performance, but their robustness is not systematically analyzed.

Further discussion is needed to assess the sensitivity of the method to these hyperparameters.

理论论述

This paper does not introduce novel theoretical contributions beyond the second-order approximation framework for data augmentation.

实验设计与分析

The experimental design is mostly sound and well-structured.

However, some aspects require further clarification, particularly regarding the robustness of hyperparameter choices and the applicability of the method in different settings (e.g., data space vs. latent space).

补充材料

No supplementary material is provided with this submission.

与现有文献的关系

Effective data augmentation techniques are essential for improving regression model performance across various applications. This work aligns with broader research in manifold learning and data augmentation.

遗漏的重要参考文献

The Hessian eigenmaps paper [a] explores similar second-order manifold approximations and should be referenced in the related work section.

[a] Donoho, D. L., & Grimes, C. (2003). "Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data." Proceedings of the National Academy of Sciences, 100(10), 5591-5596.

其他优缺点

Strengths:

  • The paper introduces a novel data augmentation method rooted in manifold learning.
  • Demonstrates strong empirical performance across multiple datasets, often surpassing existing methods.

Weaknesses:

  • Lack of clarity in hyperparameter selection:
    • The paper does not provide a comprehensive discussion on determining optimal values for key hyperparameters.
    • Sensitivity analysis is missing, leaving open questions about the method’s robustness to parameter choices.
    • Application to different data representations: the method appears applicable in both data space and latent space, but no explicit discussion clarifies the trade-offs or performance implications.
  • Lack of clarity in the role of differentiability:
    • The paper briefly mentions differentiability but does not explain its necessity or its impact on performance.
    • Why is differentiability necessary for the data augmentation module, and how does it impact performance? How does the performance change if gradient information is ignored and the generated data is only used for augmentation during standard training?
  • Lack of clarity in the batch vs. mini-batch implementation:
    • The description of the batch-wise and mini-batch implementations is unclear.
    • If using mini-batch training, is the neighborhood constructed solely from the mini-batch? What is z0z_0?
    • Since CEMSCEMS (rather than CEMSpCEMS_p) appears to be the primary approach, a clearer explanation of CEMSCEMS in the main text would be beneficial.

其他意见或建议

Minor Comments:

  • The dimensions of vectors and matrices should be explicitly stated in the text, e.g., for BuB_u (line 199, col. 1, page 4), xi,yix_i, y_i (line 168, col. 2, page 4), Ψ\Psi and GG (line 271, col. 1, page 5), to name a few.

  • Inconsistencies in notation should be addressed:

    • Line 218, col. 1, page 4: [u,g(u)][u, g(u)] -> [u,g(u)][u^\top, g(u)^\top]^\top?

    • Line 175, col. 2, page 4: Is only YY normalized?

    • Line 355, col. 1, page 7: Yao et al. (2022) includes the Echo dataset, which is not used here—why?

    • Figure 2: The meaning of the red arrow is unclear.

    • Transpose notation is inconsistently represented as TT and \top.

Typos:

  • Algorithm 1, Line 5: Solve ΨA=G\Psi A = G for AA?

  • Table 2: CEMS performance for DTI should be checked.

作者回复

We thank the reviewer for the constructive feedback. We're glad the method’s empirical strength and relevance were recognized. We addressed the comments on hyperparameters, differentiability, and implementation through added clarifications and revisions. Below are our detailed responses.

Additional tables

  1. Essential References Not Discussed.
    We added the reference and discussion to the Related Work section in the revised manuscript.

  2. The paper does not provide a comprehensive discussion for key hyperparameters.
    While we did not include a detailed discussion in the paper, we selected key hyperparameters via standard cross-validation, following the approach used in prior works (e.g., Yao et al., 2022; Schneider et al., 2023).

  3. Sensitivity analysis is missing.
    As shown in the additional tables uploaded, Our method shows robustness to hyperparameter choices. Perturbing the intrinsic dimension estimated by TwoNN yields stable performance, with the baseline or nearby values often achieving the best results. Sensitivity analyses on the neighborhood size parameter BB and the noise scale σ\sigma also show consistent performance across a broad range of values. These findings demonstrate that our method is not overly sensitive to intrinsic dimension, neighborhood size, or noise level.

  4. Application to different data representations.
    Our method is compatible with both data and latent space representations. We added a discussion in the appendix covering the trade-offs and practical considerations of each setting, including when gradient flow through the augmentation module is beneficial.

  5. Differentiability and its impact.
    Differentiability is relevant for latent-space augmentation, enabling backpropagation through the augmentation module and potentially improving representations. For input-space augmentation, differentiability is unnecessary.

  6. Implications of ignoring gradients.
    Even if latent-space augmentation is non-differentiable, it can still diversify training data. However, the lack of gradient flow may reduce the benefits of adaptively optimizing the augmentation process.

  7. Lack of clarity in the batch vs. mini-batch implementation
    We clarify the distinction between the two variants: CEMS and CEMS_p.

  • CEMS_p (point-wise) samples mini-batches randomly. For each point in the batch, a neighborhood of size kk is drawn from the full dataset to estimate a local tangent space and Hessian, producing one synthetic sample.
  • CEMS (batch-wise) samples a point z0z_0 and builds a batch NzN_z of BB nearby points. A shared, mean-centered tangent space is computed, and each point uses it to estimate its own Hessian and generate one sample.

We have revised the manuscript to explicitly define BB in CEMS_p, clarify neighborhood construction, and highlight that both variants generate one sample per point using different geometric setups.

  1. The dimensions of vectors and matrices should be explicitly stated in the text.
    We added the dimension in the revised manuscript.

  2. Inconsistencies in notation should be addressed

    • [u,g(u)][u,g(u)T]T[u, g(u)] \rightarrow [u^{\top}, g(u)^{T}]^{T}?
      Clarified that [,][\cdot, \cdot] denotes column-wise concatenation.
    • Is only YY normalized?
      Yes. Since X[0,1]X \in [0, 1], we normalize YY to balance the concatenation.
    • Why not use the Echo dataset?
      Due to time constraints and its size, we did not complete evaluation on Echo. We plan to include it in future work.
    • Figure 2: Red arrow meaning is unclear.
      We clarified that it represents un-projecting η\eta from the tangent space to RD\mathbb{R}^D via ff.
    • Transpose notation inconsistencies.
      We revised the manuscript to ensure consistent use of transpose notation.
    • Equation: Solve AΨ=GA \Psi = G.
      The correct equation is ΨA=G\Psi A = G. This has been corrected.
    • Table 2: DTI performance seems off.
      The correct value is "0.511", now fixed in the revision.
  3. Why is the linear system solved per point?
    To estimate the gradient and Hessian at each point uu, we fit a second-order Taylor expansion using neighboring points uju_j. Since this expansion is specific to uu, we solve a separate linear system per point, allowing us to capture local geometric variations across the manifold.

  4. What types of datasets benefit most from this method, and in which scenarios might it be less effective?
    Our method performs best on datasets where input-output pairs lie near a smooth, low-dimensional manifold with meaningful curvature, common in structured data like images and audio signals. It is especially effective in sparse or low-data regimes. However, it may be less effective when the manifold assumption breaks down, such as with high-dimensional noise or unstructured data.

审稿人评论

I appreciate the authors' responses, which address most of my concerns. Although I have not had the opportunity to review the revised manuscript, I trust that the authors will incorporate the improvements outlined in the rebuttal to enhance the clarity of the paper. Accordingly, I am raising my score.

作者评论

We appreciate the reviewer’s thoughtful consideration of our responses and the recognition of our efforts to address the concerns raised. We are committed to incorporating the improvements outlined in the rebuttal to enhance the clarity and quality of the paper. We are grateful for your trust and for the revised evaluation.

审稿意见
5

Presents a data-augmentation method for regression problems, taking advantage of the manifold structure of the data. Defined on the concatenation of data and labels, the local neighbourhood of all points defines the assumed manifold structure, and the two first moments (mean and Hessian) are used to sample points for data-augmentation. The resulting method is evaluated on several datasets and performs favourably against SOTA.

给作者的问题

  • Where did the idea of creating a manifold by concatenating data x and label y come from? If it is your contribution, clarify it. If it is well-known, cite previous literature.
  • Are local neighborhoods disjoint sets or overlapping sets? As the Hessian be calculated once per each (as in "re-using neighborhoods and basis computations"), what happens if those overlap?
  • Is the intrinsic dimension of the manifold an important hyperparameter ("CEMS is governed by the intrinsic dimension of the manifold") or an inferred value which is part of the algorithm ("While the intrinsic dimension d can be viewed as a hyper-parameter of CEMS, we estimate it in practice using a robust estimator")? How dependant is the performance on the validity of this estimator?

论据与证据

The claims made in the submission seem to be supported by clear and convincing evidence. The application of manifold learning techniques to data regression seems both straight-forward but novel.

方法与评估标准

Both in-distribution and out-of-distribution problems are evaluated, and strong baselines are used.

理论论述

No theoretical claims are made. The theoretical justifications and derivations for the regression and the resulting error bounds (appendix A1) look good to me.

实验设计与分析

The experimental design is sound and exceptionally detailed in terms of practical considerations, such as batch size and intrinsic dimension estimation.

补充材料

Yes! Thanks for a short and readable material (8pg).

与现有文献的关系

The manuscript is candid in presenting both the regression problem literature and the manifold learning literature, and the contribution to previous studies is very clear.

遗漏的重要参考文献

None that I am aware of.

其他优缺点

Strengths

  • beautiful synthesis of manifold learning tools and the regression problem
  • detailed discussion of practical considerations
  • great evaluation on various regression datasets, good results

Weaknesses

  • The method is well-justified when the hessian is evaluated around each point, but this seems computationally expensive, so un-justified approximations are made (sharing Hessian across a local neighbourhood).

其他意见或建议

None.

作者回复

We thank the reviewer for the thoughtful and encouraging feedback. We appreciate the recognition of our method’s integration of manifold learning and regression, as well as the clarity of our experiments and supplementary material. Below, we address the comments and describe the corresponding revisions.

Additional tables

  1. The method is well-justified when the Hessian is evaluated around each point...
    Thank you for this observation. In our method, the Hessian itself is not shared. Instead, we reuse the local neighborhood to compute a shared orthonormal basis, which each point then uses to estimate its Hessian independently.

    Specifically:

    • Shared Neighborhood:
      A k-nearest neighbor set is constructed for each mini-batch point and reused for nearby points, based on the manifold hypothesis assuming local smoothness.
    • Shared Local Basis:
      A basis for the tangent and normal spaces is computed via Singular Value Decomposition (SVD) from the shared neighborhood.
    • Point-wise Hessian Estimation:
      Each point solves a linear system in the shared basis to obtain its own Hessian. Thus, only the basis is shared; curvature information remains point-specific.

    We compared this method (CEMS) with a fully point-wise variant (CEMSp) and found negligible performance differences (see Table 5), confirming that the shared-neighborhood approximation is both efficient and accurate.

  2. Where did the idea of creating a manifold by concatenating data x and label y come from?
    We thank the reviewer for highlighting this point. The idea of creating a manifold by concatenating input data x with its corresponding label y is not unique to our paper. Relevant prior work has been added and cited in the revised manuscript.

  3. Are local neighborhoods disjoint sets or overlapping sets?
    We thank the reviewer for raising this important point. The local neighborhoods we construct are overlapping sets rather than disjoint sets. Specifically, each point forms its own neighborhood based on its k-nearest neighbors. Therefore, it is natural that neighborhoods of close points may partially overlap.

    To clarify, the Hessian itself is not computed only once per neighborhood; rather, it is computed once per point. What we re-use across overlapping neighborhoods is only the local orthonormal basis, which is computed from the SVD of points in the shared neighborhood. Once this basis is established, each individual point solves its own linear system separately to estimate its Hessian within that shared basis.

    Hence, overlap between neighborhoods is not problematic in our setting. On the contrary, overlap can be seen as beneficial, as it can encourage consistent local geometry estimates across adjacent regions of the manifold, ensuring smooth transitions and coherent geometric structure.

  4. Is the intrinsic dimension of the manifold an important hyperparameter?
    We thank the reviewer for bringing up this important clarification. In principle, the intrinsic dimension d of the manifold could be treated as an important hyperparameter. However, in practice, we opt for an automatic estimation using a robust intrinsic dimension estimator TwoNN (Facco et al., 2017) to reduce the complexity of hyperparameter tuning. Thus, the intrinsic dimension is inferred from the data rather than manually set.

    To address the reviewer’s concern regarding the robustness to intrinsic dimension, we performed a sensitivity analysis by perturbing the TwoNN estimated dimension by ±1 and ±2. As shown in Table 2 in the anonymous pdf file, our method exhibits stable performance across all datasets, with the best or second best results frequently aligning with the baseline value. Additionally, in Table 1 we compare TwoNN with alternative estimators [a, b] and observe strong agreement, e.g., all three methods estimate the same dimension for Crimes. These results support both the robustness of our method to this hyperparameter and the empirical reliability of using TwoNN as the default.

    [a] Levina, E., & Bickel, P. (2004). Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 17.
    [b] Birdal, T., Lou, A., Guibas, L. J., & Simsekli, U. (2021). Intrinsic dimension, persistent homology and generalization in neural networks. Advances in Neural Information Processing Systems, 34.

审稿人评论

After reading the author's rebuttal and the other reviewers' comments, I stand by my original (favorable) rating.

作者评论

We would like to express our sincere gratitude to the reviewer for their careful consideration and for upholding a positive evaluation of our submission. We greatly value your constructive feedback and continued support.

最终决定

This submissions introduces a new data augmentation algorithm, which addresses regression problems. Making use of manifold learning on local neighbourhoods, a curvature-enhanced approach is used for the sampling procedure. Reviewer unanimously agreed upon the relevance of the paper and particularly appreciated the clear writing and the excellent experimental setup. During the rebuttal, some minor issues of the submission were discussed and could be addressed. I trust the authors to implement their proposed changes, but otherwise, this paper more than deserves acceptance.