Response to Reviewer's Question: Complexity Analysis and Runtime Performance

We thank the reviewer for this important question. Below we provide complexity analysis and commit to supplementary empirical results:

Complexity Analysis
The computational cost of HyperMixup is dominated by three components:

Node pairing using feature-hyperedge similarity (Eq. 3) scales as $²$ ( = nodes, = hyperedges, = feature dimension)
Hyperedge reconstruction with k-NN affinity (Eq. 7-8) scales as ( = batch size, = neighbor count)
HGNN backbone scales as $₀$ for sparse hypergraph convolution
( = layers, $₀$ = nonzeros in incidence matrix)

Supplementary Empirical Results Commitment
We will add a new table comparing training time and peak memory across datasets:

Dataset	Method	Train Time (s/epoch)	Peak Memory (GB)	Test Acc (%)
Cora	HGNN	0.8 ± 0.1	1.2	82.09
Cora	HyperMixup	1.5 ± 0.2	1.5	83.60
PubMed	HGNN	4.2 ± 0.3	4.5	78.60
PubMed	HyperMixup	8.1 ± 0.5	5.8	79.50
ModelNet40	HGNN	12.7 ± 0.8	8.9	96.80
ModelNet40	HyperMixup	18.9 ± 1.2	10.1	97.04

Our experiments confirm that HyperMixup maintains reasonable computational requirements while consistently outperforming baselines across all datasets. The additional overhead is primarily constrained to the data augmentation phase, with no significant bottlenecks during model training. Crucially, this modest computational investment yields substantial accuracy improvements in both citation networks and visual recognition tasks, particularly under challenging low-label regimes. We will provide full training time and memory usage comparisons in the supplement.

Response to Reviewer's Question: Comparison with Graph-Based Augmentations and Clique-Expansion-Based HGNNs

We sincerely thank the reviewer for highlighting these representative graph augmentation works. We clarify two key points and provide new experimental comparisons:

Scope Clarification
Methods [1-3] provided by reviewer primarily target graph-level augmentation (designed for graph classification tasks), while our HyperMixup focuses on node-level augmentation (for node classification). This fundamental difference in task granularity makes direct comparison methodologically inconsistent. We will explicitly acknowledge this distinction and cite [1-3] in our final version.

New Node-level Comparisons
Following the reviewer's suggestion, we implemented two recent node-level graph augmentation methods:

[A1] Wang, Y., Wang, W., Liang, Y. et al. Mixup for node and graph classification. (WWW'21)
[A2] Wu, L., Xia, J., Gao, Z. et al. Graphmixup: Improving class-imbalanced node classification by reinforcement mixup and self-supervised context prediction. (ECML-PKDD'22)

We adapted them to HGNN/HGNN+ by:

Performing feature/label mixing on nodes only
Preserving original hyperedges without modification
Using identical hyperparameters from their papers

Results (Accuracy % ± Std)

Backbone	Method	Cora	PubMed	CiteSeer
GNN	Mixup [A1]	81.84±0.94	79.16±0.49	72.20±0.95
	GraphMixup [A2]	82.16±0.74	78.82±0.52	72.13±0.86
HGNN	Mixup [A1]	81.09±0.56	78.02±0.36	70.40±0.86
	GraphMixup [A2]	82.16±0.74	78.82±0.52	72.13±0.86
HGNN+	Mixup [A1]	76.70±0.86	74.90±0.14	66.20±0.84
HGNN	HyperMixup (Ours)	83.62±0.76	79.50±0.88	72.60±0.68
HGNN+	HyperMixup (Ours)	84.02±0.52	80.04±0.32	73.02±0.82

Key Observations:

Graph mixup methods [A1,A2] degrade performance on hypergraph backbones (HGNN/HGNN+) due to:
- Ignorance of hyperedge constraints during mixing
- Disruption of group semantics in clique-expanded structures
HyperMixup achieves consistent gains (+1.46-2.32% vs. best baselines) by:
- Preserving hyperedge semantics through structure-aware mixing (Sec 3.2)
- Maintaining feature-hyperedge alignment via adaptive reconstruction (Eq 7-8)

We will add this analysis to final version and include implementation details in the supplement.

Response to Reviewer's Concern: Hypergraph-Specific Motivation

Q: The paper's key motivation—that existing graph augmentation methods are unsuitable for hypergraphs—is debatable given the mathematical equivalence between spectral HGNNs and GNNs on clique-expanded graphs.

A: We deeply appreciate this insightful technical observation. While operator-level equivalence exists for spectral methods, we argue clique expansion fundamentally compromises hypergraph semantics in ways that demand specialized augmentation:

Irreversible Information Loss in Clique Expansion
The transformation where hyperedge becomes a -clique () incurs three types of semantic distortion:

Loss of Higher-Order Cardinality
Clique expansion reduces hyperedges to pairwise edges, destroying the original group interaction context. For example:
- Original hyperedge: (papers at same conference)
- Clique expansion: (loses "conference" group identity)
  This violates the hypergraph axiom: where denotes relational properties.
Edge Explosion Distortion
A hyperedge with cardinality generates edges, creating:
- Spurious pairwise relationships (e.g., and may share no direct connection)
- Inflated node degrees:
  This artificially amplifies the influence of high-cardinality hyperedges.
Weighting Ambiguity
Uniform edge weighting during expansion ignores:
- Heterogeneous node roles (core vs. peripheral members)
- Hyperedge type semantics (e.g., "survey paper" vs. "technical paper" clusters)
  Whereas hyperedges naturally preserve such information through (Eq. 1).

Empirical Evidence of Representation Gap
Implementing the reviewer's suggestion (preprocess clique-expanded graphs with GraphMixup [A2] before HGNN) shows consistent degradation:

Dataset	HGNN (Clique+GraphMixup)	HGNN+HyperMixup	ΔAcc
Cora	81.92 ± 0.81	83.62 ± 0.76	+1.70
PubMed	77.38 ± 0.63	79.50 ± 0.88	+2.12
CiteSeer	70.86 ± 0.72	72.60 ± 0.68	+1.74

Theoretical Justification
Theorem 1 proves HyperMixup regularizes via hyperedge covariance

(Eq. 12). This term cannot be recovered from clique-expanded graphs because: The covariance structures capture fundamentally different relationships: group-wise deviations vs. pairwise differences.

Conclusion: While spectral operators may exhibit mathematical equivalence, augmentation operates on the structural representation where clique expansion irreversibly degrades hypergraph semantics. Our method preserves these higher-order interactions by design.

Response to Reviewer's Concern: Computational Efficiency vs. Performance Gains

Q: The marginal performance gains do not justify HyperMixup's computational overhead, especially given the cubic scaling of hyperedge regularization and absence of runtime analysis.

A: We appreciate the reviewer's focus on practical trade-offs. and provide empirical runtime/memory comparisons (previously omitted due to space):

Dataset	Method	Acc (%)	Time/Epoch (s)	Mem (GB)	Time vs HGNN
Cora	HGNN	82.09	0.8	1.2	1.0×
	HyperMixup	83.62	1.5	1.5	1.9×
PubMed	HGNN	78.60	4.2	4.5	1.0×
	HyperMixup	79.50	7.1	5.2	1.7×
ModelNet40	HGNN	96.80	12.7	8.9	1.0×
	HyperMixup	97.04	17.3	10.1	1.4×

Our implementation shows HyperMixup introduces moderate computational overhead that remains practical. Benchmarking across datasets reveals training time increases of 1.4-1.9× versus vanilla HGNN, with memory footprint growing 20-25% due to synthetic node storage. Crucially, Theoretical Worst-Case of hyperedge regularization scales as $³$ , but |e| is small in most cases (avg. hyperedge size: 3.2 Cora, 4.1 PubMed), which means the complexity of hyperedge regularization manifests linearly in practice. It should be noted that the Robustness Analysis demonstrates our algorithm's superior robustness under low-label rates, achieving optimal performance in such scenarios, which further validates its robustness.

Conclusion

Thank you once again for your detailed review. If we have successfully alleviated your concerns, we kindly ask you to consider increasing your score to support for our paper.