Graph Alignment via Birkhoff Relaxation
First theoretical guarantees on Birkhoff relaxation for Graph Alignment
摘要
评审与讨论
The paper "Graph Alignment via Birkhoff Relaxation" analyzes the Birkhof relaxation of quadratic assignment problems (QAPs) from a theoretical perspective. It considers the setting of inputs following a Gaussian Wigner Model, and gives guarantees based on the dependence of the standard deviation of the perturbation (in the input) on the dimensionality of the problem. The analysis results in a state-of-the-art condition for the provable success of the Birkhoff relaxation. It also indicates a phase transition for standard deviations that decay slower. A brief numerical study indicates that the success of the relaxation might go beyond the provable guarantee and that post-processing rounding schemes seem to yield strong results even in the case where the result of the relaxation itself is provably well-separated from the true permutation.
优缺点分析
The paper addresses an interesting and important problem. To the best of my knowledge, it makes an important theoretical contribution, advancing the provable guarantees in the form of error estimates for the Birkhoff relaxation of the quadratic assignment problem under specific assumptions on the input matrices. Although the numerical results do not necesarily indicate that the error bounds (or conditions) are tight, I consider this work to be an important step towards a better understanding of the Birkhoff relaxation.
In terms of the presentation, the paper is not very easy to follow. This is partly due to the theoretical, mathematically challenging nature of the topic. Yet, moving some more techincal aspects into the appendix to generate space for a little more context to address a broader audience could be helpful (e.g. possibly introducing the Birkhoff polytope formaly). Some specific aspects that could improve the presentation:
- I would have found it helpful if alternative relaxation methods were briefly introduced. For instance, I could imagine that "simplex" (reference [1]) is a weaker relaxation obtained by dropping the sum-to-one constraint along the columns (or rows), but currently understanding the relaxation requires looking at reference [1].
- The variable (equation after l. 110) is not defined but used many times throughout the manuscript. Its meaning can be deduced from the context but this is not friendly for a quick read.
- In line 50, I think it should read (not ), and the definition of follows from in the next line.
- After reading in line 123 that the analysis in [1] is restricted to , and that the work presented here allows , I was surprised to read "we consider a simpler optimization problem corresponding to in (2)" in line 157. It subsequently becomes clear that this is just an intermediate step, but informing the reader about the big picture of the proof strategy would be beneficial.
- The bound is a typo (but it is used correctly, without the square, in the proof).
- I'd recommend to consider different colors for the middle plot of Fig. 1, or insert a title "Birkhoff Relaxation for n=400". As the left plot has the same colors and a legend that does not apply to the middle plot, this can be misleading when having a quick look only.
- Typo in line 260, "left plot" should be "right plot".
问题
Please consider improving the readability of the manuscript for a broader audience. In particular, please address the explicit suggestions for an improved clarity listed under "Strength and Weaknesses".
In the checklist, the authors state that "Our code is a simple Python script to solve convex optimization problems using cvxpy. Such an approach is ubiquitous, so we decided not to make the code open-source. However, we are happy to make it open-source if the reviewers deem it to be essential." Despite the simplicity of the experiments, I strongly recommend making the source code publically available. This allows other researchers to reproduce that paper's results exactly with minimal effort.
局限性
yes
最终评判理由
While I agree with my fellow reviewers that the practical impact could be stronger (including dedicated solvers for competitive runtimes), I think it is a nice theoretical contribution. Moreover, the authors promised to release their code. I will keep my initial rating.
格式问题
no concern
Comment: Please consider improving the readability of the manuscript for a broader audience. In particular, please address the explicit suggestions for an improved clarity listed under "Strength and Weaknesses".
Response: Thank you so much for carefully reading our manuscript, suggesting several expositional improvements, and catching some typos! We are grateful for your time and efforts. We will implement all the suggested changes in the manuscript.
Comment: In the checklist, the authors state that "Our code is a simple Python script to solve convex optimization problems using cvxpy. Such an approach is ubiquitous, so we decided not to make the code open-source. However, we are happy to make it open-source if the reviewers deem it to be essential." Despite the simplicity of the experiments, I strongly recommend making the source code publically available. This allows other researchers to reproduce that paper's results exactly with minimal effort.
Response: Thank you so much for this comment. We will make the code open source and include it with the paper.
This paper studies the problem of graph alignment in the Gaussian Wigner Model. The goal is to align the vertex sets of two correlated matrices A, B where B has an unknown permutation applied to it. The authors consider the Birkhoff relaxation of the problem, which is a well-known (but poorly studied) relaxation of the QAP formulation. When the correlation between A and B is , the Frobenius norm between the optimal solution and the true permutation matrix is . In turn, this guarantees that a simple rounding technique recovers the true permutation. On the other hand, when the correlation between A and B is then the Frobenius norm between the optimal solution and the true permutation matrix is . However, the latter result does not rule out the possibility of rounding the relaxed solution.
优缺点分析
Analyzing Birkhoff relaxation has been a difficulty in the graph alignment literature, so I view this paper as making a solid step forward, even though the correlation algorithmic guarantees are weaker than those of competing approaches (GRAMPA, simplex relaxation). Still, empirical results show that the Birkhoff relaxation strategy (+ Hungarian algorithm) performs better than those approaches.
The cited works are incomplete. Most crucially, reference [8] is cited but not included in Section 1.3. Also, a recent paper of Gaudio, Racz, and Sridhar (“Average-case and smoothed analysis of graph isomorphism”) provides improved guarantees for matching in the noiseless Erdos-Renyi setting.
Line 215: “the triangle’s inequality”
Line 260: “left plot” should be “right plot”
Corollary 2 uses the greedy approach for matching. However, the simulations use the Hungarian algorithm. Therefore, I don’t see the simulations as validating Theorem 1, but rather as a standalone investigation. This shortcoming should be acknowledged, and there should be an empirical evaluation that uses the greedy approach.
问题
In Equation (3), don’t we need an upper bound on in order to derive a lower bound on ?
I am not following the equality in Line 198. Could you please clarify?
局限性
yes
最终评判理由
Viewed as a theoretical contribution, I support the acceptance of this paper. In their rebuttals, the authors proposed adding several concrete elements to the paper, which I view positively.
格式问题
no
Comment: In Equation (3), don’t we need an upper bound on in order to derive a lower bound on ?
Response: You are correct that we need an upper bound on . We obtain such an upper bound as follows: for some large enough with high probability. The first two inequalities hold for any square matrix. The third inequality holds as is GOE and so its spectral norm is bounded by a constant (say 3). These steps are mentioned in Line 148 of the paper.
Comment: I am not following the equality in Line 198. Could you please clarify?
Response: Thank you for your comment. Define a diagonal matrix with for all such that and otherwise. Also, let to be the unitary matrix with columns . Then, we have . As the Frobenius norm is unitary invariant, we have , and so the result follows. We will mention that we are using unitary invariance of the Frobenius norm to improve the exposition, thank you!
Comment: Corollary 2 uses the greedy approach for matching. However, the simulations use the Hungarian algorithm. Therefore, I don’t see the simulations as validating Theorem 1, but rather as a standalone investigation. This shortcoming should be acknowledged, and there should be an empirical evaluation that uses the greedy approach.
Response: Thank you for raising this interesting point! You are correct to point out that Corollary 2 deals with a simple rounding procedure and not the Hungarian algorithm. Actually, Theorem 1 (Eq. (6) more precisely) also implies the success of the Hungarian rounding procedure. The following simple proof suffice: Let be the correct permutation (WLOG), be the solution of Birkhoff relaxation and . Then, we have by Eq. (6). Also, we have by Theorem 1. Combining the two inequalities above, we get , showing that the Hungarian solution overlaps with the correct permutation for at least fraction of the vertices. We will include this result as another corollary in the paper, thank you for raising this point!
Thank you for addressing my questions! I felt that the authors also answered the questions of the other reviewers comprehensively. Experts on graph alignment view understanding the Birkhoff relaxation as a valuable goal, so I feel this paper makes an important theoretical contribution.
The paper considers the graph-alignment (quadratic assignment) problem under the Gaussian Wigner Model and delivers the first guarantees for the Birkhoff relaxation. It shows a sharp phase transition: when noise is , the relaxed solution remains close to the true permutation, while for noise on it drifts significantly away. The author's analysis relies on a dual certificate that respects the non-negativity constraints of the Birkhoff polytope, combined with sensitivity analysis of convex problems. The experiments are simulated on graphs of up to 500 nodes to verify theoretical results.
优缺点分析
Strengths
-
The paper provides the first formal guarantee for the Birkhoff relaxation in the graph alignment problems and identifies a meaningful threshold on the noise level that separates success from failure.
-
The paper is well written and easy to follow. The main contributions are presented in a straightforward manner, and the core theoretical result is introduced early on, accompanied by helpful intuition.
Weaknesses
-
The relative improvement over prior work is difficult to assess. Although the paper offers a comprehensive literature review, it does not quantitatively compare its theoretical results with existing works. A concise comparison table—summarizing assumptions, noise thresholds, and relaxation types—would help clarify the novelty of this work.
-
The analysis is limited to the correlated Gaussian Wigner model. It is unclear whether the proposed proof techniques can generalize to new settings.
-
While the theory predicts a sharp transition based on the noise level, this phenomenon is not fully explored in the experiments. In particular, it would be valuable to examine how the error evolves more finely across the noise regime, especially when lies between and .
问题
-
The authors should further discuss the comparison with previous works and the generalizability of theoretical frameworks.
-
Would the authors consider sampling more densely within the intermediate range and reporting how the empirical error aligns with the predicted behavior?
-
The reported runtime (up to 3 hours) appears significantly higher than related works on graph matching [1, 2], where solving times are typically reported in seconds. Could the authors explain this difference?
-
The authors mention that the code is simple and thus not released. However, reproducibility is critical, especially for optimization problems where solver settings can influence results.
[1]. Bernard, Florian, Daniel Cremers, and Johan Thunberg. "Sparse quadratic optimisation over the stiefel manifold with application to permutation synchronisation." Advances in Neural Information Processing Systems 34 (2021): 25256-25266.
[2]. Dröge, Hannah, et al. "Kissing to find a match: efficient low-rank permutation representation." Advances in Neural Information Processing Systems 36 (2023): 48459-48471.
局限性
The authors partially address the limitations of their work in Sections 2 and 6. However, a more explicit discussion of the two main limitations mentioned above would strengthen the paper.
最终评判理由
This rebuttal address most of my concerns, and I decide to raise my rating.
格式问题
The subscript “F” in the Frobenius norm should be set in an upright font.
Comment: The relative improvement over prior work is difficult to assess. Although the paper offers a comprehensive literature review, it does not quantitatively compare its theoretical results with existing works. A concise comparison table—summarizing assumptions, noise thresholds, and relaxation types—would help clarify the novelty of this work.
Response: Adding a concise comparison table to compare our results with the literature is a good idea. We will add such a table to the paper.
Table headers: Paper, Algorithm Type, Algorithm Name, Noise Threshold
Row 1: [Ganassali-Lelarge-Massoulie 2022], Top Eigenvector Alignment, EIG1,
Row 2: [Fan-Mao-Wu-Xu 2022], Regularized Convex Relaxation, GRAMPA,
Row 3: [Araya-Tyagi 2023], Simplex Relaxation, Simplex,
Row 4: Our work, Birkhoff Relaxation, Birkhoff,
Caption: Comparison of the spectral and optimization-based algorithms for graph alignment on the Gaussian Wigner Model.
Discussion: Our established threshold of improves over the guarantees in [Ganassali-Lelarge-Massoulie 2022] [Araya-Tyagi 2023]. On the other hand, GRAMPA has better theoretical guarantees; however, the Birkhoff relaxation has better empirical performance, hence the need to develop a rigorous analysis for this strategy as well.
[Ganassali-Lelarge-Massoulie 2022] Ganassali, Luca, Marc Lelarge, and Laurent Massoulié. "Spectral alignment of correlated Gaussian matrices." Advances in Applied Probability 54.1 (2022): 279-310.
[Fan-Mao-Wu-Xu 2022] Fan, Z., Mao, C., Wu, Y. et al. Spectral Graph Matching and Regularized Quadratic Relaxations I Algorithm and Gaussian Analysis. Found Comput Math 23, 1511–1565 (2023)
[Araya-Tyagi 2023] Araya, Ernesto, and Hemant Tyagi. "Graph Matching via convex relaxation to the simplex." Foundations of Data Science 7.2 (2023): 464-501.
Comment: The analysis is limited to the correlated Gaussian Wigner model. It is unclear whether the proposed proof techniques can generalize to new settings.
It is an interesting research direction to extend these results to more general input distributions. We discuss below the technical challenges that entail these generalizations:
Extensions of our results: It would be desirable to consider the input matrices sampled from a certain generic distribution with some concentration properties, e.g., subgaussian and subexponential.
Well Separation Result: We believe this part of the theorem can be generalized to more general symmetric matrices with i.id. entries that concentrate sufficiently, e.g., subgaussian concentration would suffice. In particular, we only need the following concentration properties of our random matrices : , , , . We will add that as a remark in the paper.
Small Perturbation Result: The main difficulties are to get a handle on the eigenvalues and eigenvectors of . Below, we outline the instances in the proof where we explicitly use the properties of a GOE matrix:
Eigenvalue separation [Claim 8]: We use tail bounds on the eigenvalues of a random matrix from [5]. The results of [5] are applicable for all Wigner matrices (i.id. subgaussian entries), and so Claim 8 can be extended to Wigner matrices.
Eigenvector Concentration [Claim 7]: To prove Claim 7, we rely on the fact that the orthonormal eigenvectors of a GOE matrix is uniformly distributed on which guarantees that has the same distribution as , where . Thus, we can get upper and lower concentrations on . This step is the bottleneck, as we require such concentration results for general Wigner matrices. We will discuss this difficulty in the paper to highlight the limitations.
Comment: Would the authors consider sampling more densely within the intermediate range and reporting how the empirical error aligns with the predicted behavior?
Response: That is an excellent suggestion! We simulated the Birkhoff relaxation for several values of to estimate the value of for which . We obtain the following values of as the output: 0.105 (for ); 0.077 (for ); 0.063 (for ); 0.054 (for ); 0.050 (for ). A linear regression between these thresholds () and outputs a slope of , verifying the phase transition at . We will add these simulation results to the paper.
Comment: The reported runtime (up to 3 hours) appears significantly higher than related works on graph matching [1, 2], where solving times are typically reported in seconds. Could the authors explain this difference?
Response: Thank you so much for this comment. As the contributions of this paper are the theoretical analysis, and the Birkhoff relaxation is well-known in the literature, we did not attempt to improve the run-times and simply employed the SCS solver using the cvxpy library of Python. The previous work [Araya Tyagi 2023] focuses on the computational times of these convex relaxations and reports a run-time of 35 seconds for the Birkhoff relaxation with using the method of Alternating Direction Method of Multipliers (ADMM). The SCS solver, on the other hand, requires about 15 mins to converge for , suggesting that alternative optimization techniques can perform substantially better computationally. As our focus is on the theoretical analysis, we did not attempt to improve the run-time. As a sanity check, note that our empirical results (Fig. 1 left) closely match those of [Fig. 1a, 1].
[Araya Tyagi 2023] Araya, Ernesto, and Hemant Tyagi. "Graph Matching via convex relaxation to the simplex." Foundations of Data Science 7.2 (2023): 464-501.
Comment: The authors mention that the code is simple and thus not released. However, reproducibility is critical, especially for optimization problems where solver settings can influence results.
Response: Thank you so much for this comment. We will make the code open source and include it with the paper.
Thank you for the thorough response, especially the insightful discussion regarding the method's extensibility. I will raise my rating. To clarify, given the theoretical nature of the paper, I do not believe it is necessary for the authors to optimize the runtime. However, as the method is heavily theory-driven, I believe the authors—being most familiar with its design—are best positioned to improve its efficiency to a more practical level. Such enhancements may be difficult for others to achieve without their direct involvement.
This paper presents a theoretical analysis of Birkhoff relaxation for graph alignment under the Gaussian Wigner Model (claimed as a first). The authors establish theoretical guarantees showing that when , the optimal solution approximates the true permutation with , while solutions become well-separated when . The work employs dual certificate construction techniques to handle non-negativity constraints and provides rigorous mathematical analysis including eigenvalue separations and concentration inequalities. The paper provides a synthetic simulation showcasing its prowess over the traditional baseline.
优缺点分析
Strengths:
- Novel theoretical contribution: A rigorous theoretical analysis of Birkhoff relaxation for graph alignment, which serves as a significant contribution to support the method's empirical success.
- Technical rigor: Mathematical analysis is thorough with dual certification and proofs demonstrate rigorous treatment (eigenvalue separations, concentration inequalities, etc).
- Clear phase transition: Establishes a meaningful theoretical phase regime where alignment succeeds versus fails.
Weaknesses: While I see that the paper seems like a nice theoretical contribution, I still think that the paper reads as something developed in a vacuum. More discussion of why these results are an important event, which is needed for the next step in a practically relevant application, would be a great addition. There is limited discussion on the impact of this particular result proved in the paper.
- Limited practical relevance: The paper fails to connect theoretical insights to downhill ramification (practical algorithm design, impact on real-world applications, etc.).
- Narrow experimental validation: Evaluation restricted to synthetic Gaussian Wigner experiments. The paper is missing evaluations on standard graph matching benchmarks or real-world datasets. Graph alignment has extensive applications in computational biology, social networks, and computer vision; however, there are no specific ablation studies on the impact of specific graph strucutre on the bounds, even empirically would be great to see. Helps talk about future directions.
问题
-
Real-world relevance: Is it a valid question to ask how the noise levels in your theoretical model relate to those encountered in practical applications mentioned in the introduction? What range of σ values is typical in practice? Are applications grouped based on where such a relaxation can be applied and where it cannot be?
-
Scope extension: Can these techniques extend beyond the Gaussian Wigner model to more general graph alignment settings? What are the main technical challenges when you are thinking of, let us say, sub-Gaussian, Bernoulli, etc.? Is there practical relevance to these other extension models?
局限性
It would be great if the authors added a dedicated limitations section discussing the points touched upon in the Weaknesses and Questions section.
最终评判理由
I am updating my score as I found the reponse from the authors partially satisfactory
格式问题
No major formatting issues were observed. The paper follows standard mathematical formatting conventions with proper theorem environments, notation, and references.
Comment: Real-world relevance: Is it a valid question to ask how the noise levels in your theoretical model relate to those encountered in practical applications mentioned in the introduction? What range of values is typical in practice? Are applications grouped based on where such a relaxation can be applied and where it cannot be?
Response: Thank you for your comment. We more broadly discuss the implications of our result for practice.
Real-world relevance: Doubly stochastic relaxations for graph matching problems has been an attractive approach in practice, with strong empirical performance for shape matching [Aflalo-Bronstein-Kimmel 2015] and ordering images in a grid [Nadav-Maron-Lipman 2017]. It is also observed in [Lyzinski et.al. 2015] that Birkhoff relaxation, when combined with indefinite relaxation, yields excellent empirical performance on real datasets. While these results highlight the usefulness of the Birkhoff relaxation to practice due to its attractive computational and empirical performance, there is limited understanding of its theoretical performance. Our paper is one of the first results in filling this gap in the literature.
Analysis of Graph Matching: Graph Matching is an instance of the quadratic assignment problem, which is NP-hard in the worst case. Thus, one cannot guarantee the success of convex relaxations (e.g., Birkhoff) in the worst case. A ubiquitous approach in the literature is then to consider a stylized model, where the inputs are sampled from certain distributions. We take this approach where the problem instance is sampled as a correlated Gaussian Wigner Model with correlation . Imposing such distributional assumptions on the input graphs allows for mathematical analysis.
Role of : Note that is the special case of the graph isomorphism problem. As the value of increases, the correlation between the two graphs reduces, which makes it harder to align them. In other words, noise increases, making it harder to extract the correct signal. Thus, serves as a tuning parameter to increase the hardness of the problem. The takeaway is then to establish that the Birkhoff relaxation succeeds for large values of , establishing the robustness of the relaxation. We make progress in this direction by improving upon the previously known result [Araya-Tyagi 2023] that shows the simplex relaxation succeeds for . These results contribute to explaining the strong empirical performance of the Birkhoff relaxation.
[Aflalo-Bronstein-Kimmel 2015] Aflalo, Yonathan, Alexander Bronstein, and Ron Kimmel. "On convex relaxation of graph isomorphism." Proceedings of the National Academy of Sciences 112.10 (2015): 2942-2947
[Nadav-Maron-Lipman 2017] Nadav, D. Y. M., Haggai Maron, and Yaron Lipman. "DS++: A flexible, scalable and provably tight relaxation for matching problems." ACM Transactions on Graphics 36.6 (2017): a184.
[Lyzinski et.al. 2015] Lyzinski, Vince, et al. "Graph matching: Relax at your own risk." IEEE transactions on pattern analysis and machine intelligence 38.1 (2015): 60-73.
[Araya-Tyagi 2023] Araya, Ernesto, and Hemant Tyagi. "Graph Matching via convex relaxation to the simplex." Foundations of Data Science 7.2 (2023): 464-501.
Comment: Scope extension: Can these techniques extend beyond the Gaussian Wigner model to more general graph alignment settings? What are the main technical challenges when you are thinking of, let us say, sub-Gaussian, Bernoulli, etc.? Is there practical relevance to these other extension models?
Thank you for this insightful question! It is an interesting research direction to extend these results to more general input distributions. We discuss below the technical challenges that entail these generalizations:
Extensions of our results: It would be desirable to consider the input matrices sampled from a certain generic distribution with some concentration properties, e.g., subgaussian and subexponential.
Well Separation Result: We believe this part of the theorem can be generalized to more general symmetric matrices with i.id. entries that concentrate sufficiently, e.g., subgaussian concentration would suffice. In particular, we only need the following concentration properties of our random matrices : , , , . We will add that as a remark in the paper.
Small Perturbation Result: The main difficulties are to get a handle on the eigenvalues and eigenvectors of . Below, we outline the instances in the proof where we explicitly use the properties of a GOE matrix:
Eigenvalue separation [Claim 8]: We use tail bounds on the eigenvalues of a random matrix from [5]. The results of [5] are applicable for all Wigner matrices (i.id. subgaussian entries), and so Claim 8 can be extended to Wigner matrices.
Eigenvector Concentration [Claim 7]: To prove Claim 7, we rely on the fact that the orthonormal eigenvectors of a GOE matrix is uniformly distributed on which guarantees that has the same distribution as , where . Thus, we can get upper and lower concentrations on . This step is the bottleneck, as we require such concentration results for general Wigner matrices. We will discuss this difficulty in the paper to highlight the limitations.
All reviewers found that the paper contains solid theoretical contribution that advances our understanding of the Birkhoff relaxation of the graph alignment problem. As the authors revise their paper, they are urged to give a more thorough discussion of the practical significance of their work, as well as a detailed comparison of the results obtained in this work with those in the literature.