5.5

/10

Poster4 位审稿人

最低3最高8标准差1.8

4.3

置信度

正确性2.8

贡献度2.0

表达2.8

ICLR 2025

Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming

Qian Li,Minghui Ouyang,Tian Ding,Yuyi Wang,Qingjiang Shi,Ruoyu Sun

OpenReview PDF

提交: 2024-09-27更新: 2025-03-02

TL;DR

We prove that constant-depth, constant-width GNNs can solve sparse binary LPs, through the lens of distributed computing and average-case analysis.

摘要

Graph neural networks (GNNs) have recently emerged as powerful tools for solving complex optimization problems, often being employed to approximate solution mappings. Empirical evidence shows that even shallow GNNs (with fewer than ten layers) can achieve strong performance in predicting optimal solutions to linear programming (LP) problems. This finding is somewhat counter-intuitive, as LPs are global optimization problems, while shallow GNNs predict based on local information. Although previous theoretical results suggest that GNNs have the expressive power to solve LPs, they require deep architectures whose depth grows at least polynomially with the problem size, and thus leave the underlying principle of this empirical phenomenon still unclear. In this paper, we examine this phenomenon through the lens of distributed computing and average-case analysis. We establish that the expressive power of GNNs for LPs is closely related to well-studied distributed algorithms for LPs. Specifically, we show that any $d$-round distributed LP algorithm can be simulated by a $d$-depth GNN, and vice versa. In particular, by designing a new distributed LP algorithm and then unrolling it, we prove that constant-depth, constant-width GNNs suffice to solve sparse binary LPs effectively. Here, in contrast with previous analyses focusing on worst-case scenarios, in which we show that GNN depth must increase with problem size by leveraging an impossibility result about distributed LP algorithms, our analysis shifts the focus to the average-case performance, and shows that constant GNN depth then becomes sufficient no matter how large the problem size is. Our theory is validated by numerical results.

关键词

Learning to optimizeshallow graph neural networklinear programmingdistributed algorithmaverage-case analysis

评审与讨论

审稿意见

评分: 8置信度: 42024-11-03

The paper studies the expressive power of constant-depth GNNs for approximatively solving sparse, binary linear programs (LPs), specifically packing/covering LPs. Specifically, the paper embeds a recent distributed LP algorithm into a GNN such that $d$ layers amount to running $d$ rounds of the algorithm. They then show that while the lower bound of Kuhn 2016 rules out these constant depth GNNs performing well in the worst case on all LPs, the average case can be achieved with a probabilistic bound (which is the meat of the contribution).

优点

originality: to my knowledge and google scholar skills, the first such proof => original
quality+ clarity: very crisp presentation, while I do not feel 100% solid in my verification, i feel I could at least follow the proof,and it appears correct to me.
important result if correct, as it gives an upper bound (it remains to be evaluated how tight) on the depth required for a given LP-heuristic GNN

缺点

I think some basic numeric would have rounded off the results and helped readers and practitioners judge the tightness of the bound.
additionally, I think the question of learnability with standard algorithms would have been interesting to study in the specific setting studied

both of these could be addressed with some small scale numerics given the constructive arguments used

additionally, I think making the chain of reduction in generality from "generic LP => sparse LP= > sparse binary LP" + the generality of packing coverage vs other types of LP clearer would additional help non-experts judge the relevance of the work.

问题

my main question is: have you implemented an run your LP? i think numerical issues like precision could play an additional role. additionally, are you aware of an explicit counter example? this + a randomized average case numerical evaluation could really round of this paper in my eyes

2024-11-27

We greatly value the reviewer's suggestions and recommendations. We provide a detailed clarification to address the reviewer’s concerns.

Q1: About the numerical experiments.

R1: In the new version, we've designed a GNN by unrolling our distributed LP algorithm (Algorithm 2), and conducted numerical experiments to validate our main finding (the expressive power of our GNN in solving sparse binary LP). Besides, we've also conducted experiments to compare our GNN with GCN, and the results show that Our GNN achieves much better performance while using significantly fewer parameters. Please see General Response: 2. Experiments for details.

Q2: About the the chain of reductions.

R2: Thank you for this valuable suggestion. In the revised version, we have explicitly clarified the chain of reduction in generality:

"generic LP => packing/covering LP (a.k.a. non-negative LP) => sparse binary LP => row-sparse column-sparse binary LP."

Q3: Have you implemented an run your LP? I think numerical issues like precision could play an additional role.

R3: In the new version, we have designed and implemented the GNN by unrolling Algorithm 2.

When designing the GNN, we've intentionally addressed potential numerical issues. For instance, if $x$ is supposed to be positive, we use $\exp(-\mathrm{ReLU}(x))$ rather than $\exp(-x)$ to prevent excessively large values. In our experiments with the designed GNN, we did not encounter any significant numerical issues.

Q4: Are you aware of an explicit counter example? this + a randomized average case numerical evaluation could really round of this paper in my eyes.

R4: The fractional minimum vertex cover (Fractional MVC) is such an explicit counterexample (cf. first paragraph on page 6 in [JACM16]). Specifically, given a graph $G=(V,E)$ , the goal of MVC is to find a minimum vertex subset $S\subset V$ , such that for each edge in $E$ at least one of its endpoints is in $S$ . Fractional MVC is the linear programming relaxation of MVC, depicted as:

$\max_{\vec x} \quad \sum_{v\in V} x_v$

$\mbox{s.t.}\quad x_u+x_v\leq 1, \forall (u,v)\in E$

$x_u\geq 0$

Note that Fractional MVC is a sparse binary LP. Kuhn proved that (see the first paragraph on page 6 in [JACM16]): for every $k>0$ , there exists a graph $G$ such that every $k$ -round distributed algorithm for the fractional MVC problem has approximation ratios at least $\Omega(n^{c/k^2}/k)$ for a positive constant $c$ . Choosing $k$ appropriately, this implies that to achieve a constant approximation ratio, every Fractional MVC algorithm requires at least $\Omega(\log n/\log\log n)$ rounds.

[JACM16] Kuhn et al. Local computation: Lower and upper bounds. JACM, 2016.

2024-11-27

thank you very much for these improvements, I am quite happy to see numerics. one comment: you should probably follow the guidelines implied by e.g. https://docs.google.com/document/d/1IzXv7LmlmDVNZeEElMfOG3IJVIhoZYSZWA25ZSteg18/edit?tab=t.0 for reporting the results (i.e., make sure that you have a statistically significant edge over the baseline architecture using multipe seeds for training and reporting, or some other way of approximating this, and then doing a suitable statistical test like welch test or whatever hypothesis you want to claim). this is solidly above the bar of usual deep ML papers, but it is also good science

2024-11-27

Thank you very much for your constructive comments. We will follow the guidelines and conduct more experiments to ensure the robustness of our experimental results.

审稿意见

评分: 3置信度: 42024-11-04

This paper analyzes the depth and width of graph neural networks for linear programs, based on existing results for distributed algorithms.

优点

This paper is in general clearly written and is easy to follow.
This paper connects GNNs with distributed algorithms for solving LPs.

缺点

I do not see much novelty in the theory. In particular, I feel that this paper is just translating known results for distributed algorithms to GNN language.
If you fix $\epsilon$ and $\eta$ , then Theorem 2 and Theorem 3 are direct corollaries of Chen et al. (2023). The results have to be more quantitative if you want to have something new.
There are no numerical experiments.

(Chen et al., 2023) Ziang Chen, Jialin Liu, Xinshang Wang, Jianfeng Lu, and Wotao Yin, On representing linear programs by graph neural networks, ICLR 2023.

问题

None

2024-11-27

We appreciate the reviewer’s constructive feedback and provide a detailed clarification to address the reviewer’s concerns:

Q1: I feel that this paper is just translating known results for distributed algorithms to GNN language:

R1: We would like to direct the reviewer to General Response: clarification of novelty where we summarized our contributions. Specifically,

About our technical contribution: our main theorem, Theorem 3, is proved by designing and unrolling a new distributed algorithm, rather than directly unrolling existing algorithms.
About our conceptual contribution: except the connection between distributed LP algorithm and GNN, another crucial observation says that "the performance of GNNs is measured based on the average-case instance". The average-case analysis is necessary in the proof of Theorem 3, and offers a potential explanation for the impressive empirical success of PDHG-net on PageRank [ICML24].

[ICML24] Li et al. PDHG-unrolled learning-to-optimize method for large-scale linear programming. ICML 2024.

Q2: If you fix $\epsilon$ and $\eta$ , then Theorem 2 and Theorem 3 are direct corollaries of Chen et al. (2023).

R2： We would like to respectfully and explicitly clarify that Theorems 2 and 3 are not corollaries of Chen et al. (2023).

The key distinction lies in the fact that the depth (and width) in Theorems 2 and 3 is independent of the instance size $m$ and $n$ . Specifically, if $\epsilon$ and $\eta$ are fixed, the GNN depth is bounded by a universal constant, regardless of how large the LP instance is. In contrast, the GNN depth in Chen et al. (2023) implicitly depends on $m$ and $n$ , and will go to infinite as $m$ and $n$ grow.

Q3: There are no numerical experiments.

R3: We've conducted numerical experiments to validate our findings (Theorem 3) and to support the applicability of our theory. Specifically, we've designed a GNN by unrolling our distributed LP algorithm (Algorithm 2), and then conducted numerical experiments to validate the expressive power of our GNN in solving sparse binary LP. Besides, we've also conducted experiments to compare our GNN with GCN, and the results show that Our GNN achieves much better performance while using significantly fewer parameters. Please see General Response: Experiments for details.

2024-12-03

Hi, there,

As the discussion phase is coming to a close, we would like to ask whether your concerns have been addressed or you have any further comments.

Thank you,

The Authors

审稿意见

评分: 5置信度: 52024-11-04

This paper proposes a connection between GNNs and existing distributed LP algorithms, suggesting that the iterations of distributed LP algorithms correspond to message-passing steps in GNNs. Additionally, the authors use existing results on iteration bounds in distributed LP algorithms to infer bounds on the number of GNN layers.

优点

Existing theoretical results on the representational power of GNNs for optimization problems often rely on strong assumptions about problem size, leaving the complexity of GNN size for these problems largely unexplored. This paper advances this area by drawing connections to existing results from LP algorithms for specific types of LPs.

缺点

The title may overstate the contributions by referring to GNNs for "linear programming," as the authors actually draw on results specific to packing and covering LPs. While these capture certain classes of combinatorial optimization problems, they do not encompass general LPs.
The connection between GNN message passing and factorization-free LP algorithms is straightforward. GNN message passing involves matrix-vector multiplications, allowing any convergence results from first-order LP algorithms that consist of matrix-vector multiplications to be applied here. The results presented in this paper represent only a subset of these. In this respect, the theoretical contribution of this paper appears limited.
The authors emphasize throughout the paper that shallow GNNs are practically effective. However, the specific value of "constant number of layers" suggested remains unclear. My understanding is that for the theories to hold, this constant would likely be very large, rendering it impractical. If so, the theory borrowed from existing distributed LP algorithms may not fully explain the practical utility of shallow GNNs.
The absence of experimental results, even with toy examples, to validate this proposed number of GNN layers is unusual for a machine learning paper. This raises further questions about the practical applicability of the proposed result.

问题

See weakness.

2024-11-27

We appreciate the reviewer’s constructive feedback and provide a detailed clarification to address the reviewer’s concerns:

1. the title may overstate the contributions:

While our main theorem (Theorem 3) focuses on specific classes of LPs, our key observations—(i) the connection between distributed LP algorithms and GNNs for LPs, and (ii) the necessity of average-case analysis—apply to general LPs as well.
The inclusion of "towards" in the title is intended to reflect the exploratory nature of our work, highlighting its contribution to advancing understanding rather than claiming a complete solution. Constructing a constant-depth GNN (e.g., with ≤10 layers) for all LPs seems to be an overly ambitious goal.
That said, we fully understand the reviewer’s concern. To better align with the scope of the paper, we change the title to "Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming".

2. The novelty:

(2.1 The connection between GNN message passing and factorization-free LP algorithms is straightforward):

While we are not entirely certain of the precise definition of "factorization-free LP algorithms," we are not sure about the assertion that any $k$ -iterations of such algorithms can always be simulated by a $k$ -depth GNN. GNNs operate under specific communication constraints, where only variables and constraints directly connected in the bipartite graph (i.e., $u_j$ and $v_j$ where $A_{ij} \neq 0$ ) can exchange information in a single layer. This communication pattern may not align with the operations of all factorization-free LP algorithms.

(2.2 The results presented in this paper represent only a subset of these):

We politely point out that Distributed LP algorithms are not a subset of first-order algorithms. While first-order methods (FOMs) typically involve gradient-based updates or matrix-vector operations, distributed LP algorithms are designed for communication-constrained environments, introducing computational complexity that extends beyond first-order methods.
For example, distributed LP algorithms can include inherently combinatorial computations, such as localized subproblem optimizations, which are outside the standard framework of first-order algorithms (see [SODA06] for an example).
Furthermore, as far as we know, existing first-order algorithms, such as PDHG or PDLP, can also be viewed as distributed LP algorithms. We acknowledge that we missed citing these important works and have included these citations in the new version.

(2.3 The theoretical contribution of this paper appears limited):

Our main theorem, Theorem 3, is proved by designing and unrolling a new distributed algorithm, rather than directly unrolling existing algorithms.
To the best of our knowledge, no existing FOMs can achieve constant-depth complexity when unrolled into a GNN, even for row-sparse column-sparse binary LPs. Such algorithms inherently depend on $m$ and $n$ , either explicitly or implicitly. For instance, their convergence rates often depend on structural properties of the LP (e.g. $\|A\|_2$ ), which scale with the problem size $(m,n)$ . This distinguishes our work, where we explicitly construct constant-depth GNNs for certain LP instances.

[SODA06] Kuhn et al. The price of being near-sighted. SODA 2006.

3. Practical effectiveness:

For Theorem 2, the constant hidden in the big-O notation is $\leq 10$ after a careful analysis.
For Theorem 3, the constant hidden in the big-O notation is $\leq 10\gamma\beta \cdot \log(\gamma\alpha\beta)/\epsilon^2$ .

For instance, if the GNN is required to output a 2-approximate solution for 99% fraction of LP instances where $m=n$ and $nnz(A)=10(m+n)$ , then the upper bound is 54079, independent with the problem size $(m,n)$ .
While the theoretical constants may be large, in the new version, we have empirically demonstrated that a 5-layer GNN designed by unrolling Algorithm 2 is capable of solving sparse binary LPs. Specifically, we've designed a GNN by unrolling our distributed LP algorithm (Algorithm 2), and conducted numerical experiments to validate Theorem 3 (the expressive power of our GNN in solving sparse binary LP). Besides, we've also conducted experiments to compare our GNN with GCN, and the results show that Our GNN achieves much better performance while using significantly fewer parameters. Please see General Response: Experiments for details.

4. Absence of Experiment: We conduct numerical experiments to validate our findings and to support the applicability of our theory. Please refer to General Response: Experiments for experimental results.

2024-11-27

I sincerely thank the authors for their efforts in revising the paper, particularly for adding numerical results, which have addressed some of my initial concerns. Accordingly, I have raised my score. However, I still recommend rejection, as several of my concerns remain unaddressed.

First, I still find the main message of the paper unconvincing. The paper essentially consists of two main components: (i) it acknowledges that shallow GNNs are effective in practice, presenting a newly added set of experiments to support this; and (ii) it provides an upper bound on the depth of GNNs (with an example of calculated value of 54079), which is actually quite large and cannot be considered "shallow." The paper attempts to integrate these two components into a cohesive narrative. However, I find that the connection between the main theoretical results (specifically Theorem 3) and the practical usage of shallow GNNs is not established.

Regarding the issue of overclaiming, I appreciate the authors' decision to change the title, which addresses most of my concerns in this regard. Nevertheless, in the updated paper, you assert that the "observations" apply to general LPs. This is somewhat confusing, as observations typically represent high-level ideas rather than rigorously proven results.

This leads me to the following questions. I would greatly appreciate it, and may consider further raising my score, if you can address them:

In my previous comment, I mentioned that the connection between GNN message passing and factorization-free LP algorithms is straightforward. I apologize for not providing clear definitions at that time. I would like to clarify by distinguishing between different classes of LP algorithms:

(a) Algorithms that use direct solvers (e.g., LU factorization, Cholesky factorization).

(b) Algorithms that can be translated into message passing on a graph, such as bipartite or tripartite graphs. Many first-order methods (FOMs) belong to this class, except those that use factorization for preconditioning. This class also includes certain combinatorial algorithms, such as shortest paths when expressed in a message-passing framework.

In my previous comment, I was specifically referring to algorithms in the second class, noting that it is straightforward to establish a correspondence between these algorithms and GNNs.

Based on this clarification, I have two specific questions for the authors:

In your Observation 1, does the term "any d-round distributed LP algorithm" refer to algorithms in any of the above classes, or is it more general and encompasses multiple classes?
You claim that "distributed LP algorithms can include inherently combinatorial computations, such as localized subproblem optimizations, which are outside the standard framework of first-order algorithms." Could you provide an example of such an algorithm (which includes localized subproblem optimizations) and assert that it can still be simulated by d-round GNNs?

2024-11-27

Thank you for you reply!

Q1: In your Observation 1, does the term "any d-round distributed LP algorithm" refer to algorithms in any of the above classes, or is it more general and encompasses multiple classes?

R1: It refers to Class (b). In fact, the formal definition of distributed LP algorithm is presented in the paragraph "The Distributed Computational Model" in Section 2.1 on Page 3.

Q2: Could you provide an example of such an algorithm (which includes localized subproblem optimizations) and assert that it can still be simulated by d-round GNNs?

R2: Algorithm 3 in [DC2005] is such an distributed algorithm. As first-order methods (FOMs) are those involve gradient-based updates or matrix-vector operations, this algorithm is combinatorial, neither calculates gradient nor do matrix-vector operations. So this algorithm does not belong to FOMs.

We claim that this algorithm can be simulated by a $d$ -round GNNs, because all distributed LP algorithms proceed in the following fashion.

The computation proceeds in rounds. In one round, each processor first executes local computations and then sends messages to its neighbors in the bipartite graph.

Due to the universal approximation property of MLPs, the local computation and the preparation of messages can be simulated by MLPs. So any d-round distributed LP algorithm can be simulated by a d-depth GNN.

Q3: I find that the connection between the main theoretical results (specifically Theorem 3) and the practical usage of shallow GNNs is not established.

R3:

We agree that a 54079-layer GNN is not practical, so there is still a gap between Theorem 3 and empirical phenomenon. It is a natural further direction to make 54079 smaller.
The point of Theorem 3 is that: our upper bound on the GNN depth (O(1)) is asymptotically much better than previous theoretical results (at least $\mathrm{poly}(m+n)$ ), and thus significantly narrows the gap between theoretical explanations and empirical phenomenon. Specifically, though 50479 is a large number, a 50479-layer GNN will be sufficient no matter how large the problem size is.

Q4: Observations typically represent high-level ideas rather than rigorously proven results

R4: We would like to say that our first observation (the connection between GNN-for-LP and distributed LP algorithm) is a rigorously proven result, since both GNN-for-LP and distributed LP algorithms can be formally defined.

[DC2005] Fabian Kuhn and Roger Wattenhofer. Constant-time distributed dominating set approximation. Distributed Computing, 2005

2024-12-02

Thank you for your detailed response. I have further raised my score, and I truly hope that future research can improve the upper bound to a much smaller number.

审稿意见

评分: 6置信度: 42024-11-04

This paper investigates the expressive power of GNNs in representing LPs. The gap between theoretical and practical insights in this area is notable: theoretically, there are no tight upper bounds on the required GNN size (in terms of depth or width) for representing LPs, with upper bounds typically depending on the problem size. However, in practice, GNNs with minimal depth are often sufficient. This study aims to bridge this gap. Focusing on a specific class of LPs (sparse binary LP), it connects GNNs with a distributed algorithm. For this type of LP, the distributed algorithm’s iteration achieves constant steps, suggesting that a GNN with constant depth can reach a certain accuracy level.

优点

The theoretical exposition is clear and rigorous. In the constant-depth analysis, the authors employ the big-O notation O(xxx) with actual constants rather than constants implicitly tied to the LP size. This precision in presentation is commendable, as many similar papers tend to obscure critical constants within the big-O notation, which can be misleading. This paper avoids that pitfall.
The analysis focuses specifically on sparse, binary-coefficient LPs. This approach is sensible, as constructing a constant-size GNN for all LPs would be overly ambitious. Additionally, their analysis highlights the conditions and assumptions required for a constant-size solution—namely, sparsity and boundedness of the maximum values in A. This adds valuable insights for the field.

缺点

My primary concern with this paper is the applicability of its theoretical findings to practical GNNs. Specifically, if one were to train a GNN to solve packing/covering LPs, would the trained model align with the theoretical framework presented here? Do the theoretical observations in this paper truly correspond with practical outcomes?

问题

refer to "weakness"

2024-11-27

Q: My primary concern with this paper is the applicability of its theoretical findings to practical GNNs. Specifically, if one were to train a GNN to solve packing/covering LPs, would the trained model align with the theoretical framework presented here? Do the theoretical observations in this paper truly correspond with practical outcomes?

R: Thank you for raising this insightful question. We've designed a GNN by unrolling our distributed algorithm (Algorithm 2), and conducted numerical experiments to validate Theorem 3 (the expressive power of our GNN in solving sparse binary LPs). Besides, we've conducted experiments to compare our GNN with GCN, and the results show that: compared to GCN, our GNN achieves much better performance while using significantly fewer parameters. Please refer to General Response: experiments for details.

We hope that these experiments could address your concerns.

2024-12-03

Hi, there,

As the discussion phase is coming to a close, we would like to ask whether your concerns have been addressed or you have any further comments.

Thank you,

The Authors

评论- General Responses

2024-11-27

We sincerely thank the reviewers for their valuable comments and constructive feedback. We provide detailed responses individually to each reviewer.

1. Clarification of Novelty

More than one reviewers have expressed concerns about the novelty, suggesting that this paper merely translates known results from LP algorithms. Here, we would like to clarify our approach and findings:

This paper considers a chain of reduction in generality: from "general LPs" to "packing/covering LPs" (a.k.a. non-negative LP), then to "sparse binary LPs", and finally to "row-sparse column-sparse binary LPs".
Both of our key observations--(i) the connection between distributed LP algorithms and GNN-for-LPs and (ii) that the performance of GNNs is measured based on the average-case instance--apply to general LPs. For instance, the average-case analysis offers a potential explanation for the impressive empirical success of PDHG-net on PageRank [ICML24].
For row-sparse column-sparse binary LPs, by unrolling a known distributed algorithm, we show that a constant-depth GNN can solve such LPs even in the worst case.
For sparse binary LPs, by leveraging known impossibility results from distributed LP algorithms, we prove that no constant-depth GNN can solve such LPs in the worst case. However, by designing and unrolling a new distributed LP algorithm (Algorithm 2), we prove that $\exists$ constant-depth GNN can solve such LPs in the average case. This highlights the necessity of average-case analysis, and is formalized in Theorem 3, which is the main technical part of our work.
Besides, we've designed a GNN by unrolling Algorithm 2, and conducted numerical experiments to validate Theorem 3. Besides, we've also conducted experiments to compare our GNN with GCN, and the results show that Our GNN achieves much better performance while using significantly fewer parameters. Please see 2. Experiments for details.

Remark: In this paper, the round complexity of distributed algorithms and the depth complexity of GNNs are analyzed in terms of the problem size $(m,n)$ , and the approximation ratio $\epsilon$ . A GNN depth is said to be constant if it does not grow with the problem size $(m,n)$ .

[ICML24] Li et al. PDHG-unrolled learning-to-optimize method for large-scale linear programming. ICML 2024.

2024-11-27

2. Experiments

We appreciate the reviewers’ suggestions regarding numerical experiments to validate our findings. We have conducted experiments and summarized the outcomes below.

Compared to GCN, our GNN achieves much better performance while using significantly fewer parameters.

LP size	Training Loss (Ours)	Training RP (Ours)	Training RD (Ours)	Test RP (Ours)	Test RD (Ours)
100	0.0086	0.20	0.43	0.20	0.45
500	0.0084	0.20	0.48	0.20	0.47
1000	0.0085	0.20	0.47	0.20	0.46
1500	0.0085	0.20	0.45	0.20	0.46

LP size	Training Loss (GCN)	Training RP (GCN)	Training RD (GCN)	Test RP (GCN)	Test RD (GCN)
100	0.0087	1.0	18.64	1.0	18.62
500	0.0086	0.96	18.37	0.94	18.35
1000	0.0079	1.0	18.33	1.0	18.35
1500	0.0083	1.0	18.33	1.0	18.38

To assess the representational power of our method, we report the training loss after the model has converged. Additionally, we evaluate the relative gap. Using feasibility restoration (see Appendix), the final values $x^{final}$ and $y^{final}$ returned by our GNN are used to compute $x'$ and $y'$ . The relative gaps are then calculated as follows: the relative gap of the primal is denoted as $RP=\frac{\text{obj}(x')-\text{opt}}{\text{opt}}$ and the relative gap of the dual is denoted as $RD=\frac{\text{obj}(y')-\text{opt}}{\text{opt}}$ . A lower value of RP and RD indicate better performance.

The number of parameters in our method(2K) does not exceed 0.8% of the number of GCN parameters(250K), but basically, better results are realized.

Besides, we also conducted numerical experiments to validate Theorem 3 (the expressive power of our GNN in solving sparse binary LP). | Layers | L=1 | L=2 | L=3 | L=4 | L=5 | |---------|-----------|---------|---------|---------|---------| | Training loss | 0.0363 | 0.0088 | 0.0096| 0.0085| 0.0086| We report the relationship between the converged training loss and the number of layers $L$ . Even for very small $L$ , the training loss for convergence can already be close to 0

2024-11-27

For reproducibility, our code to conduct the experiments can be found at https://anonymous.4open.science/r/DistributedLP-GNN-F2EC

AC 元评审

2024-12-20

The manuscript studies the connection of parallel algorithms for linear programming with graphical neural network representation for LP problems, which gives bounds for depth and width of the neural networks based on results from distributed algorithms. Overall, this provides a new angle to analyze GNN representation for linear programs. While the reviewers pointed out some shortcomings of the paper, most of those were addressed during the discussion. Therefore, the metareviewer recommends acceptance of the paper.

审稿人讨论附加意见

One of the major concerns of the reviewers is the lack of numerical experiments, which the authors provided during the discussion phase. The discussion also helps clarify certain novelty aspects of the work.

最终决定Accept (Poster)

2025-01-22

Accept (Poster)