5.3

/10

Poster4 位审稿人

最低5最高6标准差0.4

2.8

置信度

正确性3.0

贡献度2.5

表达3.0

NeurIPS 2024

Learning Plaintext-Ciphertext Cryptographic Problems via ANF-based SAT Instance Representation

Xinhao Zheng,Yang Li,Cunxin Fan,Huaijin Wu,Xinhao Song,Junchi Yan

OpenReview PDF

提交: 2024-05-14更新: 2024-11-06

摘要

关键词

Boolean Satisfiability ProblemCryptography

评审与讨论

审稿意见

评分: 5置信度: 42024-07-05

The paper explores an approach for predicting the satisfiability of Algebraic Normal Form (ANF) Boolean SAT instances by graph neural networks (GNN). The approach is similar to the framework of NeuroSAT but a new graph structure is proposed for handling the quadratic terms in ANN. Experiments indicate that the proposed approach reaches better accuracy in predicting the satisfiability on certain ANF instances compared with NeuroSAT on CNF and the prediction time is much faster than the running time of SAT solvers.

优点

The paper picks up several interesting and important aspects. First, while CNF-based SAT solvers are dominating and CNF is the most prevalent format in SAT solving, non-CNF constraints, such as ANF, can still be useful in many applications. Encoding them into CNF can harm the performance of SAT solvers and it is desirable to have native solvers for those types of constraints. Second, using machine learning for an end-to-end framework for combinatorial problems has been an active research area.

This paper is clearly written, with technical details at an appropriate level.

缺点

The results of this paper, in my opinion, are not well claimed. The proposed approach based on GNN is only a predictor with errors. In contrast, SAT solvers must find proof for the answers they give, i.e., finding a satisfying assignment or proving unsatisfiability. Comparing the running time of making a prediction by GNN and that of a SAT solver is not fair, which should be clearly stated in the paper.
The proposed approach only considers the ANF format up to order 2 while higher ordered cases are not discussed. This limitation makes it doubtful whether the proposed graph structure can be useful in general.

After the rebuttal session, I decided to raise my score from 4 to 5.

问题

Can the proposed approach be extended to allowing AND terms with more than 2 variables? If so, how should one handle the exponentially many nodes in a graph?
Would it be possible for the proposed approach to output a proof for its predication? For example, can you extract a satisfying assignment or an UNSAT core from the embeddings of the GNN?

局限性

Please see the discussion above.

作者回复

2024-08-07

Response to Reviewer ZcPU (Rating: 4 / Confidence: 4)

Thank you for your thoughtful response and for highlighting the key contributions of our paper. We are pleased that you recognized the appropriate technical depth of our work. Your elaboration on the significance of non-CNF constraints, particularly ANF, and the potential of machine learning in combinatorial problem-solving further underscores the importance of our research. Your astute observations have pinpointed crucial aspects of our work, providing us with an excellent opportunity to expound on the core contributions of our paper. We are delighted to address your points in detail below.

We hope that this response clarifies the importance of our work in dealing with non-CNF constraints (especially ANFs) and in solving combinatorial problems using machine learning techniques. We believe that our approach has the potential to open up new avenues for research and practical applications in areas where traditional CNF-based methods may not be ideal. Our method also has great potential for solving higher-order ANF problems. We remain committed to advancing this area of research. We would appreciate it if you would reconsider your rating.

Q2 "Would it be possible for the proposed approach to output a proof for its predication? For example, can you extract a satisfying assignment or an UNSAT core from the embeddings of the GNN?"

A1 Thank you for your comment. We can provide proof for our prediction using the following methods:

Key-Solving Process: In the plaintext-ciphertext cryptographic problem, CryptoANFNet can determine the assignment for a specific bit of the key by following the key-solving process outlined in Section 4.4 of our paper. Using the key-solving algorithm described, CryptoANFNet can sequentially deduce the value of each key bit. As shown in Table 3 it demonstrates its capability in cryptographic key-solving tasks.
Extracting Satisfying Assignments: CryptoANFNet can also extract satisfying assignments from the embeddings of the literals. To further validate this, we conducted experiments to predict assignments using the embeddings generated by CryptoANFNet on SR datasets. The results, as shown below, illustrate CryptoANFNet's effectiveness in producing assignments from the learned embeddings.

Dataset	SR(25)	SR(5)-high	SR(10)-high
NeuroSAT	57.0%	53.5%	51.0%
CryptoANFNet	71.5%	71.0%	63.5%

W1 "The results of this paper, in my opinion, are not well claimed. The proposed approach based on GNN is only a predictor with errors. (see weaknesses)."

A3 Thank you. We acknowledge the concern regarding the potential for prediction errors in ML-based solvers. No model can guarantee 100% accuracy across all datasets. But, as demonstrated in Section 5.4 of the paper, CryptoANFNet exhibits a significantly faster solving speed compared to traditional solvers. With this rapid solving speed, our model can be effectively utilized as a solver in these practical applications below:

As detailed in Section 4.4 of our paper, using the Key-solving algorithm, CryptoANFNet can sequentially determine the value of each specific bit of the key in a plaintext-ciphertext problem.
Besides, CryptoANFNet can also extract satisfying assignments from the embeddings of the literals, as shown in A1.

In practice, this solver can achieve a preliminary solution for a key or a satisfying assignment with relatively high accuracy. Since the efficiency of verifying the correctness of these solutions is very high, CryptoANFNet offer a significant advantage over traditional solvers in terms of application efficiency, aligning with the primary research objective of our paper.

To demonstrate the feasibility of our approach, we compared the running time of making a prediction by GNN and that of a SAT solver. As shown in Table 4 of our paper, ML-based solvers like CryptoANFNet exhibit nearly 50x faster solving speed. Additionally, Table 3 in the paper and the table in A1 indicate that CryptoANFNet can obtain a preliminary solution for a key or a satisfying assignment with relatively high accuracy.

W2&Q1 "The proposed approach only considers the ANF format up to order 2 while higher ordered cases are not discussed."

A2 Thank you. We will address your concerns from the following three aspects:

In cryptography, MQ (Multivariate Quadratic) problems represented by second-order ANF (Algebraic Normal Form) are of great significance and have wide applications. As explained in Section 3 of our paper, operations commonly used in cryptographic problems—such as modular addition, XOR, AND, left circular shift, and right circular shift—can be effectively represented using second-order ANF formulas. This conversion of cryptographic problems into MQ problems highlights the broad applicability of second-order ANF. Besides, previous research [cite 1] has already underscored the importance and application of MQ problems in cryptography. Second-order ANF is versatile enough to handle many tasks, demonstrating its extensive range of applications.
Second-order ANF formulas are complete, meaning that they can represent any finite-sized circuit or cryptographic problem. Research [cite 2] has shown that CNF formulas can be converted into ANF formulas. Since CNF is complete, ANF formulas are inherently complete as well. Furthermore, higher-order ANF formulas can be converted into second-order ANF formulas. For a native method, consider a higher-order ANF formula like $x_1x_2x_3+x_3x_4=0$ . By introducing the dummy variable $u_1$ and the equation $u_1+x_1x_2=0$ , we can transform the original formula into $u_1x_3+x_3x_4=0$ , effectively converting the higher-order ANF formula into a second-order ANF formula. In practice, more efficient conversion methods[cite 3] are proposed to reduce the number of dummy variables.

评论- Fix the assignment table in Rebuttal A1

2024-08-07

We apologize for posting the wrong table in Rebuttal A1, which is now given as follows If you still have further concerns, or if you are not satisfied by the current responses, please let us know, so that we can update the response ASAP.

A1 .... The results, as shown below, illustrate CryptoANFNet's effectiveness in producing assignments from the learned embeddings.

Dataset	SR(5)	SR(25)
NeuroSAT	91.0%	5.0%
CryptoANFNet	71.0%	35.0%

2024-08-07

Our proposed method can also address higher-order ANF formulas. As stated in line 216 of Section 4.3, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. High-order nodes are used only as intermediate results in the message-passing process. Thus, for high-order ANF formulas, i.e., AND terms with more than 2 variables, we have the similar message-passing process shown below:
- For the message-passing process from vanilla literal node to clause node, we have:
  
  $L^{(t)}_{l2c} = L\_{l2l}(M^T\_{l2l}L^{(t)})$
  
  $[C^{(t)}\_{m,\text{pos}}, C^{(t)}\_{m,\text{neg}}] = M_{l2c}^T L_{\text{msg}}(L^{(t)}_{l2c})$
- For the message-passing process from clause node to vanilla literal node, we have:
  
  $L^{(t)}\_{c2l} = M\_{l2c} C\_{\text{msg}}([C^{(t)}\_{\text{pos}}, C^{(t)}\_{\text{neg}}])$
  
  $L^{(t)}\_m = M\_{l2l} L\_{l2m}(L^{(t)}\_{c2l})$
where $M_{l2l}$ is a sparse adjacency matrix defined by $M_{l2l}(i, j) = 1$ (the i-th vanilla literal appears in the j-th high-order literal) and $M_{l2c}$ is the adjacency matrix defined by $M_{l2c}(i, j) =1$ (the $i$ -th high-order literal appears in the $j$ -th clause). Using the sparse matrix $M_{l2l}$ , we only need to retain the edges between vanilla literal nodes and higher-order nodes, without storing the entire adjacency matrix, which grows exponentially with higher orders. This approach effectively handles datasets comprising SAT instances with literals in various orders. Note that the number of parameters in CryptoANFNet is independent of the number of nodes in the ANF-based graph generated from a SAT instance. Considering the intermediate results stored by CryptoANFNet, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. During each iteration, CryptoANFNet simultaneously computes intermediate result vectors for up to the number of distinct literals in the SAT instance, which do not grow exponentially with higher orders. Thus, our approach can manage SAT instances with a greater variety of literals.

Here, we show the results on high-order ANF formulas, i.e., AND terms with more than 2 variables:

Dataset SR(25) SR(5)-high SR(10)-high
NeuroSAT 57.0% 53.5% 51.0%
CryptoANFNet 71.5% 71.0% 63.5%

where SAT instances in SR(5)-high and SR(10)-high contain clauses with 1-5 order ANF literals, generated in the same way as the SR(n) dataset. It can be seen that the extended CryptoANFNet still outperforms NeuroSAT on second-order ANF formulas dataset SR(25) and better performance on higher-order ANF formulas.

[cite 1] Ding, Jintai, Jason E. Gower, and Dieter S. Schmidt. Multivariate public key cryptosystems. Vol. 25. Springer Science & Business Media, 2006.

[cite 2] Horáček J, Kreuzer M. On conversions from CNF to ANF. Journal of Symbolic Computation. 2020 Sep 1;100:164-86.

[cite 3] Wolf C. Multivariate quadratic polynomials in public key cryptography. Cryptology ePrint Archive. 2005.

2024-08-13

Dear Reviewer ZcPU,

We sincerely appreciate the time you have invested in the review process, as well as the valuable comments and constructive suggestions you have provided. In our rebuttal, we have summarized the main feedback and addressed the key issues, particularly concerning the novelty of our paper and the generalization of our method. However, we have not yet received any further feedback. As the discussion period draws close, we are eager to know whether our responses have satisfactorily addressed your concerns.

It appears there is still some confusion regarding whether our method can be generalized to higher-order ANF formulas, whether a proof can be provided for prediction, and the comparison with speed experiments. In light of this, we have carefully responded to the issues you mentioned and supplemented our experiments. We kindly ask if you are satisfied with our responses.

We hope that our rebuttal and discussion will clarify any potential misunderstandings and contribute to a more comprehensive evaluation of our submission.

Thank you very much for your time and attention. We sincerely appreciate your support and are more than willing to address any further concerns.

Best regards,

The Authors

2024-08-14

Thanks for the rebuttal and it addressed some of my concerns. I am convinced by the high-order ANF part. I also get it that if efficiency is critical in some scenarios, then having a faster SAT solver is important. However, for a fair comparison, I think the author should look at incomplete solvers, which are mostly based on local search, e.g., GSAT, WalkSAT, RoundingSAT, FourierSAT, etc. Those solvers can only give solutions but can not prove UNSAT and they are generally very fast. Again, I believe comparing a predictor with a complete solver is unfair.

Should the author integrate the rebuttal and discuss the incomplete solvers in the final version, I could consider raising my score.

2024-08-14

Thank you for your feedback and for acknowledging the points we addressed in our rebuttal, particularly regarding the high-order ANF and the importance of efficiency in certain scenarios.

We agree on the importance of including incomplete solvers in the evaluation. We will incorporate this consideration into the final version of our paper, comparing the predictor with incomplete solvers on our dataset. The new comparison results are shown in the table below. For GSAT, we performed up to 100 iterations.

Table: Comparing the efficiency of different solvers (Average runtime: (SAT,UNSAT) ms/instance)

Datasets	SR(5)	SR(25)	Simon 3-8-16	Simon 3-16-32	Simon 6-8-16	Simon 6-16-32	Speck 3-8-16	Speck 6-8-16
NeuroSAT	(3,3)	(20,20)	(7,7)	(10,10)	(7,7)	(14,14)	(13,13)	(18,18)
CryptoANFNet (our paper)	(2,2)	(5,5)	(8,8)	(9,9)	(10,10)	(8,8)	(11,11)	(14,14)
WDSat	(36,34)	(2470,5662)	(38,38)	(39,39)	(40,37)	(86,150)	(65,72)	(2593,5060)
CryptoMiniSat	(4,4)	(13491,35912)	(4,4)	(7,9)	(8,9)	(410,1354)	(101,92)	(1671,3900)
Kissat	(2,2)	(4922,14856)	(2,2)	(2,2)	(5,8)	(219,464)	(40,35)	(1484,2919)
GSAT[cite 1]	(8,470)	(10951,10799)	(17,1392)	(2570,2425)	(1870,6970)	(14800,15139)	(281,2659)	(9473,12233)
WalkSAT[cite 2]	(3,640)	(762,744)	(4,6)	(10,12)	(289,26)	(831,899)	(39,480)	(482,538)
RoundingSAT [cite 3]	(3,3)	(36758,50122)	(3,5)	(7,10)	(28,20)	(664,1801)	(23,24)	(29,35)
FourierSAT [cite 4]	(1275,8670)	(9620,9687)	(983,426)	(1779,459)	(8163,416)	(8830,8862)	(8733,8689)	(8799,8912)

[cite 1] https://github.com/Sina-Baharlou/GSAT-WalkSAT

[cite 2] https://gitlab.com/HenryKautz/Walksat

[cite 3] https://gitlab.com/miao_research/roundingsat

[cite 4] https://github.com/vardigroup/FourierSAT

We found that both incomplete solvers, like WalkSAT, and complete solvers, like Kissat, were significantly outperformed in speed by learning-based models such as CryptoANFNet and NeuroSAT when solving SAT instances derived from cryptographic problems. These results indicate that learning-based solvers like CryptoANFNet offer a significant advantage over traditional solvers in terms of application efficiency.

We hope that this addition will address your concerns and contribute to a more comprehensive and balanced evaluation of our work. We will include the results of more solvers in the final version.

Thank you again for your valuable suggestions, and we appreciate your willingness to reconsider your rating.

审稿意见

评分: 5置信度: 22024-07-09

This paper introduces an approach to handling cryptographic problems by transforming them into Boolean Satisfiability (SAT) problems using a graph structure based on Arithmetic Normal Form (ANF) to efficiently manage XOR operations, which are prevalent in cryptography. It proposes CryptoANFNet, a graph learning framework that predicts plaintext-ciphertext satisfiability using a message-passing scheme. CryptoANFNet demonstrates superior scalability, achieving a 50x speedup over heuristic solvers and outperforming the state-of-the-art learning-based SAT solver NeuroSAT in terms of accuracy. Additionally, the paper presents a key-solving algorithm that simplifies ANF-based SAT instances, resulting in improved key decryption accuracy for datasets generated from Simon and Speck algorithms.

优点

This paper formalize the ANF formula as the Multivariate Quadratic ( MQ ) problem that reduce the complexity of the problem
The evaluation results demonstrate a significant acceleration on SMT solving. 50x speedup is surprisingly good.

缺点

The proposed approach of using graph neural network and message passing is less novel considering its close design compared to NeuroSAT.
For performance evaluation and baseline selection, this paper doesn't compare itself to state-of-the-art approaches.
This paper doesn't include popular encryption algorithms, such as AES, in its dataset.

问题

What is the key novelty in the model part when compared to the existing work, NeuroSAT? It seems like the key difference lies in the different message passing design.
Does this paper select the-state-of-the-art works as its baselines? This paper considers some works that are in the form of competition, e.g., [42], while there are some related works, e.g., "Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach".
For dataset with encryption algorithms, this paper generates data with SIMON and SPECK, which are lightweight block cipers and not really popular in real-world applications. Can this paper extend its scope to more popular and complex ciphers, such as AES?

局限性

Limitations are addressed.

作者回复

2024-08-07

Thanks for your valuable comment and for recognizing the significance of our work in formalizing the ANF formula as a Multivariate Quadratic (MQ) problem. We appreciate your acknowledgment of how this approach reduces the complexity of the problem. We are pleased to address our contribution in detail.

We hope this response clarifies the importance of our work in reducing the complexity of cryptographic problems and significantly improving SAT solving efficiency. We believe this has the potential to open up new avenues for research and practical applications. We are committed to continuing this line of research and look forward to making further progress in this area.

W1&Q1 "What is the key novelty in the model part when compared to the existing work, NeuroSAT? It seems like the key difference lies in the different message passing design."

A1 Thank you. CryptoANFNet is inspired by some excellent previous works, like NeuroSAT, but we do not consider our key novelty to be incremental. We will show our key novelty in the following aspects.

Introduction of ANF Formulas: We introduced the input format of SAT instances, ANF (Algebraic Normal Form), for efficiently solving SAT instances in cryptographic problems. ANF formulas can represent SAT instances in cryptographic problems with fewer literals and clauses, whereas CNF formulas, as used in NeuroSAT, often require a larger number of literals and clauses to represent common cryptographic operations like XOR. In this way, ANF formulas are more efficient for representing cryptographic problems.

ANF-Based Graph Representation: We proposed a new ANF-based graph representation to capture the operations in ANF, while NeuroSAT is based on a CNF-based graph representation (LCG). Since the high-order literals are used only as intermediate results in the message-passing process, the parameterized nodes in the ANF-based graph are less than in CNF-based graph.

As shown in Table 1 of our paper, our ANF graph is more efficient than the general graph form used in NeuroSAT. Here, we provide a summary table comparing ANF with two types of CNF-based graph representations, Literal-Clause Graph (LCG) and Variable-Clause Graph (VCG). This demonstrates that our proposed ANF-based graph representation remains efficient.

	Datasets	SR(5)	SR(25)	Simon 3-8-16	Simon 3-16-32	Simon 6-8-16	Simon 6-16-32	Speck 3-8-16	Speck 6-8-16
	#Literals	6	424	25	49	49	97	57	183
CNF(LCG)	#Clauses	75	5492	195	403	735	1519	360	2950
	#Nodes	87	6340	245	501	833	1713	474	3316
	#Literals	6	424	25	49	49	97	57	183
CNF(VCG)	#Clauses	75	5492	195	403	735	1519	360	2950
	#Nodes	81	5916	220	452	784	1616	417	3133
	#Literals	5	25	24	48	48	96	56	128
ANF	#Clauses	11	26	24	48	48	96	64	136
	#Nodes	27	77	72	144	144	288	184	400

Model Structure for ANF-Based Graph Representation:

We proposed a model structure for handling ANF-based graph representation, which have better performance than NeuroSAT in MQ problem and can effectively extend to higher-order ANF formulas. Unlike NeuroSAT, which retains embeddings for all literal nodes that may appear in clauses, the approach for ANF formulas presents a different challenge. Due to the high-order literals in ANF formulas, the number of literal types increases exponentially with the order and retaining embeddings for all possible literals would incur unacceptable costs. Therefore, unlike NeuroSAT, as stated in line 216 of Section 4.3, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. High-order nodes are used only as intermediate results in the message-passing process. For high-order ANF formulas, i.e., AND terms with more than 2 variables, we have the similar message-passing process shown below:
- For the message-passing process from vanilla literal node to clause node, we have:
  
  $L^{(t)}_{l2c} = L\_{l2l}(M^T\_{l2l}L^{(t)})$
  
  $[C^{(t)}\_{m,\text{pos}}, C^{(t)}\_{m,\text{neg}}] = M_{l2c}^T L_{\text{msg}}(L^{(t)}_{l2c})$
- For the message-passing process from clause node to vanilla literal node, we have:
  
  $L^{(t)}\_{c2l} = M\_{l2c} C\_{\text{msg}}([C^{(t)}\_{\text{pos}}, C^{(t)}\_{\text{neg}}])$
  
  $L^{(t)}\_m = M\_{l2l} L\_{l2m}(L^{(t)}\_{c2l})$
where $M_{l2l}$ is a sparse adjacency matrix defined by $M_{l2l}(i, j) = 1$ (the i-th vanilla literal appears in the j-th high-order literal) and $M_{l2c}$ is the adjacency matrix defined by $M_{l2c}(i, j) =1$ (the $i$ -th high-order literal appears in the $j$ -th clause). Using the sparse matrix $M_{l2l}$ , we only need to retain the edges between vanilla literal nodes and higher-order nodes, without storing the entire adjacency matrix, which grows exponentially with higher orders. This approach effectively handles datasets comprising SAT instances with literals in various orders.

2024-08-07

W3&Q3 "For dataset with encryption algorithms, this paper generates data with SIMON and SPECK, which are lightweight block cipers and not really popular in real-world applications. Can this paper extend its scope to more popular and complex ciphers, such as AES?"

A3 Thank you for your comment. Theoretically, the methods proposed in this paper can be extended to more complex block ciphers that can be represented by ANF formulas. However, in practice, the efficiency of traditional solvers limits the generation of datasets for these complex block ciphers. Some block cipher structures (such as the use of S-boxes) lack efficient algorithms for conversion to ANF or CNF formulas. This results in a large number of clauses and literals, making it challenging to generate efficient datasets for the network to train on. Below, we use the AES encryption algorithm to illustrate this issue in detail.

Theoretical Conversion to Boolean Formulas: For each round of AES encryption, there are four main operations: SubBytes, ShiftRows, MixColumns, and AddRoundKey, which can theoretically be represented as Boolean formulas and thus converted into ANF and CNF forms of SAT instances.
- SubBytes: This is a nonlinear operation where each byte is replaced using an S-box. Despite its nonlinearity, we can convert this step into a Boolean formula as follows: Suppose the input byte is $X = x_7x_6x_5x_4x_3x_2x_1x_0$ , and the output byte is $Y = y_7y_6y_5y_4y_3y_2y_1y_0$ . For each possible input value (0 to 255), list the corresponding output value and express it as a Boolean formula. For example, if $X = 00000000$ (i.e., 0x00) and the S-box output is $Y = 01100011$ (i.e., 0x63), we can express this as: $\neg x_7 \land \neg x_6 \land \neg x_5 \land \neg x_4 \land \neg x_3 \land \neg x_2 \land \neg x_1 \land \neg x_0 \rightarrow$ $(y_7 \land y_6 \land \neg y_5 \land \neg y_4 \land y_3 \land \neg y_2 \land y_1 \land y_0)$ By such conversion, this step can be transformed into Boolean formulas.
- ShiftRows: This is a row-shifting operation that can be converted to Boolean formulas as described in Section 3 of the paper.
- MixColumns: This is a linear transformation involving matrix multiplication over a finite field, which can be described using Boolean formulas.
- AddRoundKey: This is an XOR operation, which can be converted to Boolean formulas as described in Section 3 of the paper.
Practical Challenges: We used the AES encryption algorithm's corresponding SAT instance generator as described in [cite 1] to generate datasets in ANF and CNF forms. For a 10-round 128-bit AES encryption, we generated SAT instances with nearly 8000 literals and 160,000 clauses. The scale of these instances makes it difficult for traditional solvers to verify their SAT or UNSAT status, thereby limiting the creation of a valid dataset. To validate this, we tested the solving speed of SAT instances generated by a 1-round 128-bit AES encryption algorithm on the traditional solver Cryptominisat [cite 2]. It took over 2 hours to solve a single instance. This speed is unacceptable for generating training datasets. Furthermore, for both NeuroSAT and our proposed CryptoANFNet, the graph representations of such large-scale SAT instances exceed the capacity of current machines. Therefore, in practice, we cannot test our methods on datasets generated from AES encryption.

[cite 1] https://github.com/meelgroup/aes-cnf-gen

[cite 2] https://github.com/msoos/cryptominisat

In conclusion, while our methods have theoretical applicability to complex block ciphers, practical limitations in traditional solvers' efficiency and dataset generation hinder their current implementation on such large-scale encryption algorithms.

2024-08-07

Note that the number of parameters in CryptoANFNet is independent of the number of nodes in the ANF-based graph generated from a SAT instance. Considering the intermediate results stored by CryptoANFNet, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. During each iteration, CryptoANFNet simultaneously computes intermediate result vectors for up to the number of distinct literals in the SAT instance, which do not grow exponentially with higher orders. Thus, our approach can manage SAT instances with a greater variety of literals. Here, we show the results on high-order ANF formulas, i.e., AND terms with more than 2 variables:

Dataset	SR(25)	SR(5)-high	SR(10)-high
NeuroSAT	57.0%	53.5%	51.0%
CryptoANFNet	71.5%	71.0%	63.5%

where SAT instances in SR(5)-high and SR(10)-high contain clauses with 1-5 order ANF literals, generated in the same way as the SR(n) dataset. It can be seen that the extended CryptoANFNet still outperforms NeuroSAT on second-order ANF formulas dataset SR(25) and better performance on higher-order ANF formulas.

Key solving and satisfying assignment prediction: As detailed in Section 4.4 of our paper, we propose a key solving algorithm to train CryptoANFNet to sequentially determine the value of each specific bit of the key in a plaintext-ciphertext problem. The results is shown in Table 3 of our paper. Additionally, CryptoANFNet can also directly extract satisfying assignments of the key from the embeddings of the literals. Here, we present the results of CryptoANFNet on the task of satisfying assignments, which illustrate CryptoANFNet’s effectiveness in producing assignments from the learned embeddings.

Dataset SR(5) SR(25)
NeuroSAT 91.0% 5.0%
CryptoANFNet 71.0% 35.0%

W2&Q2 "Does this paper select the-state-of-the-art works as its baselines? ... e.g., [42], while there are some related works, e.g., "Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach"."

A2 Thank you for your comment. We referred to the benchmark G4satbench [cite 1] to select the state-of-the-art works as the baseline. In G4satbench, NeuroSAT demonstrates superior performance across most datasets compared to other models. Therefore, we chose NeuroSAT as the state-of-the-art reference in our paper. Furthermore, we compared CryptoANFNet with various baselines, such as DG-DAGRNN (Circuit-SAT [cite 2]) and other baselines in G4satbench, and the following table shows that CryptoANFNet exhibits better performance on different datasets.

Datasets	SR(5)	SR(25)	Simon-3-8-16	Simon-3-16-32	Simon-6-8-16	Simon-6-16-32	Speck-3-8-16	Speck-6-8-16
GCN(LCG)	91.0%	53.5%	73.2%	74.0%	53.5%	52.0%	53.5%	51.5%
GCN(VCG)	90.5%	52.%	75.2%	73.5%	52.0%	53.0%	53.0%	52.0%
GIN(LCG)	91.5%	51.5%	76.2%	74.0%	52.5%	51.5%	53.5%	52.0%
GIN(VCG)	88.0%	52.5%	75.0%	74.0%	54.5%	52.5%	54.5%	52.5%
GGNN(LCG)	91.0%	54.0%	76.5%	74.0%	54.0%	53.0%	53.0%	52.5%
GGNN(VCG)	89.0%	56.0%	76.2%	74.4%	53.5%	52.3%	52.5%	51.5%
NeuroSAT	91.0%	57.0%	74.0%	72.7%	53.0%	51.0%	55.0%	52.5%
DG−DAGRNN(Circuit-SAT[cite 2])	84.0%	50.5%	73.0%	52.0%	51.0%	50.5%	50.5%	51.5%
CryptoANFNet	96.0%	72.0%	76.5%	75.6%	69.0%	66.5%	72.0%	68.5%

[cite 1] Li Z, Guo J, Si X. G4satbench: Benchmarking and advancing sat solving with graph neural networks[J]. arXiv preprint arXiv:2309.16941, 2023.

[cite 2] Amizadeh S, Matusevych S, Weimer M. Learning to solve circuit-sat: An unsupervised differentiable approach[C]//International Conference on Learning Representations. 2019.

2024-08-13

Dear Reviewer 6xjG,

We sincerely appreciate the time you have invested in the review process, as well as the valuable comments and constructive suggestions you have provided. In our rebuttal, we have summarized the main feedback and addressed the key issues, particularly concerning the novelty of our paper and the generalization of our method. However, we have not yet received any further feedback. As the discussion period is nearing its end, we are eager to know whether our responses have satisfactorily addressed your concerns.

There still seems to be some confusion regarding the key novelty of CryptoANFNet in the model part, how we selected the baseline, and whether our method can be generalized to other encryption algorithms. In light of this, we have carefully addressed the issues you raised and supplemented our experiments. We kindly ask if you are satisfied with our responses.

We hope that our rebuttal and discussion will clarify any potential misunderstandings and contribute to a more comprehensive evaluation of our submission.

Thank you very much for your time and attention. We sincerely appreciate your support and are more than willing to address any further concerns.

Best regards,

The Authors

审稿意见

评分: 6置信度: 42024-07-10

The paper strikes an interesting endeavor for introducing machine learning to address the Plaintext-Ciphertext Cryptographic Problems, which is to my best knowledge, new in literature. It lies in the between of AI and security and specifically crypto which goes beyond the line of research either in learning for combinatorics. I think this paper pushes the frontier of AI for discrete math. Overall I think the paper is interesting and worth publishing to the community to briduge the two communities: machine learning and crypto.

优点

the paper is well written which clearly introduces the background, the problem setting and the preliminaries. It is quite informative and could inspire the readers.
specifically, the work introduces machine learning for plaintext-ciphertext satisfiability prediction. It on one hand outpeforms traditional heuristic methods by speed, and on the other hand outperforms peer learning-based NeuroSAT by accuracy.
The authors further show extending their approach to the key decryption problem with a notable performance boost, compared with the one without our devised simplification approach.
the experiments are themselves novel and well designed. The results are comprehensive and informative, which I think would inspire the future work in this emerging area.

缺点

the technical part is a bit simple from the top ML conference perspective, while of course this is often the case for machine learning applications to emerging areas

问题

how the presented techniques could impact the area of machine learning for crypto, in a more broad sense?
Is there any impact or concerns to the world when the presented approach attains more success? What is the fundamental challenge for this line of research?

局限性

N/A

2024-08-07

Q2 "Is there any impact or concerns to the world when the presented approach attains more success? What is the fundamental challenge for this line of research?"

A3 Thank you for your comment. We appreciate the opportunity to discuss the impact and concerns related to our work on the broader field of cryptographic analysis and algorithm design, as follows:

Accelerating Cryptographic Analysis: By providing faster and more efficient solutions to SAT instances derived from cryptographic problems, our work can significantly expedite cryptographic analysis. This acceleration could potentially render previously computationally infeasible attacks feasible, thereby advancing the field of cryptography.
Advancing Cryptographic Algorithm Design and Improvement: Machine learning-based SAT solving in cryptographic analysis can be beneficial in multiple areas beyond direct ANF-to-SAT problem solving. For instance, it could aid in breaking post-quantum cryptographic problems, driving the evolution and enhancement of cryptographic algorithm design.

However, this research faces two primary challenges:

Exponential Growth in Computational Complexity with Increased Rounds: As the number of encryption rounds increases, the computational effort required for decryption typically grows exponentially. Moreover, increased rounds enhance the diffusion property of cryptographic algorithms, where small changes in the input result in significant variations in the output. In our cryptographic instance generation, more rounds mean each bit of the key will be related to more intermediate output bits, increasing the constraints for each key bit and thereby complicating the key-solving process.
Difficulty in Representing Certain Encryption Algorithms as SAT Instances: Some encryption algorithms, particularly those using S-boxes (e.g., AES, DES) or variable key lengths (e.g., Blowfish), are challenging to efficiently represent as SAT instances. For instance, enumerating the byte mappings within an S-box requires a substantial number of clauses and literals, greatly increasing the solving difficulty.

Overall, this line of research not only aims to push the boundaries of cryptographic analysis through improved SAT-solving techniques but also highlights the potential and challenges of integrating machine learning methods in cryptographic research.

2024-08-07

Q1 "how the presented techniques could impact the area of machine learning for crypto, in a more broad sense?"

A2 Thank you for your comment. We are excited to discuss the broader impact and future directions of our work in the area of machine learning for crypto.

Advancing ML-assisted Cryptographic Algorithm Design: Understanding how CryptoANFNet solves cryptographic instances can inspire the development of new, more robust encryption algorithms. By analyzing the methods and efficiencies of CryptoANFNet, researchers can design encryption algorithms that are more resistant to such advanced SAT-solving techniques, leading to more targeted and secure cryptographic solutions.
Development of ML-assisted SAT Solvers for Cryptographic Analysis: Our work proposes the use of machine learning-assisted SAT solvers, which paves the way for the development of specialized hardware accelerators designed for ML-assisted cryptographic analysis. These accelerators could significantly enhance the speed and efficiency of cryptographic problem-solving, making advanced cryptographic analysis more accessible and practical.
Promoting ML-based Automated Cryptographic Analysis Tools: The integration of machine learning methods into cryptographic analysis can lead to the development of more automated and intelligent cryptographic tools. These tools could reduce the need for manual analysis, accelerating cryptographic research and allowing researchers to focus on more complex and innovative problems. By leveraging ML techniques, we can create smarter tools that enhance the overall efficiency and effectiveness of cryptographic analysis.

2024-08-07

Here, we show the results on high-order ANF formulas, i.e., AND terms with more than 2 variables:

Dataset	SR(25)	SR(5)-high	SR(10)-high
NeuroSAT	57.0%	53.5%	51.0%
CryptoANFNet	71.5%	71.0%	63.5%

Key solving and satisfying assignment prediction: As detailed in Section 4.4 of our paper, we propose a key solving algorithm to train CryptoANFNet to sequentially determine the value of each specific bit of the key in a plaintext-ciphertext problem. The results is shown in Table 3 of our paper. Additionally, CryptoANFNet can also directly extract satisfying assignments of the key from the embeddings of the literals. Here, we present the results of CryptoANFNet on the task of satisfying assignments, which illustrate CryptoANFNet’s effectiveness in producing assignments from the learned embeddings.

Dataset SR(5) SR(25)
NeuroSAT 91.0% 5.0%
CryptoANFNet 71.0% 35.0%

作者回复

2024-08-07

Thanks for your valuable comment and for recognizing its innovative aspects and experimental design. Your questions about the conclusions and insights offered by this paper are important and meaningful. We greatly appreciate the opportunity to elaborate on our contributions and address the points you've raised.

We sincerely appreciate your insightful comments on our work. We fully agree that this research has the potential to foster collaboration and knowledge exchange between the machine learning and cryptography communities. Thank you again for your valuable comments and for recognizing the broad impact of our contributions. We are committed to continuing this line of research and look forward to seeing how it impacts future developments in AI-assisted cryptography and beyond.

W1 the technical part is a bit simple from the top ML conference perspective, while of course this is often the case for machine learning applications to emerging areas

A1 Thank you for your comment. We will address the technically non-trivial aspects of our work from the following aspects:

Introduction of ANF Formulas: We introduced the input format of SAT instances, ANF (Algebraic Normal Form), for efficiently solving SAT instances in cryptographic problems. ANF formulas can represent SAT instances in cryptographic problems with fewer literals and clauses, whereas CNF formulas, as used in NeuroSAT, often require a larger number of literals and clauses to represent common cryptographic operations like XOR. In this way, ANF formulas are more efficient for representing cryptographic problems.

	Datasets	SR(5)	SR(25)	Simon 3-8-16	Simon 3-16-32	Simon 6-8-16	Simon 6-16-32	Speck 3-8-16	Speck 6-8-16
	#Literals	6	424	25	49	49	97	57	183
CNF(LCG)	#Clauses	75	5492	195	403	735	1519	360	2950
	#Nodes	87	6340	245	501	833	1713	474	3316
	#Literals	6	424	25	49	49	97	57	183
CNF(VCG)	#Clauses	75	5492	195	403	735	1519	360	2950
	#Nodes	81	5916	220	452	784	1616	417	3133
	#Literals	5	25	24	48	48	96	56	128
ANF	#Clauses	11	26	24	48	48	96	64	136
	#Nodes	27	77	72	144	144	288	184	400

Model Structure for ANF-Based Graph Representation:

We proposed a model structure for handling ANF-based graph representation, which have better performance than NeuroSAT in MQ problem and can effectively extend to higher-order ANF formulas. Due to the high-order literals in ANF formulas, the number of literal types increases exponentially with the order. Therefore, as stated in line 216 of Section 4.3, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. High-order nodes are used only as intermediate results in the message-passing process. For high-order ANF formulas, i.e., AND terms with more than 2 variables, we have the similar message-passing process shown below:
- For the message-passing process from vanilla literal node to clause node, we have:
  
  $L^{(t)}_{l2c} = L\_{l2l}(M^T\_{l2l}L^{(t)})$
  
  $[C^{(t)}\_{m,\text{pos}}, C^{(t)}\_{m,\text{neg}}] = M_{l2c}^T L_{\text{msg}}(L^{(t)}_{l2c})$
- For the message-passing process from clause node to vanilla literal node, we have:
  
  $L^{(t)}\_{c2l} = M\_{l2c} C\_{\text{msg}}([C^{(t)}\_{\text{pos}}, C^{(t)}\_{\text{neg}}])$
  
  $L^{(t)}\_m = M\_{l2l} L\_{l2m}(L^{(t)}\_{c2l})$
where $M_{l2l}$ is a sparse adjacency matrix defined by $M_{l2l}(i, j) = 1$ (the i-th vanilla literal appears in the j-th high-order literal) and $M_{l2c}$ is the adjacency matrix defined by $M_{l2c}(i, j) =1$ (the $i$ -th high-order literal appears in the $j$ -th clause). Using the sparse matrix $M_{l2l}$ , we only need to retain the edges between vanilla literal nodes and higher-order nodes, without storing the entire adjacency matrix, which grows exponentially with higher orders. This approach effectively handles datasets comprising SAT instances with literals in various orders.

2024-08-13

Dear Reviewer uZGu,

We sincerely appreciate the time you have invested in the review process, as well as the valuable comments and constructive suggestions you have provided. In our rebuttal, we have summarized the main feedback and addressed the key issues, particularly regarding the technical novelty of our paper and its impact on the field. However, we have not yet received any further feedback. As the discussion period is nearing its end, we are eager to know whether our responses have satisfactorily addressed your concerns.

We appreciate that you acknowledged the strengths of our approach, analysis, and experiments. Additionally, we have elaborated on the technical novelties of our work and supplemented it with relevant experiments. Furthermore, we have discussed the impact of our method on the field and its future development. We kindly ask if you are satisfied with our responses.

We hope that our rebuttal and discussion will clarify any potential misunderstandings and contribute to a more comprehensive evaluation of our submission.

Thank you very much for your time and attention. We sincerely appreciate your support and are more than willing to address any further concerns.

Best regards,

The Authors

审稿意见

评分: 5置信度: 12024-07-13

In this manuscript, the authors propose an ANF-based graph structure that efficiently handles XOR and AND operations in cryptographic problems. Additionally, they introduce CryptoANFNet, a message-passing neural network model designed to predict the satisfiability of cryptographic problems. The authors also present a key-solving algorithm that enhances key decryption accuracy.

优点

This work introduces efficient representations and learning strategies for encryption problems, which are highly innovative.
In this work, CryptoANFNet excels in predicting satisfiability and key decryption, being up to 50 times faster and more accurate than existing methods.

缺点

In the “Introduction” of the article, the author mentioned that the application of ANF to existing SAT solvers is not as straightforward as CNF, but the article does not explicitly propose a solution that corresponds to it.
Table 1 listed the parameters of SAT problems in CNF and ANF, but the article did not provide the algorithmic level explanations of these results.
The author proposed CryptoANFNet and compared it with NeuroSAT. The results of the tests on different datasets were also listed in Table 2. However, the article lacks further comparison of the two algorithms.
This article lacks a detailed description of the specific steps of the key-solving algorithm.

问题

局限性

Discussed the limitations in Appendix B

2024-08-07

Performance: As shown in Table 2 of our paper, CryptoANFNet outperforms NeuroSAT across different datasets. Additionally, we referred to the benchmark G4satbench [cite 1] to compare CryptoANFNet with various baselines, and the following table show that CryptoANFNet exhibits better performance on different datasets.

Datasets	SR(5)	SR(25)	Simon-3-8-16	Simon-3-16-32	Simon-6-8-16	Simon-6-16-32	Speck-3-8-16	Speck-6-8-16
GCN(LCG)	91.0%	53.5%	73.2%	74.0%	53.5%	52.0%	53.5%	51.5%
GCN(VCG)	90.5%	52.%	75.2%	73.5%	52.0%	53.0%	53.0%	52.0%
GIN(LCG)	91.5%	51.5%	76.2%	74.0%	52.5%	51.5%	53.5%	52.0%
GIN(VCG)	88.0%	52.5%	75.0%	74.0%	54.5%	52.5%	54.5%	52.5%
GGNN(LCG)	91.0%	54.0%	76.5%	74.0%	54.0%	53.0%	53.0%	52.5%
GGNN(VCG)	89.0%	56.0%	76.2%	74.4%	53.5%	52.3%	52.5%	51.5%
NeuroSAT	91.0%	57.0%	74.0%	72.7%	53.0%	51.0%	55.0%	52.5%
DG−DAGRNN(Circuit-SAT[cite 2])	84.0%	50.5%	73.0%	52.0%	51.0%	50.5%	50.5%	51.5%
CryptoANFNet	96.0%	72.0%	76.5%	75.6%	69.0%	66.5%	72.0%	68.5%

[cite 1] Li Z, Guo J, Si X. G4satbench: Benchmarking and advancing sat solving with graph neural networks[J]. arXiv preprint arXiv:2309.16941, 2023

[cite 2] Amizadeh S, Matusevych S, Weimer M. Learning to solve circuit-sat: An unsupervised differentiable approach[C]//International Conference on Learning Representations. 2019.

Model Structures: As described in Section 4.3 of our paper, ANF clauses involve both AND and OR operations. When constructing the ANF graph for CryptoANFNet, there are first-order nodes, positive and negative clause nodes, and high-order nodes like $x_1x_2$ . Messages needs to be passed between clause nodes and high-order nodes, as well as between high-order nodes and first-order nodes (vanilla literals). In contrast, when constructing the CNF graph for NeuroSAT, the LCG representation involves clause nodes and positive and negative literal nodes, with information passing only between clause and literal nodes. To do the message-passing process efficiently, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. As shown in the table in A2, it requires fewer embedding parameters, making the model more efficient.
Adaptability to SAT instances in high-order ANF Formulas: Since high-order nodes are used only as intermediate results in the message-passing process, CryptoANFNet can be adapted to high-order ANF formulas. For high-order ANF formulas, i.e., AND terms with more than 2 variables, we have the similar message-passing process shown below:
- For the message-passing process from vanilla literal node to clause node, we have:
  
  $L^{(t)}_{l2c} = L\_{l2l}(M^T\_{l2l}L^{(t)})$
  
  $[C^{(t)}\_{m,\text{pos}}, C^{(t)}\_{m,\text{neg}}] = M_{l2c}^T L_{\text{msg}}(L^{(t)}_{l2c})$
- For the message-passing process from clause node to vanilla literal node, we have:
  
  $L^{(t)}\_{c2l} = M\_{l2c} C\_{\text{msg}}([C^{(t)}\_{\text{pos}}, C^{(t)}\_{\text{neg}}])$
  
  $L^{(t)}\_m = M\_{l2l} L\_{l2m}(L^{(t)}\_{c2l})$
where $M_{l2l}$ is a sparse adjacency matrix defined by $M_{l2l}(i, j) = 1$ (the i-th vanilla literal appears in the j-th high-order literal) and $M_{l2c}$ is the adjacency matrix defined by $M_{l2c}(i, j) =1$ (the $i$ -th high-order literal appears in the $j$ -th clause). Using the sparse matrix $M_{l2l}$ , we only need to retain the edges between vanilla literal nodes and higher-order nodes, without storing the entire adjacency matrix, which grows exponentially with higher orders. This approach effectively handles datasets comprising SAT instances with literals in various orders. Note that the number of parameters in CryptoANFNet is independent of the number of nodes in the ANF-based graph generated from a SAT instance. Considering the intermediate results stored by CryptoANFNet, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes. During each iteration, CryptoANFNet simultaneously computes intermediate result vectors for up to the number of distinct literals in the SAT instance, which do not grow exponentially with higher orders. Thus, our approach can manage SAT instances with a greater variety of literals.

作者回复

2024-08-07

Thank you for your thoughtful review and for recognizing the innovative aspects of our work. We appreciate your positive assessment of our paper's contribution.Your feedback highlights the key strengths of our research, particularly the introduction of efficient representations and learning strategies for encryption problems. We are grateful for the opportunity to elaborate on our contributions and address any questions you have.

We sincerely appreciate your time and effort in reviewing our manuscript. We are committed to furthering this line of inquiry and contributing to the development of more efficient and effective cryptographic tools. We would sincerely appreciate it if you could reconsider your rating and we are more than happy to address any further concerns you may have.

Q1 "In the “Introduction” of the article, the author mentioned that the application of ANF to existing SAT solvers is not as straightforward as CNF, but the article does not explicitly propose a solution that corresponds to it."

A1 Thank you for your comment. In line 54, we mention that “However, representing ANF as a graph for efficient learning is challenging, and applying it to existing SAT solvers is not as straightforward as CNF” for the following reasons:

Input Format Compatibility: Existing traditional or ML-based SAT solvers predominantly accept CNF (Conjunctive Normal Form) as their input format. When dealing with cryptographic problems formulated in ANF (Algebraic Normal Form), it is necessary to convert ANF to CNF before inputting it into these solvers. This conversion process adds an extra step, making the application of ANF less straightforward compared to CNF.
Lack of ANF-based Graph Structures: Unlike CNF, which has established graph representations such as Literal-Clause Graph (LCG) and Variable-Clause Graph (VCG) used in learning-based SAT solvers, ANF lacks similar graph structures. These graph representations are crucial for learning-based solvers as they facilitate efficient learning and problem-solving. The absence of well-defined ANF-based graph structures poses a significant challenge for applying learning-based approaches to ANF-formulated problems. This gap highlights the need for developing novel ANF-based graph structures to enable efficient learning and application in SAT solvers.

To address this challenge and enhance SAT practice within ANF, we propose a novel approach using an ANF-based graph structure to represent SAT instances relevant to cryptographic applications. This approach led to the development of CryptoANFNet, an ML-based SAT solver designed to directly predict the satisfiability of SAT instances in the ANF input format. By leveraging this method, we can effectively handle ANF-based SAT problems without the need for conversion to CNF, thereby streamlining the process and improving efficiency.

Q2 "Table 1 listed the parameters of SAT problems in CNF and ANF, but the article did not provide the algorithmic level explanations of these results."

A2 Thank you for you question. We will provide a further explanation of Table 1 in two steps:

Literal and Clause Counts in ANF and CNF:
- As stated in Section 3, Algebraic Normal Form(ANF) consist of Boolean equations, representing the conjunction of logical formulas in SAT instances. Each variable is referred to as a (vanilla) literal. In a Boolean equation, the right side is always 0 and the Boolean function on the left side, formed by the XOR connection of monomials, is called a clause. Each monomial is either a constant term 1, a (vanilla) literal, or a product of variables(literals).
- As stated in line 45, Conjunctive normal form(CNF) is a conjunction (and-ing) of clauses, with each clause consisting of adisjunction (or-ing) of the true and negated forms of literals.
- For the same cryptographic problem's SAT instance, let $n_a$ and $m_a$ denote the number of literals and clauses in ANF format, respectively, and $n_c$ and $m_c$ denote the number of literals and clauses in CNF format. Due to the different algebraic structure, CNF requires more literals and clauses to represent XOR operations in cryptographic problems. The rows corresponding to #literal and #clause in Table 1 reflect the number of literals and clauses in ANF and CNF formats, respectively.

2024-08-07

Here, we show the results on high-order ANF formulas, i.e., AND terms with more than 2 variables:

Dataset	SR(25)	SR(5)-high	SR(10)-high
NeuroSAT	57.0%	53.5%	51.0%
CryptoANFNet	71.5%	71.0%	63.5%

Q4 "This article lacks a detailed description of the specific steps of the key-solving algorithm."

A4 Thank you. Let us consider a native instance under the Simon encryption algorithm with a 4-bit key $K=\overline{k_3k_2k_1k_0}$ and plaintext-ciphertext pair $X$ and $Y$ . We will follow the three-step process described in line 266 of Section 4.4 to determine the value of $k_0$ in the key $K$ .

Initial ANF Formulation: Based on the encryption process and the given plaintext-ciphertext pair, we derive the initial ANF formula $x$ .
Guessing the specific bit $k_0$ : We hypothesize $k_3 = 1$ and $k_3 = 0$ separately. By substituting 1/0 into the initial ANF formula $x$ respectively, we reduce the number of variables by one, resulting in two derived SAT instances $x_{k_0=1}$ and $x_{k_0=0}$ .
Solving with CryptoANFNet: Finally, we input $x_{k_0=1}$ and $x_{k_0=0}$ into CryptoANFNet, which outputs two satisfiability scores $s_{k_0=1}$ and $s_{k_0=0}$ . If $s_{k_0=1} > s_{k_0=0}$ , we determine that $k_0 = 1$ ; otherwise, $k_0 = 0$ .

By following this key-solving algorithm, CryptoANFNet efficiently assists in determining the value of each individual key bit in cryptographic problems.

2024-08-07

Parameterized Node Count for Graph Representations:
- In the ANF-based graph, there are first-order literal nodes, positive/negative clause nodes, and unparameterized second-order nodes like $x_1x_2$ . Literal nodes do not distinguish between positive and negative. Each clause with the same sets of literals in an ANF formula is represented by two nodes (positive and negative), indicating the constant term taking 0 or 1.
- In the CNF-based graph, like Literal-Clause Graph (LCG) representation used in NeuroSAT, there are clause nodes and positive/negative literal nodes, with each literal represented by two nodes indicating its true and negated forms $x$ and $\bar{x}$ .
- As mentioned in line 216 of Section 4.3, CryptoANFNet retains embeddings only for first-order nodes (vanilla literals) and clause nodes. High-order nodes are used only as intermediate results in the message-passing process. In comparison, NeuroSAT retains embeddings for all nodes. Consequently, the number of parameterized nodes requiring embeddings in CNF-based LCG representation is #Node = 2×#literal+#clause, whereas in the ANF graph representation, it is #Node = #literal+2×#clause.

This details clarifies the differences between ANF-based and CNF-based (LCG) representations and their implications on the number of literals/clauses in format, and nodes requiring embeddings, as showed in Table 1.

We will further elaborate by providing a summary table below, which compares ANF with two types of CNF-based graph representations, Literal-Clause Graph (LCG) and Variable-Clause Graph (VCG). In VCG, literal nodes are not distinguished as positive or negative. The number of parameterized nodes in CNF-based VCG representation is #Node = #literal+#clause.

	Datasets	SR(5)	SR(25)	Simon 3-8-16	Simon 3-16-32	Simon 6-8-16	Simon 6-16-32	Speck 3-8-16	Speck 6-8-16
	#Literals	6	424	25	49	49	97	57	183
CNF(LCG)	#Clauses	75	5492	195	403	735	1519	360	2950
	#Nodes	87	6340	245	501	833	1713	474	3316
	#Literals	6	424	25	49	49	97	57	183
CNF(VCG)	#Clauses	75	5492	195	403	735	1519	360	2950
	#Nodes	81	5916	220	452	784	1616	417	3133
	#Literals	5	25	24	48	48	96	56	128
ANF	#Clauses	11	26	24	48	48	96	64	136
	#Nodes	27	77	72	144	144	288	184	400

Q3 "The author proposed CryptoANFNet and compared it with NeuroSAT. The results of the tests on different datasets were also listed in Table 2. However, the article lacks further comparison of the two algorithms."

A3 Thank you. We will give a detailed comparison between NeuroSAT and CryptoANFNet from the following aspects.

Input Formats for SAT Instances: CryptoANFNet introduces ANF as the input format for SAT instances, while NeuroSAT uses CNF. ANF formulas can represent SAT instances in cryptographic problems with fewer literals and clauses, whereas CNF formulas, as used in NeuroSAT, often require a larger number of literals and clauses to represent common cryptographic operations like XOR. Thus, ANF formulas are more efficient for representing cryptographic problems.
Graph Representations for Inputs: CryptoANFNet utilizes an ANF-based graph representation, as proposed in our paper, while NeuroSAT relies on a CNF-based graph representation (LCG). Since CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes, as shown in Table 1 of our paper, the ANF-based graph representation in CryptoANFNet is more efficient than the general graph form for SAT in NeuroSAT. To elaborate further, we provide a summary table in A2 that compares ANF with two types of CNF-based graph representations, Literal-Clause Graph (LCG) and Variable-Clause Graph (VCG). This comparison demonstrates that our proposed ANF-based graph representation remains efficient.

2024-08-13

Dear Reviewer imRj,

We sincerely appreciate the time you have invested in the review process, as well as the valuable comments and constructive suggestions you have provided. In our rebuttal, we have summarized the main feedback and addressed the key issues, particularly concerning the novelty of our paper and the generalization of our method. However, we have not yet received any further feedback. As the discussion period is nearing its end, we are eager to know whether our responses have satisfactorily addressed your concerns.

There still seems to be some confusion regarding the introduction of ANF, the comparison of parameters between SAT problems in CNF and ANF, further details on the Key-solving algorithm, and the comparison between CryptoANFNet and NeuroSAT. In light of this, we have carefully addressed the issues you raised and supplemented our experiments. We kindly ask if you are satisfied with our responses.

We hope that our rebuttal and discussion will clarify any potential misunderstandings and contribute to a more comprehensive evaluation of our submission.

Thank you very much for your time and attention. We sincerely appreciate your support and are more than willing to address any further concerns.

Best regards,

The Authors

作者回复

2024-08-07

We sincerely appreciate the reviewers’ time, valuable feedback, and constructive suggestions. Overall, the reviewers have deemed our work as "highly innovative" (imRj), "well-written" (uZGu, ZcPU), and having the appropriate level of technical detail (ZcPU). They have acknowledged our methodology and experiment design as "novel and well-designed" (imRj, uZGu), with "excels" and "outperforms" being common sentiments. Furthermore, the results of our work have been described as "surprisingly good", "comprehensive", and "informative" (6xjG, uZGu). The main concerns are whether our proposed method is limited to 2nd order ANF formulations, whether it can be generalised to other popular ciphers, whether it can deliver a proof for its predication and the difference between the proposed CryptoANFNet and previous work such as NeuroSAT. To preempt this potential misunderstandings that might impact the evaluation of our work, we first restate our contributions compared to the previous works and address the concerns.

Contributions
- Efficient Input Formats and Graph Representations for SAT Instances: Based on the Arithmetic Normal Form (ANF), we propose a graph structure to succinctly represent the excessive XOR operations in cryptographic problems. We then design two ways to encode the AND operations in the ANF-based graph to represent SAT instances derived from cryptographic problems. Our ANF-graph is more efficient than the general graph form for SAT Literal-Clause Graph (LCG) and Variable-Clause Graph (VCG).
- Better Performance: We propose (supervised) learning to solve (for the first time to our knowledge) the challenging cryptographic problem: plaintext-ciphertext satisfiability prediction, which could otherwise be intractable by traditional SAT methods. Our proposed GNN-based classifier CryptoANFNet with ANF-based SAT instances as input, achieves a 50x speedup over heuristic solvers, and outperforms the SOTA learning-based SAT solver NeuroSAT by 96% vs. 91% and 72% vs. 55% accuracy on small and large scale datasets, generated from Simon and Speck algorithms, respectively. Additionally, we referred to the benchmark G4satbench to compare CryptoANFNet with various baselines, and the following table show that CryptoANFNet exhibits better performance on different datasets.
- Efficient Model Structures and Adaptability to SAT instances in high-Order ANF Formulas: We design the CryptoANFNet to retain fewer embedding parameters, making the model more efficient.. As described in Section 4.3 of our paper, ANF clauses involve both AND and OR operations. When constructing the ANF graph for CryptoANFNet, there are first-order nodes, positive and negative clause nodes, and high-order nodes like $x_1x_2$ . To do the message-passing process efficiently, CryptoANFNet only retains embeddings for first-order nodes (vanilla literals) and clause nodes, which making the model more efficient. Since high-order nodes are used only as intermediate results in the message-passing process, CryptoANFNet can be adapted to high-order ANF formulas. Using the sparse adjacency matrix, we only need to retain the edges between vanilla literal nodes and higher-order nodes, without storing the entire adjacency matrix, which grows exponentially with higher orders. This approach effectively handles datasets comprising SAT instances with literals in various orders.
- Key solving algorithm We extend to the key decryption problem. We propose a key-solving algorithm that derives ANF-based SAT instances as further simplified by our devised techniques, from the plaintext and ciphertext, and use the output of two derived SAT instances to infer the key values. It boosts the accuracy by 76.5%->82% and 72%->75%, on datasets generated from Simon and Speck, respectively.
Discussion
- Generalization to high-order ANF formulas: Because high-order nodes are used only as intermediate results in the message-passing process, CryptoANFNet can be adapted to high-order ANF formulas by using the sparse adjacency matrix. CryptoANFNet still outperforms NeuroSAT on second-order ANF formulas dataset and better performance on higher-order ANF formulas.
- Generalization to other ciphers: Theoretically, the methods proposed in this paper can be extended to more complex block ciphers that can be represented by ANF formulas. However, in practice, the efficiency of traditional solvers limits the generation of datasets for some complex block ciphers. Some block cipher structures (such as the use of S-boxes) lack efficient algorithms for conversion to ANF or CNF formulas. This results in a large number of clauses and literals, making it challenging to generate efficient datasets for the network to train on.
- Deliver a proof for the predication: We can provide proof for our prediction using the key solving algorithm. Using the key-solving algorithm described, CryptoANFNet can sequentially deduce the value of each key bit. Besides, CryptoANFNet can also directly extract satisfying assignments of the key from the embeddings of the literals.
- Unfair comparsion of the running time between GNN and that of a SAT solver: We acknowledge the reviewer’s concerns regarding the potential for prediction errors in ML-based solvers. No model can guarantee 100% accuracy across all datasets. But, GNN-based solver can be used to get a preliminary solution for a key or a satisfying assignment with relatively high accuracy. Since the efficiency of verifying the correctness of these solutions is very high, these GNN-based solvers, like CryptoANFNet offer a significant advantage over traditional solvers in terms of application efficiency, aligning with the primary research objective of our paper. There, we perform this comparison to validate the advantages of CryptoANFNet in this regard.

评论- Start discussions with authors, please

2024-08-11

Dear Reviewers,

Thank you for your valuable contributions to the review process. As we enter the discussion phase (August 7-13), I kindly request your active participation in addressing the authors' rebuttals and engaging in constructive dialogue.

Please:

Carefully read the authors' global rebuttal and individual responses to each review.
Respond to specific questions or points raised by the authors, especially those requiring further clarification from you.
Engage in open discussions about the paper's strengths, weaknesses, and potential improvements.
Be prompt in your responses to facilitate a meaningful exchange within the given timeframe.
Maintain objectivity and professionalism in all communications.

If you have any concerns or need guidance during this process, please don't hesitate to reach out to me.

Your continued engagement is crucial for ensuring a fair and thorough evaluation process. Thank you for your dedication to maintaining the high standards of NeurIPS.

Best regards,

Area Chair

最终决定Accept (poster)

2024-09-25

Summary:

The paper proposes CryptoANFNet, a graph neural network approach for predicting satisfiability of Boolean SAT instances in Algebraic Normal Form (ANF), particularly for cryptographic problems. It introduces a new graph structure to efficiently handle XOR operations common in cryptography. CryptoANFNet demonstrates superior performance compared to existing methods like NeuroSAT in terms of accuracy and speed on certain datasets.

Strengths:

Novel approach to handling non-CNF constraints (ANF) which are relevant for cryptographic problems
Efficient graph representation for ANF formulas
Good performance in terms of accuracy and speed compared to baselines
Ability to handle higher-order ANF formulas beyond just quadratic terms

Weaknesses:

Initially limited to 2nd order ANF, though authors later showed it can handle higher orders
Comparison of prediction time to complete SAT solver runtime was seen as unfair by some reviewers
Limited evaluation on more complex ciphers like AES due to practical constraints

Key Discussion Points:

Novelty compared to NeuroSAT: Authors clarified the key differences in input format, graph representation, and model structure.
Handling higher-order ANF: Authors demonstrated how CryptoANFNet can be extended to higher-order formulas efficiently.
Proof generation: Authors showed CryptoANFNet can provide satisfying assignments and solve for encryption keys sequentially.
Comparison to incomplete solvers: In response to reviewer feedback, authors added comparisons to incomplete SAT solvers like GSAT and WalkSAT, showing CryptoANFNet still offers speed advantages.
Applicability to complex ciphers: Authors explained challenges in applying the method to AES due to limitations in generating efficient datasets, not theoretical limitations of the approach.

Overall, the reviewers acknowledged the paper's contributions and innovations, particularly after clarifications in the rebuttal. The main remaining concerns were about fair comparisons to other solvers and the current practical limitations for very complex ciphers. Most reviewers increased or maintained positive scores after the rebuttal, suggesting the paper makes a valuable contribution to the field.