Though the title says ‘bridging Ethereum and Twitter’, the proposed dataset is focused on NFT transactions and related Twitter accounts. The author should either prove that with this dataset, GNN would have the ability to find other kinds of matching links (i.e., phish-hack EOA nodes and corresponding Twitter accounts), or modify the title to be more precise.
In Section 3.3, the authors claim, ‘we obtain embeddings for all Twitter accounts using the DeepWalk algorithm in the Twitter graph’. Figure 2 also shows structural features given by deep walk. However, as stated in Appendix, only the handcrafted 8 features from C.1 are used.
For task 1, the authors only use matched addresses. Since GNN can pass node features via message-passing mechanism, the semantic information is able to spread around. Whether does the semantic information benefit or harm the prediction of non-matched addresses?
For task 3 is to predict whether one Ethereum address is connected to one Twitter account. In Appendix K, the authors describe "For Twitter accounts that do not have a matched Ethereum address, we ... appending an eight-dimensional zero-vector to the existing structural features." While in Section 4.4, the negative samples are pick non-existing connections between Ethereum addresses and all their matched Twitter accounts. If the negative samples are , why are there Twitter accounts with no Ethereum addresses? If the negative samples are , then does the input contain bias already?