5.8

/10

Poster4 位审稿人

最低5最高7标准差0.8

3.8

置信度

正确性3.0

贡献度2.8

表达2.5

NeurIPS 2024

CausalStock: Deep End-to-end Causal Discovery for News-driven Multi-stock Movement Prediction

Shuqi Li,Yuebo Sun,Yuxin Lin,Xin Gao,Shuo Shang,Rui Yan

OpenReview PDF

提交: 2024-05-08更新: 2024-11-06

TL;DR

We propose a novel news-driven multi-stock movement prediction framework called CausalStock.

摘要

关键词

Causal discoveryStock movement predictionText mining

评审与讨论

审稿意见

评分: 5置信度: 32024-06-18

The paper presents a novel framework, termed CausalStock, for news-driven multi-stock movement prediction. The authors address two key issues in existing methods: the unidirectional nature of stock relations and the substantial noise in news data. CausalStock introduces a lag-dependent temporal causal discovery module and an LLM-based Denoised News Encoder. The experimental results show that CausalStock outperforms strong baselines on six datasets and provides good explainability.

优点

The combination of causal discovery and news denoising for stock movement prediction is novel. The authors emphasize the explainability of their model, which is a significant advantage in financial applications. In addition, the paper also includes detailed ablation studies that highlight the contributions of different components of the model.

缺点

Regarding the Lag-dependent Temporal Causal Discovery module, since there are already many traditional causal discovery methods such as PC [1] and GES [2], there is a lack of ablation studies comparing this module with traditional causal discovery methods in terms of performance and time efficiency.
Regarding the learned causal graph G, I have concerns about the difference between it with a learned correlation matrix. Though it is derived from the causal discovery and Bayesian perspective, in terms of implementation, what is the difference between the correlation matrix? Incorporating a comparison with a correlation matrix in Figure 3b will be convincing.
Lack of complexity analysis and cost time comparison.
Minor comments: The MIE and the Lag-dependent TCD are not explicitly shown in Figure 2. The detailed structure of each module (DNE, Price Encoder, and FCM) is also lacking.

Reference:

[1] Estimating high-dimensional directed acyclic graphs with the PC-algorithm (2005)

[2] Optimal Structure Identification With Greedy Search (2002)

问题

How to make sure the learned G is exactly the causal graphs? It seems that the final loss only considers the prediction accuracy.
How would the model handle sudden market shifts or unprecedented events that significantly impact stock prices?

After rebuttal

Most of my concerns have been addressed. I raise my score to 5.

局限性

The authors discussed the limitations of the proposed method.

作者回复

2024-08-06

Thank you very much for your valuable suggestions! We will try our best to tackle the concerns one by one.

(1) Comparision with the traditional causal discovery method and the correlation matrix

We greatly appreciate your insightful comments! Following your suggestion, we compare our discovered causal matrix with the matrix calculated by the traditional causal discovery methods and the correlation matrix. The performance results are shown as follows, which indicates that our causal discovery module has a significant advantage over others.

Dataset	ACL18		CMIN_US		CMIN_CN
	ACC	MCC	ACC	MCC	ACC	MCC
Correlation Matrix	52.24	0.0220	52.85	0.0230	52.52	0.0210
PC Algorithm	51.51	0.0138	52.04	0.0184	51.90	0.0166
GES Algorithm	51.41	0.0135	51.92	0.0152	51.83	0.0145
CausalStock	63.42	0.2172	54.64	0.0481	56.19	0.1417

Compared to traditional causal methods, our integrated causal discovery and prediction modules form an end-to-end deep learning framework, which can constrain the discovered causal graph by simulating the data generation process. In comparison to correlation-based methods, causal relationships are more suitable for representing stock relationships.

(2) Complexity analysis and cost time comparison.

Thanks for pointing this out. We made a rough estimate of the baselines' complexity on the multi-stock movement prediction task from two perspectives: running time and training cost FLOPs corresponding to the time complexity and the number of parameters corresponding to the space complexity.
We leverage an NVIDIA GeForce RTX 3090 and all the parameter settings of baselines follow the original papers.

The complexity and prediction accuracy results of our method and baselines are shown in the following table.

Model	FLOPs	Parameters	Running Time	Acc on KDD17	Acc on NI225	Acc on FTSE100
LSTM	$2.3 × 10^8$	$5797$	$5m58s$	$51.18$	$50.79$	$50.96$
ALSTM	$3 × 10^8$	$6917$	$6m32s$	$51.66$	$50.60$	$51.06$
Adv-ALSTM	$3 × 10^8$	$6917$	$7m02s$	$51.69$	$50.60$	$50.66$
StockNet	$5.0 × 10^{10}$	$4.4 × 10^6$	$112m$	$51.93$	$50.15$	$50.36$
CausalStock	$1.4 × 10^9$	$5 × 10^5$	$7m58s$	$56.09$	$53.01$	$52.88$

From the table above, it can be seen that we can achieve good performance with comparable complexity.

(3) About Figure 2

We have shown the main structure and the most contributive part details of our model in the submitted version.
Thank you for your reminder. We have made a more comprehensive model structure in the attached pdf file.

(4) About the learning of causal graph

The ground truth causal graph of the stock market is unknown, so we need certain theoretical guarantees to approximate the true causal relation. Under the current setup, we can theoretically prove that this learning process can converge to the true causal graph.
Validity of variational objective:
- Our model holds the following assumptions: Causal Markov Property, Minimality and Structural Identifiability, Correct Specification, Causal Sufficiency, and Regularity of log-likelihood (see Appendix B for a detailed description).
- If we further assume that there is no model misspecification, then the Maximum Likelihood Estimation solution $\theta'$ and the variational posterior distribution of $G$ satisfies $q'_\phi(G)=\sigma(G=G')$ by optimizing the ELBO term with infinite data, where $G'$ is a unique graph.
- In particular, $G'=G^*$ and $p_\theta'(X;G')=p(X;G^*)$ , where $G^*$ is the ground truth graph and $p(X;G^*)$ is the true data generating distribution. Please find reference [1] for a more detailed proof.

[1] Wenbo Gong, Joel Jennings, Cheng Zhang, and Nick Pawlowski. Rhino: Deep causal temporal relationship learning with history-dependent noise. arXiv preprint arXiv:2210.14706, 2022.

(5) About the sudden market shifts or unprecedented events

Thanks for this constructive comment! Our model could handle sudden market shifts or unprecedented events from the following perspectives:
- Incorporating News to Capture Market Dynamics: One of the primary purposes of incorporating news into our model is to capture real-time market dynamics. News data can reflect market changes and sudden events promptly. By processing and analyzing news information quickly, our model can rapidly adjust its predictions based on new information. This allows the model to react swiftly and accurately to sudden market shifts.
- Causal Relationships to Respond to Sudden Events: Our model's ability to learn causal relationships helps it respond effectively to sudden events. For example, if negative news is detected, the causal relationships in the model can identify which stocks might be affected and react accordingly. This mechanism ensures that the model not only considers the directly impacted stocks but also identifies related stocks that might be influenced, providing a comprehensive view of market dynamics.
- Adjusting Priors to Reflect Changes in Causal Relationships: If a sudden event affects the causal relationships between stocks, such as a company ending a partnership, our model can adjust by incorporating the new pre-defined knowledge into the priors. This allows the causal graph to be dynamically updated to reflect the latest market structure and relationship changes, maintaining the accuracy and relevance of the predictions.
Thus, our model could effectively handle sudden market shifts and unprecedented events.

We would appreciate it very much if you could raise the score if the concerns are solved. Thank you!

2024-08-12

Thanks for your response. My concerns are addressed and I have decided to raise the score to 5.

2024-08-14

I would like to extend my sincere gratitude for your insightful and constructive comments on our manuscript. Your positive evaluation and the time you have invested in thoroughly reviewing our work are greatly appreciated. Thank you once again for your kind and supportive review. We are grateful for the opportunity to improve our work based on your suggestions！

审稿意见

评分: 7置信度: 52024-06-23

This work predicts stock movements by inferring the causal relation between stocks and news. News are encoded to structured representation through LLMs to filter out noises. Causal relation is modeled as directional graph and is inferred through variational approach.

优点

The news denoising module is inspiring and might be extent to other applications. The graph inference module shows promise with interesting design feature such as sparseness and knowledge prior.
The paper present comprehensive experiments by evaluating the propose approaches on various datasets and comparing with multiple baselines. The results are promising.
Interpretability is more valued in finance applications and the explainability analysis hits on the pain point of many DL based stock prediction system.

缺点

The size of graph inference module $O(n^2)$ where $n$ is the number of stocks. The graph is soon becoming unlearnable as the $n$ grows due to limited historical data. The benchmarks used here are small, it is unclear whether the performance can persist on a larger stock set.
I think the general framework is not sufficiently novel, neither the graph inference module. I believe the news denoising module is inspiring and worths further investigation. The paper unfortunately doesn't provide such more analysis on the Denoised News Encoder.
There are several typos in the paper such as line 42, 44, etc.

问题

Since graph inference is challenging and can be erroneous, I am curious about the performance without graph module, that is, simply using the news representation and historical price as input features for predictions.
Table 2 shows that "CausalStock w/o lag-dependent TCD" outperforms the proposed method. How do you reconcile the results?

局限性

None

作者回复

2024-08-06

Thank you very much for the insightful and constructive comments! We sincerely appreciate the suggestions to improve our submission. We will address all raised concerns one by one.

(1) About the performance without graph module

Following your suggestion, we ablate the graph module and compare it with our method to explore the value of our causal discovery module. Besides, we compare our causal discovery module with other traditional causal discovery methods, i.e. PC and GES, and the Spearman correlation matrix. The performance results are shown as follows, which indicates that our causal discovery module has a significant advantage over others.

	ACL18		CMIN_US		CMIN_CN
	ACC	MCC	ACC	MCC	ACC	MCC
Correlation Matrix	52.24	0.0220	52.85	0.0230	52.52	0.0210
PC Algorithm	51.51	0.0138	52.04	0.0184	51.90	0.0166
GES Algorithm	51.41	0.0135	51.92	0.0152	51.83	0.0145
No Causal Graph	51.08	0.0102	51.48	0.0106	51.37	0.0102
CausalStock	63.42	0.2172	54.64	0.0481	56.19	0.1417

(2) About the ablation results

Thanks for pointing this out! We have noticed that two ablation results of two datasets were mistakenly swapped in our presentation of the results. We sincerely apologize for this oversight, and we have now corrected it in the following table.

Ablation Type	Ablation Variants	ACL18		CMIN-CN		CMIN-US
		ACC	MCC	ACC	MCC	ACC	MCC
Main Framework	CausalStock w/o lag-dependent TCD	59.19	0.1757	52.93	0.0312	54.97	0.1298
	CausalStock w/o news	58.10	0.1421	53.16	0.0375	54.16	0.1264
Traditional News Encoder	CausalStock with Glove+Bi-GRU	60.78	0.1952	53.87	0.0467	55.13	0.1326
	CausalStock with Bert	61.74	0.2067	53.92	0.0472	55.43	0.1352
	CausalStock with Roberta	61.81	0.2071	54.06	0.0477	55.58	0.1364
Denoised News Encoder	CausalStock with FinGPT	61.92	0.2105	54.30	0.0475	55.67	0.1386
	CausalStock with Llama	62.82	0.2164	54.52	0.0483	55.97	0.1406
	CausalStock (with GPT-3.5)	63.42	0.2172	54.64	0.0481	56.19	0.1417

Thank you for your thorough review again. We will be more diligent in our checks to prevent similar errors in the future. We would appreciate it very much if you could raise the score if the concerns are solved. Thank you!

2024-08-12

Thanks for the new results and correction. I raise my score accordingly. Please include the ablation study in the new version.

2024-08-14

审稿意见

评分: 6置信度: 32024-06-26

This paper proposes a news-driven multi-stock movement prediction model called CausalStock. The paper introduces a Denoised News Encoder, which utilizes LLMs to evaluate news text and obtain denoised representations. It also presents a Lag-dependent temporal causal discovery module to discover causal relations between stocks. Based on the input market information and learned causal graph distribution, CausalStock employs a FCM to make stock movement predictions. The contributions of the paper include the design of the Denoised News Encoder, the Lag-dependent temporal causal discovery module, and the application of FCM for stock movement prediction.

优点

The paper is well-written with a complete and coherent structure.
The idea of using MIE and Lag-dependent TCD to improve the performance of stock prediction is novel and the effective.

缺点

There exist some typos in the paper: a. In line 199, the “or” should be “and”. b. In Figure 2, the abbr of stock APPLE is “AAPL” while in Figure 3 is “APPL”. c. In line 594, the hidden size should be 32 rather than 332. d. In line 603, “it’s” should be “its”.
You’d better place the Related Work section into the main manuscript.
In Appendix B, the explainations of the assumptions are not very sufficient.

问题

Are the sparseness constraints used in the loss function? It doesn’t appear in the Equation 14.
How to calculate the correlation in Section 4.4 and Appendix D? How to obtain the causal strength of a stock in Figure 3(a)?
In Appendix B, the author assume their model satisfies the Causal Markov Property. Does it mean the current state is only correlated with the last state? However, the proposed method is based on lag-dependent as in Equation 3. Does the proposed model really satisfy the Causal Markov Property?
it’s better for the authors give a case study of the method and provide a visualization of the learned causal graphs, which can show their method can indeed obtain information in news to help the prediction of stocks.

局限性

Yes

作者回复

2024-08-06

Thank you very much for your valuable suggestions and your encouraging comments! We will try our best to tackle the concerns.

(1) If sparseness constraints used in the loss function?

Thank you for the question. We use sparseness constraints in the loss function. These constraints are included in the Evidence Lower Bound (ELBO) term, as shown in Equation 13. The final loss function (Equation 14) includes the ELBO term (Equation 13) and the BCE loss term (Equation 13).

(2) About the correlation calculation in Section 4.4

The causal strength graph $\hat{G}$ is designed to evaluate the causal strength. It has the same size as the causal graph $G$ , with each position being a learnable parameter. After averaging the causal strength graph across all time lags, we obtain a matrix that indicates the magnitude of the causal influence of each company on other companies. Figure 3(a) is an example of such a matrix. Then the correlation in Section 4.4 and Appendix D is calculated by Spearman’s rank correlation coefficient between this matrix and the market values of the corresponding company to explore the explainability.

(3) About the Causal Markov Property

The definition of Causal Markov Property [1] is as follows: Given a directed acyclic graph (DAG) $G$ and a joint distribution $p$ , this distribution is said to satisfy Causal Markov Property w.r.t. the DAG $G$ if each variable is independent of its non-descendants given its parents. It is different from the traditional temporal Markov Property.
Under our setup, all the historical nodes we considered are served as the parent nodes of the target nodes. Specifically, in our lag-dependent temporal causal mechanism, even though we set the causal nodes are lag-dependent, once all of them are given, the target nodes are independent of the other unmodeled factors. This is a common assumption in the field of causal discovery, and our model satisfies this assumption.

[1] Jonas Peters, Dominik Janzing, and Bernhard Scho ̈lkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.

(4) About the visualization of the learned causal graphs and news cases

We have provided a visualization of the learned causal graph and some news cases in Figure 3.
For a more detailed illustration, if negative news is detected, then the causal relationships in the model can identify which stocks might be affected and react accordingly by passing the information extracted from the news, which is empowered by the integration of news and causal discovery.

We would appreciate it very much if you could raise the score if the concerns are solved. Thank you!

审稿意见

评分: 5置信度: 42024-07-13

This paper proposes a stock price prediction system based on the history of stock price features and news features, with a model which models the causal relationships between different stacks throughout window of time. The model is based on a temporal causal graph that determines whether the features of stock $i$ (including news and price features) at time $t$ can affect stock $j$ at a future time point (e.g., $t + \ell$ , $\ell \in [L]$ , where $L$ is the maximum time lag). Causal discovery is performed by modeling the posterior of the lag-dependent causal graph given the data, $p(**G** | **X**_{<T})$ --- this is done by optimizing a variational inference objective. The features of a stock $X_t^i$ capture a combination of its price $P_t^i$ and news $C_t^i$ . The system is evaluated on several benchmarks and compared against previous stock prediction systems.

优点

The proposed method seems to be a reasonable solution to the problem and incorporates several components that address the multi-faceted nature of such a real-world problem (e.g., temporal causal discovery, encoding different types of features)
The reported results show modest but consistent improvement over previous stock prediction systems.
There is a nice ablation study on the effect of different components of the system (e.g., w/ vs w/o news, different news encoders, w/ vs w/o lag-dependence in the causal graph).

缺点

The procedure for extracting news features is somewhat limited. I.e., it is based on simply prompting an LLM to evaluate a news text across 5 dimensions, obtaining a 5-dimensional feature vector. It may be possible to obtain more information from the news text by more directly accessing an embedding representation, perhaps via a fine-tuned model. However, the ablation results in this paper do demonstrate the usefulness of the news component.
Although reasonable, the temporal causal graph model can likely be extended to better capture more complex aspects of the evolution of stock prices. For example, it appears to model different stock edges as independent of each other.

问题

Can you elaborate on the choice of variational approximating class of distributions and discuss any possible limitations? It seems that this is product distribution, making all edges independent (though a particular edge is not independent of that edge at a previous point in time). Is that correct? For example, this wouldn't be able to capture phenomena like "sectors" whose stocks are closely tied to each other. It's fine to make such an assumption for tractability, but it warrants discussion.
Can the model capture a causal graph that varies with time? I.e., perhaps in 2010 stock X is not really causally connected to stock Y, but after 2020 they become causally connected. E.g., this might capture a company entering a new sector, or starting a partnership with another company (e.g., Open AI and Microsoft).
How are news articles selected over time? For example, what if there are multiple articles for a stock within a time period or there are none?
Why do you need variables for both the likelihood of an edge $u_{\ell, ji}$ and the likelihood of no edge $v_{\ell, ji}$ ? Since this is a binary variable, can't you have a single variable and let $\sigma_{\ell, ji} = \mathrm{sigmoid}(u_{\ell, ji}')$ ?

A minor thing: the use of the word "remarkably" in a few places in the paper is grammatically incorrect (e.g., "Remarkably, we put the Related work in Appendix E"). You are using it to mean something like "we note that", but the word "remarkably" means something more like "suprisingly".

局限性

There is a short section describing limitations & future work in the appendix, but I would encourage the authors to expand on it and incorporate it into the main paper.

作者回复

2024-08-06

Thank you very much for your valuable suggestions! We will try our best to tackle the concerns one by one.

(1) About the choice reason of Bernoulli distribution

Thank you for the question. The essence of causal relationships is determining whether a change in one variable directly causes a change in another variable. This binary decision process can naturally be modeled using the Bernoulli distribution. For each causal link, we need to determine whether it exists, which is a typical binary problem—either the causal link exists (represented by 1) or it does not exist (represented by 0). The Bernoulli distribution is used to describe this type of binary random variable, making it very suitable for our needs.

(2) About the independence assumption of causal edges

While each causal link is not completely independent, in our proposed Lag-dependent temporal causal discovery, the causal links are conditionally dependent on temporal lags, as shown in Equation 5 of our paper. However, for different stocks, the causal links are indeed assumed to be independent, which simplifies computational complexity, making the model more efficient to train and infer by parallel computation, especially for large-scale stock sets. We also consider the dependent relationships among stocks, such as sector-specific relationships like you mentioned. We included a plug-and-play option in the prior, as shown in Equation 4, allowing the model to incorporate trusted prior knowledge, such as industry knowledge or company collaboration information. This flexibility enables the model to capture more complex dependencies when such information is available.

(3) About the time-varied causal graph

Thanks for your constructive comment. The ground truth causal graph of the stock market is unknown, so we need certain theoretical guarantees to approximate the true causal relation. Under the current setup, We can theoretically prove that this time-invaried causal learning mechanism converges to the true causal graph (see Appendix B).
The point you raised is highly valuable in practical applications. In the future, we can consider adopting meta-learning or incremental learning training methods to update the causal graph iteratively. This way, the model can continue to learn and update based on new data in practical applications, thereby reflecting the dynamic causal relation changes in the market.

(4) About the varied number of news

The number and distribution of news articles are related to the dataset. Below are the statistics of our news datasets:

Dataset	#news		#words
	Mean	Std	Mean	Std
ACL18	4.14	9.92	43.59	50.28
CMIN_CN	5.21	4.99	84.22	62.22
CMIN_US	6.23	6.16	42.97	39.81

For the absence of news data, CausalStock leverages only the price data to construct the causal graph (see Appendix B for details) and subsequently integrates the discovered causal relations and price data for predictions. Even without news data, our model can effectively perform causal inference and make predictions using price data alone.
When there are multiple news articles for a particular stock within a given look-back window, we employ LLMs to filter and score the articles. From this process, we retain up to a maximum number of 30 most relevant articles per day to ensure that the model receives focused and informative input. These selected articles are then encoded and fed into the network, maintaining the model's effectiveness and accuracy in handling multiple news sources.

(5) About the simultaneous modeling of the existence and non-existence likelihood

We model the existence and non-existence likelihood simultaneously, which ensures greater flexibility and could avoid the constrained optimization. Besides, we also perform the idea the reviewer mentioned: only model the causal link existence logits and operate the Sigmoid function to obtain the link probability. We compare these two methods and the results are as follows:

	ACL18		CMIN_US		CMIN_CN
	ACC	MCC	ACC	MCC	ACC	MCC
only-existence	58.21	0.1652	52.32	0.0241	53.96	0.0670
both existence & non-existence	63.42	0.2172	54.64	0.0481	56.19	0.1417

(6) About limitations and future work

Thanks for your constructive suggestions! We have deeply considered it and come up with new thinking results of the limitations.

This paper explores a method that discovers causal relations based on theoretical considerations. In the future, we could try to adopt meta-learning or incremental learning training methods to update the causal graph iteratively, i.e. explore the time-varied causal graph.
While the Bernoulli distribution is suitable for determining whether a causal link exists, if we want to further explore the multi-level nature of causal relationships, more complex distributions might be needed. In the future, we could improve the model in this way.

We would appreciate it very much if you could raise the score if the concerns are solved. Thank you!

2024-08-10

Thank you for your response. I appreciate the clarifications about the questions and limitations raised in the review. Although your response clarifies some questions, there are no major additions or changes that resolve the two main weaknesses raised in the review. I have decided to maintain my score.

评论- About the weakness.

2024-08-13

Thank you for your reminder. We have further considered the two points mentioned in the weakness part and have added some experiments and analysis as follows.

(1) About the Traditional News Encoder (text embedding) and LLM-based Denoised News Encoder (5-dimensional representation)

In the Ablation Study Section of our manuscript, we have compared the performance of Traditional News Encoders (Glove+Bi-GRU, Bert and Roberta text embeddings) and LLM-based Denoised News Encoders (FinGPT, Llama and GPT 3.5). The results have shown that LLM-based Denoised News Encoders are more effective than Traditional News Encoders.

Thanks for pointing this out! In response to your concern, we refine the experiments further to explore the performance of the Traditional News Encoders via some fine-tuned models. Specifically, we employ three classes of fine-tuned and pre-trained text embedding models: i) fine-tuning Bert-base-multilingual-cased and Roberta-base during our CausalStock model training; ii) leveraging two fine-tuned financial text embedding models - FinBert[1] and FinGPT-v3.3. iii) leveraging pre-trained Llama-7b-chat-hf to generate text embedding.

		ACL18		CMIN-CN		CMIN-US
		ACC	MCC	ACC	MCC	ACC	MCC
Traditional News Encoder	with Glove+Bi-GRU	60.78	0.1952	53.87	0.0467	55.13	0.1326
	with Bert(Pre-trained)	61.74	0.2067	53.92	0.0472	55.43	0.1352
	with Bert(Fine-tuned)	61.26	0.2033	53.43	0.0419	55.93	0.1406
	with Roberta(Pre-trained)	61.81	0.2071	54.06	0.0477	55.58	0.1364
	with Roberta(Fine-tuned)	61.75	0.2065	54.02	0.0474	55.63	0.1368
	with FinBert(Pre-trained)	61.72	0.2062	54.01	0.0471	55.61	0.1362
	with FinGPT(Pre-trained)	61.69	0.2060	54.00	0.0470	55.60	0.1360
	with Llama(Pre-trained)	62.20	0.2130	54.40	0.0480	55.85	0.1390
Denoised News Encoder	with FinGPT	61.92	0.2105	54.30	0.0475	55.67	0.1386
	with Llama	62.82	0.2164	54.52	0.0483	55.97	0.1406
	CausalStock(with GPT-3.5)	63.42	0.2172	54.64	0.0481	56.19	0.1417

It can be seen that the denoised news representation generally outperforms traditional text embedding. By analyzing some cases, we found that for the news-driven stock movement prediction task, effectively utilizing key information is much more important than retaining the comprehensive information of news (too much noise), and this is why we propose the Denoised News Encoder.

[1] Araci D. Finbert: Financial sentiment analysis with pre-trained language models[J]. arXiv preprint arXiv:1908.10063, 2019.

(2) About the stock causal independence assumption.

We appreciate your suggestion to model the dependencies between different stock edges, i.e. modeling the variable-dependent causal relations. The initial intention of this assumption is to simplify the complexity of our model. After carefully considering your suggestion, we find that it is feasible to extend our model according to your point.

Modeling Edge Dependencies

Based on the lag-dependent causal mechanism, we propose a variable-dependent causal mechanism that explicitly captures the dependencies among different stock edges. Specifically, each edge $G_{l,ji}$ 's probability is conditioned on the states of all other edges at the same time step $l$ , and the conditional function is the same as the function in the lag-dependent mechanism (see Equation 6 for details). Formally, we extend to model $q_\phi(G_{l,ji} \mid G_{l,\backslash (ji)})$ , where $\backslash (ji)$ indicates the edges except for the $ji$ -th edge.

We implemented the aforementioned variable-dependent approach. The results are as follows:

Ablation Variants	ACL18		CMIN-CN		CMIN-US
	ACC	MCC	ACC	MCC	ACC	MCC
w/o variable-dependent	63.42	0.2172	54.64	0.0481	56.19	0.1417
with variable-dependent	63.50	0.2175	54.60	0.0479	56.25	0.1419

The results show that incorporating a variable-dependent causal mechanism has the potential to enhance model performance. However, the improvements are not uniform and vary depending on the dataset and may depend on other possible factors (like hyperparameter tuning, dataset size, and stock set size), which emphasizes that further validation is needed.

Complexity Analysis

While the above results show a promising performance of the variable-dependent causal mechanism, it significantly increases the computational complexity.

The original lag-dependent model has a time complexity of $O(\text{lag} \times D^2)$ , where $D$ is the number of stocks.
The extended variable-dependent model increases the complexity to $O(\text{lag} \times D^4)$ to incorporate the dependency of every link pair.

This complexity scales with the fourth power of the number of stocks, making it challenging to apply the model to markets with large numbers of stocks.

To sum up, we will conduct further research to comprehensively balance the trade-offs between the modeling flexibility and the computational demands.

2024-08-14

Thank you for the response. I've read your response carefully and updated my review. I encourage the authors to include the discussion on independence assumptions and complexity analysis for future revision.

2024-08-14

We sincerely thank you for your encouraging comments! All the additional experimental results will be included in the revision.

If our response has addressed your concerns, we would greatly appreciate it if you could kindly consider raising the score to support our work. Thank you for your valuable feedback and supportive encouragement!

作者回复

2024-08-07

The more detailed model structure.

最终决定Accept (poster)

2024-09-25

This study introduces CausalStock, a model designed to predict movements in multiple stocks leveraging news data. A key feature of the model is the Denoised News Encoder, which leverages Large Language Models (LLMs) to process news text and extract important information. Besides, the model incorporates a Lag-dependent temporal causal discovery module that models causal links between different stocks. Experiments show the improvement of the proposed approach over baselines.

During the discussion, reviewers actively interacted with authors and concerns raised by reviewers were properly addressed. I’d encourage authors to incorporate these updates into the next version.

Overall, this submission technically sounds with strong empirical results. I’d like to recommend its acceptance.