Conjugate Bayesian Two-step Change Point Detection for Hawkes Process
This work employs data augmentation to propose a conjugate Bayesian two-step change point detection method for the Hawkes process.
摘要
评审与讨论
The paper proposes a new Bayesian inference method for change point detection in Hawkes Processes using conditionally conjugate priors and Gibbs Sampling. In their experiments, their new method turns out to perform better than competing methods both in terms of accuracy and speed.
优点
The authors do a good job in motivating the relevance of the model and the need for improved (Bayesian) inference methods. The benchmarks appear to be sound and show clear benefits of the new method compared to alternatives. Obtaining conditionally conjugate priors for such a model is not an easy task and I think the authors do a good job in communicating their theoretical results both in the main text and in the appendix (where they provide the proofs).
缺点
- The synthetic data experiments seem to be relatively limited, showing only the results of a single process with 2 change points. I think that a bigger simulation study with more variation in different aspects (beyond what the stress test study did) would have strengthend the paper.
- The stress test in the ablation study seems to vary the difficulty of the problem, which makes sense. However, the authors only show the results of their own method there, although I see no reason to not also show the results of the other methods for these cases.
问题
- In the synthetic datasets experiments, I wonder whether the data simulation benefits your method over the other methods? Asked differently are your and the other methods only differing in the inference algorithm applied or also in the specific model they assume? In the latter case, simulating data from the same model that is assumed by your method (but not by others?) could potentially bias the results. I would like to understand this point better.
- In terms of innovations of the conditional conjugate approach, what are the concrete innovations over the approach proposed in [30] as cited in your paper?
- You only investigate the use of 1 to 3 basis functions. As someone experienced with splines, this seems relatively little. Can you elaborate why you only need few basis functions in most real world scenarios?
- Is your method implemented somewhere in a user friendly manner such that people can readily apply it to their own data?
局限性
As a non expert for these models, I wonder if your new method limitations also for univariate Hawkes Processes? You only mention limitations for potential multivariate extentions.
Q: The stress test in the ablation study seems to vary the difficulty of the problem, which makes sense. However, the authors only show the results of their own method there, although I see no reason to not also show the results of the other methods for these cases.
A: Thank you for your valuable suggestion. Due to page limit, we only presented the results of the stress tests for our own method in the paper. In the rebuttal PDF, we have included results for other baseline methods as well. It is evident that our method still shows significant advantages. Please see Table 1, 2 and 3 in the rebuttal PDF. Thank you once again for your suggestion and we will add these new results to the camera ready.
Q: In the synthetic datasets experiments, I wonder whether ...... could potentially bias the results. I would like to understand this point better.
A: Thank you for your asking. The data simulation method could indeed favor our model over others because our model assumes a nonlinear Hawkes process, while the SMCPD and SVCPD models assume a linear Hawkes process. To address this potential bias, we modified the SVCPD model to also assume a nonlinear Hawkes process, resulting in the SVCPD+Inhi model. Our model still outperformed all these models, demonstrating the effectiveness and robustness of our approach. Thank you for highlighting this important point.
Q: In terms of innovations of the conditional conjugate approach, what are the concrete innovations over the approach proposed in [30] as cited in your paper?
A: [30] used conditional conjugate methods for parameter estimation without change points, but we made an extension to this method, using conditional conjugate methods for change point detection. This is the main difference between our work and [30].
Q: You only investigate the use of 1 to 3 basis functions. As someone experienced with splines, this seems relatively little. Can you elaborate why you only need few basis functions in most real world scenarios?
A: We verified this through experiments. In fact, we tried 1-5 basis functions. However, due to page limits, we only showed the results for 1-3. We found that having too many basis functions might lead to overfitting, so using fewer basis functions might be more effective.
Q: Is your method implemented somewhere in a user friendly manner such that people can readily apply it to their own data?
A: The code can be run with a single Python file, and anyone can directly call this interface. Once everything is finalized, our code will be made publicly available on GitHub, making it easy for everyone to use on their own data. Thank you for your interest.
Q: As a non expert for these models, I wonder if your new method limitations also for univariate Hawkes Processes? You only mention limitations for potential multivariate extentions.
A: For univaraite Hawkes process, these limitations do not exist.
Thank you. I will keep my (positive) score.
Thank you for your positive feedback on our work. We truly appreciate your recognition and support!
The paper aims to detect change points (in terms of model parameters) in point processes and proposes a conjugate Bayesian two-step change point detection method for Hawkes processes. This is achieved by applying data augmentation and a novel Gibbs sampler for closed-form updates for model parameters. For both synthetic and real-world data, the proposed method demonstrates superior performance in accuracy and efficiency compared to existing baselines.
优点
- The paper is very well-written in its structure, notation, explanation, and discussion. This work is put in the context of literature and is self-contained. All symbols are defined before using them.
- The main paper includes all important pieces with great clarity and necessary details while and concise at the same time.
- Experimental results demonstrate significant improvements from baselines (especially on real-world datasets). The settings are clearly described for readers to interpret the results.
缺点
The results for synthetic data are not as strong as those for real-world data, please see my question below.
问题
Could you help explain the statistical significance of results for synthetic data, e.g., how many runs to average the results for the std.? It seems to me that 1 std. is relatively large compared to the average, and there are large overlaps of 1 std. between different models.
局限性
Limitations are discussed in the paper.
Q: Could you help explain the statistical significance of results for synthetic data, e.g., how many runs to average the results for the std.? It seems to me that 1 std. is relatively large compared to the average, and there are large overlaps of 1 std. between different models.
A: Thanks for your question; this is a very good point. We averaged the results over 5 runs to obtain the final results. The randomness in our experiments comes from the sampling of model parameters from the corresponding parameter posterior and the sampling of the next point given the sampled parameters. The interval of our model takes all this randomness into account, which is why the experimental variance is a bit large.
Thank you for the response. I'll keep my positive score.
Thank you for your positive feedback on our work. We truly appreciate your recognition and support!
This paper considers the Bayesian two-step change point detection model for Hawkes process. Through data augmentation techniques such as the use of Polya-Gamma random variables and the marked Poisson process, conditional conjugacy is achieved and an efficient Gibbs sampler can be designed for posterior computation in the first step. Extensive numerical experiments are provided on both synthetic and real data.
优点
- The paper is well-written and organized.
- Sufficient background is provided, and clear motivation is discussed.
- The targeted problem, i.e. posterior computation for Bayesian change point detection problem, is an interesting, important, and challenging problem.
- The proposed method is concise, and very applicable in real problems.
- Extensive numerical experiments and analyses are conducted.
缺点
The method is only Bayesian in the first step, i.e. the step of obtaining the posterior predictive distribution of the next event occurrence t_{m + 1}. The determination of change point is purely based on whether the next event occurrence lies with in the posterior predictive credible interval, which no longer takes into account of the uncertainty at this step. It would be better and more intuitive if a jointly Bayesian model could be used - of course, this is also more computationally challenging.
问题
- What are the benefits of using beta density as the basis functions for \phi, compared to using say functions based on differently scaled RBF or Matern kernels?
- Just like the sigmoid function can be polya-gamma augmented, the probit function also has its own data augmentation techniques. Can some similar Gibbs sampler be obtained if we use the probit function as \sigma?
- How well is the mixing of the data augmented Gibbs sampler?
局限性
See weaknesses and questions.
Q: The method is only Bayesian in the first step, ...... more computationally challenging.
A: Thank you for your valuable suggestion. As you mentioned, we are indeed only using Bayesian in the first step at the moment. In future work, we will consider using a complete Bayesian approach.
Q: What are the benefits of using beta density as the basis functions for , compared to using say functions based on differently scaled RBF or Matern kernels?
A: This is a good question. We experimented with beta density as the basis functions for because we followed the convention in the previous work [30]. As stated in [30], ``Although basis functions can be in any form, to make the weights indicative of functional connection strength, basis functions are chosen to be probability densities with support ". Furthermore, they restricted the support to to accelerate the computation of . Therefore, beta density is a natural choice.
Q: Just like the sigmoid function can be polya-gamma augmented, the probit function also has its own data augmentation techniques. Can some similar Gibbs sampler be obtained if we use the probit function as ?
A: This is an interesting question. The probit method also has its own data augmentation techniques. Theoretically, a similar Gibbs sampler can also be obtained if we use the probit function to replace . However, we have not tried this method.
Q: How well is the mixing of the data augmented Gibbs sampler?
A: Thank you for your question. In our experiments, we set the burn-in to 90, considering the samples starting from the 91st as samples from the stationary distribution. We found that the mixing of the data augmented Gibbs sampler performed very well.
Thank you to the authors for the detailed reply. My concerns and questions have been partly addressed. I am keeping my rating as is.
Thank you for your positive feedback on our work. We truly appreciate your recognition and support!
This paper proposes a conjugate Bayesian two-step change point detection method for the Hawkes process using data augmentation. It addresses the computational inefficiency of existing methods by providing analytical expressions. The new method proves to be more accurate and efficient, as demonstrated by extensive experiments on both synthetic and real data.
优点
This paper proposes a novel method that ensures the posterior distribution of the bounded historical period Hawkes process is conjugate. Innovatively, it employs Pólya-Gamma variables and marked Poisson processes. This innovative approach is highly useful, extending beyond change point detection to potentially a wide range of applications for the Hawkes model.
Empirical results show the proposed method outperforms counterparts.
缺点
The implementation details have not been adequately discussed and presented. While some figures and details are provided in the appendix, I suggest moving the important ones to the main content and discussing them in detail. Additionally, I recommend including more background information on Pólya-Gamma variables and marked Poisson processes, as these concepts may not be familiar to readers outside this domain.
问题
Gibbs sampling is still used; is that computationally expensive? Could you please discuss the computational complexity of your algorithm?
局限性
Yes, discussed in a separate section.
Q: The implementation details have not been adequately discussed ...... as these concepts may not be familiar to readers outside this domain.
A: Thank you for your suggestion. Due to page limit, some content had to be placed in the appendix. However, we appreciate your feedback and will consider making a more reasonable adjustment to the content placement in the camera-ready. Additionally, we will include more background information on Polya-Gamma variables and marked Poisson processes to aid readers who may not be familiar with these concepts.
Q: Gibbs sampling is still used; is that computationally expensive?
A: Thanks to the data augmentation technique, the data augmented Gibbs sampler has completely analytical expressions, making its use not expensive.
Q: Could you please discuss the computational complexity of your algorithm?
A: Indeed, we have already done this. The discussion on computational complexity is provided in detail in Section 3.3.5, ‘Algorithm, Hyperparameters, and Complexity,’ on page 6, line 215 of the main content.
Thanks for the response. Although augmented Gibbs has analytical expressions, do they need to iterative sampling one variable conditional on other variable? Does it still need to repeat the iterative samplings/calculations for multiple times till convergence? Does it have to be sequentially computed?
Thanks for your reply. Your understanding is right. Gibbs need to iterative sampling one variable conditional on other variable. And it need to repeat the iterative samplings/calculations for multiple times till convergence.
We thank all reviewers for their efforts in providing insightful comments and constructive feedback. We are pleased that the reviewers have recognized the significance of our paper in solving an interesting change point detection problem in Hawkes process [R1, R2, R3, R4], conducting comprehensive numerical experiments [R1, R2, R3, R4], and maintaining clear and concise writing [R1, R2, R3]. In the following, we address reviewers’ comments point by point.
This paper uses data augmentation to design a conjugate Bayesian two-step change point detection algorithm for the Hawkes process. This is more accurate than existing methods that rely on non-conjugate inference. The reviewers agree that this is well-written and clearly motivated, the problem considered is important, and the experiments are extensive. The reviewers have asked many questions which the authors need to address in the revised manuscript.