Neural Bayesian Filtering
Neural Bayesian Filtering embeds beliefs and tracks multimodal posteriors by filtering in the embedding space.
摘要
评审与讨论
The paper proposes particle filtering in an embedded space. Therefore distributions are represented via samples which serve as an input to an embedding function that maps the belief to an embedding vector which conditions a normalizing flow. Resampling can then be done by sampling from the normalizing flow.
优缺点分析
The idea of embedding the belief state is interesting. However, I would argue that the contribution is limited, as this embedding just serves as a novel resampling step in the particle filter method.
Further, I believe the paper would benefit from improving the clarity and the notation. I do not think is a good choice to represent the filtering distribution.
Additionally, there seems to be an error in the description of the method. When sampling the transitions from , the updated weight should not be reweighed with .
问题
Why consider policy and system? The policy could be integrated and be seen as part of the system, like often done in filtering.
局限性
Yes.
最终评判理由
I thank the the authors for their response. However, I still believe that the contribution is limited. Therefore I keep the original score.
格式问题
No concerns
Additional Experiments
To help address reviewer feedback, we have conducted additional experiments described below. Due to discussion-phase constraints, we report results for a representative subset of configurations; full results will appear in the final version. We believe these experiments significantly strengthen our empirical evaluation. The rebuttal to Reviewer YTdG follows the description of these experiments.
Additional Experiment: Wall-Clock Time [AE1]
We benchmarked a single update step using time.perf_counter() on a 2019 MacBook Pro (Intel i7). Each experiment ran 5,000 updates on a randomly generated size-8 grid in both 2D and 3D. To reduce the impact of outliers, we removed any measurements below Q1–1.5·IQR or above Q3+1.5·IQR, where IQR = Q3–Q1. Results (in seconds):
| 8-2D | 8-3D | |
|---|---|---|
| PF (32) | 0.001177 ± 0.000150 | 0.001384 ± 0.000303 |
| PF (64) | 0.001947 ± 0.000429 | 0.002098 ± 0.000403 |
| PF (128) | 0.004259 ± 0.000393 | 0.004439 ± 0.000496 |
| NBF (16) | 0.001942 ± 0.000256 | 0.001878 ± 0.000128 |
| NBF (32) | 0.002658 ± 0.000298 | 0.002530 ± 0.000334 |
To compare against fine-tuning methods, we also timed gradient updates for the recurrent model used in Section 5. Using a precomputed batch of size 32 (5,000 batches), we measured the average time per update (excluding batch preparation):
| 8-2D | 8-3D | |
|---|---|---|
| Recurrent | 0.008231 ± 0.000087 | 0.024640 ± 0.000429 |
Additional Experiment: Larger Grid without Exact Posteriors [AE2]
In the randomized setting, we evaluate on a size 15 grid with 3 dimensions and 4 obstacles (size 4 cubes). We report the mean JS divergence over 100 runs of 15 filtering steps, using a particle filter with 2,048 particles as a ground-truth proxy for training and evaluation:
| 15-3D | |
|---|---|
| PF (32) | 0.599612 ± 0.006426 |
| PF (64) | 0.567438 ± 0.007595 |
| PF (128) | 0.517764 ± 0.009198 |
| PF (256) | 0.454213 ± 0.010412 |
| NBF (16) | 0.409273 ± 0.005505 |
| NBF (32) | 0.398140 ± 0.006055 |
| NBF (64) | 0.407332 ± 0.007366 |
Additional Experiment: Enhanced Recurrent Baseline [AE3]
We increased the Recurrent baseline’s capacity and provided the grid layout as input features in the randomized setting. On size 8 grids with 3 dimensions and 2 size 3 cubes, we compare:
- Recurrent (ORIGINAL) from Section 5
- Recurrent (MODIFIED): 4 hidden layers × 128 units, receiving k-hot obstacle encodings
We ran 500 episodes of length 15 under random policies and report the mean JS divergence averaged over each sequence:
| 8-3D | |
|---|---|
| Recurrent (ORIGINAL) | 0.228782 ± 0.009555 |
| Recurrent (MODIFIED) | 0.229193 ± 0.009485 |
| PF (32) | 0.465414 ± 0.006275 |
| PF (64) | 0.404413 ± 0.005083 |
| NBF (16) | 0.181717 ± 0.004085 |
| NBF (32) | 0.150387 ± 0.003521 |
Note: Both recurrent baselines perform well in early timesteps but degrade later, highlighting the challenge of belief modeling without explicit policy information despite increased capacity and additional grid features.
Rebuttal to Reviewer YTdG
We thank the reviewer for the helpful comments regarding clarity and the notation change. We will address those comments in the revision. The reviewer is also correct that the updated weights are not reweighted by .
I would argue that the contribution is limited, as this embedding just serves as a novel resampling step in the particle filter method.
We respectfully disagree that the embedding merely serves as a resampling step for particle filtering. NBF's learned representation has numerous advantages over empirical distributions of particles found in a particle filter. Please see the comment on originality in the response to Reviewer HrMH for more details.
Why consider policy and system? The policy could be integrated and be seen as part of the system, like often done in filtering.
We believe that considering policy and system separately is well-motivated by learning in multi-player games, where the rest of the system is stationary but the policy is non-stationary and changes according to the learning dynamics specified by the algorithm.
I thank the the authors for their response. However, I still believe that the contribution is limited. Therefore I keep the original score.
This paper introduces Neural Bayesian Filtering (NBF), a method for state estimation in partially observed Markov systems with changing environments and controls. NBF builds a latent space of the set of posterior distribution, which gets updated when a new observation arrives using a particle filtering framework in the embedding space. Experiments in two partially observed environments show that it outperforms competing particle filtering and deep learning algorithms.
优缺点分析
STRENGTHS
This paper tackles the important problem of modeling complex and multimodal belief states, which is central to many real-world applications involving partial observability. Neural Bayesian Filtering (NBF) offers a novel perspective by combining ideas from particle filtering and deep generative models. This hybrid approach enables the model to track distributions with changing environments and controls that are hard to handle with traditional methods, while aiming to mitigate known issues like particle impoverishment.
The paper is also well written and well structured. The motivation behind NBF is clearly explained, and the technical components are introduced step by step, making the method easy to follow.
Experiments show that the method outperforms particle filtering methods and recurrent models in the proposed experiments, particularly with changing environments and control.
WEAKNESSES
The main concern I have with this paper is how well the proposed approach scales to larger or more complex environments. NBF depends on the assumption that the embedding model is expressive enough to represent every belief state exactly. This is a strong requirement, and it's not clear how realistic it is in practice, especially in higher-dimensional or dynamic settings. The experiments also suggest that NBF’s performance drops as environments get more complex, as in the randomized Gridworld and larger Goofspiel variants (Figures 5 and 6). This points to a limitation in how well the belief approximation holds up under harder conditions. The authors do not test how well this assumption holds in more challenging environments beyond Gridworld and Goofspiel, which to me is a problem that can limit the impact this paper can have in the community.
Also, the belief embeddings themselves are computed using a weighted mean over sample embeddings (Section 3.1), which feels like a simplification. This approach might miss important structure in the distribution, like multimodality or higher-order interactions, which could be especially problematic in more complex scenarios.
Finally, the donut example in Figure 2 is useful for building intuition on the learned densities, but the paper could benefit from a more concrete illustration of the learned embeddings as well, to understand what the model is learning (to simplify the task for visualization purposes you could also fix the donut width).
问题
Aside from the points raised in the weaknesses section above, I have two other questions
- In the randomized grid and policy experiment, you mention that NBF performs best overall despite using relatively few particles compared to PF. However, for a fair comparison, the number of particles alone isn’t the right metric—it would be more meaningful to compare models under a similar computational budget. In the Discussion, you note that “the additional cost of embedding and generating a much smaller set of particles with our model is insignificant compared to particle simulation costs.”. Could you clarify what number of PF particles would match NBF’s total computational cost? How does PF perform in that case, when compared on equal compute?
- The consistency guarantee of NBF (Theorem 4.1) assumes ϵ-global observation positivity. How generic or realistic is this assumption in practical settings? Could you provide examples or intuitions for environments where it might fail?
局限性
Yes
最终评判理由
The paper presents a well-motivated and clearly explained method that effectively combines particle filtering with deep generative models to model complex belief states, demonstrating strong performance in simpler partially observed environments.
However, as also pointed out by other reviewers, I believe the method’s applicability to more realistic or higher-dimensional environments remains limited, due to the strong reliance on expressive embedding models. These issues restrict the broader impact of the proposed approach.
格式问题
None
We thank the reviewer for the detailed review and insightful comments. We have added experiments (described in the rebuttal to Reviewer YTdG) to help address the reviewer's feedback and improve the paper.
NBF depends on the assumption that the embedding model is expressive enough to represent every belief state exactly. This is a strong requirement, and it's not clear how realistic it is in practice, especially in higher-dimensional or dynamic settings.
The reviewer is correct that Theorem 1 (NBF consistency) has strong requirements about the embeddings model's expressiveness. Model expressiveness is an important factor in NBF's performance, but a perfect model is not strictly necessary to achieve good belief approximations in practice. The baseline Approx Beliefs in the GridWorld and Goofspiel experiments show the model has significant approximation error, and NBF still outputs better approximations than the baseline particle filters, with fewer particles.
The experiments also suggest that NBF’s performance drops as environments get more complex, as in the randomized Gridworld and larger Goofspiel variants (Figures 5 and 6). This points to a limitation in how well the belief approximation holds up under harder conditions.
The reviewer is also correct to point out that NBF's performance drops as the complexity of the task increases. This is expected, as we keep the embedding model complexity and the number of particles consistent across differing sizes of Gridworld and Goofspiel. Standard particle filters are well known to have poor scalability as dimensionality increases (Thrun 2002), and though our experiments are on relatively small domains, we observe this happening. For example, on 8x8 grids, particle filters with more particles than states in the environment's state space fail to achieve belief approximations comparable in quality to NBF with 16 particles. Our experiments are designed to demonstrate that, in some domains, we can bypass these scaling issues given a suitable embedding model.
Also, the belief embeddings themselves are computed using a weighted mean over sample embeddings (Section 3.1), which feels like a simplification. This approach might miss important structure in the distribution, like multimodality or higher-order interactions, which could be especially problematic in more complex scenarios.
By computing the mean over pointwise embeddings, we are effectively computing expectations over feature maps, which in the case of characteristic kernel embeddings is shown to be sufficient to uniquely identify distributions (Song et al., 2009). Our method for embedding beliefs has a finite-dimensional feature space and doesn't carry the same guarantees, but that doesn't mean the embeddings we use cannot parameterize distributions with multi-modality or higher-order interactions. For example, consider a Gridworld with two obstacles in opposite corners. Given a uniform prior on the grid, upon observing a collision with an obstacle, the resulting posterior distribution is bimodal with significant mass concentrated near each obstacle. This and similar situations occur regularly in our Gridworld experiments from Section 5.
However, for a fair comparison, the number of particles alone isn’t the right metric—it would be more meaningful to compare models under a similar computational budget.
We agree with the reviewer's concerns about fairness and evaluating methods under a similar computational budget. To address this, we have conducted the additional experiment EA1. We measure wall clock time for NBF and particle filters of varying size. In size 8 grids, we see that NBF with 32 particles is significantly faster than a particle filter with 128 particles.
The consistency guarantee of NBF (Theorem 4.1) assumes ϵ-global observation positivity. How generic or realistic is this assumption in practical settings?
-global positivity is a simplifying assumption that helps avoid the case where all weights in the estimator are equal to zero in the analysis. It is violated in both Goofspiel and Gridworld because certain observations have zero chance of occurring in some states. For example, in Goofspiel, if the opponent has already played all cards smaller than , there is zero chance that the player can play and observe that they played the higher card. In practice, this is not a problem outside of the unlikely event that all generated particles carry zero weight after a single step. This is an instant form of particle impoverishment, and could happen in some domains, but any particle-based approaches that follow the paradigm laid out on lines 173-175 would be similarly affected. We did not observe this happening in our experiments.
Finally, the donut example in Figure 2 is useful for building intuition on the learned densities, but the paper could benefit from a more concrete illustration of the learned embeddings as well, to understand what the model is learning (to simplify the task for visualization purposes you could also fix the donut width).
Could the reviewer please clarify this point? As we understand, we agree with the reviewer that visualizing what the model is learning would be interesting, but we use an embedding size of eight for the donut example (even though donuts are generated from three parameters in closed form).
After carefully considering the authors' rebuttal to all reviewers and additional experiments, I have decided to increase my score to a borderline accept. The authors have addressed several of my concerns, particularly around computational comparisons and clarification of key assumptions. However, as also pointed out by other reviewers, I still believe the method’s applicability to more realistic or higher-dimensional environments remains limited, due to the strong reliance on expressive embedding models. These issues restrict the broader impact of the proposed approach.
This paper proposes a method for general Bayesian filtering in the setting where the full system (e.g. the transition function and observation function) is known and can be simulated. The main innovation is to pre-train distribution embeddings that can flexibly model posterior distributions. The key features of these embeddings are that one can encode a set of sample into an embedding and can draw samples from the distribution represented by each embedding. Practically, this is implemented using normalising flows.
To use these embeddings in a filtering setting, the authors uses a simple update scheme similar to classical particle filters: first a set of particles is sampled from the embedded distribution, then the particles are simulated forward and re-weighted using the observation likelihood before finally being encoded into the updated posterior embedding.
The authors provide a theorem showing that their method is consistent if the embedding is expressive enough. Experimentally, NBF is evaluated on two synthetic environments, highlighting two benefits of the proposed method:
- NBF outperforms classical particle filters which suffer from impoverishment in higher dimensional cases
- NBF can incorporate information about the transition and policy directly, which is not possible for RNN-based approaches that are trained on fixed environments.
优缺点分析
Strengths:
- The paper is clearly written and easy to understand.
- The method is intuitive and can be used as a slot-in replacement for any classical particle filters.
- The method effectively addresses the curse of dimensionality for particle filters.
- The limitations of the method is properly discussed in the text.
Weaknesses:
- The experimental evidences are somewhat lacking (see questions).
- The proposed method seems to be a simple improvement over the well-known particle filter with a different re-sampling scheme. While this seems to solve the impoverishment problem, I question the novelty and significance of the work.
The authors discusses the limitations of their method. I feel that the two listed below are significant and can limit the application of the method.
- Training embeddings of requires samples of all the possible posteriors induced by the problem. This limits the applicability of the method in realistic settings where it is intractable to compute these posteriors in the first place.
- It is unclear whether this method can scale to more complex and larger problems than ones investigated in the paper. Related to the previous point, I imagine getting representative samples of posterior distributions can be extremely difficult as the problem becomes more complex.
Disclaimer: Bayesian filtering is not my primary field. While I understand the contributions and the approach, I cannot claim that I have extensive knowledge of the current state-of-the-art. My initial recommendation is borderline reject but I would be open to revise my score if the authors can 1. clarify my questions on the fairness of the experiments and 2. make the case that the results are novel and of significance to the community.
问题
I list three questions below - one general question on the applicability of the method and two specific questions about the choice of baseline in the experiments.
- Training embeddings: Training the distribution embeddings requires a training set of distributions. I believe this is crucial to the success of the method as it distills knowledge about what the possible posteriors are (otherwise one would not be able to compress generic distributions to a small embedding vector). The expressivity is also a crucial assumption in the theory. In the experiments, how are these posterior distributions collected? Would this be possible in practical situations?
- RNN baseline - available information: The authors argue that standard recurrent architectures cannot take advantage of information about and as they are trained on fixed environments. While this is a sensible claim that is supported by the experiments, it seems somewhat unfair that NBF has access to the underlying environment and can perfectly simulate it but RNN needs to infer all of these from context. One potentially fairer baseline could look at what happens if some embeddings of and are provided to the RNN as observations. For example, feeding in an one-hot embedding of where the obstacles are in randomised grid, or an index of the selected policy in goofspiel.
- RNN baseline - capacity: Related to the above point, the RNN baseline needs to do much more computation in the randomised setting as it needs to infer the exact transitions and policies from context. As such, one would expect the network size required is larger. However, judging from the appendix, in Goofspiel the sizes of RNN and NBF are equal, and in grid world, the size of the RNN is even smaller than NBF. Can the author explain the choice of network sizes? I understand that performing an extensive hyperparameter search can be prohibitive, but I do wonder what would happen if the RNN baseline is significantly larger.
局限性
Yes. The limitations are thoroughly discussed and is much appreciated.
最终评判理由
During the rebuttal period, the authors have provided extra experiments to support their claims. This includes updated RNN baselines that better incorporate the structure of the problem, and new experiments where the belief embeddings are trained on simulated particles. The latter experiment, in particular, relaxes the need to have access to exact posteriors, which significantly broadens the applicability of the proposed method. With this, the authors make a good point: any time one can use a particle filter (offline), the proposed method can improve the online performance. Overall, my main concerns have been mostly addressed by the authors and I will increase my original score to borderline accept.
The reason why I still believe this is borderline is that, as other reviewers have pointed out, it remains unclear whether this approach will work at a larger scale. The present experiments only cover relatively simple tasks.
格式问题
None
We thank the reviewer for the detailed feedback and will use it to improve the paper. We have added experiments (described in the rebuttal to Reviewer YTdG) to help address the reviewer's comments.
Originality
First, we would like to make some broad clarifications on the originality of the work. NBF doesn't just provide a novel resampling scheme. Aside from the ability to sample from outside the support of the particle filter's empirical distribution, it also provides a continuous representation of the belief state that has numerous advantages over particle filters. For example, we are unaware of methods for evaluating densities of points outside the particle set, but NBF can query the density of any point in the sample space. NBF opens up new avenues for future work that builds on the ability to track belief embeddings over sequences of observations. One example is approximating value functions that map belief states to real numbers. Such value functions are key to many state-of-the-art depth-limited search methods, but it is not clear how to effectively learn them over sets of particles that do not encompass the full support of the belief distribution (Sustr et al, 2021). We would also like to reiterate Reviewer jFmS's comment that "The proposed approach is based on well-known ideas (vector embedding, normalising flow, particle filter), but the combination is new and interesting."
Training embeddings of requires samples of all the possible posteriors induced by the problem.
We don't need samples from all possible posterior distributions to achieve good generalization if there is some learnable underlying structure. For example, the set of donut distributions from Section 3.2 is infinitely large, but given samples from a sufficient number of example donuts, we can learn the structure and generalize to unseen donuts.
It is unclear whether this method can scale to more complex and larger problems than ones investigated in the paper. Related to the previous point, I imagine getting representative samples of posterior distributions can be extremely difficult as the problem becomes more complex
Building a training set has the same requirements as a standard particle filter: the ability to simulate the relevant environment dynamics and policies. We thank the reviewer for this question because this was not clear from the original set of experiments. As a result, we include a new experiment (AE2) to address this. Running a large particle filter offline and training on the approximate belief states results in a model with which NBF outperforms particle filters with an order of magnitude more particles. Collecting posteriors in this manner is more scalable than computing them exactly and still trades off more expensive offline computation for faster updates at test time. NBF with 16 particles significantly outperforms a standard particle filter with 256 particles in this domain, and we expect that this trend would continue to larger grid sizes.
RNN baseline - available information / capacity
We have also included another experiment (AE3) to help address the reviewer's concerns about fairness in comparison to the RNN baseline. In Experiment AE3, we train a larger Recurrent baseline in the 3-dimensional, size eight grid, and provide it with k-hot encoded obstacle locations. Increased capacity and obstacle locations do not help the RNN baseline perform significantly better in the randomized setting. Considering this baseline's good performance in the fixed setting, we suspect that differences in policies significantly impact the belief state, and it is generally not clear how to incorporate policy information into the model.
I feel that my concerns have been adequately addressed. The new experiment of training on particle filter generated data is helpful. I had updated the score and filled in the final justification earlier but I was made aware that these are not visible to the authors - apologies for the delay.
This paper proposes a latent representation of a belief over a state space using a fixed length embedding vector. This embedded belief representation is then used to perform filtering in a particle filter framework. Using the proposed filter consists of two steps: The first is an offline training step, where a function that maps weighted samples to an embedding, as well as a function that samples from the distribution described by the embedding vector are learnt from samples of beliefs. The second step is online filtering, where the embedded belief is updated using a prediction and measurement update step. One filtering iteration starts with generating samples in the state space from the embedded belief, then the samples are propagated through the prediction step and reweighted in the measurement update step, after which the weighted samples are converted back to an embedded belief. The proposed approach is tested in two simple simulated scenarios with discrete state spaces, and outperforms a standard particle filter and an LSTM-based filter.
The main contributions claimed by the paper are the definition of the embedded belief and the way to learn the embedding and sampling networks, as well as the incorporation of the embedded beliefs into a particle-based filtering framework.
优缺点分析
Quality:
Given the chosen approach of this paper (to embed posterior beliefs, and to convert between the embedded space and state space using samples), the development of the proposed approach is obvious and convincing. In my opinion, it would have been better to describe the development in Sec. 4 in terms of the general filtering framework of the Bayes filter (see e.g. Chap. 2 in Thrun et al. (2005)), but the given development is clear enough.
However, the argument for choosing the approach presented in this paper, as well as for the way to evaluate the proposed approach, has some weak points. Specifically:
- The proposed approach is presented as an alternative to the particle filter, and much of the motivation for the developing the proposed approach is given as addressing two of the shortcomings of the standard particle filter, namely the need for a large number of particles in high-dimensional state spaces and particle impoverishment (see e.g. lines 9, 26-29, 88-95, and the choice of particle filter as baseline in Sec. 5). However, the paper only mentions the "vanilla" (standard) particle filter, and it completely ignores the substantial body of work on the particle filter that aims to address its shortcomings, namely techniques that address the need for large number of samples in high-dimensional spaces (e.g., the well-known Rao-Blackwellised particle filter by Murphy et al. (2001) and much additional work based on it) and techniques that address particle impoverishment (e.g. Orguner and Gustafsson (2008) and Park et al. (2009)). It is therefore unclear whether the proposed approach constitutes a significant contribution given the current state of the research field.
- The choice of representing all beliefs that are expected to be encountered during filtering by the latent representation is not well motivated in the paper. To learn this representation seems like a daunting task for all but the simplest of scenarios, and the paper provides little indication of how difficult it would be to learn this representation for complex scenarios. In fact, the experiments only include simple scenerios with small, discrete state spaces, and even for these scenerios it seems to be difficult to learn an appropriate latent representation (as shown in Fig. 6). One cannot help to wonder whether the proposed approach has traded difficulty in one aspect of the particle filter for difficulty in another aspect, without necessarily improving the overall approach (except in some specific scenarios).
- The motivation given for not testing in more complex settings is not very convincing (lines 277-280). Instead of restricting the experiments to scenerios that are simple enough so that the belief can be calculated analytically, one could e.g. demonstrate how well the proposed approach scales with problem complexity by using more complex scenarios and comparing the results of the proposed filter with that of a particle filter with a large number of particles (to provide an accurate estimate of the true belief, instead of a competing method). This would have been easy to do (i.e., it would not at all be a separate study) and would have contributed much to addressing the concerns about the applicability of the proposed approach.
Clarity:
The paper is well written, clear, and easy to follow. However, it is a bit unclear in a few places, including the following:
- The paper characterises the proposed method as "performing [an] update in the embedding space" (e.g. lines 179-180). This characterisation is a bit misleading, since it seems to imply that prediction and measurement update steps are calculations performed in the embedding space; however, the prediction and measurement update steps are actually performed by the propagation and reweighting of particles in the state space.
- The diagram in Fig. 3 is misleading, since it appears to show that there are multiple iterations performed for the update of the filter between time and . In fact, the proposed filter follows the standard 2-step procedure (with an additional embedding and sampling step): the prediction step followed by the measurement update step, of which the output is fed into the prediction step of the next time step.
- The description of the Goofspiel experiment (Sec. 5.2) is not very clear: It is unclear whether experiment is performed with a full deck or a reduced deck (as described in lines 253-257), and the representation used (i.e., the definition of the state space) is unclear.
Significance:
The proposed approach is interesting, original, and achieves good results in some simple scenarios. However, due to the lack of proper contextualisation in some regards, the choice of methods to compare against, and the deficiencies in the experiments, there is significant uncertainty about what the impact of the paper will be. Specifically:
- No existing research on the particle filter that aims to address particle impoverishment and the requirement for a large number of particles in high-dimensional spaces is discussed in the paper, and the proposed approach is not evaluated against any of these approaches (see discussion in "Quality" above for specifics).
- There exist other approaches for embedding beliefs (see e.g. Song et al. (2009)), but these are not discussed in the paper, nor is the proposed approach evaluated against them.
- The proposed approach is not evaluated against other neural-based filtering approaches (e.g. that by Sokota et al. (2022)). Without an evaluation, the supposed high computation cost of these methods mentioned in lines 316-318 remains speculation.
- Since the proposed approach is only evaluated in simple scenarios with discrete state spaces, there are questions about its applicability to more complex scenarios, specifically about the difficulty of defining an expressive enough embedding, learning embeddings, and the cost of inference.
Originality:
Although the idea of embedding beliefs is not novel, the way in which the paper does it and incorporates it into a filtering framework seems to be. The proposed approach is based on well-known ideas (vector embedding, normalising flow, particle filter), but the combination is new and interesting.
Other:
- Line 62: In the classical filtering context, "control" is often understood to mean the commanded actions (e.g., the commands given to the actuators of a robot); however, "control" here refers to a policy, which is not necessarily the same as commanded actions. This might cause confusion for readers that are familiar with classical filtering.
- Line 70: To avoid any ambiguity, it should probably be stated that .
- Lines 77-78: is widely understood to mean "the prior distribution over "; to use it to refer to the posterior distribution over is probably not a good idea. In addition, in the context of filtering, there are posterior distributions at each time step, and not indicating the specific posterior distribution with the notation causes ambiguity (see e.g. line 108).
- Line 112: The meaning of is not properly defined.
- Lines 156 and 185: "the Appendix" should be "the appendix".
- In lines 207 and 208, "Approx Beliefs" is described as a filtering algorithm; however, it obviously is not.
- The caption of Fig. 4 mentions shaded areas, but no shaded areas are visible in the plots.
- In lines 233 and 234, it is unclear what "the belief model" refers to.
- Line 270: The meaning of is unclear.
References:
- Murphy, K. and Russell, S., 2001. Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice (pp. 499-515). New York, NY: Springer New York.
- Orguner, U. and Gustafsson, F., 2008. Risk-sensitive particle filters for mitigating sample impoverishment. IEEE Transactions on signal processing, 56(10), pp.5001-5012.
- Park, S., Hwang, J.P., Kim, E. and Kang, H.J., 2009. A new evolutionary particle filter for the prevention of sample impoverishment. IEEE Transactions on Evolutionary Computation, 13(4), pp.801-809.
- Sebastian Thrun, Wolfram Burgard, and Dieter Fox. 2005. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press.
- Song, L., Huang, J., Smola, A. and Fukumizu, K., 2009, June. Hilbert space embeddings of conditional distributions with applications to dynamical systems. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 961-968).
问题
- How difficult is it expected to be to define an expressive enough embedding and learn embeddings, and how costly is the inference expected to be for more complex scenarios and continuous state spaces?
- Are there any good reasons why the proposed approach is not evaluated against state-of-the-art versions of the particle filter, other neural-based filters, or the proposed belief embedding evaluated against other belief embeddings?
- Is there any good reason why the proposed approach is not evaluated in more complex scenarios?
局限性
The proposed approach is described quite clearly and fairly. However, the applicability to and limitations in more complex scenarios remain unclear due to the simplicity of the scenarios used in the experiments.
最终评判理由
The paper proposes a new and interesting approach to Bayesian filtering by using belief state embeddings. However, as also pointed out by the other reviewers, the main question that remains unanswered is whether an expressive enough belief state embedding can be learnt for complex scenarios. There is therefore much uncertainty about the future impact of this approach.
格式问题
N/A
We thank the reviewer for their detailed review. The additional references and valuable suggestions about the development will significantly improve the paper. We commit to implementing the clarity suggestions throughout. We have added experiments (described in the rebuttal to Reviewer YTdG) to help address the reviewer's comments.
However, the paper only mentions the "vanilla" (standard) particle filter, and it completely ignores the substantial body of work on the particle filter that aims to address its shortcomings
We thank the reviewer for their suggestions regarding other particle filtering variants that help mitigate impoverishment and the need for large numbers of particles in environments with high dimensionality. We commit to adding a discussion regarding the referenced papers and the tradeoffs they make to mitigate impoverishment in the related work section of our revision. As mentioned in the response to Reviewer HrMH's originality concerns, the belief embeddings maintained by NBF offer significant advantages over the empirical distributions of particles maintained by particle filters. The comparisons to the standard SIR filter help demonstrate that NBF mitigates impoverishment, while also offering a novel representation of beliefs. Furthermore, unlike NBF, the proposed improvements to vanilla particle filters require developing domain-specific operators or exploiting the underlying structure of states in the environment. For example, Rao-Blackwellized Particle Filters rely on the ability to exploit the underlying structure of the state and analytically marginalize out parts of it, and EPF requires domain-dependent genetic operators.
There exist other approaches for embedding beliefs (see e.g. Song et al. (2009)), but these are not discussed in the paper
We thank the reviewer for pointing us to Song et al. (2009). Their approach uses a fixed RKHS kernel to embed conditional distributions, which gives the desirable property of embedding uniqueness if a characteristic kernel is used. However, it hinges on choosing a kernel up front and risking mismatches with the structure of the set of the target posteriors. Instead, we learn a compact finite-dimensional feature map for embedding distributions together with a generative model, end-to-end. This lets the model discover a latent embedding space suitable for generating samples from the set of posteriors needed for NBF. We’ll include a discussion on this trade-off in the revision.
The choice of representing all beliefs that are expected to be encountered during filtering by the latent representation is not well motivated in the paper.
We agree with the reviewer that, for some problems, learning this latent representation for an arbitrary target set of belief distributions may be challenging. However, we demonstrate that when we can learn an embedding model effectively, NBF yields a clear computational benefit during inference and a compelling alternative to methods that ignore potential underlying structure in the belief state. Discerning the limits of where NBF is applicable remains an interesting and open question. We will clarify this in the revised version.
one could e.g. demonstrate how well the proposed approach scales with problem complexity by using more complex scenarios and comparing the results of the proposed filter with that of a particle filter with a large number of particles (to provide an accurate estimate of the true belief, instead of a competing method)
We thank the reviewer for the insightful suggestion to use a large particle filter to approximate true beliefs in more complex settings. We have added a proof-of-concept in Additional Experiment AE2, where we use a large particle filter to generate training data and as an approximation of the ground truth for evaluation. Under these conditions, NBF with 16 particles significantly outperforms a particle filter with 256 particles. These results show that NBF doesn't require samples from all exact posteriors to outperform more computationally expensive particle filters in some domains.
The proposed approach is not evaluated against other neural-based filtering approaches (e.g. that by Sokota et al. (2022)). Without an evaluation, the supposed high computation cost of these methods mentioned in lines 316-318 remains speculation.
Experiment AE1 shows that the cost of performing a gradient update to our model is roughly 1.5x slower than a single step of NBF with 32 particles in Gridworld. Though the cost of an individual step may seem small, Belief Fine-Tuning uses 10,000 fine-tuning steps at every decision point (Sokota et al., 2022). Simpler environments might require significantly fewer gradient updates, but even 1000 or 100 updates add significant overhead relative to a step of NBF. This difference is potentially exacerbated in downstream tasks where beliefs are frequently updated.
how costly is the inference expected to be for more complex scenarios and continuous state spaces
The wall clock time experiments (AE1) also suggest that the cost of inference in NBF can be comparable to a particle filter with more particles. This, of course, depends on model size, number of particles, and the complexity of simulating the environment one step. Complex environments may require more expressive, slower models, but on the other hand, computation time may also be dominated by particle simulation.
While we only evaluate our work in discrete settings (outside of the toy donut example), we expect our approach will extend to continuous state spaces because Normalizing flows are better suited to continuous data. For the discrete state spaces, we require additional tricks like Variation Dequantization. Verifying this is an important next step.
Thank you to the authors for their response. I have carefully read all the reviews and the authors' responses.
The authors argue that unlike the NBF, improvements to the vanilla PF require domain-specific knowledge and techniques. However, I would argue that the (the online phase of the) NBF does require domain-specific knowledge, which it acquires in the offline training phase when it learns the domain-specific belief state embeddings. I do acknowledge that acquiring this domain-specific knowledge by learning from data could be an advantage to exploiting domain-specific knowledge by a human designer if the belief state embeddings can be tractably and reliably learnt. However, as things stand, it is not clear to me that this is the case.
I agree with the authors that if sufficiently expressive belief state embeddings can be learnt, then the NBF should outperform the vanilla PF, as well as address the shortcomings of the PF. The additional experiment AE2 does go some way towards demonstrating that the NBF can handle more complex scenarios; however, the question about whether expressive enough embeddings can be learnt still lingers, and it would not be answered before the NBF is demonstrated for much more complex scenarios such as continuous state spaces.
Lastly, I find the direct comparison of number of particles between the NBF and the PF a bit disingenuous, since the particles play different roles in the two filters:
- In the PF, the particles represent the belief distribution.
- In the NBF, the particles select the appropriate belief state embedding.
It is therefore fully expected that the NBF would require fewer particles than the PF for similar performance.
In conclusion, I appreciate the authors' efforts to run additional experiments and respond to my review in detail. My view of the manuscript has improved somewhat; however, it does not warrant an increase of the score to the next level.
We thank the reviewer for another thorough response. We fully agree that, in NBF’s offline training phase, we learn and encode domain-specific structure into our belief embeddings. This is exactly the goal of the belief embedding model, and it seems that we agree that if this model can be tractably and reliably learned, the potential advantages of NBF are clear.
We also agree with the reviewer that the roles of particles differ fundamentally between the two methods. In a standard particle filter, particles serve only to approximate the belief via weighted samples, with no alternative mechanism for representation. NBF's learned embeddings summarize beliefs and permit a substantially smaller online particle budget that mitigates both impoverishment and scaling issues with dimensionality. The wall clock time experiments we have added during this discussion phase demonstrate that, despite NBF's modest computational overhead, using a much smaller particle set with NBF can lead to a significant speedup. We will ensure these distinctions are articulated more explicitly in the revision.
The reviewers agree that the paper presents an interesting and technically solid idea—embedding belief states for Bayesian filtering—but its broader impact remains uncertain. Reviewers praise the clarity of writing, the intuitive formulation, and the novelty of combining embeddings, particle filters, and normalizing flows. They also recognize that NBF shows promising empirical results, outperforming standard particle filters and RNN baselines in simple settings. However, a consistent concern across reviews is that the experimental evaluation is too limited: the method is tested only on small, discrete domains, leaving open the question of whether it scales to high-dimensional or continuous state spaces. Reviewers also questioned the fairness of baselines, the limited discussion of related work, and whether the claimed benefits are truly novel compared to established particle filter variants. The authors’ rebuttal addresses several concerns with additional experiments and clarifications, thereby improving the confidence of some reviewers; however, doubts remain about the scalability and significance. Overall, while the paper is technically sound and introduces an interesting perspective, its contribution is judged as borderline due to limited empirical scope and unresolved questions about general applicability. Therefore, I’ll recommend rejecting the paper.