Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building
We propose a framework for map-building based on fragmentation and recall inspired by the remapping property of grid cells in the entorhinal cortex.
摘要
评审与讨论
This submission proposed a new method to map large and complex environment based on inspirations from rodent neurophysiology. The basic idea is that the brain represents the maps as segmented components. The authors develop algorithms based on this basic intuition and test the algorithms on simulated environments. The authors compared the performance of their method to a method called Frontier (Yamauchi, 1997) and its variants. They reported that theirs carry certain advantages in most scenarios tested here.
Overall, I found this to be a nice paper. It is well written and the ideas were clearly explained. My main concern is about the benchmarking that is, I am not sure if the authors have performed the comparison with the state-of-the-art methods in the field that are relevant to this problem. The method in Yamauchi (1997), while popular, was proposed more than 25 years ago after all.
优点
— the paper is well written. — using neurophysiological knowledge of the rodent hippocampus to inform the design of spatial navigation system is interesting — the results seem to be promising
缺点
— the improvement over the alternative methods seems to shrink in the more real-world-like applications. Can the authors comment on or provide an interpretation of this?
— Can the authors justify why the Frontier method by Yamauchi (1997) would be the most appropriate benchmark to have? It would be nice if it’s possible to include some other more recent methods.
问题
-
Please see the questions in the previous section.
-
In addition, I am not sure if “grid cell-inspired” in the title is entirely appropriate given that the authors do not use grid cells in their model and the idea of representing space in segments seems to hold more generally for the hippocampal-parahipponcampal representation in rodents.
Thank you for your insightful and constructive feedback on our manuscript. We greatly appreciate the time and effort you have dedicated to reviewing our work. We will revise the manuscript, taking into account each of your valuable comments. We would like to emphasize that the core contribution of this research is establishing a novel connection between neuroscience and SLAM, rather than proposing a state-of-the-art SLAM algorithm.
Re: the improvement over the alternative methods seems to shrink in the more real-world-like applications. Can the authors comment on or provide an interpretation of this?
In robot simulations, there is significant noise from the LIDAR sensor, path planning, actuation, and localization of the current position. These factors lead to increased memory usage in both the Frontier and FARMap methods. Consequently, the performance gap between these two methods is smaller than in the proposed environments. Additionally, we measured the memory usage of FARMap based on its peak usage during navigation. While the maximum usage of FARMap is similar to its counterpart in Environment 1 and in American settings, the overall usage is consistently lower. For instance, in AWS, FARMap explores 49% more of the environment while using only 37% more memory.
In robot simulation, there are many noise from LIDAR sensor, path planning, actuating, and localizing current position. They lead to use more memory on both Frontier and FARMap. Therefore, the gap between two methods is smaller than the proposed environments. On the other hand, we measure the memory usage of FARMap as the largest memory usage in navigation. Although the maximum usage of FARMap is similar to its counterpart in Environment 1, and American, the overall usage is smaller. In AWS, FARMap explores 49% more environment while it uses 37% more memory usage.
Re: Can the authors justify why the Frontier method by Yamauchi (1997) would be the most appropriate benchmark to have?
We selected the Frontier method (Yamauchi, 1997) as our baseline because it is a fundamental technique widely used in contemporary methods (Kulkarni et al., 2022) and is often employed as a benchmark (Chaplot et al., 2020). Its widespread adoption positions the Frontier method as an ideal candidate for integration with FARMap. This integration facilitates the possibility of employing various methods for local subgoal generation, thereby enhancing the versatility and applicability of FARMap.
Re: I am not sure if “grid cell-inspired” in the title is entirely appropriate given that the authors do not use grid cells in their model and the idea of representing space in segments seems to hold more generally for the hippocampal-parahipponcampal representation in rodents.
As the reviewer pointed out, the concept of remapping is indeed associated with place cells in the hippocampus, not exclusively with grid cells (as noted in the second paragraph of our Introduction). However, our approach employs surprisal changes as a criterion for fragmentation, which correlates with the locations of actual grid cell remapping, as shown in existing studies (Klukas et al., 2021). Therefore, we titled our paper “grid cell-inspired”. We will change the paper title reflecting on your comment if the paper is accepted.
I would like to thank the authors for their response to my concerns and comments. Their response clarified the questions I had.
We are truly glad to confirm that every concern raised by the reviewer has been successfully addressed. Please let us know if you have any further questions or comments.
The paper proposes a method for mapping large spaces based on the concept of fragmentation and recall, where an agent builds local maps based on a clustering based on "surprise" and decides the next local map to explore. When a new local map is created, the previous local map is stored in a long-term memory and if the observation matches a previous local map, that local map is recalled. Experiments are performed in simulation, and compared with a classic frontier-based approach, as well as with a pre-trained neural SLAM.
优点
-
the paper overall presents a technically sound method that is able to achieve exploration of unknown environments.
-
the paper provides an interesting grounding of the proposed method with neuroscience, in proposing fragmentation and recall.
-
the paper is overall clear, with a logical structure in presenting the different components of the proposed method.
缺点
-
while it is interesting to see the grounding of the proposed method in neuroscience, some of the general ideas are already present in other methods for exploration, in particular, reasoning topologically is captured by methods that use the generalized Voronoi graph or semantic maps to guide the exploration, and the long-term storage through pose graphs in SLAM, where loop closure is applied (discussed in graph-based slam appendix section), or curiosity-driven exploration. The paper should discuss the proposed method with respect to such methods.
-
the paper's comparison is limited in considering only the standard frontier-based exploration, when in fact there are a number of exploration methods showing better performance than the standard one, both in terms of exploration, as well as planning time. Some examples both classic and learning based include:
Cao, C., Zhu, H., Choset, H., & Zhang, J. (2021, July). TARE: A Hierarchical Framework for Efficiently Exploring Complex 3D Environments. In Robotics: Science and Systems (Vol. 5).
Lindqvist, B., Agha-Mohammadi, A. A., & Nikolakopoulos, G. (2021, September). Exploration-RRT: A multi-objective path planning and exploration framework for unknown and unstructured environments. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3429-3435). IEEE.
Shrestha, R., Tian, F. P., Feng, W., Tan, P., & Vaughan, R. (2019, May). Learned map prediction for enhanced mobile robot exploration. In 2019 International Conference on Robotics and Automation (ICRA) (pp. 1197-1204). IEEE.
Caley, J. A., Lawrance, N. R., & Hollinger, G. A. (2019). Deep learning of structured environments for robot search. Autonomous Robots, 43, 1695-1714.
- the gain in memory appears to be a major component of the proposed method, however, overall, the trend seems to be fairly close to the frontier-based approach and somewhat surprising given the use of local maps. In fact, for the realistic experiments, in AWS office, memory appears better for Frontier. The size of each local map might depend on the complexity of the environment, but it is worth discussing what affects the determination of the local map in practice.
A couple of minor presentation comments:
- to be more precise in assumptions and corresponding presentation of functions, it is worth mentioning that the robot is non-omnidirectional, as otherwise the indicator function for whether the frontier edge is spatially behind the agent wouldn't apply. In addition, for that function there would be a threshold to determine what "behind" means, with respect to the orientation of the robot.
- usually white pixels are used for free space, instead of black.
- "FRAGMENTAION" -> "FRAGMENTATION"
- "that work did not seriously explore" -> "that work did not explore in-depth"
- instead of calling "wall-clock time" it is better to characterize it with "planning time"
问题
- please comment on how the memory usage changes with the environment complexity.
Thank you for your insightful and constructive feedback on our manuscript. We greatly appreciate the time and effort you have dedicated to reviewing our work. We will revise the manuscript, taking into account each of your valuable comments. We would like to emphasize that the core contribution of this research is establishing a novel connection between neuroscience and SLAM, rather than proposing a state-of-the-art SLAM algorithm.
Re: Some of the general ideas are already present in other methods for exploration. The paper should discuss the proposed method with respect to such methods.
We appreciate the reviewer's acknowledgment of our discussion on graph-based SLAM in Appendix A. The topological graph in FARMap and the Generalized Voronoi Graph (GVG)-based approaches are designed for different objectives. GVG-based methods primarily aim for rapid path planning through pre-computed paths and loop closure. In contrast, FARMap's topological graph is designed to surmount the limitations of local maps that cannot refer to other submaps. However, this does not mean that both methods cannot be used together. Rather, there is a potential synergy to use both as we mentioned in Appendix A.1.
Regarding curiosity-driven exploration [1,2], these methods employ future prediction error or surprisal as intrinsic rewards to address environments with sparse rewards. Conversely, FARMap utilizes surprisal for segmenting space in accordance with areas where grid cell remapping occurs.
[1] Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." ICML, 2017.
[2] Burda, Yuri, et al. "Exploration by random network distillation." ICLR, 2019.
Re: the paper's comparison is limited in considering only the standard frontier-based exploration.
Thank you for this suggestion. FARMap is designed to be versatile and can integrate with existing exploration methods. Exploration-RRT which is a kind of multi-goal RRT can work as a subgoal generator and path planner in FARMap. Additionally, Caley et al. and Shrestha et al. predict goal location from predicted map structure of indoor environment so it is easily adapted as a local subgoal generator. On the other hand, TARE divides space outside of current local planning horizon into multiple cuboid subspace and each subspace is labeled as explored, exploring and unexplored. In this case, FARMap can be combined with two ways: 1) employing only local path planning for the exploration strategy within local maps, and 2) combining both local and global path planning strategies in local maps, using larger fragments.
We also wish to highlight that we have conducted comparisons with learning-based exploration methods, such as Neural SLAM, in Section 5.3, and with curiosity-driven exploration methods, like RND, in Appendix J.
Re: for the realistic experiments, in AWS office, memory appears better for Frontier.
We acknowledge that in the AWS office scenario, FARMap demonstrates higher memory usage compared to Frontier, contrary to the results in our proposed environments where FARMap and Frontier have a significant gap in memory usage but achieve better map coverage. However, it's important to consider that coverage directly impacts memory usage. In AWS, FARMap explores 49% more of the environment while using only 37% more memory, showcasing its efficiency in broader exploration.
Re: The size of each local map might depend on the complexity of the environment, but it is worth discussing what affects the determination of the local map in practice.
The size of local maps in FARMap is influenced by two key factors: the environmental complexity and the fragmentation threshold, . In more complex environments, surprisal tends to be higher compared to simpler environments, as the complexity often limits the field of view and increases the time required to move from one location to another. However, generating an excessive number of local maps can lead to unnecessary overhead in communication between Short-Term Memory (STM) and Long-Term Memory (LTM), and may also result in each map covering too small a region. Conversely, without any fragmentation, the benefits of using FARMap are diminished. To strike a balance, we employ the z-score of surprisal to regulate the number of fragments – minimizing excessive fragmentation in complex environments while ensuring sufficient fragmentation in simpler ones. Then, modulates the amount of fragments.
Thanks for the response to this and others' reviews, which gave clarity to some of the raised questions.
The response emphasizes that the core contribution of this research is on "establishing a novel connection between neuroscience and SLAM, rather than proposing a state-of-the-art SLAM algorithm". In that context, the paper's message should be modified accordingly. Currently it seems that instead it is about proposing a state-of-the-art SLAM algorithm, given also the emphasis on the performance in the results, in which case, the paper would not be enough, given that there are other recent methods that the proposed method should be compared with. Instead, if looking at that connection, it would be interesting to see how in practice new submaps are created -- perhaps with some toy examples and small breakdown of the experiments -- in realistic environments and whether that matches with the general understanding of neuroscience on how we construct models of the world. It would also be interesting not only to discuss high-level the parameters that affect the determination of local maps, but also how to set values in practice, perhaps in an automated way -- currently the appendix shows a sensitivity analysis.
Comparison with remapping location and fracture point
Thank you for your response and valuable feedback. In response to the reviewer's comments, we created simulated environments based on the studies by Derdikman et al. (2009) and Carpenter et al. (2015) and tested FARMap to generate fracture points. As shown in Figure, the fracture points align well with the actual remapping locations of the grid cells. These remapping locations are typically at bottlenecks or corners of the environments, where the unobserved area hidden by walls becomes apparent.
Re: It would also be interesting not only to discuss high-level the parameters that affect the determination of local maps, but also how to set values in practice, perhaps in an automated way.
In Appendix I, we demonstrate that FARMap exhibits robustness to variations in hyperparameters. It implies that there is no need for extensive hyperparameter tuning in practice rather people can directly use the default parameters. However, decaying factor, , and fragmentation threshold can be set based on neuroscience or psychology studies (e.g., Sandra et al. 2011).
Hale, Sandra, et al. "The structure of working memory abilities across the adult life span." Psychology and Aging 26.1 (2011): 92.
Thank you for the prompt response and including the connection with other neuroscience/psychology studies. Reframing the paper in that context can make the paper more aligned with the core direction of the paper, that is the connection between neuroscience and SLAM. One element that is worth elaborating in the paper then is also discussing the hyperparameters, such as decaying factor and fragmentation threshold, based on those studies and see whether they align with the values set. The non significant changes in performance might also come from the scale of those hyperparameters: while the analysis included values for example from 1 to 3 for the fragmentation threshold, practical values that can be derived from those studies will have a larger range.
We appreciate the rapid response of the reviewer. We are sincerely pleased that the reviewer acknowledges the connection. We will strengthen this aspect in the revised manuscript.
Re: One element that is worth elaborating in the paper then is also discussing the hyperparameters, such as decaying factor and fragmentation threshold, based on those studies and see whether they align with the values set
Thank you for the pointing out. Unfortunately, both Derdikman et al. (2009) and Carpenter et al. (2015) do not deal with decaying factor and fragmentation threshold since their focus is to empirically show that the fragmentation (remapping) happens in a continuous environment, and how grid cell representation looks like or changes. To the best of our knowledge, there is no neuroscience work that mentions both decaying factors or fragmentation criteria along with actual empirical studies. Rather there is a separate study for decaying theory in psychology that we can refer to the factor (please refer to the review paper from Ricker et al.).
Ricker, Timothy J., Evie Vergauwe, and Nelson Cowan. "Decay theory of immediate memory: From Brown (1958) to today (2014)." Quarterly Journal of Experimental Psychology 69.10 (2016): 1969-1995.
Re: The non significant changes in performance might also come from the scale of those hyperparameters: while the analysis included values for example from 1 to 3 for the fragmentation threshold, practical values that can be derived from those studies will have a larger range.
We would like to remind that the fragmentation threshold is intended to reflect the complexity of the local map using a -score. The range of 1 to 3 was chosen based on the statistical rule of thumb; the probabilities of a -score exceeding 1 and 3 are 16% and 0.15%, respectively, in a Gaussian distribution. Note that we defined the fracture point as a high surprisal point. However, since the surprisal is a non-i.i.d sample, the distribution might not be Gaussian. We plan to conduct a broader sensitivity analysis, although it cannot be added during the remaining rebuttal period.
However, for Table 8, we conducted nine different fragmentation threshold tests directly using surprsial instead of using -scores in large environments. Note that the range of surprisal is in [0, 1]. Table R1 shows that performance varies depending on the threshold; low thresholds lead to excessive fragmentation, while a very high threshold (0.9) reduces fragmentation but increases planning time. This demonstrates the challenge of defining the correct fragmentation threshold for each environment, as opposed to using a statistical approach (i.e., -score).
Table R1. Average coverage, memory usage, and time of FARMap with various fragmentation thresholds without using -score in a large environment.
| Fragmentation Threshold | Coverage | Memory | Time |
|---|---|---|---|
| 0.1 | 49.4 | 7.9 | 105.2 |
| 0.2 | 49.2 | 7.9 | 93.9 |
| 0.3 | 48.5 | 8.0 | 114.8 |
| 0.4 | 49.2 | 8.0 | 112.4 |
| 0.5 | 48.7 | 7.9 | 107.6 |
| 0.6 | 47.9 | 9.2 | 117.4 |
| 0.7 | 48.3 | 13.4 | 144.5 |
| 0.8 (chosen for Table 8) | 51.7 | 20.8 | 177.3 |
| 0.9 | 57.1 | 62.3 | 1011.9 |
| --- | --- | --- | --- |
| 2.0 (using -score) | 56.6 | 31.4 | 352.5 |
The paper proposes an exploration method that performs submapping using a surprise mechanic to decide when to create novel submaps. The resulting approach is compared in 2D on synthetic and simulated environments against a basic frontier exploration method.
优点
The use of a non-uniform submap generation logic is interesting.
缺点
While the idea of surprise-based submap creation is interesting, many aspects of the overall method are unclear.
What does the map representation look like? The information provided appears to be contradicting itself. The C-th channel is said to contain confidence information, but over what? Additionally, the C-th channel in the observation contains visibility information. However, later on, there is talk of occupancy and colors. The actual representation used by the maps is never explained concretely.
Another aspect that lacks clarity is the surprisal mechanic. It mentions uncertainty estimation yet provides the prediction error as an example. How can an error be used as uncertainty estimation? Equation 2 is also highly confusing as M and o are matrices with different dimensionality yet are multiplied together. How does this work? Furthermore, based on the text, the multiplied quantities represent different properties, making things even more confusing. As it is never made clear what local maps look like and how they are formed, the entire surprisal aspect is challenging to evaluate.
The recall aspect, which is paramount to reusing existing local maps efficiently, lacks any information regarding how it works. Does the system assume perfect localization and thus can just use the submap graph, or is there a place recognition system that reidentifies these local submaps?
While there is crucial information about core aspects missing or relegated to the appendix, there is plenty of detail regarding aspects that one could argue are less critical. For example, the detailed view integration above Section 3.4 or the exact description of the synthetic environment description in Section 4.
The experiments are not very convincing for several reasons. A major one is that a very basic frontier method is used, of which the details are unclear. The proposed method utilizes several heuristics to avoid making bad choices, are similar heuristics employed in the baseline? Another aspect is that the metrics used are unclear and hard to interpret. As an example, Table 2 shows memory usage with a unit of (k), what does this mean? The paper provides statistical information which is good, though it might be better if either the standard deviation or quantile (likely the better choice) were used throughout rather than switching between the two. While the experiments section is quite long, there is little actual discussion of the results, which is usually the most exciting part of an experimental section.
From the description of the baseline method, it is unclear whether it also uses submaps. The results seem to imply so, as otherwise, the relative memory plots should show a value of 1 from my understanding. The paper also does not compare to contemporary exploration frameworks such as GBPlanner (referenced in the appendix) or work such as [1] that are evaluated on realistic robotic 3D setups and show impressive performance. In the absence of such baselines, it is impossible to evaluate the benefit of the proposed irregular submapping system.
While the idea of creating submaps in a more dynamic way than typical fixed-size grids is interesting, the amount of questions surrounding the proposed system and lack of comparison with recent methods makes it impossible to support the paper's publication in its current state.
[1] Schmid, Lukas, et al. "A unified approach for autonomous volumetric exploration of large scale environments under severe odometry drift." IEEE Robotics and Automation Letters 6.3 (2021): 4504-4511.
问题
- Does the method assume perfect localization and if so how does it handle realistic uncertainty in pose?
- Is the baseline method utilizing a submapping approach as well, and if so how does it work?
Thank you for your insightful and constructive feedback on our manuscript. We greatly appreciate the time and effort you have dedicated to reviewing our work. We will revise the manuscript, taking into account each of your valuable comments. We would like to emphasize that the core contribution of this research is establishing a novel connection between neuroscience and SLAM, rather than proposing a state-of-the-art SLAM algorithm.
Re: What does the map representation look like?
We apologize for the confusion regarding the local map and surprisal. The local predictive spatial map is a dimensional tensor. In each coordinate (h, w), it represents C+1 dimensional vectors, where the first C dimensions (0 to C-1) denote color and the last dimension denotes confidence. If the observation is RGB as in the proposed environments, C is 3. Similarly, C is 1 if the observation is in grayscale. On the other hand, if the observation comes from an occupancy sensor (i.e., LIDAR in Section 5.2 and Neural SLAM module in Section 5.3), C is 0, which means that there is no color. We will clarify this in the revision.
The value of the C-th dimension represents the confidence of each cell observed during exploration. It is gradually decaying like a biological agent’s working memory as we mentioned in Section 3.1. We kindly remind you that the primary goal of this paper is to apply neuroscience theory and foundations to SLAM.
Please refer to the attached codebase (lib/memories/memory.py) for the detailed actual implementation. If you have further questions, please let us know. We will be happy to answer them.
Re: Equation 2 is also highly confusing
The C-th channel of the local predictive map, , where H and W represent the height and width of the map, contains the confidence level of each cell. If a cell has been observed before, it holds a nonzero value; otherwise, its value is 0. On the other hand, the C-th channel of the spatially transformed observation at time , denoted as (refer to Figure 3 for a visual representation), indicates whether a cell is currently visible (1) or not (0).
Therefore, represents the summation of the confidence of each visible cell in the local predictive map before updating the current (time ) observation. Similarly, represents the number of visible cells in the current observation. Hence, Eq. (2) denotes the average confidence of a visible cell from the current observation.
Re: “Surprisal - How can an error be used as uncertainty estimation?”
Surprisal is usually defined as future prediction error or distribution difference in many research areas such as psychology [1], reinforcement learning [2], and neuroscience [3]. However, surprisal can also be interpreted as uncertainty (predicted variance), as demonstrated by Kendal et al. [4]. Therefore, we refer to it as ‘uncertainty estimation’ in our work. We will revise the terminology for clarity in our revised manuscript.
[1] Modirshanechi, Alireza, Johanni Brea, and Wulfram Gerstner. "A taxonomy of surprise definitions." Journal of Mathematical Psychology 110 (2022): 102712.
[2] Achiam, Joshua, and Shankar Sastry. "Surprise-based intrinsic motivation for deep reinforcement learning." arXiv preprint arXiv:1703.01732 (2017).
[3] Sinclair, Alyssa H., et al. "Prediction errors disrupt hippocampal representations and update episodic memories." PNAS 118.51 (2021): e2117625118.
[4] Kendall, Alex, Yarin Gal, and Roberto Cipolla. "Multi-task learning using uncertainty to weigh losses for scene geometry and semantics." CVPR. 2018.
Re: Does the method assume perfect localization and if so how does it handle realistic uncertainty in pose?
We do not assume perfect localization. Results from Robot simulations using a LIDAR sensor (Section 5.2) and Habitat simulations with the Neural SLAM module, which predicts 2D observations from RGB egocentric observations (Section 5.3), demonstrate that our method works well in noisy, continuous environments without assuming perfect localization.
Re: Table 2 shows memory usage with a unit of (k), what does this mean?
In Table 2, 'k' is used to denote 1000, representing the actual discovered size of the environment (in cell units with the resolution mentioned in Appendix D.2). Regarding resolution, we use the default resolution of turtlebot, 0.05 m x 0.05 m. We will modify the notation to (x1000) in future revisions and convert these values into square meters (m^2) for clarity.
Re: The paper provides statistical information which is good, though it might be better if either the standard deviation or quantile (likely the better choice) were used throughout rather than switching between the two.
Thank you for your comment. We employed a 95% confidence interval from bootstrapping for the proposed environments, as the results are aggregated from various maps not from the same map (distribution), making standard deviation inappropriate. Conversely, in the robot simulation in ROS, we can calculate a 95% confidence interval based on standard deviation as it was run multiple times in the same environments. We will adjust the notation of the 95% confidence interval in Table 2 accordingly.
Re: Is the baseline method utilizing a submapping approach as well, and if so how does it work?
The baseline method does not utilize a submap-based approach for identifying subgoals and planning. However, as mentioned in Appendix A.1, FARMap can be combined with any planners that employ graph- or submap-based approaches for path planning. For instance, GBPlanner uses a frontier method to select a subgoal, while employing a graph-based approach for rapid path planning. It can be integrated with FARMap by replacing the Planner component (such as Dijkstra’s algorithm or RRT) and the subgoal identification mechanism (Frontier) with those from GBPlanner.
Thank you for the comments they definitely cleared up some aspects of the paper that were unclear to me while reading the paper.
We are delighted to have resolved the reviewer's concerns. Please let us know if you have any further questions or comments. Thank you very much.
We sincerely appreciate the reviewers for their constructive reviews and active participation during the author-reviewer discussion period. We are pleased that the reviewers have noted that most concerns have been addressed and they acknowledge our contribution to establishing a novel connection between neuroscience and SLAM. Following the comment from Reviewer 6iE9, we will emphasize this aspect more vividly.
This paper presents an algorithm for robotic navigation and exploration, inspired by neuroscience, specifically the fragmentation in grid cell maps in the rodent's hippocampus. Agents build local maps by using surprise information and setting subgoals for spatial explorating and then store them in a long-term memory (LTM). LTM maps with matching fracture points are matched and used a subgoals in a topological graph to guide global exploration. The method is evaluated on 2D simulations and compared with a frontier exploration baseline.
Strengths:
- Reviewers LKBe, 6iE9 praised the idea of submap generation.
- Grounding with neuroscience (6iE9,VpdJ) for map fragmentation and map recall from LTM.
- Clarity (6iE9, VpdJ).
- Promising results (6iE9, VpdJ)
Weaknesses:
- Benchmark on an old baseline (Yamauchi 1997) without more state-of-the-art exploration and mapping algorithms (LKBe, 6iE9, VpdJ) - although the authors pointed out a comparison with Neural SLAM
- No discussion of suitability of method for real-world applications (VpdJ)
- LKBe found the descriptions of the surprisal mechanic and map representation unclear, without information if localisation is perfect or not - these were clarified by the authors
- No discussion or references of similar ideas around topological maps for exploration, neural SLAM with loop closure, curiosity-driven exploration (6iE9)
- 6iE9 had questions about the analysis of map fragmentation methods, which were addressed by the authors
Based on the scores for this paper (3, 3, 6, despite the engagement of the reviewers in the rebuttals), the paper does not meet the bar for acceptance. I wish the authors best of luck for rewriting it following the reviewers' suggestions and resubmission.
为何不给更高分
The reviewers were not convinced to raise their scores during the rebuttal and discussion period.
为何不给更低分
N/A
Reject