PaperHub
4.3
/10
Rejected4 位审稿人
最低3最高6标准差1.3
3
5
6
3
4.0
置信度
正确性2.3
贡献度2.0
表达2.0
ICLR 2025

3D Perception with Differentiable Map Priors

OpenReviewPDF
提交: 2024-09-28更新: 2025-02-05
TL;DR

DMP enhances multi-view vision models with historical features that are learned end-to-end and stored in an efficient and scalable representation

摘要

关键词
autonomous driving3D object detectionmapping

评审与讨论

审稿意见
3

The paper introduces Differentiable Map Priors (DMP), a framework aiming at enhancing 3D perception systems in autonomous vehicles by leveraging historical traversal data. Specifically, the prior knowledge is represented as a differentiable map that can be directly learned from training data. The learned spatial prior features can be fused with features from the current observations, which leads to improved object detection and semantic segmentation performance on the nuScenes dataset.

优点

  1. Integrating historical data is a promising practice for equipping autonomous vehicles with a more robust perception ability, especially when the onboard observation is less reliable (e.g. bad weather or long-range perception).

  2. Maintaining a learnable map prior offers a flexible solution to use the historical information according to training data and target tasks. In addition, the map prior is built on a multi-resolution hash table, which is compact and memory-efficient for real-world applications.

  3. The proposed map prior can be effortlessly incorporated with current detection and segmentation frameworks by feature fusion.

缺点

  1. The novelty of this paper is unclear: The idea of integrating historical data with onboard data for autonomous vehicles has already been explored in works like HINDSIGHT[1] and NMP[2]. The Neural Map Prior proposed in NMP, although not in the form of hash tables, shares a similar design and usage as the Differentiable Map Prior in this work. The major difference is that NMP uses GRU updating and the Differentiable Map Prior in this work can be directly optimized, which does not form enough novelty from my point of view.

  2. The presentation of this paper can be improved: Some figures, tables, and descriptions are not informative and clear enough. For example, Figure 1 and Figure 5 both show examples from the nuScenes dataset to illustrate the large portion of overlapping traversals, which could be redundant to each other. In Figure 2, it is confusing what “global map” represents, is it a traditional map, or the differentiable map priors? Table 1 could also be improved. Firstly, it is unclear what bold numbers represent. Secondly, the table arrangement makes it very hard to tell the improvement brought by DMP, calculating delta values may help. Besides, the in-text citation style needs refinement for better readability.

  3. Experimental results do not demonstrate consistent and strong improvement: From Table 1, it is observed that incorporating DMP does not always lead to better results. For example, for BEVDet, mASE and mAOE is better without DMP. For DEVFormer, mASE, mAAE and mAVE do not improve with DMP.

  4. Ablations are not adequate: The ablation experiments currently presented do not cover the core of the method. For example, the Differentiable Map Prior itself is not ablated. In addition, the fusion between prior features and sensor features is not ablated.

[1] You, Yurong, et al. "Hindsight is 20/20: Leveraging past traversals to aid 3d perception." ICLR 2022.

[2] Xiong, Xuan, et al. "Neural map prior for autonomous driving." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

问题

  1. Building Differentiable Map Prior with a multi-resolution hash map, how to avoid hash collision for finer resolutions? More specifically, if the number of grid points is larger than the number of entries in the hash table, how to assign entries to grid points?

  2. The paper mentions that at inference, the map prior will only be used if it is available. For the scenes that have never been traversed, the method will fall back to the baseline algorithm. How does the method get to know when to fall back to the baseline and when to use the map prior? Does it mean DMP explicitly keeps track of the positions that have been visited?

  3. Following the previous question, is Differentiable Map Prior available for all locations or just for the ones that have been visited? What embedding will be retrieved if a novel location is input?

  4. When training, are DMP and detectors trained together? Are all components in detectors trainable or not? If both DMP, detector encoder and decoder are trained together, how to make sure DMP is not bypassed?

  5. For the comparison with NMP, can you provide more details on how you modify it to use training prior during evaluation? From my point of view, NMP does not use separate priors for training and testing, and its original setting should be comparable with this paper. Results for original NMP should be included for a complete comparison.

  6. In Table 2, while NMP* and DMP have similar NDS and mAP, DMP shows a much higher mIoU. Can you provide mIoU for individual classes and explain this performance gap?

伦理问题详情

Nil

评论

We thank the reviewer for the detailed review and thoughtful questions about the technical implementation of our approach.

Presentation

We greatly appreciate the reviewer's constructive feedback and suggestions. While both Figure 1 and Figure 5 show examples of traversal overlap, they serve distinct purposes - Figure 1 provides a high-level motivating example while Figure 5 demonstrates the exact data distribution used in the official split and our experiments. For Figure 2, we thank the reviewer for pointing out the ambiguity of "global map" and have updated the caption to clarify that it refers to the learned map prior in the revised manuscript. Regarding Table 1, we have modified the in-text citation style and specified that bold numbers indicate the best performing model across all methods (up to the last significant digit).

Mixed Individual Metrics

Thank you for the keen eye! We'd like to clarify that metrics like mASE, mAOE, mAAE, and mAVE are computed only on true positive detections. In 3D object detection, these metrics may appear worse when a model achieves higher recall, as it successfully detects more challenging objects. This is why the community primarily relies on NDS and mAP as key indicators of overall system performance.

Here we show one particular instance from DETR3D [1]:

MethodNDSmAPmATEmASEmAOEmAVEmAAE
Mono3D0.4290.3660.6420.2520.5231.5910.119
DETR3D0.4790.4120.6410.2550.3940.8450.133

Despite achieving higher NDS and mAP, the mATE and mASE are comparable while the mAAE is worse. This pattern is also observed in other strong object detection architectures [2,3].

"How to avoid hash collision for finer resolutions?

While hash collisions do occur at finer resolutions, we hypothesize the MLP decoder that follows the lookup learns to resolve these collisions during training. Such behavior has been noted in Instant-NGP [4].

"How does the method know when to fall back to the baseline and when to use the map prior?"

In our experiments, we use a simple binary indicator (computed from historical data statistics) to determine whether the current location has been visited before.

"are DMP and detectors trained together? ... how to make sure DMP is not bypassed?"

Our approach trains DMP and the detector end-to-end, maintaining all parameters as trainable. The model learns to balance the use of immediate perception and prior information - when current observations provide sufficient information, the model appropriately reduces its reliance on the prior. This adaptive behavior is similar to observations in multi-sensor fusion systems (lidar, cam, radar), where models learn to balance information from different sensor modalities.

References:

[1] Z. Wang, et al., "DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries," CoRL 2022.

[2] Y. Liu, et al., "PETR: Position Embedding Transformation for Multi-View 3D Object Detection," ECCV 2022.

[3] Z. Li, et al., "BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers," ECCV 2022.

[4] T. Müller, et al., "Instant neural graphics primitives with a multiresolution hash encoding," ACM Transactions on Graphics, 2022.

评论

Thanks for the authors' reply, but it seems my comments on weaknesses are not clearly addressed.

评论

We spent considerable time responding to the reviewer's comments and believed we provided comprehensive responses to the concerns.

Regarding ablations, we intentionally designed the fusion module to be straightforward to focus on our core idea: that learned map priors enhance perception performance. We complement this with ablations on the map representation's memory capacity in Figure 7, which we consider the most critical aspect to ablate.

We welcome specific feedback on any particular points that the reviewer believes needs additional clarification or elaboration!

审稿意见
5

This paper introduces map priors for 3D object detection. To obtain multi-scale map representation, this paper utilizes techniques like InstantNGP to encode the map as a hash map. Further, this paper presents a prior fusion module to fuse the map priors into 3D perception models, like BEVDet, BEVFormer. Experimental results demonstrate the effectiveness of the design.

优点

  • This paper introduces map information as the prior of the 3D perception model. The idea is interesting.

  • Main experiments in Table 1 demonstrate improvements. Experiments in Figure 6 are interesting, and demonstrate that map priors help multi-traversal circumstances.

缺点

  1. Map representation.
  • The paper introduces the map priors for 3D perception. It is unaware whether the map representation is an HD-map or not. I suggest the authors clarify the map prior representations. If it is an HD-map, then the cost of this method will be a large issue. Further, introducing map priors means the driving regions are limited (only available in areas with HD maps), which may limit the potential of this method.

  • Ablation studies related to map representation and final NDS performance are also required. Whether SD map can also bring such performance improvements?

  1. Comparisons with prior works related to map segmentation.
  • How about the method compared to other end-to-end driving methods, especially the overall performance on 3D object detection? like UniAD [1] or VAD [2]? This method also introduces map segmentation as an auxiliary supervision.

  • Comparisons with NMP in Table 2. I am not sure about the motivation of this experiment. Whether it is used to show the prior fusion technique is more advanced compared to the one in NMP or not? I think there is a gap between the setting of NMP and DMP, which are used to update the online map, while another is proposed to utilize the map.

[1] Planning-oriented Autonomous Driving

[2] VAD: Vectorized Scene Representation for Efficient Autonomous Driving

  1. Missing related papers. Some papers share similar ideas with the paper, and detailed comparisons are suggested. Some discussions related to the key similarities and differences are required.
  • Mind the map! Accounting for existing maps when estimating online HDMaps from sensors.

  • P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors

问题

See weakness.

伦理问题详情

N/A

评论

We thank the reviewer for the thoughtful review and constructive suggestions for improving our work. We have revised the manuscript to explicitly specify the map prior representation is learned and does not rely on HD-map data.

"Ablation studies related to map representation

Thank you for the suggestion! We have revised the manuscript to explicitly specify that our map prior representation is learned and does not rely on HD-map data. HD-maps have been shown to improve object detection performance in prior work [1]. We are currently running additional experiments on HD-map enhanced camera-based baselines on nuScenes and aim to include these results before the end of the rebuttal period.

"Comparison to other end-to-end driving methods (UniAD or VAD)"

UniAD and VAD are end-to-end driving frameworks that target planning while incorporating multiple tasks including detection, tracking, map prediction, motion forecasting and planning. After carefully examining their published results and code, we find that unfortunately neither method reports their 3D object detection performance.

While our current work focuses on detection, the reviewer's suggestion inspires a promising future direction of extending our differentiable map prior to tackle end-to-end driving tasks.

Motivation of Table 2 Comparison

We agree with the reviewer that NMP's original setting of online map prediction differs from ours. However, their external memory approach serves as a strong baseline for improving perception using past information. The comparison highlights key differences in approach - while both methods leverage historical information for perception from multi-view camera inputs, we aim to demonstrate how our learned prior design improves performance compared to NMP's formulation.

Additional Related Works

Thank you for bringing these papers to our attention. While these papers share the high-level idea of incorporating map information, they differ in their objectives and technical approaches. MapEX and P-MapNet focus on estimating and generating HD maps by leveraging existing map priors (manually annotated HD maps, SD maps), while our work learns priors from historical traversals to improve 3D object detection performance. We have include discussion of these works in our revised manuscript to better position our contributions within the literature.

References:

[1] B. Yang, M. Liang, and R. Urtasun, "HDNET: Exploiting HD Maps for 3D Object Detection," CoRL 2018.

评论

Thanks for the authors' response.

Response 1: For the map representation concern, I understand the map representation is learned and modeled by InstantNGP. But the limited driving region problem on map data still exists. I still have the concern that, if the map prior is available, why we need the instantNGP to model the map representation?

Response 2: I still think the comparison to UniAD or VAD is required as they also propose methods to utilize map priors.

评论

We appreciate the reviewer's response and follow-up discussion.

"the limited driving region .. still exists"

While the reviewer raises a valid point about limited driving regions, this aligns with the practical reality of autonomous driving deployment! Most autonomous vehicles operate in geofenced areas that are repeatedly traversed (e.g., ride-hailing services in specific cities or autonomous trucks on fixed routes). In these scenarios, our method provides a easy and practical boost to perception performance.

"Learned vs Hand-crafted Map Priors"

The reviewer raises a good point that in scenarios where map priors are available, they should simply be used - we've conducted additional experiments detailing such a baseline and kindly refer the reviewer to the common response above. As shown in those results, while traditional map priors do provide modest improvements, our learned approach achieves higher performance while eliminating the need for manual map annotations.

"Comparisons with UniAD and VAD"

We thank the reviewer for the thoughtful suggestion regarding UniAD and VAD comparisons. These are indeed impressive works that have advanced the field, however, their map modules serve a fundamentally different purpose. They perform online semantic map segmentation as part of an end-to-end driving stack, focusing on planning. In contrast, our work focuses specifically on learning priors to enhance perception. Given these different goals and technical approaches, direct comparisons may not be most instructive.

However, we're open to exploring meaningful ways to compare our approaches. For instance, we could conduct trajectory prediction experiments if the reviewer thinks this would provide valuable insights.

审稿意见
6

The manuscript proposes to leverage the fact that autonomous vehicles (AV) will be deployed in areas that are and will be extensively observed during previous trips. That information is available during training and should allow learning priors over the map that can be leveraged during inference when the vehicle is deployed. The idea is to learn a world-aligned sparse, multi-resolution feature representation of the environments during training time by simply backpropagating into those features during normal training. During inference the learned features are retrieved given the vehicles location and fed into the 3D perception system as priors. Two key architectures are investigated: BEV-based systems and DETR-style Transformer detectors. Both show improvement in their performance when map prior information is available. Encouragingly the more traversals have been observed during training the better the performance increase during inference.

优点

  • The key idea of learning a spatial priors for AVs is simple and powerful. It is great to see this work execute on this idea. The proposed approach is elegant and straight forward to add to existing systems. This enhances potential impact.
  • The manuscript is well written and the illustrations are high quality and support the written text well.
  • The experiments are illustrative especially the ablation with respect to number of traversals in training set and performance at different distances. Both of those support the claim that a map prior is learned by showing that (1) when more prior observations exist the learned prior leads to a higher performance boost and (2) when sensor observations are weak (because things are far away) the learned prior can compensate and lift performance substantially.
  • The baseline experiments with NMP shows clearly the benefit of end-to-end learning the map prior during training with the model in the loop.

缺点

  • A slight weakness is that the improvements of adding the map prior seem to be relatively minor in the overall evaluation. Likely this is due to the observations at close range providing sufficient information to not really need the learned prior (as shown in Fig8). I do wonder what happens if the image information provided to the model gets degraded in some way. Like downscaling significantly or blacking-out chunks of the images, or dropping out all but one frame? A limit experiment would be to provide NO image information to understand how much the models can do without any observations only given the prior?
  • The sparse voxel hashmap representation for the prior is a great choice for a scalable representation. It would be good to add some statistics into the paper as to the required memory footprint per distance traveled (for example). This would (1) give the reader a better sense for actual practical memory footprints of the map and (2) support the claim that the prior map is represented in a scalable way.

问题

  • It was not that obvious upon a first read how the sparse voxel datastructure is setup:
    • Is it 3D or only 2D (I assumed 2D but the word voxel was used?)?
    • How are the 4 levels spaced in voxel resolution 0.5m, ?, ?, 25m?
    • Am I right to assume that T=2^16 means that each side of the 2D grid is 2^8=256 pixels/voxels wide? so the highest resolution layer is 0.5m*256 = 128m wide?
  • How does the approach handle dynamic objects?

伦理问题详情

NA

评论

We thank the reviewer for the thorough review and the feedback on our work.

"What happens if the image information provided to the model gets degraded in some way"

Great idea! We performed experiments to simulate sensor corruption by zeroing out a random subset of the 6 cameras and reporting NDS.

Cameras DroppedBEVFormerBEVFormer+Prior
00.4190.438
10.3790.399
20.3510.367
30.3270.341
60.0150.077

Here, 0 cameras dropped refers to the original results, and 6 cameras dropped is the limit experiment. As we can see, the map prior helps throughout, but more interestingly, the performance is non-zero in the limit.

Additional Hashmap Memory Statistics

We thank the reviewer for the suggestion! To help quantify the memory footprint, we use Boston (6.39 km^2) as a concrete example. For our configuration with a 4 level hash table with feature dimension d=32d = 32 (where each level contains an 8-dimensional vector), maximum number of embeddings T=216T=2^{16} per level, and rmin=0.5m,rmax=25mr_{min} = 0.5m, r_{max} = 25m, the map prior embedding (stored as 4 byte floats) requires only ~12.3 MB, where most of the memory savings come from the bounded hash table size TT. In contrast, a "dense" 2D grid representation with d=32d=32 at 0.5m resolution would require 6.39km2(1000m/km)2/(0.52)324 bytes=3.1 GB6.39 km^2 * (1000 m / km)^2 / (0.5^2) * 32 * 4\ bytes = 3.1 \ GB.

"Is it (the learned prior) 3D or 2D?"

The hashmap operates in 2D BEV space. Since it builds upon representation from the neural rendering literature, we borrowed the term "voxel", but are happy to use "cells" throughout the paper if this better conveys the 2D nature of our approach.

"How are the 4 levels spaced in voxel resolution?"

The resolution grows exponentially per level, with resolutions computed as ri=rmingir_i = r_{min} * g^i, where the growth rate g=(rmax/rmin)1/(L1)g = (r_{max}/r_{min})^{1/(L-1)}. For the configuration used in our experiments:

  • Level 0: 0.5m
  • Level 1: ~1.8m
  • Level 2: ~6.7m
  • Level 3: 25m

"Am I right to assume that T=216T=2^{16} means that each side of the 2D grid is 282^8=256 pixels/voxels wide?"

The parameter T=2162^{16} controls the maximum size of the hash table embedding at each level, acting as an explicit bound for the memory footprint. The grid size is determined by the resolution of the level and the desired area of coverage.

"How does the approach handle dynamic objects?"

We kindly refer the reviewer to see our general response above.

评论

Thank you for addressing my questions and additional clarifications. Upon reading the other reviews, I do agree with the concerns raised about map prior baselines. In particular using for example the basemap from OpenStreetMap or aggregating plain RGB values to the basemap as baselines would strengthen the claim that the learning of base features learns something substantially stronger.

评论

We thank the reviewer for their response and constructive feedback! We have conducted the map prior baseline experiments and have posted the results in a shared response above.

These finding should provide additional insight into our method's effectiveness and we hope the reviewer can consider a revision of the score.

审稿意见
3

This paper presents an approach to learn spatial priors from geolocations. Specifically, the global map is represented as a multi-resolution hap map, which can be incorporated into an existing 3D detection architecture. Experimental results on the nuScenes dataset are reported.

优点

The proposed map prior leads to slightly better 3D object detection results on the nuScenes dataset.

缺点

  1. The motivation of this paper is to use the historical traversals of autonomous vehicles to improve the 3D scene understanding task (line #16, line #33, etc). The proposed approach, however, does not use the historical information at all. It simply uses the geolocation within the global map to learn point-wise embeddings. More elaborations are needed here.

  2. The comparison with the prior work NMP is problematic. The motivation of NMP is to use the historical information to improve the downstream task. The learned prior serves as an external memory module, which can be learned and applied to both training and inference stages separately. Using the training prior during evaluation breaks the setting of NMP and makes it unnecessarily ineffective. And in the NMP paper, no 3D object detection experiments are conducted. It is not clear how it is adapted to such a task in this paper.

  3. The introduction is incredibly short. It is not clear how the proposed approach is different from previous approaches. As a result, it is hard to comprehensively gauge the significance of the proposed approach. A significant revision of the paper is needed.

问题

  1. Why can the proposed map prior improve the object detection? Intuitively, it only learns prior knowledge based on the geolocation only. Why can it be useful? Showing more analysis and discussions would be very useful.

  2. How is NMP adapted for the object detection task? And how can the proposed approach work for map segmentation?

  3. In line #40, it is claimed that the authors "incorporate DMP into three distinct multi-view perception stacks". What does "stacks" means here?

评论

We thank the reviewer for their time and feedback.

"The proposed approach, however, does not use the historical information at all. It simply uses the geolocation within the global map to learn point-wise embeddings"

We respectfully believe there is a misunderstanding. We use historical information during training to build up a global map representation (map prior). This map prior is then used at inference to improve model performance.

These point-wise embeddings are precisely how we distill and store historical information from multiple traversals. Through end-to-end training, the spatial embeddings learn relevant features observed over time at that position, effectively encoding historical context.

"short introduction with unclear differentiation from previous approaches"

We aimed for conciseness in our introduction and have expanded the introduction to more explicitly differentiate our work from prior approaches. As the reviewer noted, the way we encode the historical information as point-wise embeddings is the key differentiator from previous methods.

"Why can the proposed map prior improve object detection?"

Please see our general response above.

"How is NMP adapted for object detection?"

We adapted NMP by integrating the DETR detection head used in the BEVFormer and BEVFormer + DMP experiments. The detection head uses a set of learned query anchors to attend to BEV features through a series of transformer decoder layers to predict 3D boxes. We are happy to provide more details and include this in the Appendix.

"What does 'incorporate DMP into three distinct multi-view perception stacks' mean?"

This refers to integrating DMP with three different perception architectures - BEVDet, BEVFormer, and PETR - to evaluate its performance across various model designs and losses.

评论

As the rebuttal period is nearing its end, we would value the reviewer's feedback on our earlier responses to your concerns. Please let us know if any points need further clarification.

评论

We thank the reviewers for their insightful comments and suggestions. We have updated the manuscript to address:

  • Clarifications on the map prior vs. HD-maps
  • Extended introduction to better contrast with prior work
  • Updated figure captions for clarity
  • Improved Table 1's formatting and citation style
  • Included discussion of related map-based approaches

Here, we address a common question on the benefits of using map priors for object detection.

On map priors for object detection

Empirically, there is an improvement when using past observations of scenes to detect objects that are likely dynamic. This has been shown in prior work (HindSight [1]) and in Table 1.

A few potential explanations for this are:

  • a. Static parts of the scene do not change and their inference is improved by the learned prior. This can act as a prior.
  • b. Past locations of dynamic objects indicate likely positions of future instances (e.g., cars always driving on the roads). This is a well-studied prior in human cognition [2,3]. If it's useful for humans, it's conceivably useful for models as well.

Once again, we sincerely thank the reviewers for their time and feedback. Please see the detailed responses to each reviewer below.

References:

[1] Y. You et al. "Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception." In ICLR, 2022.

[2] S. Charlton et al. "Driving on Familiar Roads: Automaticity and Inattention Blindness." In Transportation Research: Traffic Psychology and Behaviour, 2013.

[3] P. Itini et al. "Route Familiarity in Road Safety: A literature review and an identification proposal." In Transportation Research: Traffic Psychology and Behaviour, 2019.

评论

We thank reviewers tfjT and 1jMR for bringing up the baseline comparison with traditional map priors. We have conducted additional experiments to directly compare DMP against these explicit map priors.

NDSmAP
BEVDet0.3380.262
BEVDet + Map Prior0.3440.267
BEVDet + DMP0.3810.302
NDSmAP
BEVFormer0.4190.320
BEVFormer + Map Prior0.4240.322
BEVFormer + DMP0.4380.348

In these experiments, we rasterize the map annotations for "road_divider", "lane_divider", "pedestrian_crossing", "road_segment", and "lane" classes to be used as the map prior. To ensure a fair comparison, we use a linear projection + the same Convolutional Fusion block from our method to integrate both these traditional map priors and our learned priors with the sensor features.

The results demonstrate that while traditional map priors do provide modest improvements, our fully learned priors achieve stronger performance gains without requiring additional map annotations.

AC 元评审

The paper presents a differentiable map prior to improve 3D object detection and segmentation based on learned spatial priors from past traversals of a scene. The idea is interesting, simple and shown to be effective in experiments. However, similar motivations have been considered in prior works and while the differentiability of the proposed approach is appreciated, the paper does not sufficiently distinguish itself from previous works that originated the idea. Overall, given the several ways in which the positioning and experimentation can improve, the paper is not recommended for acceptance at ICLR. It is suggested that the authors incorporate all the review suggestions to improve the paper and resubmit to a future venue.

审稿人讨论附加意见

Reviewer hPaK does not favor acceptance, since their query on novelty with respect to NMP and Hindsight was unanswered in the rebuttal. Even if some discussion on it is present in the main paper, a statement on distinctions beyond end-to-end differentiability would be valuable. Their observation on mixed improvements across metrics was also not sufficiently analyzed beyond pointing to prior works. While concerns raised by Reviewer W1RT are deemed by the AC as addressed by the rebuttal, Reviewer tfjT remains unconvinced by the rebuttal regarding choice of map representation and lack of comparisons to specific methods. While Reviewer 1jMR leans to accept, they share concerns with other reviewers on the strength of the map prior baselines. Overall, the reviewers do not support acceptance based on the several ways in which the paper can be improved.

最终决定

Reject