PaperHub
4.3
/10
Rejected4 位审稿人
最低3最高8标准差2.2
3
3
3
8
3.0
置信度
ICLR 2024

Planning with an Ensemble of World Models

OpenReviewPDF
提交: 2023-09-16更新: 2024-02-11
TL;DR

This work propose a set of world models for better motion planning and evaluation of planners.

摘要

关键词
Motion PlanningEvaluating Motion PlanningWorld Models

评审与讨论

审稿意见
3

This paper introduces the discrepancy problem regarding the realism of a public motion planning simulation environment. The author argues that the existing simulation environment cannot accurately replicate city-specific driving behaviors and proposes to assess realism by measuring the planning performance when replaying recorded human driver logs within the simulation. While the behavior of agents in the existing simulation environment is governed by a set of hyperparameters, such as maximum acceleration and deceleration, the author proposes a city specific environment by tuning these hyper-parameters according to human motion planning performance. Then, a MPC based city-specific motion planner is introduced by identifying the city through map classification and utilizing city-specific world models. The authors conducted an evaluation using a public motion planning benchmark.

优点

This paper raises a valid problem on the lack of behavior realism in the NuPlan reactive simulation environment with statistics support their claim (Figure 1).

The city-specific NuPlan environment could be beneficial for the community.

缺点

Quantify the realism of driving behaviors for non-playable agents in simulation is a challenging problem. I can not agree with the author on measuring realism of the simulator using replayed human driving logs. The simulated agents may just behave more diversely in the simulation which could cause the replayed human driving logs to have a poor planning performance due to collisions. In fact, some could argue that realistic simulation requires the non-playable agents to be flexible and diverse while behaving realistically in distribution[1]. Extensive clarification on the reason for using replay human logs as an measurement of realism would be needed.

I think the proposed work depends heavily on the previous success of PDM-C planner which is not carefully discussed in this paper.

I am not sure if I would say the author proposed city-specific gyms "closely model real-world behavior statistics" in Figure 1. Maybe a KDE plot would help to see the difference.

[1] Simon Suo, Sebastian Regalado, Sergio Casas, and Raquel Urtasun. Trafficsim: Learning to simulate realistic multi-agent behaviors. In CVPR, 2021.

问题

Question mentioned in the weakness section would help clarify the reason for using replayed human driving logs to measure realism of the simulator.

Figure 2, are the blue lines represent the center line of each lane?

At the end of section 3, the author mentioned simulating "Boston-like driving behavior for driving logs collected in Pittsburgh." Could the author please explain in detail how this simulation was achieved?

评论

We thank the reviewer for the valuable feedback acknowledging the value of the problem and usefulness to the community.

Realism of simulator: Quantify the realism of driving behaviors for non-playable agents in simulation is a challenging problem....

We agree with the reviewer that the quantification of realism is a challenging problem. We address the above concern under Realism of simulator in the common rebuttal.

Novelty of adaptive planner: I think the proposed work depends heavily on the previous success of PDM-C planner which is not carefully discussed in this paper.

We would like to emphasize that we use PDM-C as a baseline planner as stated in Table 2. We propose a few modifications to the planner: a) we changed the predictive model of agents’ behavior from constant velocity used in PDM to IDM model (Section 4, Improving PDM-C’s world model), b) adapting the behavior model of agents based on the city (Section 4, City Identification for adaptive planning), c) Minor one: we extend the number of proposals from just lateral and longitudinal variations to other variations like offsets to max acc and max dec. Our primary focus of the work concerns to improve the agents’ reactive behaviors in Gym simulators in nuPlan evaluation benchmark (in C3 settings) to reflect the real-world human behavior. For this, we propose parametric city-specific Gyms for each city in the nuPlan dataset as tabulated in Table 4. The optimized parameters for city-specific-gyms provide several insights of the agents’ behavior in each city. Further, we wanted to model the aspects of city-specific behaviors in an MPC-based planner. We picked PDM as a baseline. The default PDM planner modeled the agents’ behavior in the world as a constant velocity predictive model. We modified the constant velocity model to an IDM model with optimized parameters for each city and we called it City-Driver. Our main contribution is not developing a new adaptive planner but re-purposing the existing s-o-t-a MPC-based planner (PDM) to make it city-adaptive. As a by-product, our proposed City-Driver achieves state-of-the-art results on the nuPlan benchmark, reducing test error from 6.4% to 4.8%. We will try to polish the paper to emphasize our main contribution and tone-down any over-claim of adaptive planner.

I am not sure if I would say the author proposed city-specific gyms "closely model real-world behavior statistics" in Figure 1. Maybe a KDE plot would help to see the difference.

Thanks for the suggestion. We can add the plot in the paper.

评论

Thank you for replying some of the questions from the reviewers. Overall, I tend to agree with reviewers 8uCs and np2p on the limited contributions of the proposed city-dependent environment and the presentation of the work. In particular, enforcing the setup of studying realism in simulations to Nuplan and the Nuplan benchmark alone with limited measurements of realism. The presentation of the work needs to focus more on the main contribution which may be improving the realism of the simulator. Backing up claims with sufficient data analysis (in addition to Figure 1) (like action distribution of non-ego agents, number of non-parked agents?) is crucial for establishing the validity of your argument.

审稿意见
3

This paper focuses on motion planning for autonomous vehicles (AVs).

The contributions are as follows.

First, it identifies a limitation/bug of the nuPlan simulator, a data-driven reactive simulator for motion planning for AVs. The limitation is that in the simulation, when controlling other vehicles with rule-based planners, they are always projected to the road, including the parked cars. This makes the simulation unrealistic. They address the limitation by allowing the other vehicles to follow their ground-truth trajectories while still being reactive. Experiments show that this makes the simulation more realistic, based on the metric to what extent can the simulator reproduce the behaviors of other vehicles when the ego-vehicle follows the ground-truth trajectory.

Second, the authors argue that each city has its own dynamics and challenges. As such, for better evaluation of the planners, we should make city-specific benchmark scenario. To do so, they optimize the behaviors of other agents for each city using city-pecific data. Experiments show that without surprise, the planner that uses a generic model performs in city-specific benchmark scenarios. To improve the performance, the authors propose to learn a classifer to identify the current city and use the corresponding model to do rollouts during planning. Experiments are performed to verify the effectiveness of this approach.

优点

The presentation of the ideas is good, with illustrating plots and examples. The paper is easy to read. Extensive experiments are performed to verify the claims.

缺点

My main concern is on the contributions of this paper.

The first contribution, which is to test and improve the existing simulators, is very limited. It focuses on a single pre-existing simulator. The realism of simulation is measured and improved on a single metric: accuracy of other agents if the ego-vehicle follows ground-truth. This metric ignores other important factors regarding the realism of simulation: what happens if the ego-vehicle does not follow ground-truth.

The second contribution, making a number of models for a number of scenarios and using a classifer to learn to use which model for the current scenario during planning, is also not novel. Moreover, I don't find this method promising. First of all, there is an issue of scalabilities. When there are more cities, you need to make more models, and the classifer needs to be bigger. Second, there is an issue of generalization. What if the vehicle is running in an unseen city. Third, apart from the city, there are many other factors that could affect the dynamics, such as if it is peak hour.

问题

  1. I understand that compared to C3, C4 allows the other agents/vehicles to traverse on the ground-truth trajectory without being projected to the road while remaining reactive. But how? I cannot find the details.
评论

Thanks for the valuable feedback and acknowledging the quality of presentations of ideas in the paper with appropriate plots and examples.

Realism of simulator: The first contribution, which is to test and improve the existing simulators, is very limited. It focuses on ...

We address the above concern under Realism of simulator in the common rebuttal. Kindly go over it. We can discuss the prospects of any new metrics for this case, if you have any ideas to throw. We can continue the distinction.

Scalability and generalizability of city-specific models: The second contribution, making a number of models for a number of scenarios and using a classifer to learn to use which model for the current scenario during planning, is also not novel...

We wish to emphasize that the city classifier is not a major contribution in this work. We addresses your above concern under Scalability and generalizability of city-specific models in the common rebuttal.

I understand that compared to C3, C4 allows the other agents/vehicles to traverse on the ground-truth trajectory without being projected to the road while remaining reactive. But how? I cannot find the details.

This is quite a simple change. Notable point is that the road graph is built using nodes and edges on the road lane centreline. And, this is different from the agent ground truth trajectory in the recorded log. For C3 simulation, the non-playable agents are projected to the closest road graph node and then IDM logic is employed to navigate. Here, even the parked cars are snapped to the road graph and it moves along on the road graph. For C4, the simple change we made is to restrict the agent path only on the ground truth trajectory. This way, scenario simulation in C4 appears more like a recorded log. On visible evidence is that this allowed parked cars to be as parked cars in C4.

审稿意见
3

The paper proposes an approach in which datasets that are turned into simulators can be evaluated for realism, for motion planning experiments. To this end, the work presents a way to measure how different the simulated environment is from real-world situations, and suggests the use of an adaptive planner for a reactive environment. The specific datasets and simulators here are CARLA and nuPlan, used for navigation in traffic.

优点

The question / problem the paper is addressing is interesting and useful. It starts from an investigation of statistical differences between places, and finds the nuPlan simulator doesn't reflect real-world human behavior.

I also like the work helps to spotlight city-specific insights, and ways to deal with such differences.

The formal presentation of the work is good.

缺点

The delineation of contributions seems somewhat ambiguous, particularly concerning the novel aspects beyond the insights into nuPlan. The introduction cites the development of an "adaptive-planner" as a major contribution. However, upon reading the paper, it appears that the problem is addressed using an existing planner rather than introducing a new one. This discrepancy between the stated contribution and the actual content may benefit from clarification.

As a result, presentation of the work itself is a probably one of the main weaknesses.

Additionally, the paper could be strengthened by discussing various methodologies to assess realism. Incorporating statistical measures such as summary statistics, event counts, time intervals between events, or probability distributions could offer a more comprehensive evaluation. The current approach might inadequately capture infrequent but significant events. It is also possible that the methods employed are not described in sufficient detail to fully convey their scope and impact.

As a more general idea I'm wondering if it would be worthwhile looking at the individual distributions more closely to investigate the performance of the default behaviors as a result of averaging out multiple modalities; potentially could be solved by combining GMM/MPC where the GMM model the behavior of the traffic. There's quite a bit of work on stochastic MPC in the context of autonomous driving, including non-gaussian trajectory planning, and insights into how this work applies to the given problem would be interesting. (possible relevant references eg https://ieeexplore.ieee.org/document/9133136 or https://arxiv.org/abs/2002.10999)

Minor comments:

  • I don't think introduction of the "Gym" terminology is necessary
  • If I understand correctly, the plots in figure 1 are from 2 simulators and the real world. The caption says using three different simulators. I assume that is a mistake.
  • the references in related works (trajectory optimization paragraph) should be in parenthesis (citep not citet).

问题

Is "city-driver" as the 2nd listed contribution mainly the objective function? I have the feeling I misunderstood this point.

评论

We thank the reviewers for the valuable feedback acknowledging the presentation of the work and city-specific insights.

Novelty of adaptive planner: The delineation of contributions...

We would like to emphasize that we use PDM-C as a baseline planner as stated in Table 2. We propose a few modifications to the planner: a) we changed the predictive model of agents’ behavior from constant velocity used in PDM to IDM model (Section 4, Improving PDM-C’s world model), b) adapting the behavior model of agents based on the city (Section 4, City Identification for adaptive planning), c) Minor one: we extend the number of proposals from just lateral and longitudinal variations to other variations like offsets to max acc and max dec. Our primary focus of the work concerns to improve the agents’ reactive behaviors in Gym simulators in nuPlan evaluation benchmark (in C3 settings) to reflect the real-world human behavior. For this, we propose parametric city-specific Gyms for each city in the nuPlan dataset as tabulated in Table 4. The optimized parameters for city-specific-gyms provide several insights of the agents’ behavior in each city. Further, we wanted to model the aspects of city-specific behaviors in an MPC-based planner. We picked PDM as a baseline. The default PDM planner modeled the agents’ behavior in the world as a constant velocity predictive model. We modified the constant velocity model to an IDM model with optimized parameters for each city and we called it City-Driver. Our main contribution is not developing a new adaptive planner but re-purposing the existing s-o-t-a MPC-based planner (PDM) to make it city-adaptive. As a by-product, our proposed City-Driver achieves state-of-the-art results on the nuPlan benchmark, reducing test error from 6.4% to 4.8%. We will try to polish the paper to emphasize our main contribution and tone-down any over-claim of adaptive planner.

Realism of simulators: Incorporating more statistical measures for evaluation... The current approach might inadequately capture infrequent but significant events...

We urge the reviewer to read the realism of simulators part in the common rebuttal comment. We aim to improve the realism of agent behaviors in a constrained setup. We would like to argue that the agents’ behaviors have limited scope or flexibility in nuPlan C3 evaluation settings.

We strongly believe that we covered a fair amount of possibilities of agent behaviors optimizing for the city-level behavior modeling. During the rebuttal period, we run the same optimization at the scenario level as shown in scalability and generalizability in the common comment. This will further capture more infrequent agent behaviors. However, we hypothesize that we can push the same optimization at the agent-level for more infrequent agent behaviors and bring out more significant events in lines with your comments. This is currently out of scope in this work which we proposed as future works. We agree with your comments on statistical measures. We currently provide the stats of minimum gap between agents as one of the reliable measure to show the difference between our proposed Gym and the Default-Gym in comparison with real-world behavior as in recorded data. We can try to compute more statistical measures and include them in the final paper.

As a more general idea I'm wondering if it would be worthwhile combining GMM/MPC where the GMM model the behavior of the traffic...

With the comment “averaging out multiple modalities”, our guess is that we average out many traffic behaviors in each city into a single city-level agent behavior model. You are right in pointing it out. To tackle this, instead of optimizing behavior model for each city, we learn to optimize the agent behaviors into different clusters. For more details, kindly refer scalability and generalizability part in the common rebuttal comment. We will look into the works and also cite in our works.

Is "city-driver" as the 2nd listed contribution mainly the objective function? I have the feeling I misunderstood this point.

City-Driver is our proposed city-adaptive planner where we repurpose MPC-based PDM planner with a city-specific agent behavior (predictive) model. This agent behavior model is a parametric IDM model optimized using objective function in Equation 1.

审稿意见
8

This paper introduces a novel approach to enhance motion planning in autonomous vehicles by using city-specific simulations. It addresses the limitations of standard simulators, which often fail to mimic real-world driving behaviors accurately, particularly in different urban contexts. The paper proposes the creation of city-specific gyms, such as Boston-Gym or Pittsburgh-Gym, that better capture the unique driving characteristics of each city. This approach leads to more realistic and effective training of motion planning algorithms. Additionally, the paper introduces 'City-Driver,' a model-predictive control-based planner that adapts to various driving conditions by using these city-specific world models, showing significant improvements in performance on the nuPlan benchmark. The research highlights the importance of considering local driving behaviors for accurate and efficient motion planning in autonomous vehicles. Extensive Testing: Demonstrates through extensive experiments that the City-Driver model achieves state-of-the-art results on the nuPlan benchmark, suggesting that adapting to city-specific driving characteristics is crucial for accurate motion planning.

优点

Innovative Approach: The paper introduces a novel concept of city-specific simulations for motion planning in autonomous vehicles, addressing a crucial gap in existing simulation methodologies.

Realistic Simulations: By creating city-specific gyms, such as Boston-Gym and Pittsburgh-Gym, the paper significantly enhances the realism of autonomous driving simulations, ensuring that they better reflect unique local driving behaviors.

Improved Accuracy: The introduction of the City-Driver model, a model-predictive control-based planner, demonstrates notable improvements in motion planning accuracy and performance, as evidenced by its results on the nuPlan benchmark.

Practical Application: The research directly addresses a practical challenge in autonomous vehicle development, offering solutions that could be integrated into real-world systems.

缺点

Complexity and Scalability: The approach of creating city-specific simulations could be complex and resource-intensive, potentially challenging to scale across numerous cities with distinct driving behaviors.

Generalization: While city-specific models offer increased accuracy, they might limit the generalizability of the motion planning algorithms. A system trained in one city might not perform as well in another without significant retraining.

问题

The reviewer really likes the perspective the authors propose regarding the intrinsic different distribution for driving among different cities. Several minor concerns are:

  • The nuplan dataset is not well balanced (more data collected in Las Vegas compared to the others). Is this going to affect the results to some extents (e.g., LV driver seems to be better than other cities)?
  • While the geolocation would impact the overall driving behaviors, it is quite implicit in some sense. Do the authors have an idea of how to measure this driving behavior distribution shift in a better way?
评论

We sincerely thank the reviewer for the positive comments about the paper.

Scalability and Generalizability: a) Complexity and Scalability: The approach of creating city-specific simulations could be complex and resource-intensive, potentially challenging to scale across numerous cities with distinct driving behaviors and b) Generalization: While city-specific models offer increased accuracy, they might limit the generalizability of the motion planning algorithms. A system trained in one city might not perform as well in another without significant retraining.

We address the above 2 concerns under Scalability and generalizability of city-specific models in the common rebuttal.

The nuplan dataset is not well balanced (more data collected in Las Vegas compared to the others). Is this going to affect the results to some extents (e.g., LV driver seems to be better than other cities)?

For all optimization and evaluation in the paper, we fetched the same number of scenarios from each city. For instance, we use 100 nuPlan test scenarios for every scenario type from each city for evaluation, amounting to 1400 scenarios per city. For optimization, we fetch 1400 nuPlan train scenarios from each city.

While the geolocation would impact the overall driving behaviors, it is quite implicit in some sense. Do the authors have an idea of how to measure this driving behavior distribution shift in a better way?

In the common rebuttal comment, we propose to learn clustering of driving behaviors instead of city based clusters. We believe that classifying and plotting the scenarios from cities in these cluster space may give a good idea of driving behavior distribution of the cities.

评论

Reviewers np2p and 2dAe acknowledge the presentation of the ideas and extensive experiments to verify the claims. Reviewer RiCh acknowledges the statistics provided for lack of behavior realism in nuPlan.

Reviewers 8uCs, np2p and RiCh recommend a reject rating. We summarize and address their major concerns below:

We summarize and address the concerns of reviewers below:

Realism of simulator. Reviewers np2p are concerned that simulator quality is measured with a single metric replaying ego trajectories; for example, RiCh suggests that diversity of simulated trajectories should also be a metric.

We wholeheartedly agree with reviewer RiCh’s comment that the quantification of simulator realism is an open problem. We would like to bring to your attention that default simulator used in the nuPlan’s benchmark is a deterministic simulator i.e, the agents use a world model that uses IDM logic to unroll their trajectories which do not exhibit any variability in their trajectories during the simulation. In this constrained setup, we aim to build a better deterministic simulator world model for agents that reflect human behavior as seen in recorded data. For this, we set up the ego planner to follow log data and the non-playable agents learn simulator world models by optimizing 5 metrics: goal progress, time to collision, driving area compliance, driving direction compliance, vehicle dynamics measures (Refer nuPlan metrics). Our proposed optimized simulator model not only fetches a high score for Log Replay planner improving from 0.93 to 0.95 as shown in Table 3 of the paper but also improves planning accuracy in the nuPlan’s benchmark, reducing SOTA error from 6% to 4% (Table 8) when our world models are used for non-playable agents within model predictive control (MPC) based planners. With regards to comment on the diversity, we believe that diversity in agent behaviors in nuPlan benchmark is enforced by hand-picking diverse test scenarios. In this work, we aim to just simulate the non-playable agents to match the recorded log data and not to generate a diverse set of scenarios from a particular log scenario as in [1]. There are few other works that focus on realism of simulators without using diversity measures as noted in Table 2 of [2]. That said, we are open to ideas to improve the realism of non-playable agent behaviors in the above setup or in general. We are quite happy to take the discussion forward.

Scalability and generalizability of city-specific models. Reviewers 2dAe and np2p are concerned about these.

The primary focus of the work is to improve the agent simulations and ego planning performance in nuPlan benchmark. The nuPlan dataset covers 4 cities for both training and evaluation. Hence, our approach focused tabulating results for these given cities. However, we appreciate your concerns for the general AV settings for scalability and generalizability. We will add the following discussion to the paper.

For general AV setup, without loss of generalizability, we can learn clustering of distinct driving behaviors from nuPlan dataset instead of learning city-specific models. With the given nuPlan recorded logs covering diverse, distinctive driving behaviors, we can build clusters of different driving behavior models. Given a scenario in a new city, we can classify it to a cluster which is the closest driving behavior of agents and use it as an agent behavior model in our planner. We build clusters of distinctive driving models based on kmeans clustering. We learn these clusters on the IDM parameter space using the objective function in Equation 1. We initialize clusters randomly and we ablate on number of clusters and iterations. With this cluster-based approach which we call Behavior-Gym, we see improvements in Gym performance on the realism metrics.

ModelsGym modelsCityC3
Human replayBehavior-GymAll96.94

Further it also improves the Driver model to have improved planning performance on nuPlan benchmark as shown below. Here, similar to the city classifier which we proposed in the paper, we employ a behavior classifier. During the inference of planners, we take the inputs of history of the ego and other agents in the scene and pass through LaneNet as described in Section 4 and output the cluster. The behavior model of that cluster is used as the agent behavior model in our proposed planner. Using this, we see that we can further improve the nuPlan benchmark by reducing the error from 6.4% to 4.6 %. We will add this to Table 8.

ModelsDriver modelsCityC2C3
OursBehavior-DriverAll95.0995.32

References:

[1] Simon et al. Trafficsim: Learning to simulate realistic multi-agent behaviors. In CVPR, 2021.

[2] Montali et al. The waymo open sim agents challenge. arXiv 2023.

AC 元评审

Summary of the Paper: The paper presents an approach to enhance motion planning in autonomous vehicles by creating city-specific simulations and an adaptive planner, City-Driver. It addresses the nuPlan simulator's limitations in accurately replicating diverse urban driving behaviors. By tuning hyperparameters to city-specific data, the paper proposes more realistic simulations and introduces a model-predictive control-based planner that adapts to varying driving conditions using city-specific world models. The paper's evaluation on the nuPlan benchmark demonstrates improvements in motion planning accuracy and performance, underscoring the significance of localized driving behaviors in autonomous vehicle development.

Strengths: The paper's primary strength lies in its approach to motion planning through city-specific simulations, addressing the gap in existing simulation methodologies. This approach enhances the realism of autonomous driving simulations, ensuring they reflect unique local driving behaviors. The introduction of the City-Driver model shows notable improvements in motion planning accuracy and performance. The paper is also practically oriented, directly addressing challenges in autonomous vehicle development, and offers solutions that could be integrated into real-world systems.

Weaknesses: The paper's approach of creating city-specific simulations raises concerns about complexity, scalability, and generalizability. The resource-intensive nature of this approach could also make it challenging to scale across multiple cities with distinct driving behaviors. Furthermore, the focus on city-specific models might limit the generalizability of motion planning algorithms.

为何不给更高分

As per the weakness described above. Due to the domain specific scope of the problem and method, I also think that it may be better to submit this work to a conference specialized to autonomous vehicle technology.

为何不给更低分

N/A

最终决定

Reject