A simulation-heuristics dual-process model for intuitive physics

Shiqian Li,Yuxi Ma,Bo Dai,Yujia Peng,Chi Zhang,Yixin Zhu

OpenReview PDF

提交: 2024-09-14更新: 2024-12-01

摘要

关键词

Intuitive physicsphysical reasoningmental simulationheuristic model

评审与讨论

审稿意见

评分: 3置信度: 32024-10-18

This work investigates the role of mental simulation and heuristics in human physics prediction. Whereas previous works have studied mental simulation and heuristic-driven physics prediction in isolation, this paper hypothesizes that humans make use of both of these strategies and switch between them depending on the context and problem difficulty. They design a new “pouring marble task” with more diverse physical properties. Humans are asked to judge the tilt angle needed to pour marbles from cups under various setups.

优点

Overall, the paper was well-written and easy to understand. The argument was very clearly laid out in the introduction, and the figures look nice.

缺点

When I first read “We employ a grid search method to optimize both θ for the strategic transition and the noise parameters σ for the IPE, in addition to a group of heuristic parameters ω derived from linear regression.”, I wasn’t quite sure what that meant. What was the objective being maximized? Further on in the paper, the authors state “A grid search identified the boundary of 68.2 degrees in simulation time and a dynamic positional noise of 0.2 as optimal for mirroring human judgments." Does this mean that the Simulation-Heuristics Model is fit to the human data? If so, are there separate cohorts for model fitting and model comparison?
I don’t understand the newly proposed heuristic model. As stated in the paper, previous work in heuristic models use predefined rules or fit to human data. In this paper, the heuristic model is fit to simulation data. How is this an accurate representation of human heuristics since humans haven’t seen the examples this model has.
Considering this was only tested on a single problem (pouring), it seems premature to claim this as a general model for resource efficient physics prediction. If correct, we would see this behavior replicated across a variety of tasks using some unified notion of a computational budget. Time is also not a very good proxy for simulation difficulty.
The authors report correlations and RMSE for different models (Heuristic, IPE, SHM), but I couldn't find any statistical tests that compared these models.
Would the findings in this paper not be consistent with a biased heuristic or simulation model? Is there any reason a miscalibrated physics simulation that got the friction, mass, etc parameters wrong wouldn’t also lead to angle-dependent prediction error?
In figure 1c, red is used for both the mean angle estimate and the A-shape points. This was confusing at first glance.
No code is provided, which limits reproducibility.

问题

See weaknesses

审稿意见

评分: 3置信度: 42024-11-04

This paper proposes a new framework - Simulation Heuristic Model (SHM) which is built on top of a linear heuristic model to replicate human prediction as opposed to time-expensive simulation.

优点

The paper asks original questions along the lines of how humans reason and how often they employ simulations vs heuristics to make predictions. The paper performs a user study to understand the impact and significance of the idea.

缺点

The paper is poorly written and presented. Contributions and results - which should be highlighted and form the thesis of the paper - are buried in details. If I understand correctly, the primary idea in the paper is to approximate a noisy simulation (which is computed using equation 1) using a linear model (represented in equation 2) after a certain time threshold for the simulation is met. There are two issues here - I don't think physical simulations can be represented using a quartic equation. Were other heuristic models considered? Additionally, deciding on the time threshold after which the heuristic should kick in requires having a hold out validation set. How was this done with having only 43 participants?

问题

How did other heuristic models do (for example neural nets)? What are the inputs into this heuristic model?
Are results from a user study of 43 participants statistically significant to claim that this dual process works better than IPE? Was a hold out validation set used for computing results using a time threshold deciding by grid search on a training set?
If the heuristic (equation 2) is supposed to "mimic/predict" the simulation (equation 1), how is it possible for SHM to outperform IPE in practice? What data was the heuristic trained with?

审稿意见

评分: 8置信度: 22024-11-07

This work presents a methodology design for looking into intuitive physics engine. A pouring-marble task is designed with various conditions and the results show some interesting behavior in cognitive strategies. Inspired by this, a framework called SHM is proposed for human mental simulation that aligns more precisely with human behavior.

优点

The research topic is interesting and, compared with previous work, the scenario is more complicated and the experiment shows the effectiveness of the new modeling approach.

缺点

An important contribution claimed in this paper is that, compared with previous works that mainly focused on a single task, this work provides a systematic methodology for learning heuristics. However, there is only one task in this paper although with varied conditions. I recommend adding another task with similar settings to show the general utility.

问题

The modeling approach aims for a systematic methodology. How general is this model? Can this model handle some scenarios that the boundary cannot be described with a single parameter?

审稿意见

评分: 3置信度: 32024-11-09

The paper introduces a new framework, the Simulation-Heuristics Model (SHM), which conceptualizes intuitive physics as a dual process: Intuitive Physics Engine (IPE) dominates in short-term simulations, while a heuristic-based approach takes over when the IPE’s simulation extends beyond a certain time boundary.

优点

缺点

-I don't think this work is suitable for submission to ICLR, as it lacks AI/ML elements, learning representation and primarily consists of human experiments. I would recommend the author consider submitting it to CogSci or another more relevant conference.

-Too simple task scenarios, it would more convincing to see how this SHM can helped with other downstream real-world tasks?

-How well can existing VLM do in the proposed tasks?

问题

refer to weakness.

撤稿通知

2024-12-01

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.