On Github jhamrick / mass-inference-mathpsych-2013
In collaboration with:Peter BattagliaTom GriffithsJosh Tenenbaum
WALL-E the robot has a job to collect and compact trash. He makes these "trash cubes"...
...and has to stack them up into enormous structures.
At first glance, this seems like a trivial problem, but reasoning about the physical world like this is actually a challenging problem.
People can reason about the physical world in this way, but by and large we don't even realize when we are doing so. Our ability to reason about the everyday physical world like this is what is called out "intuitive physics".
Found photo: a soldier (?) rushing out of the way of an unstable bombed-out building in what seems to be post-blitz France. Flickr.com. Retrieved July 15th, 2013, http://www.flickr.com/photos/theeerin/8515832971/
We encounter situations in which we use intuitive physics in many aspects of our lives. On one end of the spectrum, intuitive physics helps us navigate our world safely. Like this soldier, you would probably instinctively run out of such an unstable and unsafe building. If you didn’t, there is a good chance you would be injured.
Grub, M. Gravity Glue. Retrieved February 21st, 2013, from http://www.gravityglue.com/
On the other end of the spectrum, intuitive physics evokes a sense of wonder and awe when our expectations are violated, as in the case of this rock balancing art structure.
So, these are the types of scenarios I am interested in: how do people reason about the physical world?
And, in particular, how do we infer the parameters of physical objects, such as mass or friction?
Lagerek, C. Forklift truck lifting large container high in the air. Shutterstock.com. Retrieved February 2nd, 2013, from http://www.shutterstock.com/pic.mhtml?id=5677750
For example, this scene should cause you to form strong inferences about the mass of the forklift, the storage container, or possibly both. In order for this scene to be stable, the forklift must be very heavy, and/or the storage container must be very light.
How is it that you are able to infer this parameter - mass - just by looking at a static image?
Todd & Warren (1982); Gilden & Proffitt (1989); Runeson, Juslin, & Olsson (2000); Hecht (1996)Sanborn, Mansingkha, & Griffiths (2013)
There is long history of work studying how people infer mass in simple, 2d scenarios. For example, upon seeing two blocks colliding (such as the ones in this figure), which do you think is heavier?
However, researchers have primarily focused on the limitations of people's reasoning, assuming that they have little to no knowledge of physics.
More recent work by Sanborn, Mansingkha & Griffiths has reexamined these scenarios.
However, all this previous work has focused on simple, 2D scenes, yet the world we live in is 3D and very rich and complex.
Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)
My previous work has looked at how people reason about more complex scenes, such as these towers of building blocks.
We showed people towers like these and asked them questions such as "will the tower fall?" or "in what direction will it fall?" and developed a model based on simulation to explain people's behavior.
Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)
The intuition for the idea of simulation comes from that we can imagine what will happen when the tower falls... and we can practically “see” the movement of the blocks once it has started to fall.
So, from these two previous lines of research, we hae that:
The next logical question is: can people infer mass in complex scenes?
(2 possible trial orderings)
This is exactly what we asked people to do. We ran an experiment in which people saw 40 different towers, in two possible orderings. Each trial was structured as follows:
First, people saw the tower We then asked them to answer the question, "will the tower fall?" After they answered, we showed them feedback. Either they saw binary text feedback, in which they just saw "will fall" or "won't fall" printed on the screen, or they saw both text and visual feedback, where they actually got to see a movie of the tower falling or not falling. Finally, on some trials, we also asked people which color they thought was heavier.We additionally split people into groups based on mass ratio, which determined the feedback they would see.
For example, if yellow is heavier, this is a stable tower, so people would see feedback where the tower does not fall.
On the other hand, if red is heavier, then the tower is unstable, so people would see it fall.
So, depending on the mass ratio, people saw different feedback, which (ideally) would lead them to make different inferences about which color was heavier.
This is indeed what we found. This plot shows the proportion of correct "which color is heavier?" judgments after the 8th trial.
Despite the one anomolous condition, I want to stress how quickly most people seemed to infer the correct mass in general. This is the same plot, but for Trial 1, and as you can see, after only this single trial of feedback, many people had already inferred the correct mass. There seems to be some very powerful oneshot learning going on here: people are quite sensitive to the feedback they are getting.
So, we asked the question "can people infer mass in complex scenes?", and the answer is yes, they can!
The next question is: how? What is the mechanism underlying people's ability to make these inferences?
Simulation-based physics knowledgeHamrick et al. (2011); Battaglia et al. (under review)
Rational approach to inferring physical parametersSanborn et al. (2009; 2013)
We developed a model that combines the two previous approaches I mentioned earlier: the simulation based physics knowledge that I have helped to develop, and the rational approach to inferring physical parameters by Sanborn et al.
Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)
I will first describe how the simulation component of the model works.
First, the model views the scene and forms an internal representation of the objects in the scene. However, it has some perceptual uncertainty, so when it perceives the scene there is some noise in the localization of objects.
To give you a sense of what I mean by perceptual noise, I am going to show you a short video of different perceptual samples that the model might take. You can tell that they all come from the same original tower, but that they are all also slightly different.
Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)
Ok, so once the model has drawn some perceptual samples, it runs these forward through a process of physical reasoning. This is actually a physical simulation, like that of a physics engine that you might find in a video game.
Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)
Finally, the model takes the final states from the simulations and compares them to the initial states in order to come to a decision on the physical judgment, such as "will the tower fall?". By considering all the samples, the model can approximate a distribution for the probability that the tower will fall. We will take this distribution now and embed it in our inference framework to allow for belief updating.
As before, the model views the scene...
And uses its physics knowledge to compute the probability that the tower will fall given a particular mass ratio. It can then integrate over mass ratios in order marginalize out that variable and give the distribution only over whether the tower will fall, independent of what the true mass ratio is.
Next, the model produces a judgment of "fall" or "not fall" using this same distribution (engaging in a probability matching strategy).
Finally, after the model observes feedback, it can evaluate the likelihood of that feedback under the same probability distribution for whether the tower will fall that I described before. It then updates its beliefs using Bayes's Rule, giving a posterior distribution over masses, reflecting belief about which mass ratios are more likely.
Guesses "will fall" 50% of the time, and "won't fall" the other 50%.
\[p(J) = 0.5\]
Uses IPE to make predictions, but ignores feedback/does not update beliefs about mass.
\[p_t(r\ |\ \mathrm{tower}, F_t) = p_{t-1}(r)\]
Uses IPE to make predictions and to update beliefs about mass.
\[p_t(r\ |\ \mathrm{tower}, F_t)\propto p(F_t\ |\ \mathrm{tower}, r)p_{t-1}(r)\]
In order to see how well our model could explain people's behavior, we compared it with two other models.
First, we computed the model fit for a model that guesses randomly – that is, it randomly says “will fall” 50% of the time, and “won’t fall” the other 50% of the time.
Next, we computed the fit for a model that has knowledge of physics, but doesn’t actually update its beliefs – in other words, it ignores all feedback.
Finally, we computed the fit for the full learning model, which has knowledge of physics and updates its belief about the true mass ratio after observing feedback on each trial.
For each of these models, we evaluated the likelihood of people's responses to the question of "will it fall?".
If people are taking the knowledge they inferred about mass and generalizing it to other types of physical reasoning, then their responses to "will it fall?" should get change over time as their beliefs about the latent parameter for mass get more accurate.
So, we should see that the learning model does a better job of explaining their judgments than the fixed or the chance model.
If, however, people do not generalize their inferences, then we should see that the fixed model does better than the learning and chance models.
Here is chance performance -- it is the same for all conditions because again, it just assumes people guess "will fall" 50% of the time and "won't fall" the other 50%.
As we would expect, the learning model is better than chance at explaining people's behavior, except in the condition where people did not learn.
In most conditions, the learning model is also significantly better than the fixed model, indicating that people's knowledge does transfer to other types of physical judgments.
There were two other cases in which the fixed model and the learning model were equally good at explaining people's behavior, though both were better than chance.
In general, however, these results indicate that the learning model is at least a good starting point for explaining how people update their beliefs about mass over time and how they generalize that knowledge to other types of physical judgments.
If it is in fact the case that people were sensitive to the "anti-informativeness" of the confusing feedback in the \(r=0.1\)/binary/order 1 condition, then we should be able to vary this information content and see systematic changes in people's responses.
Relatedly, if it was the extra information in the videos that allowed people in the visual feedback condition to recover from the confusing feedback, then we would like to know which extra information they paid attention to. As I mentioned before, they could be also using features such as direction, number of blocks that fell, etc., in addition to fall/not fall.
Finally, people seemed to be more sensitive to information that was present at the beginning of the experiment. An ideal Bayesian learner should not exhibit any order effects, so perhaps the primacy effect we saw was a result of resource limitations or imperfect belief updating.
This is an important step towards understanding how people learn about the environment around them, and gives a framework for modeling their behavior in physical scenarios.
In conclusion, we asked the question, "can people infer mass in complex scenes?". We determined that the answer is yes, and that they are very adept at doing so -- people can infer the correct mass after receiving only binary feedback, and sometimes they can make the correct inferences after only a single trial.
We developed a model which explained people's general behavior by combining approximate knowledge of Newtonian physics (as instantiated in a probabilistic simulation) in a rational framework for inferring unobservable parameters like mass.
This work is an important step in understanding and explaining how people behave and reason about the physical world, and provides a framework for studying how they learn about other parts of their environment (for example, other parameters like friction, or perhaps different dynamics entirely).
Battaglia, Hamrick, & Tenenbaum (under review)
Battaglia, Hamrick, & Tenenbaum (under review)
Battaglia, Hamrick, & Tenenbaum (under review)
Battaglia, Hamrick, & Tenenbaum (under review)
Battaglia, Hamrick, & Tenenbaum (under review)
Battaglia, Hamrick, & Tenenbaum (under review)
Battaglia, Hamrick, & Tenenbaum (under review)