Inferring mass in complex physical scenes via probabilistic simulation

Presented by Jessica Hamrick

MathPsych, August 6th, 2013

In collaboration with:Peter BattagliaTom GriffithsJosh Tenenbaum

WALL-E the robot has a job to collect and compact trash. He makes these "trash cubes"...

...and has to stack them up into enormous structures.

At first glance, this seems like a trivial problem, but reasoning about the physical world like this is actually a challenging problem.

How should WALL-E stack the cubes (which might have an uneven distribution of weight) so that the stacks will not topple?
If there are places where the trash might be heaped in precarious or dangerous ways, how does WALL-E know to avoid those places so he doesn't get hurt?

People can reason about the physical world in this way, but by and large we don't even realize when we are doing so. Our ability to reason about the everyday physical world like this is what is called out "intuitive physics".

Found photo: a soldier (?) rushing out of the way of an unstable bombed-out building in what seems to be post-blitz France. Flickr.com. Retrieved July 15th, 2013, http://www.flickr.com/photos/theeerin/8515832971/

We encounter situations in which we use intuitive physics in many aspects of our lives. On one end of the spectrum, intuitive physics helps us navigate our world safely. Like this soldier, you would probably instinctively run out of such an unstable and unsafe building. If you didn’t, there is a good chance you would be injured.

Grub, M. Gravity Glue. Retrieved February 21st, 2013, from http://www.gravityglue.com/

On the other end of the spectrum, intuitive physics evokes a sense of wonder and awe when our expectations are violated, as in the case of this rock balancing art structure.

Questions

How do people reason about the physical world?

How do people infer the parameters of physical objects?

So, these are the types of scenarios I am interested in: how do people reason about the physical world?

And, in particular, how do we infer the parameters of physical objects, such as mass or friction?

Lagerek, C. Forklift truck lifting large container high in the air. Shutterstock.com. Retrieved February 2nd, 2013, from http://www.shutterstock.com/pic.mhtml?id=5677750

For example, this scene should cause you to form strong inferences about the mass of the forklift, the storage container, or possibly both. In order for this scene to be stable, the forklift must be very heavy, and/or the storage container must be very light.

How is it that you are able to infer this parameter - mass - just by looking at a static image?

Inferring mass in two-body collisions

Todd & Warren (1982); Gilden & Proffitt (1989); Runeson, Juslin, & Olsson (2000); Hecht (1996)Sanborn, Mansingkha, & Griffiths (2013)

There is long history of work studying how people infer mass in simple, 2d scenarios. For example, upon seeing two blocks colliding (such as the ones in this figure), which do you think is heavier?

However, researchers have primarily focused on the limitations of people's reasoning, assuming that they have little to no knowledge of physics.

Some have argued that people directly “perceive” properties like mass.
Others argue that they rely on various heuristics a biases, such as that the object moving faster after a collision is the lighter object.

More recent work by Sanborn, Mansingkha & Griffiths has reexamined these scenarios.

They found that a model that does have accurate knowledge of physics, but has perceptual uncertainty, is able to capture people’s responses.
So, it is not necessarily the case that there is no physics knowledge, just that it might be limited by other factors like perceptual constraints.

However, all this previous work has focused on simple, 2D scenes, yet the world we live in is 3D and very rich and complex.

Physical predictions in complex scenes

Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)

My previous work has looked at how people reason about more complex scenes, such as these towers of building blocks.

We showed people towers like these and asked them questions such as "will the tower fall?" or "in what direction will it fall?" and developed a model based on simulation to explain people's behavior.

Physical predictions in complex scenes

Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)

The intuition for the idea of simulation comes from that we can imagine what will happen when the tower falls... and we can practically “see” the movement of the blocks once it has started to fall.

People can:

Infer mass in simple scenes
Make predictions in complex scenes

Can people infer mass in complex scenes?

Which color is heavier?

So, from these two previous lines of research, we hae that:

People may use accurate physics knowledge to infer mass in simple scenes
People may use a type of physical simulation to make predictions in complex scenes

The next logical question is: can people infer mass in complex scenes?

For example, you see a tower of blocks fall down.
It has red and yellow blocks, and you know one color is heavier, but not which color.
Can you infer which is the heavier color?

Trial Structure

(2 possible trial orderings)

View tower

Answer "will it fall?"

Observe feedback

Binary text feedback
Video and text feedback

(Some trials) Answer "which color is heavier?"

This is exactly what we asked people to do. We ran an experiment in which people saw 40 different towers, in two possible orderings. Each trial was structured as follows:

First, people saw the tower We then asked them to answer the question, "will the tower fall?" After they answered, we showed them feedback. Either they saw binary text feedback, in which they just saw "will fall" or "won't fall" printed on the screen, or they saw both text and visual feedback, where they actually got to see a movie of the tower falling or not falling. Finally, on some trials, we also asked people which color they thought was heavier.

Feedback mass ratios

We additionally split people into groups based on mass ratio, which determined the feedback they would see.

Feedback mass ratios

For example, if yellow is heavier, this is a stable tower, so people would see feedback where the tower does not fall.

Feedback mass ratios

On the other hand, if red is heavier, then the tower is unstable, so people would see it fall.

So, depending on the mass ratio, people saw different feedback, which (ideally) would lead them to make different inferences about which color was heavier.

"Which is heavier?": Trial 8

This is indeed what we found. This plot shows the proportion of correct "which color is heavier?" judgments after the 8th trial.

The majority of people correctly inferred the correct color after only 8 trials -- here, 7/8 conditions are significantly above chance.
This was the same for the rest of the experiment as well; trial 40 looks essentially the same.
The one exception was this one condition (\(r_0=0.1\), order 1, binary feedback) which never seemed to infer the correct mass ratio -- even at trial 40, they were still at chance.
- When we looked closer at this condition, we found that the first few trials gave confusing, "anti-informative" feedback, meaning that the feedback made it look like the actual heavier color was the lighter color.
- So, we think that people saw these trials, made the incorrect inference, and then got confused when later trials contradicted their initial inference.
- However, the corresponding visual feedback condition (\(r_0=0.1\), order 1, visual feedback) learned the correct mass ratio. We think this is because there is more information present in the visual feedback. In addition to "fall/not fall", people might be also paying attention to things like how many blocks fall, what direction, etc. So, we think this extra information was enough to let people recover from the initially confusing feedback.

"Which is heavier?": Trial 1

Despite the one anomolous condition, I want to stress how quickly most people seemed to infer the correct mass in general. This is the same plot, but for Trial 1, and as you can see, after only this single trial of feedback, many people had already inferred the correct mass. There seems to be some very powerful oneshot learning going on here: people are quite sensitive to the feedback they are getting.

People can:

Infer mass in simple scenes
Make predictions in complex scenes
Infer mass in complex scenes

How?

So, we asked the question "can people infer mass in complex scenes?", and the answer is yes, they can!

The next question is: how? What is the mechanism underlying people's ability to make these inferences?

Modeling people's inferences

Combine two approaches:

Simulation-based physics knowledgeHamrick et al. (2011); Battaglia et al. (under review)

Rational approach to inferring physical parametersSanborn et al. (2009; 2013)

We developed a model that combines the two previous approaches I mentioned earlier: the simulation based physics knowledge that I have helped to develop, and the rational approach to inferring physical parameters by Sanborn et al.

"Intuitive Physics Engine" (IPE)

1. Perception

Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)

I will first describe how the simulation component of the model works.

First, the model views the scene and forms an internal representation of the objects in the scene. However, it has some perceptual uncertainty, so when it perceives the scene there is some noise in the localization of objects.

Perceptual Samples

Your browser does not support the video tag.

To give you a sense of what I mean by perceptual noise, I am going to show you a short video of different perceptual samples that the model might take. You can tell that they all come from the same original tower, but that they are all also slightly different.

"Intuitive Physics Engine" (IPE)

2. Physical Reasoning

Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)

Ok, so once the model has drawn some perceptual samples, it runs these forward through a process of physical reasoning. This is actually a physical simulation, like that of a physics engine that you might find in a video game.

"Intuitive Physics Engine" (IPE)

3. Decision

Hamrick, Battaglia, & Tenenbaum (2011); Battaglia, Hamrick, & Tenenbaum (under review)

Finally, the model takes the final states from the simulations and compares them to the initial states in order to come to a decision on the physical judgment, such as "will the tower fall?". By considering all the samples, the model can approximate a distribution for the probability that the tower will fall. We will take this distribution now and embed it in our inference framework to allow for belief updating.

Belief updating

As before, the model views the scene...

Belief updating

And uses its physics knowledge to compute the probability that the tower will fall given a particular mass ratio. It can then integrate over mass ratios in order marginalize out that variable and give the distribution only over whether the tower will fall, independent of what the true mass ratio is.

Belief updating

Next, the model produces a judgment of "fall" or "not fall" using this same distribution (engaging in a probability matching strategy).

Belief updating

Finally, after the model observes feedback, it can evaluate the likelihood of that feedback under the same probability distribution for whether the tower will fall that I described before. It then updates its beliefs using Bayes's Rule, giving a posterior distribution over masses, reflecting belief about which mass ratios are more likely.

Models

Random

Guesses "will fall" 50% of the time, and "won't fall" the other 50%.

\[p(J) = 0.5\]

Fixed-belief IPE

Uses IPE to make predictions, but ignores feedback/does not update beliefs about mass.

\[p_t(r\ |\ \mathrm{tower}, F_t) = p_{t-1}(r)\]

Learning IPE

Uses IPE to make predictions and to update beliefs about mass.

\[p_t(r\ |\ \mathrm{tower}, F_t)\propto p(F_t\ |\ \mathrm{tower}, r)p_{t-1}(r)\]

In order to see how well our model could explain people's behavior, we compared it with two other models.

First, we computed the model fit for a model that guesses randomly – that is, it randomly says “will fall” 50% of the time, and “won’t fall” the other 50% of the time.

Next, we computed the fit for a model that has knowledge of physics, but doesn’t actually update its beliefs – in other words, it ignores all feedback.

Finally, we computed the fit for the full learning model, which has knowledge of physics and updates its belief about the true mass ratio after observing feedback on each trial.

Model Comparison: "Will it fall?"

For each of these models, we evaluated the likelihood of people's responses to the question of "will it fall?".

If people are taking the knowledge they inferred about mass and generalizing it to other types of physical reasoning, then their responses to "will it fall?" should get change over time as their beliefs about the latent parameter for mass get more accurate.

So, we should see that the learning model does a better job of explaining their judgments than the fixed or the chance model.

If, however, people do not generalize their inferences, then we should see that the fixed model does better than the learning and chance models.

Model Comparison: "Will it fall?"

Here is chance performance -- it is the same for all conditions because again, it just assumes people guess "will fall" 50% of the time and "won't fall" the other 50%.

Model Comparison: "Will it fall?"

As we would expect, the learning model is better than chance at explaining people's behavior, except in the condition where people did not learn.

In most conditions, the learning model is also significantly better than the fixed model, indicating that people's knowledge does transfer to other types of physical judgments.

There were two other cases in which the fixed model and the learning model were equally good at explaining people's behavior, though both were better than chance.

This shows that a model with knowledge of physics is most important in explaining people's responses.
However, it also suggests that our learning model isn't a perfect account of people's behavior -- there is something else going on that isn't fully captured by the learning model.

In general, however, these results indicate that the learning model is at least a good starting point for explaining how people update their beliefs about mass over time and how they generalize that knowledge to other types of physical judgments.

Future Questions

How much do people get from a single trial?

Is evidence as computed by the model predictive of people’s accuracy on “which is heavier?”

What extra information is in visual feedback?

Direction
Number of blocks
Which blocks
...

Primacy effects?

Is it due to imperfect/noisy belief updating?
What resource limitations would cause that?

If it is in fact the case that people were sensitive to the "anti-informativeness" of the confusing feedback in the \(r=0.1\)/binary/order 1 condition, then we should be able to vary this information content and see systematic changes in people's responses.

Relatedly, if it was the extra information in the videos that allowed people in the visual feedback condition to recover from the confusing feedback, then we would like to know which extra information they paid attention to. As I mentioned before, they could be also using features such as direction, number of blocks that fell, etc., in addition to fall/not fall.

Finally, people seemed to be more sensitive to information that was present at the beginning of the experiment. An ideal Bayesian learner should not exhibit any order effects, so perhaps the primacy effect we saw was a result of resource limitations or imperfect belief updating.

People can:

Infer mass in simple scenes
Make predictions in complex scenes
Infer mass in complex scenes
With or without video feedback
One shot learning

General behavior explained by:

Approximate Newtonian knowledge
Rational inference of unobserved parameters

This is an important step towards understanding how people learn about the environment around them, and gives a framework for modeling their behavior in physical scenarios.

Thanks!

In conclusion, we asked the question, "can people infer mass in complex scenes?". We determined that the answer is yes, and that they are very adept at doing so -- people can infer the correct mass after receiving only binary feedback, and sometimes they can make the correct inferences after only a single trial.

We developed a model which explained people's general behavior by combining approximate knowledge of Newtonian physics (as instantiated in a probabilistic simulation) in a rational framework for inferring unobservable parameters like mass.

This work is an important step in understanding and explaining how people behave and reason about the physical world, and provides a framework for studying how they learn about other parts of their environment (for example, other parameters like friction, or perhaps different dynamics entirely).

Extra slides

Model Comparison: "Which is heavier?"

Ideal Observer Beliefs

Mass tower pairs