Acting as Inverse Inverse Planning

Great storytellers know how to take us on a journey. They direct characters to act -- not necessarily in the most rational way -- but rather in a way that leads to interesting situations, and ultimately creates an impactful experience for audience members looking on. If audience experience is what matters most, then can we help artists and animators *directly* craft such experiences, independent of the concrete character actions needed to evoke those experiences? In this paper, we offer a novel computational framework for such tools. Our key idea is to optimize animations with respect to *simulated* audience members' experiences. To simulate the audience, we borrow an established principle from cognitive science: that human social intuition can be modeled as"inverse planning,"the task of inferring an agent's (hidden) goals from its (observed) actions. Building on this model, we treat storytelling as"*inverse* inverse planning,"the task of choosing actions to manipulate an inverse planner's inferences. Our framework is grounded in literary theory, naturally capturing many storytelling elements from first principles. We give a series of examples to demonstrate this, with supporting evidence from human subject studies.


INTRODUCTION
The ACM SIGGRAPH vision statement is "enabling everyone to tell their stories."But why do stories need to be "told"?Why is a well-told story more impactful than the everyday events occurring all around us?The answer is that storytelling is done for an audience: unlike the dull, indifferent progression of real life, stories are propelled by deliberate artistic choices made solely to craft the audience's experience.Storytellers can intentionally build suspense by withholding information, or cause a surprise "twist" by revealing it at the right time-and they can violate the laws of physics at will.This insight suggests a novel declarative interface for computational storytelling.Instead of authoring a low-level sequence of events, what if a storyteller could directly specify the highlevel desired audience experience?The computer would then Inverse rendering Inverse planning [Baker et al. 2009] Depiction task (desired percept → synthetic stimulus) Inverse inverse rendering Inverse inverse planning Theoretical proposal Durand et al. [2002] Kukkonen [2014] Concrete computational model Chandra et al. [2022] This paper automatically synthesize a concrete animation that evokes that experience.In this paper, we present precisely such an algorithm.We optimize animations to create the desired experience in simulated audience-members, modeled computationally using principled frameworks grounded in modern cognitive science.Because these computational models are well-tuned to human intuitions, the optimized animations create the desired effect in human audiences.
What models should we use?A long line of work from the computational cognitive science community, originating with Baker et al. [2009], has modeled human social cognition with Bayesian inference.This model posits that we always have uncertainty about the goals of people we see around us-however, their actions reveal information about those goals.For example, if the baby in the figure above moves south, we get the sense it probably wants the candy, though it might also want the cupcake.If we model people as approximately-rational planners, where P(action | goal) is highest for the best action to achieve a goal, then action understanding is inverse planning: the problem of inferring P(goal | action).
In this paper, we propose that storytellers add another recursive layer: they choose actions to persuade inverse-planners of certain goals.An actor portraying the baby might choose to move west to emphasize that it wants the cupcake, not the candy-even though moving west is a sub-optimal choice for an audience-agnostic rational agent.More generally, just as Baker et al. cast "action understanding as inverse planning," we cast "acting as inverse inverse planning."This framework is remarkably flexible: it provides a single unifying language-optimization over Bayesian inference-in which storytellers can express a variety of storytelling tasks from first principles.For example, a storyteller might ask the system for argmax action P(baby wants cupcake | action).The optimizer could then automatically suggest west as the best solution.
More broadly, our model naturally captures elements of character, setting, plot, irony, flashbacks, and more.
"Inverse inverse planning" is closely related to the graphics community's past work in depiction of static scenes.In a seminal SIG-GRAPH course, Durand et al. [2002] abstractly cast the task of visual "depiction" as "inverse inverse rendering."An artist paints the canvas to cause viewers ("inverse renderers") to infer the scene they seek to convey, even if the resulting painting is not a faithful, physically-accurate rendering of that scene.For example, a caricature exaggerates facial features to emphasize the subject's identity to a viewer.Recently, Chandra et al. [2022] concretely applied "inverse inverse rendering" to produce new perceptual experiences (illusions) by optimizing over Bayesian inverse-rendering models of human vision.Here, we extend the same framework to animation, which is often called the "illusion of life" [Thomas et al. 1995].We illustrate the analogy between these lines of work in Table 1.
Inverse inverse planning is also grounded in ideas from literary theory.For example, Kukkonen [2014] abstractly casts storytelling as "probability design," thinking of narratives as sequences of observations presented to a Bayesian audience.We provide the first concrete computational evidence supporting that vision.Similarly, American writer Kurt Vonnegut's master's thesis (unpublished, but see the linked lecture video) argues that all stories have simple "shapes" defined by the trajectory of the protagonist's fortunes over time [Kiley et al. 2016;Vonnegut 2005].Reagan et al. [2016] apply statistical analyses to a corpus of novels to extract representative story shapes, and suggest that future work investigate the "opposite direction" of generating a story that matches a given dramatic arc.
Our work here takes a first step in that direction (Section 3.1.5).
Our core contribution in this paper is a novel cognitivelyinspired formalism for storytelling and animation in computer graphics.We begin by reviewing how cognitive scientists treat "action understanding as inverse planning," outlining a concrete model proposed by Ullman et al. [2009] (Section 2).Next, we show how to implement "inverse inverse planning" on top of this model, and demonstrate the remarkable flexibility of our system with a wide variety of example applications (Section 3).We then offer some evidence from IRB-approved human subject studies, showing that inverse inverse planning is more effective than audience-agnostic "naïve" planning for depiction tasks (Section 3.4).For example, in one study viewers were 9× likelier to correctly discern the relationship between two characters when presented with animations created by our method, as compared to naïve planning baselines.Finally, we discuss additional related work (Section 4) and conclude with prospects for future work (Section 5).Sample code for implementing our examples is available in the supplementary materials and online at https://people.csail.mit.edu/kach/a2i2p/.

INVERSE PLANNING
Humans have an astonishing intuitive ability to make inferences about other intelligent agents.Consider the well-known 90-second animation by psychologists Heider and Simmel [1944], which shows three simple shapes moving in 2D.Although these simple geometric shapes do not look like humans, nearly everyone attributes complex human traits to them: viewers consistently report that the shapes have certain beliefs, desires, emotional states, and moralities.A long line of work from the cognitive science community, originating with Baker et al. [2009Baker et al. [ , 2007]], has modeled this intuition with Bayesian inference.These models posit that when we observe an agent take an action, we infer that agent's goal by applying Bayes' rule: P(goal | action) ∝ P(action | goal)P(goal).Here, the likelihood P(action | goal) can be estimated by imagining the agent's planning process: the more optimal an action is for the agent, the likelier it should be.P(goal) reflects our prior beliefs about goals.
This model is considered "inverse planning" in the sense that while planning outputs a plan for a goal, inverse planning infers a goal from a plan.Inverse planning has been extended in a variety of directions, such as in modeling agents that reason about each other via a "theory of mind" [Baker et al. 2008;Tauber and Steyvers 2011;Ullman et al. 2009], and agents who engage in human-like planning, which is not always optimal [Zhi-Xuan et al. 2020].Recent work has shown that inverse planning can account for human judgements of human kinematic motions [Qian et al. 2021], judgements made by young children [Pesowski et al. 2020], and actions humans take to influence each other [Goodman and Frank 2016;Ho et al. 2021Ho et al. , 2022;;Radkani et al. 2022;Shafto et al. 2014;Yoon et al. 2016].
For much of this paper, we will discuss a concrete world model proposed by Ullman et al. [2009], which has been tuned to match human intuitions and demonstrated to do so via human subject studies.In the rest of this section, we briefly review this world model and how inverse planning functions in it.
World model.Ullman et al.'s world (Figure 1) consists of two agents moving in a maze on a grid.In our animations, we will stylize them as two characters: a robot and an enchanted animate cheese cube in a kitchen.The two agents can move north / south / east / west through the kitchen or stay in place.The cheese is "weak" and only succeeds in moving 60% of the time.However, the robot is "strong" and can push the cheese cube along.A table in kitchen blocks the cheese's motions, but can be moved by the robot.Finally, the kitchen floor has two special tiles, pink and green.
The two characters can have a variety of natural goals in the kitchen.The cheese and the robot could each "want" to sit on either the pink or green tile (think of the tiles as "MacGuffins").Additionally, because the robot is strong enough to move the cheese by pushing it around, it could want to "help" or "hinder" the cheese from reaching its goal.
Planning.These dynamics and goals can be formalized as a multiagent Markov Decision Process or MDP (for a detailed introduction to MDPs, we refer readers to a recent textbook by Kochenderfer et al. [2022]).The state space S encodes the positions of the robot, cheese, and table in the kitchen.The action space A for each agent is {←, →, ↑, ↓, stay}.The transition function for each agent encodes how each action affects the state as described above (the transition function for the cheese is stochastic because the action may fail).
Finally, the reward function for each agent captures the agents' goals.The cheese and the robot each receive a fixed reward if they are on their respective goal tiles (pink or green), and pay a small cost for moving instead of staying in place.In addition, the robot receives a "social reward" based on the cheese's reward on this turn.Specifically, if the cheese earns reward  cheese , then the robot earns a bonus reward  robot •  cheese where  robot is positive if the robot is helping, negative if the robot is hindering, and zero if the robot is neutral. robot can be chosen from {−3, −1, 0, +1, +3}, expressing the range from "highly adversarial" to "indifferent" to "highly helpful." Having written this concrete reward function, Ullman et al. compute optimal policies for the two agents by running value iteration [Bellman 1966], setting  = 0.99 and extracting -functions from the computed value functions.They use a hierarchical softmax strategy, first computing a policy for the cheese assuming the robot moves uniformly at random, and then computing a policy for the robot assuming the cheese selects actions with probabilities given by the softmax of its -function with temperature  = 2.0.This allows for two recursive levels of "theory of mind" in the planner: the robot models the cheese modeling the robot.
Figure 1 shows a sample animation we generated by running these optimal policies naïvely for a helpful robot in a random scenario (i.e. in a random state, with random goals where  robot > 0).As we argue in the caption, this is a poor depiction of helpfulnessinstead, it inadvertently conveys indifference.Next, we will quantitatively capture why the depiction fails in this way, by using inverse planning to model how humans experience these animations.
Inverse planning.Ullman et al.'s inverse planner makes inferences about agents' (hidden) goals from (observed) actions.Let a hypothesis be a tuple: Above, we showed that for fixed  , we can use value iteration to compute   robot (, ) and   cheese (, ) for state  ∈ S and action  ∈ A. Assuming the softmax-rational model above, this induces a probability distribution over each character 's actions: Now, if we observe an agent take action  from state , we can apply Bayes' rule to calculate the probability of the characters' goals: Here, P( →  |  ) is the likelihood given by Equation (1) and P( ) is our prior belief about the characters' goals.Before the animation plays, we assume a uniform prior over  , i.e.P( ) ∝ 1, reflecting our ignorance about the characters' goals.Each time we observe a character take an action, we update our belief about hypothesis  .To implement this inference, we maintain a mapping from hypotheses to probabilities.When we observe an action, we update the conditional probability of each hypothesis using Equation (2), re-normalizing so the distribution sums to 1.We can see inverse planning in action in the bottom of Figure 1.The first three plots show how P( cheese ), P( robot ), and P( robot ) change as the animation plays and more actions are observed (plots #4 and #5 are introduced in Section 3.2).As expected, the model is not confident that the robot is helping.In the next section, we show how to use inverse inverse planning to create animations that do effectively depict helping (and other scenarios).

INVERSE INVERSE PLANNING
Now that we can model an audience's experience of an animation by Bayesian inverse planning, we can create new animations by inverse inverse planning-that is, by optimizing over Bayesian inference.
To be precise, we optimize over scripts of length  , where a script  is given by an initial state  0 and a sequence of valid transitions The robot begins to move, while the cheese stays on the pink goal.
The robot reaches the green goal tile and stops.
Both characters are motionless for the rest of the animation.(top) Suppose we animate a robot that is helping the cheese reach its goal, by having both characters follow their optimal policies ("naïve planning," Section 2).This produces a poor depiction: it is not clear that the robot wants to help the cheese, only that it wants to go to green.(bottom) The inverse planner agrees.It infers that the cheese wants pink (first plot), and that the robot wants green (second plot; notice bump at  = 10 when the robot reaches green and stays).But it remains uncertain about the robot's alignment (third plot), because the robot's behavior is consistent with both ambivalence to the cheese and wanting to help the cheese (but doing nothing because the cheese is already at pink).Can we do better?Yes: by inverse inverse planning!(Figure 2).
The cheese moves towards pink.
The robot pushes the cheese along.It deposits the cheese on pink.Finally, it recedes to the corner.Using inverse inverse planning, we optimize animations that maximize the inverse planner's belief that the robot is helpful (Section 3).This finds a much more effective depiction.(bottom) Now, it is clear that the robot is helping, and indeed the model is confident of that from the beginning (third plot).
succeeds in moving with probability 0.6-we allow our optimizer to choose whether or not the move succeeds.Suppose, like in the previous section, that we wanted an animation that depicts the robot as helping the cheese.We can express this task in a simple and natural objective function over scripts: The objective  help is maximized for scripts where at every time , based on observing the animation up to time  (i.e. 1: ), a viewer has a strong belief that the robot is helping (i.e. robot > 0).Notice that  help does not say anything about  cheese ,  robot , or even the initial positions of the characters in  0 .If we were instead making animations by simulating optimal agents, we would have to specify all of these parameters up-front, even though they are unrelated to the abstract goal of depicting "helping."In this way, inverse inverse planning allows for modular, high-level reasoning about the essence of a story in a way that planning itself does not.
To optimize scripts for a given objective like  help , we use beam search.We sample a set of random initial states to seed our candidate scripts.For each candidate script, we independently run beam search over transition sequences, where the search heuristic is the objective applied to the current script "prefix."The number of initial states sampled and the beam width are hyperparameters of the algorithm.Unless otherwise noted, for the examples below we used 500 states with beam width 1 (i.e.greedy search).
Aside from some smaller details (see Section 3.2), this is all we need to inverse inverse plan.When we run the optimizer on  help with  = 15, we get a rendered animation within just a couple of minutes (Figure 2).The cheese moves towards pink, and the robot pushes it along.Upon watching this animation, a rational viewer would infer that the cheese wanted pink (because of its initial motion towards pink), and that the robot wanted to help the cheese (because it pushed the cheese to pink and stepped back afterwards).This animation is a significantly more effective depiction of helping than the one we generated earlier by naïve planning (Figure 1).

Applications
We now show a wide variety of additional examples of encoding classic storytelling elements as inverse inverse planning.We encourage readers to imagine how they would depict these scenes themselves before looking at our system's outputs, available in the supplementary video (timestamps in text) and summarized in Figures 3 and 4. All outputs shown are from the same random seed (0).Simple variations emerge with different seeds.Optimization and rendering takes less than two minutes on a server with 44 CPUs.
3.1.1Character.Above, we showed how to depict the character of a helpful robot.Similarly, we can ask for an animation of a hindering robot.In the generated animation, the cheese first moves to green.Then the robot pushes it into a corner and blocks the way to green (Fig 3A/1:40).In comparison, regular planning from a random initial state yields an animation where the robot moves to green, blocking the cheese.To an observer it is unclear whether the robot is intentionally hindering, or indifferent to the cheese and wanting green for itself (Fig 3B/1:16).Inverse inverse planning avoids this ambiguity because the robot never moves onto green.
3.1.2Plot twists.We next consider the "plot twist, " a storytelling device where an unexpected event radically alters the audience's expectations.For example, a classic plot twist reveals that a seemingly friendly character was adversarial all along.Here, we ask for an animation where the robot appears to be helpful at first, but at  =  /2 is revealed to be hindering instead.
In the generated animation, the robot "helpfully" pushes the cheese to pink.However, upon reaching pink it continues pushing, trapping the cheese along the wall.This surprising action reveals that the robot's true intention was to hinder all along (Fig 3C/2:39).We can also ask for the reverse, a video where the robot appears to be hindering but was helping all along.Now, the cheese moves to pink and the robot approaches as if to push it off (hindering).However, the cheese continues moving, revealing that it wanted green all along.The robot helpfully pushes it there (Fig 3D/3:14).
3.1.3Irony.Next, we consider dramatic irony, which occurs when the audience has a different understanding of a situation than the characters in that situation.Because our system explicitly models the audience's understanding, we can straightforwardly express scenes with dramatic irony.Here, we design an objective function for scenes where the robot appears to be trying to help, but mistakenly hinders because of its false belief about the cheese's goal.We use conditional probability to express that the robot should appear to be helpful if the cheese had a different goal.
In the generated animation, the cheese moves to green, but the robot pushes it off and towards pink.When the cheese tries to move back, the robot "helpfully" guides it back to pink (Fig 3E/3:36).
3.1.4Flashbacks.Nonlinear discourse is a storytelling technique where information in a story is revealed out of chronological order.For example, a "flashback" can be used to re-contextualize a scene, giving it heightened significance or new meaning.Here, we show an example of using inverse inverse planning to design flashbacks.Imagine we saw a glimpse of the robot pushing the cheese east, away from the pink and green goals.Can we show a flashback animation that casts this action as helping?Let  () be the script  with a single transition appended, in which the robot moves east while the cheese stays.We can now apply the objective function for "helping" over  (), with an additional cost function that ensures that the robot pushes the cheese when it moves east (i.e. the cheese is directly to the east of the robot).
flashback-help () =  help ( ()) + 1 cheese pushed at  =  + 1 0 else For this example, we raise the beam width to 100 because we observed that greedy search was easily trapped in local minima.In the generated flashback, the cheese is trying to go all the way around the room to pink because the table is blocking a door along the shorter path.This casts the robot's pushing as helping (Fig 4G/4:03).Of course, we can instead substitute  hinder to find a flashback that casts the action as hindering.In the generated flashback, the cheese tries to move directly to pink, casting the robot's push as hindering (Fig 4H/4:18).
3.1.5Narrative arc.Recall from the introduction Vonnegut's theory that the "shape" of a story is the trajectory of the main character's fortunes.We would like to optimize for animations where the robot's fortunes decline and then rise again, creating a "story arc." To heighten the effect, we add a mechanism for characters' fortunes to change based on external events.Since ancient times, storytellers have propelled or resolved plots by introducing a new element from outside the world of the story.For example, a heavenly chariot descends from the sky to save the characters in Euripides' tragedy Medea.Literary theorists refer to this pattern as "deus ex machina, " because gods and divine interventions ("deus") were once lowered onto stages with mechanical contraptions ("ex machina").We create the possibility for "deus ex machina" by creating a special type of transition ⟨deus, , ⟩ where the obstructing table "falls from the sky" into the kitchen at position (, ).This transition can only be used once, and only in worlds where the table is not already present.Note that the characters' learned policies do not account for the possibility of this transition occurring; nor do audiences know to expect it-it is a surprise from "outside" the fictional world.
With this enhancement, we can search for stories where the value function of the robot (learned by value iteration) correlates with the rise-fall-rise of 1.5 periods of a sinusoid: We do not specify anything else about the story.However, we introduce a new term to enforce that the characters' goals are consistent.Otherwise, we might get stories where the robot's apparent fortune changes because its apparent goal changes.To enforce this consistency, we minimize the KL-divergence between our beliefs about the characters' goals before and after each observed action.Here,  1: is a random variable with probability distribution P( |  1: ).
In the generated animation, the robot starts helping the cheese to pink.However, the table falls onto pink at the last moment.Then the robot moves the table out of the way, allowing the cheese to finally reach pink (Fig 3F/4:32).

Implementation details
To get high-quality results, we need to account for some storytellingspecific details that Ullman et al. did not need to address.
First, we would like characters to appear rational.Ullman et al.'s model presupposed that agents in the animations were acting rationally, because it was only tested on hand-designed stories featuring rational agents.However, we run our model on arbitrary scripts with potentially-irrational behavior.Thus, we need to add hypotheses for irrational agents.We augment our hypothesis tuple with two boolean variables  robot and  cheese , which track whether each character is rational, positing in the likelihood function that irrational agents select actions uniformly at random.Additionally, when the cheese is irrational, we only include hypotheses where the robot is indifferent to it.Now that we can infer the rationality of each character, we automatically add a rationality term,  rational () =  P( robot ∧  cheese |  1: ), to the artist-provided objective function.Additionally, we implicitly condition the artist's probability calculations on  robot ∧  cheese because they likely have rational characters in mind when designing the story.
Similarly, we must ensure that the environment behaves plausibly (Kukkonen calls this versimilitude).Recall that the cheese only succeeds in moving with probability 0.6.Our videos must faithfully reflect this.If we show the cheese try and fail to move ten times consecutively, audiences would either dismiss the video as implausible and contrived, or update their belief about the success probability.We avoid such pathological cases by introducing an environmental consistency term  env () = −( p () − 0.6) 2 where p () is the proportion of times the cheese successfully moves in .
Finally, we must address the possibility that goals change over time.An observer might also discount or simply forget past evidence.For example, if the robot acted adversarially in the first part of an animation, but then sat still for a long time, we might be uncertain whether it is still adversarial.If it then begins moving irrationally, we would immediately perceive it as irrational, discounting past rational behavior.Following Baker et al. [2009], we account for this by positing that after each turn, with some small probability  = 10 −5 , all latent variables are reassigned uniformly at random.This is easily implemented by adjusting P( ) ↦ → P( ) • (1 −) + (1 − P( ))/( − 1) •, where  is the number of hypotheses.This adjustment discounts past evidence, forcing the inverse inverse planner to continue providing new evidence throughout the story.

Miming in a physics world
Finally, we depart from this grid-world setting and move to a more naturalistic physics-based setting.We were inspired by the short animated film Sisyphus [Jankovics 1974], which depicts the character Sisyphus from Greek mythology pushing a heavy boulder up a hill.By exaggerating Sisyphus' movements, the animation creates a dramatic impression of the boulder's immense weight.We wondered if inverse inverse planning could evoke such an effect: that is, make a character "mime" a heavy object.
To model the scenario, we created a small physics-based environment consisting of a mass-spring system, and used it to design a "Luxo lamp"-style hopper similar to that of Witkin and Kass [1988].We attached the hopper to a box on a hill (Figure 6a).
For planning, we built a differentiable physics simulator for this environment and used it to train a controller to pull the box up the hill using the Short-Horizon Actor-Critic algorithm [Xu et al. 2022].Actor-critic algorithms jointly train two neural networks: a policy  (;  ) that computes actuations for the agent at state , and a value function  (; ) that computes the optimal-long term reward attainable from .The learned parameters of these neural networks are  and .We optimized actor-critic pairs for two box weights, light (0.1) and heavy (0.5) to obtain  {0.1,0.5} and  {0.1,0.5} .
Next, we used inverse planning to model a viewer's impression of the box weight.Following Battaglia et al. [2013]'s work on Bayesian models of human physics perception, we compared each frame of the observed trajectory against hypothetical simulations of the physical system using  0.1 and  0.5 .
Finally, we created videos of the hopper "miming" pulling a heavy box with inverse inverse planning.We used gradient descent to optimize a trajectory that maximizes the inverse planner's confidence that the box is heavy (even though the box was actually light in the simulator).We parameterized trajectories by residuals over the optimal policy  0.1 : for each time  we optimize a residual actuation that is added to the optimal actuation given by  (  ;  0.1 ).This results in the hopper "pretending" to struggle as it pulls the box (supplementary video, 5:30).Our human subject studies (Section 3.4.2) confirm that the hopper indeed convinces viewers that the box is heavier than it truly is.

Stumbling.
We additionally replicate the "shapes of stories" example from Section 3.1.5 in this domain.We use gradient descent to optimize a trajectory in which value function  (  ;  0.5 ) dips from time   to time   : As before, we optimize residuals over the optimal policy  0.5 .The resulting animation shows the hopper "stumble" at time   and "recover" at time   (Figure 6c; supplementary video, 7:05).

Human subject studies
We empirically evaluated our method on both the "kitchen" and "hill" domains.Our guiding question was: Is our inverse inverse planner more effective at depicting desired conditions than a regular naïve planner?To investigate this question, we designed two experiments.

Kitchen.
We sought to answer whether inverse inverse planning better depicts the robot's relationship with the cheese than naïve planning.We generated 20 animations each of helping, hindering, and indifference (i.e. robot = 0) using random seeds 0-19, using both inverse inverse planning and naïve planning.We recruited 98 online participants (English-speaking, 80% male, average age 40, min 18, max 71) and showed each participant a random shuffled subset of these animations.Note that participants were not aware that the videos could have come from two different algorithms, or even that they were computer-generated at all.For each video, we asked participants to report whether the robot was helping the cheese, hindering it, indifferent to it, or whether the animation was unclear.We measured the proportion of responses that matched the desired depiction target.
Figure 5 shows the results of this experiment.When depicting helping, the average inverse inverse planning animation caused 73% of viewers to report "helping, " while the average naïve planning animation only caused 6% to ( ≪ 0.01).When depicting hindering, inverse inverse planning was also significantly better than naïve planning (62% vs. 29%;  ≪ 0.01).Both methods were equivalently effective at depicting indifference (73% vs. 75%, n.s.).This is because naïve planning animations often have no interaction between the characters, so viewers easily infer indifference.In summary, we found that inverse inverse planning is indeed better at depicting the robot's relationship with the cheese, especially for conditions that are challenging for naïve planning to depict.

Hill.
We sought to answer whether the hopper from Section 3.3 convincingly "mimes" a heavy box.We recruited 35 online participants (English-speaking, 57% male, average age 34, min 19, max 74) and showed them each a series of pairs of animations.Each animation was randomly either an "honest" hopper with a heavy or light box, or a "mime" with a light box pretending that the box is heavy.Each video had a different-colored box to emphasize that they may have different weights.Note that participants were not aware that the videos could have come from two different algorithms, or that the hoppers could mime.Participants were asked to select which of the two animations had a heavier box.
Figure 6b shows the results of this experiment.As expected, participants were at chance (50%) when the animations had the same condition, and between heavy and light boxes they selected the heavy box 97% of the time ( ≪ 0.01).The mime convinced 95.7% of viewers that its box was heavier than the light box, despite being of the same (light) weight ( ≪ 0.01).Furthermore, it convinced 68.6% of viewers that it was heavier than the heavy box despite being 5× lighter ( < 0.01).We conclude that the mime successfully convinces viewers that the box is heavier than it truly is.

ADDITIONAL RELATED WORK
Section 2 reviewed related work from cognitive science, on which this paper directly builds.Here we study links to other work in reinforcement learning, computer graphics, and textual storytelling.
Inverse reinforcement learning.The reinforcement learning community has long sought to automatically learn reward functions by "inverse reinforcement learning" [Arora and Doshi 2021;Ng et al. 2000;Ramachandran and Amir 2007].These methods can be applied to influence observers, like how here we influence an audience's experience of a story.For example, Dragan [2015] seeks to make a robot's motion "legible" to humans by optimizing for an ideal Bayesian observer's ability to predict the robot's intent.Oppositely, Pattanayak et al. [2022] use "inverse-inverse reinforcement learning" to have agents strategically fool adversarial viewers about their true intentions.
Computer animation.A variety of techniques [Wampler et al. 2010;Won et al. 2021] create animations of competing agents (e.g.fight scenes) by computing optimal actions via planning algorithms (analogous to "naïve planning" in this paper).This often leads to believable behavior, but little to no artistic control.Others [Funge et al. 1999;Kapadia et al. 2016;Shum et al. 2010] add various means for artists to guide the animation towards a desired state.These systems provide low-level artistic control over the space of outcomes, but still no high-level control over the animation's story arc.Won et al. [2014] rank generated animations to show a human a diverse set of candidates to review.In comparison, inverse inverse planning allows for a higher level of artistic control, automatically selecting the best result by modeling a human reviewer.
In interactive settings, the graphics community has long approached creating believable characters by requiring artists to manually give characters a large repertoire of high-level "goals" and low-level "behaviors" to express and accomplish those goals [Cassell et al. 2001;Hayes-Roth et al. 1997;Loyall 1997;Perlin and Goldberg 1996;Rousseau and Hayes-Roth 1998].In comparison, inverse inverse planning can automatically derive "behaviors" to depict goals.Some systems enforce structure on generated animations by searching over an "evaluation function" that proxies for aesthetic quality [Mateas 1999;Mateas and Stern 2003;Weyhrauch 1997].Such evaluation functions require significant low-level story-specific engineering effort for the artist to specify (dozens of pages of heuristics).In comparison, our method provides a general, principled tool for artists to write simple high-level "evaluation functions." Textual story generation.As in graphics, the predominant approaches in textual story generation are planning-or logical-searchbased [Lebowitz 1985;Martens et al. 2013;Meehan 1977;Riedl and Young 2010].A variety of ad-hoc heuristics have emerged for modeling specific aspects of audience experience to guide planning.For example, suspense can be measured by counting how few options a character has to solve a problem [Bae and Young 2008;Cheong and Young 2006;Gerrig and Bernardo 1994], and characters can be made "believable" by ensuring that their actions' intents are visible to the reader and motivationally consistent [Riedl and Young 2004;Szilas 2003].In relation to this line of work, inverse inverse planning is a general, principled framework that flexibly models a variety of audience inferences using Bayesian statistics.Audience modeling remains a challenge for newer language-model based approaches to textual story generation [Kreminski and Martens 2022].

LIMITATIONS AND FUTURE WORK
Scalability.The inverse inverse planner demonstrated in this paper is limited primarily by its scalability.The amount of computation needed grows with the size of the state space and number of characters (for "planning"), the size of the audience's hypothesis space (for "inverse planning"), and the complexity and length of scripts (for "inverse inverse planning").This is because, following Ullman et al., we used slow-but-exact algorithms at every level for precision and robustness (e.g.value iteration for planning, enumerative inference for inverse planning).Nonetheless, all of the examples shown in this paper were generated within just a couple of minutes.This is because there were several opportunities for optimization: we precomputed the results of value iteration offline, and we parallelized beam search across many cores.
Having laid this groundwork with exact algorithms, we expect approximate algorithms to help scale this framework to larger domains.As a first step, our "miming" domain demonstrates ways to scale inverse inverse planning to a large (indeed, continuous) state space: (1) approximating value functions with actor/critic neural networks, and (2) using gradient descent for the optimization step.
Approximate algorithms may additionally help scale inverse planning to larger hypothesis spaces, allowing us to optimize over a richer space of audience inferences.For example, Bayesian inverse planning can be approximated using Sequential Monte Carlo (SMC) methods, which sample from the posterior distribution instead of exhaustively integrating over all hypotheses.Zhi-Xuan et al. [2020, Table 1(a)] show that compared to Ullman et al.'s exact method (which we implement here), SMC runs 1-2 orders of magnitude faster across a benchmark of four different planning domains.Alternatively, the results of inverse planning can be approximated using amortized inference or deep learning.Malik and Isik [2022, Table 5(b)] train a spatiotemporal graph neural network to perform goal inference 4 orders of magnitude faster than the Bayesian inverse planner of Netanyahu et al. [2021].However, their method has roughly 10% lower test accuracy than Bayesian inverse planning, suggesting that it might not generalize to match human intuitions well on the type of "surprising" out-of-distribution stories we seek here.What is the right balance between efficiency and accuracy?We leave further investigation in these directions to future work.A more efficient inverse inverse planner could enable not only scaling to larger problems, but also sophisticated real-time applications, for example in interactive fiction [Laurel 1986, p. 153] and human-computer improvisation [Pinhanez 1999].
Tools for artists.In this paper, we showed how the language of optimization targets over Bayesian posteriors can be used to express a desired audience experience.But of course, we do not expect artists will manually write these mathematical expressionsrather, we expect them to select and customize predefined story patterns (just as they do not manually program shaders/BRDFs, but rather select and customize predefined materials/textures).This is possible because our optimization targets are abstract and portable across multiple domains.For example, we reuse the story arc pattern in both the kitchen and box-pulling domains (Sections 3.1.5and 3.3.1).Thus, there is much scope for future graphics/HCI work in designing intuitive interfaces for artists to select optimization targets.For example, an artist might select the story arc pattern from a library and then "draw" the desired arc on a tablet, or even describe it in natural language.
Modeling emotion.Emotion is at the heart of storytelling.A promising future direction is to augment the audience model to reason about emotion: either to evoke certain emotions in the audience, or to use characters' visible emotions as degrees of freedom in story design.For example, if a villain captures a hero, showing the hero's sidekick smile covertly could make the audience infer that the sidekick was secretly aligned with the villain all along.In turn, this could evoke anger at the betrayal in the audience.Bayesian models analogous to inverse planning can capture some human emotional reasoning [Houlihan et al. 2022;Ong et al. 2019aOng et al. , 2015Ong et al. , 2019b;;Saxe and Houlihan 2017], and our ongoing work investigates methods for optimizing over these models.

CONCLUSION
We presented inverse inverse planning: a principled computational framework for storytelling and animation, which is grounded in classic ideas from computer graphics, cognitive science, and literary theory, and goes beyond naïve simulation of rational agents.Building on an established model of social cognition ("inverse planning"), we showed how to optimize animations to evoke specific audience experiences ("inverse inverse planning").We demonstrated the remarkable flexibility of inverse inverse planning by using it to capture a variety of storytelling elements, and presented experimental evidence that the resulting animations were effective depictions.
Our work lights the path to a first-principles approach to storytelling that treats audience experience as the ultimate desideratum.We do not rely on large datasets to learn storytelling techniques statistically, nor do we require ad-hoc manual encodings of those techniques.Rather, we show that those techniques emerge naturally as answers when we ask the right computational questions.
is not uniquely determined by   −1 and ⟨ robot  ,  cheese  ⟩ because state transitions may be non-deterministic.For example, recall how the cheese only The cheese moves itself onto the pink goal tile and stops.
Figure1:(top) Suppose we animate a robot that is helping the cheese reach its goal, by having both characters follow their optimal policies ("naïve planning," Section 2).This produces a poor depiction: it is not clear that the robot wants to help the cheese, only that it wants to go to green.(bottom) The inverse planner agrees.It infers that the cheese wants pink (first plot), and that the robot wants green (second plot; notice bump at  = 10 when the robot reaches green and stays).But it remains uncertain about the robot's alignment (third plot), because the robot's behavior is consistent with both ambivalence to the cheese and wanting to help the cheese (but doing nothing because the cheese is already at pink).Can we do better?Yes: by inverse inverse planning!(Figure2).
Figure2: (top) Using inverse inverse planning, we optimize animations that maximize the inverse planner's belief that the robot is helpful (Section 3).This finds a much more effective depiction.(bottom) Now, it is clear that the robot is helping, and indeed the model is confident of that from the beginning (third plot).

Table 1 :
"Inverse inverse planning" is analogous to past graphics work on "inverse inverse rendering."