How Far Should You Look? The Effects of Reward Sparsity on Resource-Rational Planning

Jan 23, 2025

The Breadth-Depth Dilemma

Decision-making under resource constraints is a fundamental cognitive challenge. Living beings must balance the benefits of extensive planning against the costs of mentally simulating potential outcomes.

In this project, we investigated how reward sparsity (how rare good options are) influences optimal planning strategies. We adopted a two-phase framework where agents first allocate a limited sampling capacity to “imagine” decision paths, and then commit to a route.

The Impact of Thinking Costs

We introduced a linear cost for every “mental sample” the agent takes. Our model reveals that the optimal amount of planning is highly sensitive to the cost of information.

Optimal Sampling Capacity (Z-axis) as a function of environmental richness (X-axis) and sampling cost (Y-axis). As the cost of thinking increases (moving up the Y-axis) or rewards become sparser (moving left on the X-axis), the optimal strategy collapses to 'zero planning'—pure exploitation. — **Optimal Sampling Capacity (Z-axis)** as a function of environmental richness (X-axis) and sampling cost (Y-axis). As the cost of thinking increases (moving up the Y-axis) or rewards become sparser (moving left on the X-axis), the optimal strategy collapses to ‘zero planning’—pure exploitation.

Key Findings

The “Giving Up” Threshold: In environments where positive rewards are extremely sparse ($p \le 0.1$), agents rationally choose not to plan at all when sampling costs exceed a certain threshold ($C^* \approx 0$).
Bounded Rationality: Even in “rich” environments where rewards are abundant, the optimal amount of planning remains surprisingly modest.The agent finds a “golden spot” where the marginal gain of one more mental simulation equals the cost, often stopping far short of exhaustive exploration.

A Phase Transition in Strategy

We observed a distinct “tipping point” in optimal behavior. Below a certain level of environmental richness, the agent relies on pure exploitation (guessing). Once the environment becomes sufficiently rich, the agent rapidly ramps up exploratory planning.

(Left) Net reward vs. Sampling Capacity. (Right) The optimal sampling capacity ($C^*$) shows a sharp, non-linear increase as environmental richness ($p$) exceeds 0.5. This suggests a phase transition between an exploitation-dominated regime and an exploration-dominated regime. — **(Left)** Net reward vs. Sampling Capacity. **(Right)** The optimal sampling capacity ($C^*$) shows a sharp, non-linear increase as environmental richness ($p$) exceeds 0.5. This suggests a phase transition between an exploitation-dominated regime and an exploration-dominated regime.

Implications

These results suggest that “deep” extensive planning is often suboptimal in realistic scenarios where thinking is costly. Instead, the brain likely utilizes a context-sensitive planning schedule: minimizing cognitive effort in barren environments and selectively increasing depth only when the environment promises a high return on investment.

Work: Master’s Thesis, University of Pennsylvania.