NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks

Traditional robot programming is hard to scale. It requires orchestrating multimodal perception, physical contact dynamics, diverse configurations, and execution failures by hand. Code-as-policy systems let language models compose these into executable robot programs. That makes robot behavior inspectable, editable, and debuggable.

But existing robotic coding agents run in naive execution environments. They receive only coarse, task-level feedback. A failed rollout signals that the task failed, not why. The root cause can be perception, motion planning, grasping, contact dynamics, or long-horizon coordination. These systems also discard fixes once a task ends. So the agent solving its hundredth task is no more experienced than at its first.

A team of researchers from NVIDIA, University of Michigan, UIUC, UC Berkeley, and CMU introduces ASPIRE (Agentic Skill Programming through Iterative Robot Exploration). It is a continual learning system that writes and refines robot control programs. It also distills validated fixes into a reusable, transferable skill library.

How ASPIRE works

ASPIRE runs an open-ended learning loop with three components. It uses a coordinator–actor architecture. A central coordinator manages the shared skill library and dispatches actor coding agents to tasks. Actors do not exchange full chat histories or raw trajectories. Only distilled skills move between them.

Closed-loop robot execution engine: This replaces coarse rollout feedback with per-primitive multimodal traces. For each perception, planning, and control call, it stores inputs, outputs, and return status. It also stores RGB keyframes, overlays, grasp candidates, object poses, and motion-planning results. The agent inspects only the calls implicated by a failure. It then localizes the fault and validates a repair through re-execution.

Skill library: Reusable knowledge is rarely an entire task program. So the library stores heterogeneous fixes. These include localization heuristics, perception prompts, grasping constraints, motion primitives, and debugging workflows. Each skill is compact in-context guidance. It holds a failure signature, a when-to-apply condition, a repair strategy, and often a code sketch. The coordinator admits only patterns that pass debug validation and API-policy checks.

Evolutionary search: Trace-guided debugging alone can collapse into local repair loops. The agent keeps patching the same failed strategy. To broaden exploration, ASPIRE proposes K candidate programs each round. Candidates condition on top-performing prior programs and their remaining failure traces. The next round explores distinct strategies rather than refining one solution.

In simulation, the coding agent is Claude Code with Claude Opus 4.6 and a 1M-token context window. Programs are written in CaP-X, an open-source code-as-policy framework built on MuJoCo Playground. The agent cannot read simulator ground truth. Reading physics-engine state or asset files like .bddl, .xml, or .urdf is forbidden. The rule is simple. If a real robot with a camera could do it, it is allowed.