Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

AAAI 2026

^*Equal contributions
¹FPT Software AI Center   ²University of Arkansas
³University of Stuttgart   ⁴Carnegie Mellon University
⁵Aalborg University   ⁶University of Liverpool
⁷German Research Center for Artificial Intelligence (DFKI)
⁸Max Planck Research School for Intelligent Systems (IMPRS-IS)

Abstract

As embodied agents operate in increasingly complex environments, the ability to perceive, track, and reason about individual object instances over time becomes essential, especially in tasks requiring sequenced interactions with visually similar objects. In these non-Markovian settings, key decision cues are often hidden in object-specific histories rather than the current scene. Without persistent memory of prior interactions, such as what has been interacted with, where it has been, or how it has changed, visuomotor policies may fail, repeat past actions, or overlook completed ones. To surface this challenge, we introduce LIBERO-Mem, a non-Markovian task suite for stress-testing robotic manipulation under object-level partial observability. It combines short- and long-horizon object tracking with temporally sequenced subgoals, requiring reasoning beyond the current frame. However, naïve vision-language-action (VLA) models struggle in such settings, with token scaling quickly becoming intractable-even for tasks spanning just a few hundred frames. We propose Embodied-SlotSSM, a slot-centric VLA framework built for temporal scalability. It maintains spatio-temporally consistent slot identities and leverages them through two mechanisms: (1) slot-state-space modeling for reconstructing short-term history, and (2) a relational encoder to align the input tokens with action decoding. Together, these components enable temporally grounded, context-aware action prediction. Experiments show Embodied-SlotSSM's baseline performance on LIBERO-Mem and general benchmarks, offering a scalable solution for non-Markovian reasoning in object-centric robotic policies.

Task	Task Description	Subtask Goals	Types
Task 1	robot to pick up the bowl and place it back on the plate	bowl lifted → bowl on plate	OM
Task 2	robot to lift the bottle and put it down on the plate	bottle lifted → bottle on plate	OM
Task 3	robot to lift the bowl and place it back on the plate 3 times	bowl lifted → bowl on plate → × 3	OM, OS
Task 4	robot to pick up the bottle and put it down on the plate 3 times	bottle lifted → bottle on plate → × 3	OM, OS
Task 5	robot to lift the bowl and place it back on the plate 5 times	bowl lifted → bowl on plate → × 5	OM, OS
Task 6	robot to pick up the bowl and put it on the plate 7 times	bowl lifted → bowl on plate → × 7	OM, OS
Task 7	robot to swap 2 bowls on their plates using the rotation rule	bowl 1 on plate 3 → bowl 2 on plate 1 → bowl 1 on plate 2	OM, OR
Task 8	robot to swap 3 bowls on their plates using the rotation rule	bowl 1 on plate 4 → bowl 2 on plate 1 → bowl 3 on plate 2 → bowl 1 on plate 3	OM, OR
Task 9	robot to put bowl in closest basket and move basket to the middle	bowl 1 in basket 1 → basket 1 in center	OM, OO
Task 10	robot to put bowl in closest basket and move empty basket to middle	bowl 1 in basket 1 → basket 2 in center	OM, OO

Task

Task Description

Subtask Goals

Types

Task 1

robot to pick up the bowl and place it back on the plate

bowl lifted → bowl on plate

Task 2

robot to lift the bottle and put it down on the plate

bottle lifted → bottle on plate

Task 3

robot to lift the bowl and place it back on the plate 3 times

bowl lifted → bowl on plate →
× 3

OM, OS

Task 4

robot to pick up the bottle and put it down on the plate 3 times

bottle lifted → bottle on plate →
× 3

OM, OS

Task 5

robot to lift the bowl and place it back on the plate 5 times

bowl lifted → bowl on plate →
× 5

OM, OS

Task 6

robot to pick up the bowl and put it on the plate 7 times

bowl lifted → bowl on plate →
× 7

OM, OS

Task 7

robot to swap 2 bowls on their plates using the rotation rule

bowl 1 on plate 3 →
bowl 2 on plate 1 →
bowl 1 on plate 2

OM, OR

Task 8

robot to swap 3 bowls on their plates using the rotation rule

bowl 1 on plate 4 →
bowl 2 on plate 1 →
bowl 3 on plate 2 →
bowl 1 on plate 3

OM, OR

Task 9

robot to put bowl in closest basket and move basket to the middle

bowl 1 in basket 1 →
basket 1 in center

OM, OO

Task 10

robot to put bowl in closest basket and move empty basket to middle

bowl 1 in basket 1 →
basket 2 in center

OM, OO

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

AAAI 2026

LIBERO-Mem, a non-Markovian task suite for stress-testing robotic manipulation under object-level partial observability. It combines short- and long-horizon object tracking with temporally sequenced subgoals, requiring reasoning beyond the current frame.

Abstract

Key Contributions

Visualizations

Acknowledgements