Constructing the Umwelt: Cognitive Planning through Belief-Intent Co-Evolution
Mar 15, 2025··
1 min read
Shiyao Sang

Abstract
This paper challenges a prevailing epistemological assumption in End-to-End Autonomous Driving: that high-performance planning necessitates high-fidelity world reconstruction. Inspired by cognitive science, we propose the Mental Bayesian Causal World Model (MBCWM) and instantiate it as the Tokenized Intent World Model (TIWM), a novel cognitive computing architecture. Its core philosophy posits that intelligence emerges not from pixel-level objective fidelity, but from the Cognitive Consistency between the agent’s internal intentional world and physical reality. By synthesizing von Uexküll’s Umwelt theory, the neural assembly hypothesis, and the triple causal model (integrating symbolic deduction, probabilistic induction, and force dynamics) into an end-to-end embodied planning system, we demonstrate the feasibility of this paradigm on the nuPlan benchmark. Experimental results in open-loop validation confirm that our Belief-Intent Co-Evolution mechanism effectively enhances planning performance. Crucially, in closed-loop simulations, the system exhibits emergent human-like cognitive behaviors, including map affordance understanding, free exploration, and self-recovery strategies. We identify Cognitive Consistency as the core learning mechanism: during long-term training, belief (state understanding) and intent (future prediction) spontaneously form a self-organizing equilibrium through implicit computational replay, achieving semantic alignment between internal representations and physical world affordances. Based on this, we propose two fundamental hypotheses: (H1) The efficacy of embodied intelligence depends less on parameter scale or data volume, and more on the degree of semantic alignment between internal attentional dynamics and physical affordances; (H2) The Intent Token serves as a functional neural unit, computationally analogous to the biological neural assembly. TIWM offers a neuro-symbolic, cognition-first alternative to reconstruction-based planners, establishing a new direction: planning as active understanding, not passive reaction.
Type
Publication
Arxiv