RobotPan: A 360° Surround-View Robotic Vision System for Embodied Perception
X-Humanoid Team
X-Humanoid

RobotPan. A 360° surround-view robotic vision system for real-time rendering, reconstruction, and streaming.

Abstract

Surround-view perception is increasingly important for navigation and loco-manipulation, especially in human-in-the-loop settings such as teleoperation, data collection, and emergency takeover. However, current robot visual interfaces often rely on narrow forward-facing views, suffer from motion-induced jitter that causes simulator sickness in head-mounted displays, and require cumbersome manual switching among multiple on-board cameras. We introduce a surround-view robotic vision system that combines six cameras with LiDAR to provide full 360° visual coverage, while meeting the geometric and real-time constraints of embodied deployment. We further present RobotPan, a feed-forward framework that predicts metric-scaled and compact 3D Gaussians from calibrated sparse-view inputs for real-time rendering, reconstruction, and streaming. RobotPan lifts multi-view features into a unified spherical coordinate representation and decodes Gaussians using hierarchical spherical voxel priors, allocating fine resolution near the robot and coarser resolution at larger radii to reduce redundancy without sacrificing fidelity. To support long sequences, our online fusion updates dynamic content while preventing unbounded growth in static regions by selectively updating appearance. Finally, we release a multi-sensor dataset tailored to 360° novel view synthesis and metric 3D reconstruction for robotics, covering navigation, manipulation, and locomotion on real platforms. Experiments show that RobotPan achieves competitive quality against prior feed-forward reconstruction and view-synthesis methods while producing substantially fewer Gaussians, enabling practical real-time embodied deployment.

System Overview

RobotPan system overview

RobotPan is a surround-view robotic vision system engineered for real-time embodied perception. Integrating six RGB cameras and a central LiDAR, the system achieves full 360° visual coverage during complex robot operations. From calibrated sparse multi-view observations, RobotPan predicts metric-scaled and compact 3D Gaussians. This advanced representation directly enables real-time surround-view rendering, novel view synthesis, metric depth estimation, and sparse-view dense reconstruction. By jointly ensuring strict geometric consistency and real-time rendering performance, the system serves as a highly practical visual interface for humanoid robots across teleoperation, navigation, and loco-manipulation tasks.

Tiangong3 Robot

Tiangong3 demonstrates advanced whole-body motion control, showcasing agile locomotion, dynamic balance, and coordinated limb movements across diverse terrains and tasks.

Tiangong3 Robot sensor layout

Tiangong3 Robot and its RobotPan Vision System sensor layout. Left: Full-body view of the robot highlighting the head region. Middle: Close-up front and side views of the robot head. Right: Orthographic projections of the head-mounted sensing system. The diagrams illustrate a compact sensor arrangement featuring a central LiDAR and a ring of six RGB cameras distributed at 60-degree intervals. This layout is designed to provide comprehensive 360-degree panoramic coverage for robot visual perception.

Scene Results

We evaluate RobotPan across diverse real-world scenarios covering navigation, manipulation, and locomotion. From calibrated sparse-view inputs captured by the surround-view system, RobotPan produces compact 3D Gaussians for real-time 360° rendering, novel view synthesis, and metric-scale depth estimation. The following results demonstrate the system's performance in representative scenes.

Robot in Action

Official demonstrations of the Tiangong robot platform, showcasing its full-body motion control, dynamic agility, and robust locomotion capabilities. Click any thumbnail to watch the full video on Bilibili.

Thomas Flare

Thomas flare — full-body dynamic rotation demonstrating extreme balance and coordinated joint control.

Leap of Faith

"Leap of Faith" — bold high-altitude jump showcasing agile locomotion and robust landing control.

Tiangong 3.0 Demo

Full capability demo — versatile whole-body control across diverse tasks with dexterous manipulation.

BibTeX

@misc{robotpan2026,
  title         = {RobotPan: A 360° Surround-View Robotic Vision System for Embodied Perception},
  author        = {TODO},
  year          = {2026},
  eprint        = {TODO},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {TODO}
}