Interactive visualization. Hover or tap to move the split.
The interactive visualization below shows the comparison of our point cloud compared to industry-standard Structured light and GT point clouds for the reconstruction of transparent objects. You can hover in the visualization above to compare the point clouds.
The interactive visualization below shows the comparison of our point cloud compared to industry-standard Structured light and GT point clouds for the reconstruction of transparent objects. You can hover in the visualization above to compare the point clouds.
The interactive visualization below shows the comparison of our point cloud compared to industry-standard Structured light and GT point clouds for the reconstruction of a metal bin. You can hover in the visualization above to compare the point clouds.
The interactive visualization below shows the comparison of our point cloud compared to industry-standard Structured light and GT point clouds for the reconstruction of a scene with a highly reflective dark background. You can hover in the visualization above to compare the point clouds.

Abstract

We present a novel multi-camera, multi-modal vision system designed for industrial robotics applications. The system generates high-quality 3D point clouds, with a focus on improving the completeness and reducing hallucinations for collision avoidance across various geometries, materials, and lighting conditions. Our system incorporates several key advancements: (1) a modular and scalable Plenoptic Stereo Vision Unit that captures high-resolution RGB, polarization, and infrared (IR) data for enhanced scene understanding; (2) an Auto-Calibration Routine that enables the seamless addition and automatic registration of multiple stereo units, expanding the system's capabilities; (3) a Deep Fusion Stereo Architecture - a state-of-the-art deep learning architecture that effectively fuses multi-baseline and multi-modal data for superior reconstruction accuracy. We demonstrate the impact of each design decision through rigorous testing, showing improved performance across varying lighting, geometry, and material challenges. To benchmark our system, we create an extensive industrial-robotics inspired dataset featuring sub-millimeter accurate ground truth 3D reconstructions of scenes with challenging elements such as sunlight, deep bins, transparency, reflective surfaces, and thin objects. Our system surpasses the performance of state-of-the-art high-resolution structured light on this dataset. We also demonstrate generalization to non-robotics polarization datasets.

Auto Calibration

Our IR-dot based automatic calibration pipeline allows us to register multiple units to each other without requiring multiple images or calibration targets.

Deep Plenoptic Stereo Network Visualization Results

Interactive visualization. Hover or tap to move the split.
The interactive visualization below shows the impact of adding IR modality in the reconstruction of a clutter bin compared to using RGB only cameras. In You can hover in the visualization above to compare the point clouds.
The interactive visualization below shows the comparison of our point cloud compared to DPSNet reconstruction on a scene with thin metallic bin. You can hover in the visualization above to compare the point clouds.

Deep Plenoptic Stereo Network Results