Challenges of Depth Estimation for Transparent Objects

Dataset

DOI

Challenges of Depth Estimation for Transparent Objects Context and methodology This dataset was created to investigate the limitations of current method targeting depth estimation of transparent objects. The aim is to highlight the advantages and disadvantages of the different types of approaches and to quantitatively evaluate the expected error. We collected diverse data to empirically investigate the reliability of different methods that provide depth for transparent objects. By using glass and plastic objects, filled with liquid and empty, properties like opacity and index of refraction are varied. The selected objects also vary significantly in shape and size, with a mix of transparent and non-transparent materials. Additionally, scene properties, including viewing angle, object arrangements, support plane texture, and lighting are varied to create diverse evaluation scenarios. Technical details The dataset is collected by moving a camera attached to a robot arm around a scene. The same viewpoints are collected for every scene, and the camera poses are obtained through inverse kinematics of the robot arm. We use 3D-DAT (https://github.com/markus-suchi/3D-DAT) for annotation, placing object models in the virtual 3D scene, and manually correcting their poses based on their reprojection error in the different RGB views. To obtain 3D object models, the physical objects are coated using a mat spray paint after collecting the different scenes. A high-quality depth sensor (Photoneo MotionCam-3D scanner, https://www.photoneo.com/) is used to reconstruct them. The set of 15 objects used in our experiments is illustrated in Figure 3, and includes plastics and glass objects, filled or empty with a variety of shapes, and a variety of sizes. A total of 32 scenes is collected using a Intel Realsense D435 (https://www.intelrealsense.com/depth-camera-d435i/), saving both the RGB image and the depth image at a resolution of 1280 × 720 pixels. The robotic arm performs a circular motion around the scene with the camera oriented toward the scene center, placing the camera at four different heights and corresponding polar angles (68°, 60°, 48° and 33°). For each circle, either 16 or 26 views are collecting resulting in a total of 64 or 104 views per scene. The light is uniform and comes from the top of the scene. For seven scenes, we add a strong light projector to the side of the scene, producing caustics and other refraction and reflection effects at the interface of transparent objects. Six scenes also have a textured background instead of an uniform one, and the number of distractors in the scene is varied. For each scene in the "scenes/" folder, the structure is as follow:

rgb/ contains the color images depth/ contains the depth obtained with the Realsense D435 camera groundtruth_handeye.txt contains the camera poses of each viewpoint (each line contains pose in TUM format: id, tx, ty, tz, rx, ry, rz, rw with id being the current view, tx,ty,tz the translation, rx, ry, rz, rw the rotation as quaternion). poses.yaml contains the scene objects annotation in the same world reference frame as the camera poses

Identifier
DOI	https://doi.org/10.48436/3b85h-y6t96
Related Identifier	IsVersionOf https://doi.org/10.48436/f2t7d-ay992
Metadata Access	https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:3b85h-y6t96

Provenance
Creator	Weibel, Jean-Baptiste Nicolas
Publisher	TU Wien
Contributor	Weibel, Jean-Baptiste Nicolas
Publication Year	2024
Rights	Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess	true
Contact	Weibel, Jean-Baptiste Nicolas (TU Wien)

Representation
Resource Type	Dataset
Version	1.0.0
Discipline	Other