Performance of RGB-D camera for different object types in greenhouse conditions

Ola Ringdahl1, Polina Kurtser2, and Yael Edan3
1Department of Computing Science, Umea University, Sweden
2Örebro University, Sweden
3Ben-Gurion University of the Negev, Israel

RGB-D cameras play an increasingly important role in localization and autonomous navigation of mobile robots. Reasonably priced commercial RGB-D cameras have recently been developed for operation in greenhouse and outdoor conditions. They can be employed for different agricultural and horticultural operations such as harvesting, weeding, pruning and phenotyping. However, the depth information extracted from the cameras varies significantly between objects and sensing conditions.  This paper presents an evaluation protocol applied to a commercially available Fotonic F80 time-of-flight RGB-D camera for eight different object types. A case study of autonomous sweet pepper harvesting was used as an exemplary agricultural task. Each of the objects chosen is a possible item that an autonomous agricultural robot must detect and localize to perform well.  A total of 340 rectangular regions of interests (ROI) were marked for the extraction of performance measures of point cloud density, and variability around center of mass, 30-100 ROIs per object type. An additional 570 ROIs were generated (57 manually and 513 replicated) to evaluate the repeatability and accuracy of the point cloud. A statistical analysis was performed to evaluate the significance of differences between object types. The results show that different objects have significantly different point density. Specifically metallic materials and black colored objects had significantly less point density compared to organic and other artificial materials introduced to the scene as expected.  The point cloud variability measures showed no significant differences between object types, except for the metallic knife that presented significant outliers in collected measures. The accuracy and repeatability analysis showed that 1-3 cm errors are due to the the difficulty for a human to annotate the exact same area and up to $\pm$4 cm error is due to the sensor not generating the exact same point cloud when sensing a fixed object.