Conference on Robot Learning (CoRL) 2023
Our method is able to find and grasp the target object using a standard two-finger robot gripper, even in the presence of noise from vision sensor data in real-world settings.
Cluttered shelf with 3~4 occluding objects
Cluttered shelf with 5~6 occluding objects
Finding and grasping a target object on a cluttered shelf, especially when the target is occluded by other unknown objects and initially invisible, remains a significant challenge in robotic manipulation. While there have been advances in finding the target object by rearranging surrounding objects using specialized tools, developing algorithms that work with standard robot grippers remains an unresolved issue. In this paper, we introduce a novel framework for finding and grasping the target object using a standard gripper, employing pushing and pick-and-place actions. To achieve this, we introduce two indicator functions: (i) an existence function, determining the potential presence of the target, and (ii) a graspability function, assessing the feasibility of grasping the identified target. We then formulate a model-based optimal control problem. The core component of our approach involves leveraging a 3D recognition model, enabling efficient estimation of the proposed indicator functions and their associated dynamics models. Our method succeeds in finding and grasping the target object using a standard robot gripper in both simulations and real-world settings. In particular, we demonstrate the adaptability and robustness of our method in the presence of noise in real-world vision sensor data.
In this paper, we propose a novel optimal control framework for mechanical search with practical algorithms that leverage 3D reconstruction models. First, we introduce two indicator functions (denoting the target's candidate pose by \(x\in \mathrm{SE}(3)\)): (i) an existence function \(f(x)\) that indicates if the target can be present at \(x\) and (ii) a graspability function \(g(x)\) that indicates if the target at \(x\) is graspable. The objective then becomes to rearrange surrounding objects until only one existable and graspable pose \(x^*\) remains, i.e., there exists a unique \(x^*\in \mathrm{SE}(3)\) such that \(f(x^*)=1\) and \(g(x^*)=1\), which leads to a straightforward definition of a cost function using \(f\) and \(g\).
we leverage a 3D object recognition model to effectively estimate the functions \(f, g\) and their corresponding dynamics models; therefore we formulate a tractable model-based optimal control problem. Specifically, we employ a recent 3D recognition model rooted in superquadric primitives. Notably, the superquadric representation allows for rapid collision checks, depth image rendering, and the utilization of pushing dynamics models, which leads to effective estimations of \(f\) and \(g\) and their dynamics models. To mitigate accumulated estimation errors during optimal control, we adopt the model predictive control with a short time horizon.
Through experiments conducted in both simulations and real-world scenarios, we have validated the effectiveness of our 3D reconstruction-based approach. Our method successfully identifies and grasps the target object using a standard two-finger robot gripper, even in the presence of noise from vision sensor data in real-world settings. Real-world manipulation videos are provided below.
@inproceedings{kim2023leveraging,
title={Leveraging 3D Reconstruction for Mechanical Search on Cluttered Shelves},
author={Kim, Seungyeon and Kim, Young Hun and Lee, Yonghyeon and Park, Frank C},
booktitle={Conference on Robot Learning},
pages={822--848},
year={2023},
organization={PMLR}
}