A research team conducted a study to improve robots' performance in multiple peg-in-hole assembly in adapting to different working scenarios, including different object geometry and pose. Using a flexible and reusable sequential control policy framework, they explored how to apply artificial intelligence technology more efficiently in industrial scenarios. Their sequential control policy framework demonstrated higher training efficiency with faster convergence and a higher success rate compared to the single control policy for long-term multiple peg-in-hole assembly tasks.
The team’s work is published in the journal CAAI Artificial Intelligence Research on November 22, 2024.
Previous research has achieved high precision in peg-in-hole assembly. As the name describes, peg-in-hole assembly refers to a robotics task where the robot inserts a peg into a hole. It is a fundamental task, widely used in many areas of automated manufacturing. However, generalization to changing working scenarios remains an underexplored area. To address this, the team proposed a sequential control policy, or SeqPolicy, to implement the multiple peg-in-hole flexibly while maintaining a higher success rate. Sequential control policy describes a system where steps are taken in a predetermined order. The system moves from one step to the next once the previous step is successfully completed. The picking process is incorporated to simulate the uncertainties of the random grasping pose.
“In this framework, reinforcement learning is used to train the control policy through trial and error, making the policy robust to dynamic environments. However, policy learning is challenging in the long-term task because of the low sample efficiency, which makes the control policy require numerous trials to discover an effective strategy as expert-desired,” said, Xinyu Liu, a researcher from the Technical University of Denmark. For example, sometimes the training gets stuck in sub-optimal behaviors, leading to the failure of the overall task. Therefore, SeqPolicy divides a long-term task into three primitive skills: picking, aligning, and insertion, which are simple and fast-training.
Additionally, the important intermediate states like the lifting height and initial aligning direction can be guided by expert knowledge to meet specific requirements in different scenarios. This modular pipeline can train the control policy efficiently and adapt to diverse scenarios flexibly. “Instead of relying on a single super control policy to handle everything, we propose building a team with several small and specialized control policies, each focused on primitive skills,” said Liu.
“Artificial intelligence and robotics are popular topics, and impressive results are shown, but achieving both strong generalization ability, like handling diverse tasks, and high precision, such as doing each task perfectly, with limited cost, including computational resources and sim2real deployment, is a challenge,” said Liu. The policy learning is simple and stable-performed for primitive skills, like aligning and insertion in the peg-in-hole assembly. Having some small specialization control policies to implement the assembly task together is low-cost and flexible, and it meets the unique demands of industrial applications. Additionally, these small policies are reusable.
Looking ahead, the team sees work that could be conducted to improve their framework, and several directions that could be explored based on this study. Policy chaining would be an immediate challenge to automatically transfer the specialized skills or control policies. Currently, the policies are just linked manually by defining the initial state based on the end state of the previous task. Developing a seamless policy chaining method would make the system more autonomous.
Another vital direction is refining how robots "see" and understand the spatial information of the objects. “The way we define what the robot observes, its observation space, is critical for deciding when to switch from one skill to another. Experiments in our study showed progress, but multimodal fusion technology could significantly improve the system's adaptability in diverse scenes like factories or labs,” said Chao Zeng, a co-author of this work
“Our ultimate goal is to develop a general framework for robotic assembly with several specialization control policies and a ‘manager model’ that determines the expert models based on the states and extracts the universal observation for these control policies. With this framework, robotic assembly could be conducted efficiently and flexibly,” said Zeng.
The research team includes Xinyu Liu from the Technical University of Denmark; Chao Zeng and Chenguang Yang from the University of Liverpool; and Jianwei Zhang from the University of Hamburg.
The research is partially funded by the UKRI Guarantee funding for Horizon Europe MSCA Postdoctoral/Individual Fellowships.
About CAAI Artificial Intelligence Research
CAAI Artificial Intelligence Research (CAAI AIR) is an Open Access, peer-reviewed scholarly journal, published by Tsinghua University Press, released exclusively on SciOpen. CAAI AIR aims to publish the state-of-the-art achievements in the field of artificial intelligence and its applications, including knowledge intelligence, perceptual intelligence, machine learning, behavioral intelligence, brain and cognition, AI chips and applications, etc. Original research and review articles on but not limited to the above topics are welcome. The journal is completely Open Access with no article processing fees for authors.
About SciOpen
SciOpen is an open access resource of scientific and technical content published by Tsinghua University Press and its publishing partners. SciOpen provides end-to-end services across manuscript submission, peer review, content hosting, analytics, identity management, and expert advice to ensure each journal’s development. By digitalizing the publishing process, SciOpen widens the reach, deepens the impact, and accelerates the exchange of ideas.
Journal
CAAI Artificial Intelligence Research
Article Title
Reinforcement Learning-Based Sequential Control Policy for Multiple Peg-in-Hole Assembly
Article Publication Date
22-Nov-2024