While the importance of human vision is something we rarely question, few people are likely as confident in even recognizing the concept of computer vision. Like human vision, computer vision is related to how computers ‘see.’ Specifically, computer vision refers to how computers recognize, identify and analyze images and video. Computer vision is used in many sectors for its ability to monitor and analyze visual data in ways that extend past what human vision can do. This includes the medical, agricultural, and industrial sectors where, for example, early tumor detection, early pest detection and fine quality control can save both money and, most importantly, lives. For computer vision, one of the most challenging functions is camouflage object detection (COD), the ability to recognize, identify and analyze an object in an image or video that is difficult to differentiate from its background. In its initial stages, a cancerous cell may not look all that different from a healthy cell.
Since 2023, there has been a surge in research on COD in conjunction with the use of deep learning, a type of machine learning. This has created a large pool of research that has not yet been surveyed. To address this, a research group at Tsinghua University has undertaken an extensive review of the COD literature to catalogue, review and analyze the current state of the field.
Chunming He, one of the paper’s authors and a researcher at Duke University, said that their paper “provides the most comprehensive review to date on COD methods, emphasizing both theoretical frameworks and practical contributions. The survey covers advancements across image-level and video-level COD, examining traditional and deep learning approaches.”
The review was published in CAAI Artificial Intelligence Research (DOI: 10.26599/AIR.2024.9150044) on December 31, 2024. The corresponding repository is provided in GitHub: https://github.com/ChunmingHe/awesome-concealed-object-segmentation.
COD is used in both image and video footage analysis. The two main streams of analysis fall under the traditional category or the newer deep learning category. Traditional COD analyzes data such as color, texture and intensity to separate and recognize hidden objects in visual images and video. However, as the Tsinghua group noted, traditional computer vision is limited in many situations, including low-resolution images and instances where the object strongly blends in with the background.
That is where deep learning comes in. Researchers needed a way to teach the computers how to carry out the tasks of recognizing and analyzing camouflaged objects in a wider range of conditions, with more flexibility and with greater detail without needing human technicians constantly labelling and sorting the data. Deep learning uses a neural network that in some ways mimics the human neural network to allow a computer to learn, and continue to learn by itself, how to do specific tasks such as COD.
Among the many different approaches to the use of deep learning in COD that the researchers reviewed was the novel-task setting strategy approach. Novel task setting strategies set the COD application to new tasks that allow it to learn how to handle new situations and data to increase its abilities in novel and complex situations using several different methods. For example, referring based COD can be used when a specific object is of interest. In that scenario, the target references, images or text, are introduced into the application so that it can learn to recognize the specific target and improve the overall performance of the application by increasing its exposure to more novel objects.
Collaborative COD does something very similar, but it introduces multiple images of an object or a class of objects so that the application can recognize a wider, more varied group of objects that still belong to one object class. Again, this increases the overall accuracy of the model. Both strategies take advantage of the strength of deep learning to refine, extend and strengthen the abilities of the applications to collect and analyze COD.
Despite its immense promise, there are several issues facing the use of deep learning in COD. One issue is the scarcity of data that can be used to train these systems. ‘To mitigate the scarcity of data, leveraging deep generative models to synthesize diverse, realistic camouflaged images will bolster training effectiveness by dataset augmentation, enhancing model robustness in dealing with camouflaged scenarios,’ said Fengyang Xiao, first author and researcher at Tsinghua Shenzhen International Graduate School, Tsinghua University.
Looking forward, the next steps in COD research for He “involve addressing the identified limitations in COD methods by exploring innovative research directions, such as unsupervised and weakly supervised learning, multi-task and multi-modal approaches, and leveraging large-scale vision-language models. The ultimate goal is to develop robust, efficient, and generalized COD models capable of tackling diverse and complex real-world scenarios, ensuring broader applicability and advancing the field of computer vision.”
Other contributors, all from Tsinghua Shenzhen International Graduate School, Tsinghua University include Sujie Hu, Yuqi Shen, Chengyu Fang, Longxiang Tang, Xiu Li and Fengyang Xiao, Department of Biomedical Engineering, Duke University; Jinfa Huang, School of Electrical and Computer Engineering, Peking University and Ziyun Yang, Department of Biomedical Engineering, Duke University.
Funding and Contributions. This work was supported by the STI 2030-Major Projects (No. 2021ZD0201404). The authors express their sincere appreciation to Dr. Deng-Ping Fan for his insightful comments, which greatly improved the quality of this paper.
About CAAI Artificial Intelligence Research
CAAI Artificial Intelligence Research (CAAI AIR) is an Open Access, peer-reviewed scholarly journal, published by Tsinghua University Press, released exclusively on SciOpen. CAAI AIR aims to publish the state-of-the-art achievements in the field of artificial intelligence and its applications, including knowledge intelligence, perceptual intelligence, machine learning, behavioral intelligence, brain and cognition, AI chips and applications, etc. Original research and review articles on but not limited to the above topics are welcome. The journal is completely Open Access with no article processing fees for authors.
About SciOpen
SciOpen is an open access resource of scientific and technical content published by Tsinghua University Press and its publishing partners. SciOpen provides end-to-end services across manuscript submission, peer review, content hosting, analytics, identity management, and expert advice to ensure each journal’s development. By digitalizing the publishing process, SciOpen widens the reach, deepens the impact, and accelerates the exchange of ideas.
Journal
CAAI Artificial Intelligence Research
Article Title
A Survey of Camouflaged Object Detection and Beyond
Article Publication Date
31-Dec-2024