Article Highlight | 27-Jun-2024

A survey of synthetic data augmentation methods in machine vision

Beijing Zhongke Journal Publising Co. Ltd.

Currently, deep learning is the most important technique for solving many complex machine vision problems. State-of-the-art deep learning models typically contain a very large number of parameters that need to be learned to characterize a wide range of visual phenomena. Moreover, as a result of the enormous appearance variations of real-world objects and scenes, there is often the need to introduce various variations of the available data during training. Consequently, training deep learning models requires a very large amount of annotated data to guarantee good generalization performance and avoid overfitting. However, data collection and annotation are often time-consuming and costly exercises. Instead of attempting to collect very large quantities of annotated data, it is often more practical to create new samples artificially. Data augmentation is the process of creating new data to artificially extend the training set. Typically, the process of augmentation involves performing transformations on the original data to alter them in a particular way. The transformation operations usually change the visual characteristics of the data but preserve their labels. Data augmentation (DA) can, thus, be seen as a means for simulating real-world behavior such as the visual appearance of objects and scenes under different view angles, pose variations, object deformations, lens distortions and other camera artifacts. In practice, there are several situations in which the training of machine learning (ML) models may require data augmentation. The most common scenarios include the following:

 

Training a deep learning model but the quantity of training data is small; Adequate training data exists but is of perceptually poor quality (e.g., low resolution, hazy or blurry); Available training data are not representative of the target data (e.g., does not have adequate appearance variations); The proportion of various classes is skewed (imbalanced data); Data are only available for one condition (e.g., bright day) but there is a need to train models to perform inference under different sets of conditions (e.g., night, rainy or foggy weather); There is no practical way to access data for training (e.g., excessive cost or restriction).

 

The first four problems can be adequately solved by manipulating the existing data to produce additional data that enhance the overall performance of the trained model. In the case of the last two problems, however, the only viable solution is to create new training data.

 

As discussed earlier, the most common approach to data augmentation is to transform training data in various ways. However, in application scenarios where no training data exist naturally, or where their collection is too costly, it often becomes impractical to create additional training data using the aforementioned methods. Moreover, many computer vision tasks are often use-case sensitive, requiring task-specific data formats and annotation schemes. This makes it difficult for broadly-annotated, publicly-available large scale datasets to meet the specific requirements of these tasks. In these cases, the only viable approach is to generate training data from scratch. Modern image synthesis methods can simulate different kinds of task-specific, real-world variability in the synthesized data. They are particularly useful in applications such as autonomous driving and navigation, pose estimation, affordance learning, object grasping and manipulation, where obtaining camera-based images is time-consuming and expensive. Moreover, in some applications, bitmap pixel images may simply be unsuitable. Data synthesis methods can readily support nonstandard image modalities such as point clouds and voxels. Approaches based on 3D modelling also provide more scalable resolutions as well as flexible content and labelling schemes that are adapted for the specific use-case.

 

Data augmentation approaches based on data synthesis are becoming increasingly important in the wake of severe data scarcity in many machine learning domains. In addition, the requirements of emerging machine vision applications such as autonomous driving, robotics and virtual reality are increasingly becoming difficult to meet using traditional data transformation-based augmentation. For this reason, data synthesis has become an important means to provide quality training data for machine learning applications. Unfortunately, however, while many surveys on data augmentation approaches exist, very few works address synthetic data augmentation methods. This work published in Machine Intelligence Research by researchers from Ghana is motivated by the lack of adequate discussion on this important class of techniques in the scientific literature. Consequently, researchers aim to provide an in-depth treatment of synthetic data augmentation methods to enrich the current literature on data augmentation. Researchers discuss the various issues on data synthesis in detail, including concise information about the main principles, use-cases, and limitations of the various approaches.

 

In this work, researchers first provide a broad overview of data augmentation methods in Section 2. Several survey works have explored data augmentation in detail. Different from these surveys, researchers focus mainly on synthetic data augmentation methods. Researchers consider that such a narrow scope will enable them to provide a much more detailed treatment of important issues while at the same time maintaining a relatively concise volume.    

 

A concise taxonomy of synthetic data augmentation approaches is provided in Section 3. This section introduces four main classes of synthetic data generation techniques that are commonly used: generative modelling, computer graphics modelling, neural rendering, and neural style transfer. A detailed classification of synthetic data augmentation approaches is depicted in this section.

 

Furthermore, in Sections 4–7, researchers explore in detail the various techniques for synthesizing data for machine vision tasks. Here, they discuss the important principles, approaches, use-cases, and limitations of each of the main classes of methods. The approaches surveyed in this work are generative modelling, computer graphics modelling, neural rendering, and neural style transfer (NST). Researchers present a detailed discussion of each of these approaches in Sections 4–7. They also compare the advantages and disadvantages of these classes of data synthesis methods.

 

Section 4 is about generative modelling. Generative modelling methods are a class of deep learning techniques that exploit special deep neural network architectures to learn holistic representations of the underlying categories to generate useful synthetic data for training deep learning models. Generally, they work by learning possible statistical distributions of the target data using noise or examples of target data as input. This knowledge about the distribution of training data, thus, can enable them to generate complex representations. Examples of generative models include Boltzmann machines (BMs), restricted Boltzmann machines (RBMs), generative adversarial networks (GANs), variational autoencoders (VAEs), autoregressive models and deep belief networks (DBNs). Currently, GANs and VAEs and their various variants are the most popular neural network architectures for generative modelling.

 

Section 5 is about computer graphics modelling. An increasingly promising line of work that aims to address the data scarcity problem exploits computer graphics tools to synthesize training data. Computer graphics tools are capable of creating 2D and 3D objects as well as whole complex scenes. The procedure for synthesizing data using computer-aided design (CAD) techniques involves complex processes such as modelling, rigging, texturing, and animating of the generated 3D objects. Game engines provide more advanced modelling capabilities that can be used to create large, interactive scenes and virtual environments that span whole cities.

 

Section 6 introduces neural rendering. Another common way to synthesize new training data for visual recognition tasks is by neural rendering. The aim of neural rendering is to realize the scene rendering process using deep learning models. Unlike traditional scene rendering based on 3D graphical modelling, the neural rendering process can be accomplished in both forward and backward directions. In the forward direction, 2D images are generated from 3D scenes and additional scene parameters. In the backward direction, the pixel image is translated into a realistic 3D scene.

 

Neural style transfer (NST) introduced in Section 7 is a method for synthesizing novel images similar to GAN-based style transfer. However, in contrast with generative modelling approaches, neural style transfer exploits conventional feed forward convolutional neural networks for data synthesis.

 

Researchers summarize the main features of common synthetic datasets in Section 8. In Section 9, researchers discuss the effectiveness of synthetic data augmentation in machine vision domains, many works have been mentioned for discussion.

 

Researchers present a summary of the main issues in Section 10. This section describes four main classes of approaches: generative modelling, data synthesis by means of computer graphics tools, neural rendering approaches that utilize deep learning models to simulate 3D modelling process, and neural style transfer that relies on combining different hierarchical levels of convolutional features to synthesizer new image data. The different data synthesis approaches have unique characteristics that define their scope of application.

 

Promising directions for future research are proposed in Section 11. Researchers outline the most promising future research directions: Modelling multiple sensory modalities in an integrated and adaptive way; Towards more effective and efficient representation and training; Towards synthesis and representation of context-relevant scene properties; Simulating less intuitive augmentation schemes.

 

Synthetic data augmentation is a way to overcome data scarcity in practical machine learning applications by creating artificial samples from scratch. This survey explores the most important approaches to generating synthetic data for training computer vision models. Researchers present a detailed coverage of the methods, unique properties, application scenarios, and the important limitations of data synthesis methods for extending training data. Researchers also summarized the main features, generation methods, supported tasks and application domains of common publicly available, large-scale synthetic datasets. Finally, researchers investigate the effectiveness of data synthesis approaches to data augmentation. The survey shows that synthetic data augmentation methods provide an effective means to obtain good generalization performance in situations where it is difficult to access real data for training. Moreover, for tasks such as optical flow, depth estimation and visual odometry, where photorealism plays no role in inference, training with synthetic data sometimes yield better performance than training with real data.

 

 

See the article:

A Survey of Synthetic Data Augmentation Methods in Machine Vision

http://doi.org/10.1007/s11633-022-1411-7

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.