News Release

RealFuVSR: Feature enhanced real-world videosuper-resolution

Peer-Reviewed Publication

Beijing Zhongke Journal Publising Co. Ltd.

Overview of RealFuVSR

image: 

The input image first passes through the preprocessing module. Then, the processed image passesto our aligned model. After bidirectional propagation and cascade residual upsampling, we finally get our high-resolution image. Thenetwork is trained end-to-end. Red represents backward propagation, and blue represents forward propagation.

view more 

Credit: Beijing Zhongke Journal Publising Co. Ltd.

As a popular topic in recent years, video super-resolution (VSR) has been regarded as a challenging taskbecause it is necessary to collect supplementary information from video frames for recovery. It was designedto recover a more realistic high-quality video from an unknown degraded (i.e., compressed, downsampled,blurred, or noisy) low-quality video.

Spatial alignment, which is a vital method in VSR, is responsible for aligning highly relevant but unalignedfeatures for subsequent recovery. Many methods, have been proposed to solve the problem of VSR alignment. Previous methods typically used optical flow to predict the motion field between the reference(near) frame and the target frame and then used the corresponding motion field warp to the target frame. Subsequent approaches have used more complex implicit alignment methods. For example, a deformable convolution was used by TDAN to align various frames at the feature level. This proved to be viable; however, the training process was unstable. In EDVR, multiscale deformable convolution is used for alignment. In RBPN, multiple projection modules use multiple frames for aggregation. This not only improves its performance but also adds complexity to the model.

The BasicVSR is a robust backbone. However, its effectiveness is constrained by the precision of the optical flow estimation, and incorrectly aligned features affect the alignment of the next frame. In Real world VSR, error information is cumulative during the propagation process, which may amplify noise and hinder video restoration

In this work, we redesigned BasicVSR by means of deformable convolution, a multi-scale feature extractionmodule (MSF), a cascade residual upsampling module, and a simulation of real-world degradation. Usingthese methods, hidden state information can be propagated and aggregated more effectively.

The contributions of this study are as follows:

• We propose a new video restoration model, RealFuVSR, which can extract and fuse features from multiplescales and eliminate confusing artifacts during propagation.

• RealFuVSR uses advanced alignment and upsampling methods to restore high-quality frames whilemaintaining a certain number of parameters.

Qualitative and quantitative evaluations of our model showed that RealFuVSR can recover high-qualityvideos with richer textures and details. Our RealFuVSR model outperforms the most recent Real-BasicVSRand Real-ESRGAN models.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.