Figure 2. The Proposed ViT-based neural network for image reconstruction. (IMAGE)
Caption
Vision Transformer (ViT) is leading-edge machine learning technique, which is better at global feature reasoning due to its novel structure of the multistage transformer blocks with overlapped ‘patchify’ modules. This allows it to efficiently learn image features in a hierarchical representation, making it able to address the multiplexing property and avoid the limitations of conventional CNN-based deep learning, thereby allowing better image reconstruction.
Credit
Xiuxi Pan from Tokyo Tech
Usage Restrictions
None
License
Original content