Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment
@article{Pfister2021SelfSupervisedMP, title={Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment}, author={Jan Pfister and Konstantin Kobs and Andreas Hotho}, journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year={2021}, pages={816-825} }
Neural networks for Image Aesthetic Assessment are usually initialized with weights of pretrained ImageNet models and then trained using a labeled image aesthetics dataset. We argue that the ImageNet classification task is not well-suited for pretraining, since content based classification is designed to make the model invariant to features that strongly influence the image’s aesthetics, e.g. stylebased features such as brightness or contrast.We propose to use self-supervised aesthetic-aware…
3 Citations
CLIP knows image aesthetics
- Computer ScienceFrontiers in Artificial Intelligence
- 2022
Comparing the usefulness of features extracted by CLIP compared to features obtained from the last layer of a comparable ImageNet classification model suggests that CLIP is better suited as a base model for IAA methods than ImageNet pretrained networks.
Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2
- Computer ScienceArXiv
- 2022
A new version of Aesthetic Multi-Attributes Networks (AMANv2) based on the BUTD model and the VLPSA model is proposed, which is the aesthetic attributes captioning, i.e., to assess the aesthetic Attributes such as composition, lighting usage and color arrangement.
Considering User Agreement in Learning to Predict the Aesthetic Quality
- Computer ScienceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2022
A re-adapted multi-task attention network to predict both the mean opinion score and the standard deviation in an end-to-end manner is proposed and a brand-new confidence interval ranking loss is proposed that encourages the model to focus on image-pairs that are less certain about the difference of their aesthetic scores.
References
SHOWING 1-10 OF 37 REFERENCES
Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning
- Computer ScienceAAAI
- 2020
This paper designs two novel pretext tasks to identify the types and parameters of editing operations applied to synthetic instances and adapt them for a one-layer linear classifier to evaluate the performance in terms of binary aesthetic classification.
Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment
- Computer ScienceACM Multimedia
- 2018
A novel multi-patch aggregation method for image aesthetic assessment using an attention-based mechanism that adaptively adjusts the weight of each patch during the training process to improve learning efficiency and outperforms existing methods by a large margin.
A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
An Adaptive Layout-Aware Multi-Patch Convolutional Neural Network (A-Lamp CNN) architecture for photo aesthetic assessment that is able to accept arbitrary sized images, and learn from both fined grained details and holistic image layout simultaneously.
NIMA: Neural Image Assessment
- Computer ScienceIEEE Transactions on Image Processing
- 2018
The proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks and can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline.
Revisiting Self-Supervised Visual Representation Learning
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
NICER: Aesthetic Image Enhancement with Humans in the Loop
- Computer ScienceArXiv
- 2020
This work proposes the Neural Image Correction & Enhancement Routine (NICER), a neural network based approach to no-reference image enhancement in a fully-, semi-automatic or fully manual process that is interactive and user-centered and shows that NICER can improve image aesthetics without user interaction.
Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021
An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided.
AVA: A large-scale database for aesthetic visual analysis
- Computer Science2012 IEEE Conference on Computer Vision and Pattern Recognition
- 2012
A new large-scale database for conducting Aesthetic Visual Analysis: AVA, which contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style is introduced.
A Simple Framework for Contrastive Learning of Visual Representations
- Computer ScienceICML
- 2020
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.