Abstract

Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency without explicit knowledge of global scene semantic information. We propose a model (MxSalNet) that incorporates global scene semantic information in addition to local information gathered by a convolutional neural network. Our model is formulated as a mixture of experts. Each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks’ output, with weights determined by a separate gating network. This gating network is guided by global scene information to predict weights. The expert networks and the gating network are trained simultaneously in an end-toend manner. We show that our mixture formulation leads to improvement in performance over an otherwise identical nonmixture model that does not incorporate global scene information. Additionally, we show that our model achieves better performance than several other visual saliency models.

Original languageEnglish (US)
JournalIEEE Transactions on Image Processing
DOIs
StateAccepted/In press - May 8 2018

Fingerprint

Semantics
Deep neural networks
Neural networks
Deep learning

Keywords

  • Adaptation models
  • Biological system modeling
  • Computational modeling
  • Context modeling
  • Machine learning
  • Predictive models
  • Visualization

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design

Cite this

Visual Saliency Prediction Using a Mixture of Deep Neural Networks. / Dodge, Samuel; Karam, Lina.

In: IEEE Transactions on Image Processing, 08.05.2018.

Research output: Contribution to journalArticle

@article{6fa417af113c4c57a3c84c19f7a1c21c,
title = "Visual Saliency Prediction Using a Mixture of Deep Neural Networks",
abstract = "Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency without explicit knowledge of global scene semantic information. We propose a model (MxSalNet) that incorporates global scene semantic information in addition to local information gathered by a convolutional neural network. Our model is formulated as a mixture of experts. Each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks’ output, with weights determined by a separate gating network. This gating network is guided by global scene information to predict weights. The expert networks and the gating network are trained simultaneously in an end-toend manner. We show that our mixture formulation leads to improvement in performance over an otherwise identical nonmixture model that does not incorporate global scene information. Additionally, we show that our model achieves better performance than several other visual saliency models.",
keywords = "Adaptation models, Biological system modeling, Computational modeling, Context modeling, Machine learning, Predictive models, Visualization",
author = "Samuel Dodge and Lina Karam",
year = "2018",
month = "5",
day = "8",
doi = "10.1109/TIP.2018.2834826",
language = "English (US)",
journal = "IEEE Transactions on Image Processing",
issn = "1057-7149",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Visual Saliency Prediction Using a Mixture of Deep Neural Networks

AU - Dodge, Samuel

AU - Karam, Lina

PY - 2018/5/8

Y1 - 2018/5/8

N2 - Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency without explicit knowledge of global scene semantic information. We propose a model (MxSalNet) that incorporates global scene semantic information in addition to local information gathered by a convolutional neural network. Our model is formulated as a mixture of experts. Each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks’ output, with weights determined by a separate gating network. This gating network is guided by global scene information to predict weights. The expert networks and the gating network are trained simultaneously in an end-toend manner. We show that our mixture formulation leads to improvement in performance over an otherwise identical nonmixture model that does not incorporate global scene information. Additionally, we show that our model achieves better performance than several other visual saliency models.

AB - Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency without explicit knowledge of global scene semantic information. We propose a model (MxSalNet) that incorporates global scene semantic information in addition to local information gathered by a convolutional neural network. Our model is formulated as a mixture of experts. Each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks’ output, with weights determined by a separate gating network. This gating network is guided by global scene information to predict weights. The expert networks and the gating network are trained simultaneously in an end-toend manner. We show that our mixture formulation leads to improvement in performance over an otherwise identical nonmixture model that does not incorporate global scene information. Additionally, we show that our model achieves better performance than several other visual saliency models.

KW - Adaptation models

KW - Biological system modeling

KW - Computational modeling

KW - Context modeling

KW - Machine learning

KW - Predictive models

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85046825295&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046825295&partnerID=8YFLogxK

U2 - 10.1109/TIP.2018.2834826

DO - 10.1109/TIP.2018.2834826

M3 - Article

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

ER -