Abstract

For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system’s capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages227-235
Number of pages9
ISBN (Electronic)9781728119755
DOIs
StatePublished - Mar 4 2019
Event19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019 - Waikoloa Village, United States
Duration: Jan 7 2019Jan 11 2019

Publication series

NameProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019

Conference

Conference19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019
CountryUnited States
CityWaikoloa Village
Period1/7/191/11/19

Fingerprint

Distillation
Masks
Students
Neural networks

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Aditya, S., Saha, R., Yang, Y., & Baral, C. (2019). Spatial knowledge distillation to aid visual reasoning. In Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019 (pp. 227-235). [8658731] (Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WACV.2019.00030

Spatial knowledge distillation to aid visual reasoning. / Aditya, Somak; Saha, Rudra; Yang, Yezhou; Baral, Chitta.

Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 227-235 8658731 (Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aditya, S, Saha, R, Yang, Y & Baral, C 2019, Spatial knowledge distillation to aid visual reasoning. in Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019., 8658731, Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Institute of Electrical and Electronics Engineers Inc., pp. 227-235, 19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, United States, 1/7/19. https://doi.org/10.1109/WACV.2019.00030
Aditya S, Saha R, Yang Y, Baral C. Spatial knowledge distillation to aid visual reasoning. In Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 227-235. 8658731. (Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019). https://doi.org/10.1109/WACV.2019.00030
Aditya, Somak ; Saha, Rudra ; Yang, Yezhou ; Baral, Chitta. / Spatial knowledge distillation to aid visual reasoning. Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 227-235 (Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019).
@inproceedings{62fe6a0dba9a4f4597325901bcbc8b86,
title = "Spatial knowledge distillation to aid visual reasoning",
abstract = "For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system’s capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.",
author = "Somak Aditya and Rudra Saha and Yezhou Yang and Chitta Baral",
year = "2019",
month = "3",
day = "4",
doi = "10.1109/WACV.2019.00030",
language = "English (US)",
series = "Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "227--235",
booktitle = "Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019",

}

TY - GEN

T1 - Spatial knowledge distillation to aid visual reasoning

AU - Aditya, Somak

AU - Saha, Rudra

AU - Yang, Yezhou

AU - Baral, Chitta

PY - 2019/3/4

Y1 - 2019/3/4

N2 - For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system’s capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.

AB - For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system’s capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.

UR - http://www.scopus.com/inward/record.url?scp=85063574220&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063574220&partnerID=8YFLogxK

U2 - 10.1109/WACV.2019.00030

DO - 10.1109/WACV.2019.00030

M3 - Conference contribution

T3 - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019

SP - 227

EP - 235

BT - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -