2 Scopus citations

Abstract

For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system’s capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages227-235
Number of pages9
ISBN (Electronic)9781728119755
DOIs
StatePublished - Mar 4 2019
Event19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019 - Waikoloa Village, United States
Duration: Jan 7 2019Jan 11 2019

Publication series

NameProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019

Conference

Conference19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019
CountryUnited States
CityWaikoloa Village
Period1/7/191/11/19

    Fingerprint

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Aditya, S., Saha, R., Yang, Y., & Baral, C. (2019). Spatial knowledge distillation to aid visual reasoning. In Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019 (pp. 227-235). [8658731] (Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WACV.2019.00030