Sequence-kernel based sparse representation for amateur video summarization

Zheshen Wang, Mrityunjay Kumar, Jiebo Luo, Baoxin Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.

Original languageEnglish (US)
Title of host publicationMM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11
Pages31-36
Number of pages6
DOIs
StatePublished - 2011
Event2011 ACM Multimedia Conference, MM'11 and Co-Located Workshops - Joint Workshop on Modeling and Representing Events, JMRE'11 - Scottsdale, AZ, United States
Duration: Nov 28 2011Dec 1 2011

Other

Other2011 ACM Multimedia Conference, MM'11 and Co-Located Workshops - Joint Workshop on Modeling and Representing Events, JMRE'11
CountryUnited States
CityScottsdale, AZ
Period11/28/1112/1/11

Fingerprint

Glossaries
Signal reconstruction
Sports
Semantics
Experiments

Keywords

  • Sparse representation
  • Video summarization

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Cite this

Wang, Z., Kumar, M., Luo, J., & Li, B. (2011). Sequence-kernel based sparse representation for amateur video summarization. In MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11 (pp. 31-36) https://doi.org/10.1145/2072508.2072516

Sequence-kernel based sparse representation for amateur video summarization. / Wang, Zheshen; Kumar, Mrityunjay; Luo, Jiebo; Li, Baoxin.

MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11. 2011. p. 31-36.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z, Kumar, M, Luo, J & Li, B 2011, Sequence-kernel based sparse representation for amateur video summarization. in MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11. pp. 31-36, 2011 ACM Multimedia Conference, MM'11 and Co-Located Workshops - Joint Workshop on Modeling and Representing Events, JMRE'11, Scottsdale, AZ, United States, 11/28/11. https://doi.org/10.1145/2072508.2072516
Wang Z, Kumar M, Luo J, Li B. Sequence-kernel based sparse representation for amateur video summarization. In MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11. 2011. p. 31-36 https://doi.org/10.1145/2072508.2072516
Wang, Zheshen ; Kumar, Mrityunjay ; Luo, Jiebo ; Li, Baoxin. / Sequence-kernel based sparse representation for amateur video summarization. MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11. 2011. pp. 31-36
@inproceedings{1a52db9deb5c447ea8b68e5113b0bf55,
title = "Sequence-kernel based sparse representation for amateur video summarization",
abstract = "Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.",
keywords = "Sparse representation, Video summarization",
author = "Zheshen Wang and Mrityunjay Kumar and Jiebo Luo and Baoxin Li",
year = "2011",
doi = "10.1145/2072508.2072516",
language = "English (US)",
isbn = "9781450309967",
pages = "31--36",
booktitle = "MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11",

}

TY - GEN

T1 - Sequence-kernel based sparse representation for amateur video summarization

AU - Wang, Zheshen

AU - Kumar, Mrityunjay

AU - Luo, Jiebo

AU - Li, Baoxin

PY - 2011

Y1 - 2011

N2 - Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.

AB - Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.

KW - Sparse representation

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=84555196210&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84555196210&partnerID=8YFLogxK

U2 - 10.1145/2072508.2072516

DO - 10.1145/2072508.2072516

M3 - Conference contribution

AN - SCOPUS:84555196210

SN - 9781450309967

SP - 31

EP - 36

BT - MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11

ER -