TY - GEN
T1 - Sequence-kernel based sparse representation for amateur video summarization
AU - Wang, Zheshen
AU - Kumar, Mrityunjay
AU - Luo, Jiebo
AU - Li, Baoxin
PY - 2011
Y1 - 2011
N2 - Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.
AB - Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.
KW - Sparse representation
KW - Video summarization
UR - http://www.scopus.com/inward/record.url?scp=84555196210&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84555196210&partnerID=8YFLogxK
U2 - 10.1145/2072508.2072516
DO - 10.1145/2072508.2072516
M3 - Conference contribution
AN - SCOPUS:84555196210
SN - 9781450309967
T3 - MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11
SP - 31
EP - 36
BT - MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JMRE 2011 Workshop, J-MRE'11
T2 - 2011 ACM Multimedia Conference, MM'11 and Co-Located Workshops - Joint Workshop on Modeling and Representing Events, JMRE'11
Y2 - 28 November 2011 through 1 December 2011
ER -