Extracting key frames from consumer videos using bi-layer group sparsity

Zheshen Wang, Mrityunjay Kumar, Jiebo Luo, Baoxin Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Compared to well-edited videos with predefined structures (e.g., news or sports videos), extracting key frames from unconstrained consumer videos remains a much more challenging problem due to their extremely diverse contents (no pre-imposed structure) and uncontrolled video quality (e.g., due to poor lighting or camera shake). In order to exploit spatio-temporal correlation present in the video for key frame extraction, we propose a bilayer group sparse representation in which the input video frames are first segmented into homogeneous patches and group sparsity is imposed at two levels simultaneously: (i) patch-to-frame, and (ii) frame-to-sequence. The grouped sparse coefficients are further combined with frame quality scores to generate key frames. Extensive experiments are performed on videos from actual end users. Results obtained by the proposed approach compare favorably with existing methods to confirm its effectiveness.

Original languageEnglish (US)
Title of host publicationMM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops
Pages1505-1508
Number of pages4
DOIs
StatePublished - 2011
Event19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11 - Scottsdale, AZ, United States
Duration: Nov 28 2011Dec 1 2011

Other

Other19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11
CountryUnited States
CityScottsdale, AZ
Period11/28/1112/1/11

Fingerprint

Sports
Lighting
Cameras
Experiments

Keywords

  • Consumer video
  • Group sparsity
  • Key frame extraction

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Cite this

Wang, Z., Kumar, M., Luo, J., & Li, B. (2011). Extracting key frames from consumer videos using bi-layer group sparsity. In MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops (pp. 1505-1508) https://doi.org/10.1145/2072298.2072051

Extracting key frames from consumer videos using bi-layer group sparsity. / Wang, Zheshen; Kumar, Mrityunjay; Luo, Jiebo; Li, Baoxin.

MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops. 2011. p. 1505-1508.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z, Kumar, M, Luo, J & Li, B 2011, Extracting key frames from consumer videos using bi-layer group sparsity. in MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops. pp. 1505-1508, 19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11, Scottsdale, AZ, United States, 11/28/11. https://doi.org/10.1145/2072298.2072051
Wang Z, Kumar M, Luo J, Li B. Extracting key frames from consumer videos using bi-layer group sparsity. In MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops. 2011. p. 1505-1508 https://doi.org/10.1145/2072298.2072051
Wang, Zheshen ; Kumar, Mrityunjay ; Luo, Jiebo ; Li, Baoxin. / Extracting key frames from consumer videos using bi-layer group sparsity. MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops. 2011. pp. 1505-1508
@inproceedings{606db2e54463458a9a3fa6100becfca0,
title = "Extracting key frames from consumer videos using bi-layer group sparsity",
abstract = "Compared to well-edited videos with predefined structures (e.g., news or sports videos), extracting key frames from unconstrained consumer videos remains a much more challenging problem due to their extremely diverse contents (no pre-imposed structure) and uncontrolled video quality (e.g., due to poor lighting or camera shake). In order to exploit spatio-temporal correlation present in the video for key frame extraction, we propose a bilayer group sparse representation in which the input video frames are first segmented into homogeneous patches and group sparsity is imposed at two levels simultaneously: (i) patch-to-frame, and (ii) frame-to-sequence. The grouped sparse coefficients are further combined with frame quality scores to generate key frames. Extensive experiments are performed on videos from actual end users. Results obtained by the proposed approach compare favorably with existing methods to confirm its effectiveness.",
keywords = "Consumer video, Group sparsity, Key frame extraction",
author = "Zheshen Wang and Mrityunjay Kumar and Jiebo Luo and Baoxin Li",
year = "2011",
doi = "10.1145/2072298.2072051",
language = "English (US)",
isbn = "9781450306164",
pages = "1505--1508",
booktitle = "MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops",

}

TY - GEN

T1 - Extracting key frames from consumer videos using bi-layer group sparsity

AU - Wang, Zheshen

AU - Kumar, Mrityunjay

AU - Luo, Jiebo

AU - Li, Baoxin

PY - 2011

Y1 - 2011

N2 - Compared to well-edited videos with predefined structures (e.g., news or sports videos), extracting key frames from unconstrained consumer videos remains a much more challenging problem due to their extremely diverse contents (no pre-imposed structure) and uncontrolled video quality (e.g., due to poor lighting or camera shake). In order to exploit spatio-temporal correlation present in the video for key frame extraction, we propose a bilayer group sparse representation in which the input video frames are first segmented into homogeneous patches and group sparsity is imposed at two levels simultaneously: (i) patch-to-frame, and (ii) frame-to-sequence. The grouped sparse coefficients are further combined with frame quality scores to generate key frames. Extensive experiments are performed on videos from actual end users. Results obtained by the proposed approach compare favorably with existing methods to confirm its effectiveness.

AB - Compared to well-edited videos with predefined structures (e.g., news or sports videos), extracting key frames from unconstrained consumer videos remains a much more challenging problem due to their extremely diverse contents (no pre-imposed structure) and uncontrolled video quality (e.g., due to poor lighting or camera shake). In order to exploit spatio-temporal correlation present in the video for key frame extraction, we propose a bilayer group sparse representation in which the input video frames are first segmented into homogeneous patches and group sparsity is imposed at two levels simultaneously: (i) patch-to-frame, and (ii) frame-to-sequence. The grouped sparse coefficients are further combined with frame quality scores to generate key frames. Extensive experiments are performed on videos from actual end users. Results obtained by the proposed approach compare favorably with existing methods to confirm its effectiveness.

KW - Consumer video

KW - Group sparsity

KW - Key frame extraction

UR - http://www.scopus.com/inward/record.url?scp=84455201798&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84455201798&partnerID=8YFLogxK

U2 - 10.1145/2072298.2072051

DO - 10.1145/2072298.2072051

M3 - Conference contribution

AN - SCOPUS:84455201798

SN - 9781450306164

SP - 1505

EP - 1508

BT - MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops

ER -