YouTubeCat: Learning to categorize wild web videos

Zheshen Wang; Ming Zhao; Yang Song; Sanjiv Kumar; Baoxin Li

doi:10.1109/CVPR.2010.5540125

YouTubeCat: Learning to categorize wild web videos

Zheshen Wang, Ming Zhao, Yang Song, Sanjiv Kumar, Baoxin Li

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

58 Scopus citations

Abstract

Automatic categorization of videos in a Web-scale unconstrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages. The data from such disparate sources has different properties and labeling quality, and thus fusing them in a coherent fashion is another practical challenge. We propose a fusion framework in which each data source is first combined with the manually-labeled set independently. Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed. Based on the final fused classifier, category labels are predicted for the new videos. Extensive experiments on about 80K videos from 29 most frequent categories in YouTube show the effectiveness of the proposed method for categorizing large-scale wild Web videos.

Original language	English (US)
Title of host publication	2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
Pages	879-886
Number of pages	8
DOIs	https://doi.org/10.1109/CVPR.2010.5540125
State	Published - 2010
Event	2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 - San Francisco, CA, United States Duration: Jun 13 2010 → Jun 18 2010

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Other

Other	2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
Country/Territory	United States
City	San Francisco, CA
Period	6/13/10 → 6/18/10

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR.2010.5540125

Cite this

Wang, Z., Zhao, M., Song, Y., Kumar, S., & Li, B. (2010). YouTubeCat: Learning to categorize wild web videos. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (pp. 879-886). Article 5540125 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). https://doi.org/10.1109/CVPR.2010.5540125

YouTubeCat: Learning to categorize wild web videos. / Wang, Zheshen; Zhao, Ming; Song, Yang et al.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010. 2010. p. 879-886 5540125 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Wang, Z, Zhao, M, Song, Y, Kumar, S & Li, B 2010, YouTubeCat: Learning to categorize wild web videos. in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010., 5540125, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 879-886, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, United States, 6/13/10. https://doi.org/10.1109/CVPR.2010.5540125

@inproceedings{8a3a28acee2344b28e51c05beb51c722,

title = "YouTubeCat: Learning to categorize wild web videos",

abstract = "Automatic categorization of videos in a Web-scale unconstrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages. The data from such disparate sources has different properties and labeling quality, and thus fusing them in a coherent fashion is another practical challenge. We propose a fusion framework in which each data source is first combined with the manually-labeled set independently. Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed. Based on the final fused classifier, category labels are predicted for the new videos. Extensive experiments on about 80K videos from 29 most frequent categories in YouTube show the effectiveness of the proposed method for categorizing large-scale wild Web videos.",

author = "Zheshen Wang and Ming Zhao and Yang Song and Sanjiv Kumar and Baoxin Li",

year = "2010",

doi = "10.1109/CVPR.2010.5540125",

language = "English (US)",

isbn = "9781424469840",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

pages = "879--886",

booktitle = "2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010",

note = "2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 ; Conference date: 13-06-2010 Through 18-06-2010",

}

TY - GEN

T1 - YouTubeCat

T2 - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010

AU - Wang, Zheshen

AU - Zhao, Ming

AU - Song, Yang

AU - Kumar, Sanjiv

AU - Li, Baoxin

PY - 2010

Y1 - 2010

N2 - Automatic categorization of videos in a Web-scale unconstrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages. The data from such disparate sources has different properties and labeling quality, and thus fusing them in a coherent fashion is another practical challenge. We propose a fusion framework in which each data source is first combined with the manually-labeled set independently. Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed. Based on the final fused classifier, category labels are predicted for the new videos. Extensive experiments on about 80K videos from 29 most frequent categories in YouTube show the effectiveness of the proposed method for categorizing large-scale wild Web videos.

AB - Automatic categorization of videos in a Web-scale unconstrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages. The data from such disparate sources has different properties and labeling quality, and thus fusing them in a coherent fashion is another practical challenge. We propose a fusion framework in which each data source is first combined with the manually-labeled set independently. Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed. Based on the final fused classifier, category labels are predicted for the new videos. Extensive experiments on about 80K videos from 29 most frequent categories in YouTube show the effectiveness of the proposed method for categorizing large-scale wild Web videos.

UR - http://www.scopus.com/inward/record.url?scp=77955988704&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955988704&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2010.5540125

DO - 10.1109/CVPR.2010.5540125

M3 - Conference contribution

AN - SCOPUS:77955988704

SN - 9781424469840

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 879

EP - 886

BT - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010

Y2 - 13 June 2010 through 18 June 2010

ER -

YouTubeCat: Learning to categorize wild web videos

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this