Abstract

Today, multimedia data are produced in massive quantities, thanks to a diverse spectrum of applications including entertainment, surveillance, e-commerce, web, and social media. In particular, social media data have three challenging characteristics: data sizes are enormous, data are often multi-faceted, and data are dynamic. Tensors (multi-dimensional arrays) are widely used for representing such high-order dimensional data. Consequently, a system dealing with social media data needs to scale with the tensor volume and the number and diversity of the data facets. This necessitates highly parallelizable, and in many cases cloud-based, frameworks for scalable processing and efficient analysis of large media and social media collections. Most multimedia applications share a few core operations, including integration/fusion, classification, clustering, graph analysis, near-neighbor search, and similarity search. When performed naively, however, these core operations are often very costly, because the number of objects and object features that need to be considered can be prohibitive. Avoiding this cost requires that redundant work is avoided. Thus, for the next generation cloud-based massive media processing and analysis systems to have transformative impact, the fundamental principles that govern their design must include an awareness of the utilities of data and features to a particular analysis task. Recently, the observation that - while not all - a significant class of data processing applications can be expressed in terms of a small set of primitives that are, in many cases, easy to parallelize, has led to frameworks, such as MapReduce, which have been successfully applied in data processing, mining, and information retrieval domains. Yet, in many other domains (including many aggregation and join tasks that are hard to parallelize) they significantly lag behind traditional solutions. In particular, many multimedia and social media analysis tasks are in the category of applications that pose significant challenges. In this talk, I will present an overview of recent developments in the area of scalable multimedia and social media retrieval and analysis in the cloud and our own efforts [1, 2, 3, 4, 5, 6] to build a scalable data processing middleware, called RanKloud, specifically sensitive to the needs and requirements of multimedia and social media analysis applications. RanKloud avoids waste by intelligently partitioning the data and allocating it on available resources to minimize the data replication and indexing overheads and to prune superfluous low-utility processing. It also includes a tensor-based relational data model to support the complete lifecycle (from collection to analysis) of the data, involving various integration and other manipulation steps. RanKloud also addresses the computational cost of various multi-dimensional data analysis operations, including decomposition or structural change detection, by (a) leveraging a priori background knowledge (or metadata) about one or more domain dimensions and (b) by extending compressed sensing (CS) to tensor data to encode the observed tensor streams in the form of compact descriptors. RanKloud will extend the scope of cloud-based systems to the delivery of efficient and large scale analysis over data with variable utility and, thus, will enable new and efficient applications, tools, and systems for multimedia and social media retrieval and analysis.

Original languageEnglish (US)
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages1-2
Number of pages2
DOIs
StatePublished - 2011
Event9th Workshop on Large-Scale and Distributed Systems for Information Retrieval, LSDS-IR'11 - Glasgow, United Kingdom
Duration: Oct 28 2011Oct 28 2011

Other

Other9th Workshop on Large-Scale and Distributed Systems for Information Retrieval, LSDS-IR'11
CountryUnited Kingdom
CityGlasgow
Period10/28/1110/28/11

Fingerprint

Multimedia
Social media
Costs
Media analysis
Clustering
Structural change
Lag
Similarity search
Join
Surveillance
Resources
Task analysis
Replication
Metadata
Manipulation
Electronic commerce
Decomposition
World Wide Web
Life cycle
Indexing

Keywords

  • analysis
  • compressed sensing
  • data partitioning
  • mapreduce
  • multimedia
  • multiresolution
  • parallel processing
  • retrieval
  • social media
  • tensor decomposition

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Candan, K. (2011). RanKloud: Scalable multimedia and social media retrieval and analysis in the cloud. In International Conference on Information and Knowledge Management, Proceedings (pp. 1-2) https://doi.org/10.1145/2064730.2064732

RanKloud : Scalable multimedia and social media retrieval and analysis in the cloud. / Candan, Kasim.

International Conference on Information and Knowledge Management, Proceedings. 2011. p. 1-2.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Candan, K 2011, RanKloud: Scalable multimedia and social media retrieval and analysis in the cloud. in International Conference on Information and Knowledge Management, Proceedings. pp. 1-2, 9th Workshop on Large-Scale and Distributed Systems for Information Retrieval, LSDS-IR'11, Glasgow, United Kingdom, 10/28/11. https://doi.org/10.1145/2064730.2064732
Candan K. RanKloud: Scalable multimedia and social media retrieval and analysis in the cloud. In International Conference on Information and Knowledge Management, Proceedings. 2011. p. 1-2 https://doi.org/10.1145/2064730.2064732
Candan, Kasim. / RanKloud : Scalable multimedia and social media retrieval and analysis in the cloud. International Conference on Information and Knowledge Management, Proceedings. 2011. pp. 1-2
@inproceedings{7774a07b146b422abd58d5fb4952a1e1,
title = "RanKloud: Scalable multimedia and social media retrieval and analysis in the cloud",
abstract = "Today, multimedia data are produced in massive quantities, thanks to a diverse spectrum of applications including entertainment, surveillance, e-commerce, web, and social media. In particular, social media data have three challenging characteristics: data sizes are enormous, data are often multi-faceted, and data are dynamic. Tensors (multi-dimensional arrays) are widely used for representing such high-order dimensional data. Consequently, a system dealing with social media data needs to scale with the tensor volume and the number and diversity of the data facets. This necessitates highly parallelizable, and in many cases cloud-based, frameworks for scalable processing and efficient analysis of large media and social media collections. Most multimedia applications share a few core operations, including integration/fusion, classification, clustering, graph analysis, near-neighbor search, and similarity search. When performed naively, however, these core operations are often very costly, because the number of objects and object features that need to be considered can be prohibitive. Avoiding this cost requires that redundant work is avoided. Thus, for the next generation cloud-based massive media processing and analysis systems to have transformative impact, the fundamental principles that govern their design must include an awareness of the utilities of data and features to a particular analysis task. Recently, the observation that - while not all - a significant class of data processing applications can be expressed in terms of a small set of primitives that are, in many cases, easy to parallelize, has led to frameworks, such as MapReduce, which have been successfully applied in data processing, mining, and information retrieval domains. Yet, in many other domains (including many aggregation and join tasks that are hard to parallelize) they significantly lag behind traditional solutions. In particular, many multimedia and social media analysis tasks are in the category of applications that pose significant challenges. In this talk, I will present an overview of recent developments in the area of scalable multimedia and social media retrieval and analysis in the cloud and our own efforts [1, 2, 3, 4, 5, 6] to build a scalable data processing middleware, called RanKloud, specifically sensitive to the needs and requirements of multimedia and social media analysis applications. RanKloud avoids waste by intelligently partitioning the data and allocating it on available resources to minimize the data replication and indexing overheads and to prune superfluous low-utility processing. It also includes a tensor-based relational data model to support the complete lifecycle (from collection to analysis) of the data, involving various integration and other manipulation steps. RanKloud also addresses the computational cost of various multi-dimensional data analysis operations, including decomposition or structural change detection, by (a) leveraging a priori background knowledge (or metadata) about one or more domain dimensions and (b) by extending compressed sensing (CS) to tensor data to encode the observed tensor streams in the form of compact descriptors. RanKloud will extend the scope of cloud-based systems to the delivery of efficient and large scale analysis over data with variable utility and, thus, will enable new and efficient applications, tools, and systems for multimedia and social media retrieval and analysis.",
keywords = "analysis, compressed sensing, data partitioning, mapreduce, multimedia, multiresolution, parallel processing, retrieval, social media, tensor decomposition",
author = "Kasim Candan",
year = "2011",
doi = "10.1145/2064730.2064732",
language = "English (US)",
isbn = "9781450309592",
pages = "1--2",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - RanKloud

T2 - Scalable multimedia and social media retrieval and analysis in the cloud

AU - Candan, Kasim

PY - 2011

Y1 - 2011

N2 - Today, multimedia data are produced in massive quantities, thanks to a diverse spectrum of applications including entertainment, surveillance, e-commerce, web, and social media. In particular, social media data have three challenging characteristics: data sizes are enormous, data are often multi-faceted, and data are dynamic. Tensors (multi-dimensional arrays) are widely used for representing such high-order dimensional data. Consequently, a system dealing with social media data needs to scale with the tensor volume and the number and diversity of the data facets. This necessitates highly parallelizable, and in many cases cloud-based, frameworks for scalable processing and efficient analysis of large media and social media collections. Most multimedia applications share a few core operations, including integration/fusion, classification, clustering, graph analysis, near-neighbor search, and similarity search. When performed naively, however, these core operations are often very costly, because the number of objects and object features that need to be considered can be prohibitive. Avoiding this cost requires that redundant work is avoided. Thus, for the next generation cloud-based massive media processing and analysis systems to have transformative impact, the fundamental principles that govern their design must include an awareness of the utilities of data and features to a particular analysis task. Recently, the observation that - while not all - a significant class of data processing applications can be expressed in terms of a small set of primitives that are, in many cases, easy to parallelize, has led to frameworks, such as MapReduce, which have been successfully applied in data processing, mining, and information retrieval domains. Yet, in many other domains (including many aggregation and join tasks that are hard to parallelize) they significantly lag behind traditional solutions. In particular, many multimedia and social media analysis tasks are in the category of applications that pose significant challenges. In this talk, I will present an overview of recent developments in the area of scalable multimedia and social media retrieval and analysis in the cloud and our own efforts [1, 2, 3, 4, 5, 6] to build a scalable data processing middleware, called RanKloud, specifically sensitive to the needs and requirements of multimedia and social media analysis applications. RanKloud avoids waste by intelligently partitioning the data and allocating it on available resources to minimize the data replication and indexing overheads and to prune superfluous low-utility processing. It also includes a tensor-based relational data model to support the complete lifecycle (from collection to analysis) of the data, involving various integration and other manipulation steps. RanKloud also addresses the computational cost of various multi-dimensional data analysis operations, including decomposition or structural change detection, by (a) leveraging a priori background knowledge (or metadata) about one or more domain dimensions and (b) by extending compressed sensing (CS) to tensor data to encode the observed tensor streams in the form of compact descriptors. RanKloud will extend the scope of cloud-based systems to the delivery of efficient and large scale analysis over data with variable utility and, thus, will enable new and efficient applications, tools, and systems for multimedia and social media retrieval and analysis.

AB - Today, multimedia data are produced in massive quantities, thanks to a diverse spectrum of applications including entertainment, surveillance, e-commerce, web, and social media. In particular, social media data have three challenging characteristics: data sizes are enormous, data are often multi-faceted, and data are dynamic. Tensors (multi-dimensional arrays) are widely used for representing such high-order dimensional data. Consequently, a system dealing with social media data needs to scale with the tensor volume and the number and diversity of the data facets. This necessitates highly parallelizable, and in many cases cloud-based, frameworks for scalable processing and efficient analysis of large media and social media collections. Most multimedia applications share a few core operations, including integration/fusion, classification, clustering, graph analysis, near-neighbor search, and similarity search. When performed naively, however, these core operations are often very costly, because the number of objects and object features that need to be considered can be prohibitive. Avoiding this cost requires that redundant work is avoided. Thus, for the next generation cloud-based massive media processing and analysis systems to have transformative impact, the fundamental principles that govern their design must include an awareness of the utilities of data and features to a particular analysis task. Recently, the observation that - while not all - a significant class of data processing applications can be expressed in terms of a small set of primitives that are, in many cases, easy to parallelize, has led to frameworks, such as MapReduce, which have been successfully applied in data processing, mining, and information retrieval domains. Yet, in many other domains (including many aggregation and join tasks that are hard to parallelize) they significantly lag behind traditional solutions. In particular, many multimedia and social media analysis tasks are in the category of applications that pose significant challenges. In this talk, I will present an overview of recent developments in the area of scalable multimedia and social media retrieval and analysis in the cloud and our own efforts [1, 2, 3, 4, 5, 6] to build a scalable data processing middleware, called RanKloud, specifically sensitive to the needs and requirements of multimedia and social media analysis applications. RanKloud avoids waste by intelligently partitioning the data and allocating it on available resources to minimize the data replication and indexing overheads and to prune superfluous low-utility processing. It also includes a tensor-based relational data model to support the complete lifecycle (from collection to analysis) of the data, involving various integration and other manipulation steps. RanKloud also addresses the computational cost of various multi-dimensional data analysis operations, including decomposition or structural change detection, by (a) leveraging a priori background knowledge (or metadata) about one or more domain dimensions and (b) by extending compressed sensing (CS) to tensor data to encode the observed tensor streams in the form of compact descriptors. RanKloud will extend the scope of cloud-based systems to the delivery of efficient and large scale analysis over data with variable utility and, thus, will enable new and efficient applications, tools, and systems for multimedia and social media retrieval and analysis.

KW - analysis

KW - compressed sensing

KW - data partitioning

KW - mapreduce

KW - multimedia

KW - multiresolution

KW - parallel processing

KW - retrieval

KW - social media

KW - tensor decomposition

UR - http://www.scopus.com/inward/record.url?scp=83255176114&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83255176114&partnerID=8YFLogxK

U2 - 10.1145/2064730.2064732

DO - 10.1145/2064730.2064732

M3 - Conference contribution

AN - SCOPUS:83255176114

SN - 9781450309592

SP - 1

EP - 2

BT - International Conference on Information and Knowledge Management, Proceedings

ER -