Gbase: An efficient analysis platform for large graphs

U. Kang; Hanghang Tong; Jimeng Sun; Ching Yung Lin; Christos Faloutsos

doi:10.1007/s00778-012-0283-9

Gbase: An efficient analysis platform for large graphs

U. Kang, Hanghang Tong, Jimeng Sun, Ching Yung Lin, Christos Faloutsos

Arizona State University

Research output: Contribution to journal › Article › peer-review

44 Scopus citations

Abstract

Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.

Original language	English (US)
Pages (from-to)	637-650
Number of pages	14
Journal	VLDB Journal
Volume	21
Issue number	5
DOIs	https://doi.org/10.1007/s00778-012-0283-9
State	Published - Oct 1 2012

Keywords

Compression
Distributed computing
Graph
Indexing

ASJC Scopus subject areas

Information Systems
Hardware and Architecture

Access to Document

10.1007/s00778-012-0283-9

Cite this

@article{7f73e08723fc4143918ef1de72c371f7,

title = "Gbase: An efficient analysis platform for large graphs",

abstract = "Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.",

keywords = "Compression, Distributed computing, Graph, Indexing",

author = "U. Kang and Hanghang Tong and Jimeng Sun and Lin, {Ching Yung} and Christos Faloutsos",

year = "2012",

month = oct,

day = "1",

doi = "10.1007/s00778-012-0283-9",

language = "English (US)",

volume = "21",

pages = "637--650",

journal = "VLDB Journal",

issn = "1066-8888",

publisher = "Springer New York",

number = "5",

}

TY - JOUR

T1 - Gbase

T2 - An efficient analysis platform for large graphs

AU - Kang, U.

AU - Tong, Hanghang

AU - Sun, Jimeng

AU - Lin, Ching Yung

AU - Faloutsos, Christos

PY - 2012/10/1

Y1 - 2012/10/1

N2 - Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.

AB - Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.

KW - Compression

KW - Distributed computing

KW - Graph

KW - Indexing

UR - http://www.scopus.com/inward/record.url?scp=84866450568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866450568&partnerID=8YFLogxK

U2 - 10.1007/s00778-012-0283-9

DO - 10.1007/s00778-012-0283-9

M3 - Article

AN - SCOPUS:84866450568

SN - 1066-8888

VL - 21

SP - 637

EP - 650

JO - VLDB Journal

JF - VLDB Journal

IS - 5

ER -

Gbase: An efficient analysis platform for large graphs

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this