Gbase

An efficient analysis platform for large graphs

U. Kang, Hanghang Tong, Jimeng Sun, Ching Yung Lin, Christos Faloutsos

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.

Original languageEnglish (US)
Pages (from-to)637-650
Number of pages14
JournalVLDB Journal
Volume21
Issue number5
DOIs
StatePublished - Oct 2012
Externally publishedYes

Fingerprint

Recommender systems
Internet
Proteins
Experiments

Keywords

  • Compression
  • Distributed computing
  • Graph
  • Indexing

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems

Cite this

Gbase : An efficient analysis platform for large graphs. / Kang, U.; Tong, Hanghang; Sun, Jimeng; Lin, Ching Yung; Faloutsos, Christos.

In: VLDB Journal, Vol. 21, No. 5, 10.2012, p. 637-650.

Research output: Contribution to journalArticle

Kang, U, Tong, H, Sun, J, Lin, CY & Faloutsos, C 2012, 'Gbase: An efficient analysis platform for large graphs', VLDB Journal, vol. 21, no. 5, pp. 637-650. https://doi.org/10.1007/s00778-012-0283-9
Kang, U. ; Tong, Hanghang ; Sun, Jimeng ; Lin, Ching Yung ; Faloutsos, Christos. / Gbase : An efficient analysis platform for large graphs. In: VLDB Journal. 2012 ; Vol. 21, No. 5. pp. 637-650.
@article{7f73e08723fc4143918ef1de72c371f7,
title = "Gbase: An efficient analysis platform for large graphs",
abstract = "Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.",
keywords = "Compression, Distributed computing, Graph, Indexing",
author = "U. Kang and Hanghang Tong and Jimeng Sun and Lin, {Ching Yung} and Christos Faloutsos",
year = "2012",
month = "10",
doi = "10.1007/s00778-012-0283-9",
language = "English (US)",
volume = "21",
pages = "637--650",
journal = "VLDB Journal",
issn = "1066-8888",
publisher = "Springer New York",
number = "5",

}

TY - JOUR

T1 - Gbase

T2 - An efficient analysis platform for large graphs

AU - Kang, U.

AU - Tong, Hanghang

AU - Sun, Jimeng

AU - Lin, Ching Yung

AU - Faloutsos, Christos

PY - 2012/10

Y1 - 2012/10

N2 - Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.

AB - Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.

KW - Compression

KW - Distributed computing

KW - Graph

KW - Indexing

UR - http://www.scopus.com/inward/record.url?scp=84866450568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866450568&partnerID=8YFLogxK

U2 - 10.1007/s00778-012-0283-9

DO - 10.1007/s00778-012-0283-9

M3 - Article

VL - 21

SP - 637

EP - 650

JO - VLDB Journal

JF - VLDB Journal

SN - 1066-8888

IS - 5

ER -