G-Finder: Approximate Attributed Subgraph Matching

Lihui Liu; Boxin Du; Jiejun Xu; Hanghang Tong

doi:10.1109/BigData47090.2019.9006525

G-Finder: Approximate Attributed Subgraph Matching

Lihui Liu, Boxin Du, Jiejun Xu, Hanghang Tong

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

24 Scopus citations

Abstract

Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
Editors	Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	513-522
Number of pages	10
ISBN (Electronic)	9781728108582
DOIs	https://doi.org/10.1109/BigData47090.2019.9006525
State	Published - Dec 2019
Event	2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019

Publication series

Name	Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference	2019 IEEE International Conference on Big Data, Big Data 2019
Country/Territory	United States
City	Los Angeles
Period	12/9/19 → 12/12/19

Keywords

approximate matching
subgraph index
subgraph matching

ASJC Scopus subject areas

Artificial Intelligence
Computer Networks and Communications
Information Systems
Information Systems and Management

Access to Document

10.1109/BigData47090.2019.9006525

Cite this

Liu, L., Du, B., Xu, J., & Tong, H. (2019). G-Finder: Approximate Attributed Subgraph Matching. In C. Baru, J. Huan, L. Khan, X. T. Hu, R. Ak, Y. Tian, R. Barga, C. Zaniolo, K. Lee, & Y. F. Ye (Eds.), Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 (pp. 513-522). Article 9006525 (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData47090.2019.9006525

G-Finder: Approximate Attributed Subgraph Matching. / Liu, Lihui; Du, Boxin; Xu, Jiejun et al.
Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. ed. / Chaitanya Baru; Jun Huan; Latifur Khan; Xiaohua Tony Hu; Ronay Ak; Yuanyuan Tian; Roger Barga; Carlo Zaniolo; Kisung Lee; Yanfang Fanny Ye. Institute of Electrical and Electronics Engineers Inc., 2019. p. 513-522 9006525 (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, L, Du, B, Xu, J & Tong, H 2019, G-Finder: Approximate Attributed Subgraph Matching. in C Baru, J Huan, L Khan, XT Hu, R Ak, Y Tian, R Barga, C Zaniolo, K Lee & YF Ye (eds), Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019., 9006525, Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, Institute of Electrical and Electronics Engineers Inc., pp. 513-522, 2019 IEEE International Conference on Big Data, Big Data 2019, Los Angeles, United States, 12/9/19. https://doi.org/10.1109/BigData47090.2019.9006525

Liu L, Du B, Xu J, Tong H. G-Finder: Approximate Attributed Subgraph Matching. In Baru C, Huan J, Khan L, Hu XT, Ak R, Tian Y, Barga R, Zaniolo C, Lee K, Ye YF, editors, Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 513-522. 9006525. (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019). doi: 10.1109/BigData47090.2019.9006525

Liu, Lihui ; Du, Boxin ; Xu, Jiejun et al. / G-Finder : Approximate Attributed Subgraph Matching. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. editor / Chaitanya Baru ; Jun Huan ; Latifur Khan ; Xiaohua Tony Hu ; Ronay Ak ; Yuanyuan Tian ; Roger Barga ; Carlo Zaniolo ; Kisung Lee ; Yanfang Fanny Ye. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 513-522 (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019).

@inproceedings{13ad43ce7da24970b6c4b35c4c928193,

title = "G-Finder: Approximate Attributed Subgraph Matching",

abstract = "Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.",

keywords = "approximate matching, subgraph index, subgraph matching",

author = "Lihui Liu and Boxin Du and Jiejun Xu and Hanghang Tong",

year = "2019",

month = dec,

doi = "10.1109/BigData47090.2019.9006525",

language = "English (US)",

series = "Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "513--522",

editor = "Chaitanya Baru and Jun Huan and Latifur Khan and Hu, {Xiaohua Tony} and Ronay Ak and Yuanyuan Tian and Roger Barga and Carlo Zaniolo and Kisung Lee and Ye, {Yanfang Fanny}",

booktitle = "Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019",

note = "2019 IEEE International Conference on Big Data, Big Data 2019 ; Conference date: 09-12-2019 Through 12-12-2019",

}

TY - GEN

T1 - G-Finder

T2 - 2019 IEEE International Conference on Big Data, Big Data 2019

AU - Liu, Lihui

AU - Du, Boxin

AU - Xu, Jiejun

AU - Tong, Hanghang

PY - 2019/12

Y1 - 2019/12

N2 - Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.

AB - Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.

KW - approximate matching

KW - subgraph index

KW - subgraph matching

UR - http://www.scopus.com/inward/record.url?scp=85081290751&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85081290751&partnerID=8YFLogxK

U2 - 10.1109/BigData47090.2019.9006525

DO - 10.1109/BigData47090.2019.9006525

M3 - Conference contribution

T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

SP - 513

EP - 522

BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

A2 - Baru, Chaitanya

A2 - Huan, Jun

A2 - Khan, Latifur

A2 - Hu, Xiaohua Tony

A2 - Ak, Ronay

A2 - Tian, Yuanyuan

A2 - Barga, Roger

A2 - Zaniolo, Carlo

A2 - Lee, Kisung

A2 - Ye, Yanfang Fanny

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 9 December 2019 through 12 December 2019

ER -

G-Finder: Approximate Attributed Subgraph Matching

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this