Abstract

Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Data Mining, ICDM 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages607-616
Number of pages10
ISBN (Electronic)9781538691588
DOIs
StatePublished - Dec 27 2018
Event18th IEEE International Conference on Data Mining, ICDM 2018 - Singapore, Singapore
Duration: Nov 17 2018Nov 20 2018

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2018-November
ISSN (Print)1550-4786

Conference

Conference18th IEEE International Conference on Data Mining, ICDM 2018
CountrySingapore
CitySingapore
Period11/17/1811/20/18

Fingerprint

Semantics
Computer software maintenance
Information retrieval
Software engineering
Personnel

Keywords

  • Bug localization
  • Bug report
  • Supervised topic modeling

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Wang, Y., Yao, Y., Tong, H., Huo, X., Li, M., Xu, F., & Lu, J. (2018). Bug Localization via Supervised Topic Modeling. In 2018 IEEE International Conference on Data Mining, ICDM 2018 (pp. 607-616). [8594885] (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2018-November). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2018.00076

Bug Localization via Supervised Topic Modeling. / Wang, Yaojing; Yao, Yuan; Tong, Hanghang; Huo, Xuan; Li, Min; Xu, Feng; Lu, Jian.

2018 IEEE International Conference on Data Mining, ICDM 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 607-616 8594885 (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2018-November).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Y, Yao, Y, Tong, H, Huo, X, Li, M, Xu, F & Lu, J 2018, Bug Localization via Supervised Topic Modeling. in 2018 IEEE International Conference on Data Mining, ICDM 2018., 8594885, Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2018-November, Institute of Electrical and Electronics Engineers Inc., pp. 607-616, 18th IEEE International Conference on Data Mining, ICDM 2018, Singapore, Singapore, 11/17/18. https://doi.org/10.1109/ICDM.2018.00076
Wang Y, Yao Y, Tong H, Huo X, Li M, Xu F et al. Bug Localization via Supervised Topic Modeling. In 2018 IEEE International Conference on Data Mining, ICDM 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 607-616. 8594885. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2018.00076
Wang, Yaojing ; Yao, Yuan ; Tong, Hanghang ; Huo, Xuan ; Li, Min ; Xu, Feng ; Lu, Jian. / Bug Localization via Supervised Topic Modeling. 2018 IEEE International Conference on Data Mining, ICDM 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 607-616 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{507e478d01004fab9e5014924b82db91,
title = "Bug Localization via Supervised Topic Modeling",
abstract = "Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6{\%} improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2{\%} on those bug reports with stack-traces.",
keywords = "Bug localization, Bug report, Supervised topic modeling",
author = "Yaojing Wang and Yuan Yao and Hanghang Tong and Xuan Huo and Min Li and Feng Xu and Jian Lu",
year = "2018",
month = "12",
day = "27",
doi = "10.1109/ICDM.2018.00076",
language = "English (US)",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "607--616",
booktitle = "2018 IEEE International Conference on Data Mining, ICDM 2018",

}

TY - GEN

T1 - Bug Localization via Supervised Topic Modeling

AU - Wang, Yaojing

AU - Yao, Yuan

AU - Tong, Hanghang

AU - Huo, Xuan

AU - Li, Min

AU - Xu, Feng

AU - Lu, Jian

PY - 2018/12/27

Y1 - 2018/12/27

N2 - Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

AB - Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

KW - Bug localization

KW - Bug report

KW - Supervised topic modeling

UR - http://www.scopus.com/inward/record.url?scp=85061393845&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061393845&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2018.00076

DO - 10.1109/ICDM.2018.00076

M3 - Conference contribution

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 607

EP - 616

BT - 2018 IEEE International Conference on Data Mining, ICDM 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -