Using undiagnosed data to enhance computerized breast cancer analysis with a three stage data labeling method

Wenqing Sun, Tzu Liang Tseng, Bin Zheng, Flemin Lure, Teresa Wu, Giulio Francia, Sergio Cabrera, Jianying Zhang, Miguel Vélez-Reyesv, Wei Qian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

A novel three stage Semi-Supervised Learning (SSL) approach is proposed for improving performance of computerized breast cancer analysis with undiagnosed data. These three stages include: (1) Instance selection, which is barely used in SSL or computerized cancer analysis systems, (2) Feature selection and (3) Newly designed Divide Co-traininga' data labeling method. 379 suspicious early breast cancer area samples from 121 mammograms were used in our research. Our proposed Divide Co-traininga' method is able to generate two classifiers through split original diagnosed dataset (labeled data), and label the undiagnosed data (unlabeled data) when they reached an agreement. The highest AUC (Area Under Curve, also called Az value) using labeled data only was 0.832 and it increased to 0.889 when undiagnosed data were included. The results indicate instance selection module could eliminate untypical data or noise data and enhance the following semi-supervised data labeling performance. Based on analyzing different data sizes, it can be observed that the AUC and accuracy go higher with the increase of either diagnosed data or undiagnosed data, and reach the best improvement (ΔAUC = 0.078, ΔAccuracy = 7.6%) with 40 of labeled data and 300 of unlabeled data.

Original languageEnglish (US)
Title of host publicationMedical Imaging 2014
Subtitle of host publicationComputer-Aided Diagnosis
PublisherSPIE
ISBN (Print)9780819498281
DOIs
StatePublished - Jan 1 2014
EventMedical Imaging 2014: Computer-Aided Diagnosis - San Diego, CA, United States
Duration: Feb 18 2014Feb 20 2014

Publication series

NameProgress in Biomedical Optics and Imaging - Proceedings of SPIE
Volume9035
ISSN (Print)1605-7422

Other

OtherMedical Imaging 2014: Computer-Aided Diagnosis
CountryUnited States
CitySan Diego, CA
Period2/18/142/20/14

Keywords

  • Computerized breast cancer analysis
  • Semi-supervised learning
  • Undiagnosed data

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Biomaterials
  • Atomic and Molecular Physics, and Optics
  • Radiology Nuclear Medicine and imaging

Fingerprint Dive into the research topics of 'Using undiagnosed data to enhance computerized breast cancer analysis with a three stage data labeling method'. Together they form a unique fingerprint.

  • Cite this

    Sun, W., Tseng, T. L., Zheng, B., Lure, F., Wu, T., Francia, G., Cabrera, S., Zhang, J., Vélez-Reyesv, M., & Qian, W. (2014). Using undiagnosed data to enhance computerized breast cancer analysis with a three stage data labeling method. In Medical Imaging 2014: Computer-Aided Diagnosis [90350T] (Progress in Biomedical Optics and Imaging - Proceedings of SPIE; Vol. 9035). SPIE. https://doi.org/10.1117/12.2043708