TY - GEN
T1 - Diagnosing Concept Drift with Visual Analytics
AU - Yang, Weikai
AU - Li, Zhen
AU - Liu, Mengchen
AU - Lu, Yafeng
AU - Cao, Kelei
AU - MacIejewski, Ross
AU - Liu, Shixia
N1 - Funding Information:
This research was funded by the National Key R&D Program of China (No.s 2018YFB1004300, 2019YFB1405703), the National Natural Science Foundation of China (No.s 61761136020, 61672307, 61672308, 61872389), and TC190A4DA/3. Work by Maciejewski was partially sponsored by the U.S. National Science Foundation award number 1939725.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate. While a variety of automated methods have been developed to identify when concept drift occurs, there is limited support for analysts who need to understand and correct their models when drift is detected. In this paper, we present a visual analytics method, DriftVis, to support model builders and analysts in the identification and correction of concept drift in streaming data. DriftVis combines a distribution-based drift detection method with a streaming scatterplot to support the analysis of drift caused by the distribution changes of data streams and to explore the impact of these changes on the model's accuracy. A quantitative experiment and two case studies on weather prediction and text classification have been conducted to demonstrate our proposed tool and illustrate how visual analytics can be used to support the detection, examination, and correction of concept drift.
AB - Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate. While a variety of automated methods have been developed to identify when concept drift occurs, there is limited support for analysts who need to understand and correct their models when drift is detected. In this paper, we present a visual analytics method, DriftVis, to support model builders and analysts in the identification and correction of concept drift in streaming data. DriftVis combines a distribution-based drift detection method with a streaming scatterplot to support the analysis of drift caused by the distribution changes of data streams and to explore the impact of these changes on the model's accuracy. A quantitative experiment and two case studies on weather prediction and text classification have been conducted to demonstrate our proposed tool and illustrate how visual analytics can be used to support the detection, examination, and correction of concept drift.
KW - Concept drift
KW - change detection
KW - scatterplot
KW - streaming data
KW - t-SNE.
UR - http://www.scopus.com/inward/record.url?scp=85095477279&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095477279&partnerID=8YFLogxK
U2 - 10.1109/VAST50239.2020.00007
DO - 10.1109/VAST50239.2020.00007
M3 - Conference contribution
AN - SCOPUS:85095477279
T3 - Proceedings - 2020 IEEE Conference on Visual Analytics Science and Technology, VAST 2020
SP - 12
EP - 23
BT - Proceedings - 2020 IEEE Conference on Visual Analytics Science and Technology, VAST 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE Conference on Visual Analytics Science and Technology, VAST 2020
Y2 - 25 October 2020 through 30 October 2020
ER -