Real vs. simulated: Questions on the capability of simulated datasets on building fault detection for energy efficiency from a data-driven perspective

Jiajing Huang; Jin Wen; Hyunsoo Yoon; Ojas Pradhan; Teresa Wu; Zheng O'Neill; Kasim Selcuk Candan

doi:10.1016/j.enbuild.2022.111872

Real vs. simulated: Questions on the capability of simulated datasets on building fault detection for energy efficiency from a data-driven perspective

Jiajing Huang, Jin Wen, Hyunsoo Yoon, Ojas Pradhan, Teresa Wu, Zheng O'Neill, Kasim Selcuk Candan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

Literature on building Automatic Fault Detection and Diagnosis (AFDD) mainly focuses on simulated system data due to high expenses and difficulties of obtaining and analyzing real building data. There is a lack of validation on performances and scalabilities of data-driven AFDD approaches using simulated data and how it compares to that from real building data. In this study, we conduct two sets of experiments to seek answers to this question. We first evaluate data-driven fault detection strategies on real and simulated building data separately. We observe that the fault detection performances are not affected by fault detection strategies, sizes of training data, and the number of cross-validation folds when training and blind test data come from the same data source, namely, simulated or real building data. Next, we conduct a cross-dataset study, that is, develop the model using simulated data and tested on real building data. The results indicate the model trained on simulated data is not generalized to be applied for real building data for fault detection. Kolmogorov-Smirnov Test is conducted to confirm that there exist statistical differences between the simulated and real building data and identify a subset of features with similarities between the two datasets. Using the subset of the feature, cross-dataset experiments show fault detection improvements on most fault cases. We conclude that even if the system produces simulated data with the same fault symptoms from physical analysis perspectives, not all features from simulated datasets may not be beneficial for AFDD but only a subset of features contains valuable information from a machine learning perspective.

Original language	English (US)
Article number	111872
Journal	Energy and Buildings
Volume	259
DOIs	https://doi.org/10.1016/j.enbuild.2022.111872
State	Published - Mar 15 2022

Keywords

Building AFDD
Machine learning
Real
Similarity
Simulated

ASJC Scopus subject areas

Civil and Structural Engineering
Building and Construction
Mechanical Engineering
Electrical and Electronic Engineering

Access to Document

10.1016/j.enbuild.2022.111872

Cite this

@article{e7629a49ea774b4dad1e839087480ea0,

title = "Real vs. simulated: Questions on the capability of simulated datasets on building fault detection for energy efficiency from a data-driven perspective",

abstract = "Literature on building Automatic Fault Detection and Diagnosis (AFDD) mainly focuses on simulated system data due to high expenses and difficulties of obtaining and analyzing real building data. There is a lack of validation on performances and scalabilities of data-driven AFDD approaches using simulated data and how it compares to that from real building data. In this study, we conduct two sets of experiments to seek answers to this question. We first evaluate data-driven fault detection strategies on real and simulated building data separately. We observe that the fault detection performances are not affected by fault detection strategies, sizes of training data, and the number of cross-validation folds when training and blind test data come from the same data source, namely, simulated or real building data. Next, we conduct a cross-dataset study, that is, develop the model using simulated data and tested on real building data. The results indicate the model trained on simulated data is not generalized to be applied for real building data for fault detection. Kolmogorov-Smirnov Test is conducted to confirm that there exist statistical differences between the simulated and real building data and identify a subset of features with similarities between the two datasets. Using the subset of the feature, cross-dataset experiments show fault detection improvements on most fault cases. We conclude that even if the system produces simulated data with the same fault symptoms from physical analysis perspectives, not all features from simulated datasets may not be beneficial for AFDD but only a subset of features contains valuable information from a machine learning perspective.",

keywords = "Building AFDD, Machine learning, Real, Similarity, Simulated",

author = "Jiajing Huang and Jin Wen and Hyunsoo Yoon and Ojas Pradhan and Teresa Wu and Zheng O'Neill and {Selcuk Candan}, Kasim",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2022",

month = mar,

day = "15",

doi = "10.1016/j.enbuild.2022.111872",

language = "English (US)",

volume = "259",

journal = "Energy and Buildings",

issn = "0378-7788",

publisher = "Elsevier BV",

}

TY - JOUR

T1 - Real vs. simulated

T2 - Questions on the capability of simulated datasets on building fault detection for energy efficiency from a data-driven perspective

AU - Huang, Jiajing

AU - Wen, Jin

AU - Yoon, Hyunsoo

AU - Pradhan, Ojas

AU - Wu, Teresa

AU - O'Neill, Zheng

AU - Selcuk Candan, Kasim

PY - 2022/3/15

Y1 - 2022/3/15

N2 - Literature on building Automatic Fault Detection and Diagnosis (AFDD) mainly focuses on simulated system data due to high expenses and difficulties of obtaining and analyzing real building data. There is a lack of validation on performances and scalabilities of data-driven AFDD approaches using simulated data and how it compares to that from real building data. In this study, we conduct two sets of experiments to seek answers to this question. We first evaluate data-driven fault detection strategies on real and simulated building data separately. We observe that the fault detection performances are not affected by fault detection strategies, sizes of training data, and the number of cross-validation folds when training and blind test data come from the same data source, namely, simulated or real building data. Next, we conduct a cross-dataset study, that is, develop the model using simulated data and tested on real building data. The results indicate the model trained on simulated data is not generalized to be applied for real building data for fault detection. Kolmogorov-Smirnov Test is conducted to confirm that there exist statistical differences between the simulated and real building data and identify a subset of features with similarities between the two datasets. Using the subset of the feature, cross-dataset experiments show fault detection improvements on most fault cases. We conclude that even if the system produces simulated data with the same fault symptoms from physical analysis perspectives, not all features from simulated datasets may not be beneficial for AFDD but only a subset of features contains valuable information from a machine learning perspective.

AB - Literature on building Automatic Fault Detection and Diagnosis (AFDD) mainly focuses on simulated system data due to high expenses and difficulties of obtaining and analyzing real building data. There is a lack of validation on performances and scalabilities of data-driven AFDD approaches using simulated data and how it compares to that from real building data. In this study, we conduct two sets of experiments to seek answers to this question. We first evaluate data-driven fault detection strategies on real and simulated building data separately. We observe that the fault detection performances are not affected by fault detection strategies, sizes of training data, and the number of cross-validation folds when training and blind test data come from the same data source, namely, simulated or real building data. Next, we conduct a cross-dataset study, that is, develop the model using simulated data and tested on real building data. The results indicate the model trained on simulated data is not generalized to be applied for real building data for fault detection. Kolmogorov-Smirnov Test is conducted to confirm that there exist statistical differences between the simulated and real building data and identify a subset of features with similarities between the two datasets. Using the subset of the feature, cross-dataset experiments show fault detection improvements on most fault cases. We conclude that even if the system produces simulated data with the same fault symptoms from physical analysis perspectives, not all features from simulated datasets may not be beneficial for AFDD but only a subset of features contains valuable information from a machine learning perspective.

KW - Building AFDD

KW - Machine learning

KW - Real

KW - Similarity

KW - Simulated

UR - http://www.scopus.com/inward/record.url?scp=85123865522&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85123865522&partnerID=8YFLogxK

U2 - 10.1016/j.enbuild.2022.111872

DO - 10.1016/j.enbuild.2022.111872

M3 - Article

AN - SCOPUS:85123865522

SN - 0378-7788

VL - 259

JO - Energy and Buildings

JF - Energy and Buildings

M1 - 111872

ER -

Real vs. simulated: Questions on the capability of simulated datasets on building fault detection for energy efficiency from a data-driven perspective

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this