TY - JOUR
T1 - Plan B
T2 - Design Methodology for Cyber-Physical Systems Robust to Timing Failures
AU - Khayatian, Mohammad
AU - Mehrabian, Mohammadreza
AU - Andert, Edward
AU - Grimsley, Reese
AU - Liang, Kyle
AU - Hu, Yi
AU - McCormack, Ian
AU - Joe-Wong, Carlee
AU - Aldrich, Jonathan
AU - Iannucci, Bob
AU - Shrivastava, Aviral
N1 - Funding Information:
This work was partially supported by funding from NIST Award 70NANB19H144, and by National Science Foundation grants CNS 1525855, CPS 1645578, and CPS 1646235.
Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/9/7
Y1 - 2022/9/7
N2 - Many Cyber-Physical Systems (CPS) have timing constraints that must be met by the cyber components (software and the network) to ensure safety. It is a tedious job to check if a CPS meets its timing requirement especially when it is distributed and the software and/or the underlying computing platforms are complex. Furthermore, the system design is brittle since a timing failure can still happen (e.g., network failure, soft error bit flip). In this article, we propose a new design methodology called Plan B where timing constraints of the CPS are monitored at runtime, and a proper backup routine is executed when a timing failure happens to ensure safety. We provide a model on how to express the desired timing behavior using a set of timing constructs in a C/C++ code and how to efficiently monitor them at the runtime. We showcase the effectiveness of our approach by conducting experiments on three case studies: (1) the full software stack for autonomous driving (Apollo), (2) a multi-agent system with 1/10th-scale model robots, and (3) a quadrotor for search and rescue application. We show that the system remains safe and stable even when intentional faults are injected to cause a timing failure. We also demonstrate that the system can achieve graceful degradation when a less extreme timing failure happens.
AB - Many Cyber-Physical Systems (CPS) have timing constraints that must be met by the cyber components (software and the network) to ensure safety. It is a tedious job to check if a CPS meets its timing requirement especially when it is distributed and the software and/or the underlying computing platforms are complex. Furthermore, the system design is brittle since a timing failure can still happen (e.g., network failure, soft error bit flip). In this article, we propose a new design methodology called Plan B where timing constraints of the CPS are monitored at runtime, and a proper backup routine is executed when a timing failure happens to ensure safety. We provide a model on how to express the desired timing behavior using a set of timing constructs in a C/C++ code and how to efficiently monitor them at the runtime. We showcase the effectiveness of our approach by conducting experiments on three case studies: (1) the full software stack for autonomous driving (Apollo), (2) a multi-agent system with 1/10th-scale model robots, and (3) a quadrotor for search and rescue application. We show that the system remains safe and stable even when intentional faults are injected to cause a timing failure. We also demonstrate that the system can achieve graceful degradation when a less extreme timing failure happens.
KW - Cyber-physical systems
KW - time-sensitive systems
KW - worst-case execution time
UR - http://www.scopus.com/inward/record.url?scp=85141050961&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141050961&partnerID=8YFLogxK
U2 - 10.1145/3516449
DO - 10.1145/3516449
M3 - Article
AN - SCOPUS:85141050961
VL - 6
JO - ACM Transactions on Cyber-Physical Systems
JF - ACM Transactions on Cyber-Physical Systems
SN - 2378-962X
IS - 3
M1 - 21
ER -