TY - JOUR
T1 - Unraveled — A semi-synthetic dataset for Advanced Persistent Threats
AU - Myneni, Sowmya
AU - Jha, Kritshekhar
AU - Sabur, Abdulhakim
AU - Agrawal, Garima
AU - Deng, Yuli
AU - Chowdhary, Ankur
AU - Huang, Dijiang
N1 - Funding Information:
Dr. Dijiang Huang received his B.S. degree from Beijing University of Posts and Telecommunications, Beijing, China, and the M.S. and Ph.D. degrees from the University of Missouri Kansas City, Kansas City, MO, USA, 1995, 2001, and 2004, respectively. He is an Associate Professor with the School of Computing Informatics and Decision System Engineering, Arizona State University, Tempe, AZ, USA. His research interests include computer networking, security, and privacy. He is an Associate Editor of the Journal of Network and System Management (JNSM) and an Editor of the IEEE COMMUNICATIONS SURVEYS AND TUTORIALS. He has served as an organizer for many international conferences and workshops. His research was supported by the NSF, ONR, ARO, NATO, and Consortium of Embedded System (CES). He was the recipient of the ONR Young Investigator Program (YIP) Award.
Funding Information:
We would like to thank the DevilSec team, specifically, Nathan Smith, and Austin Ballard for carrying out the planned APT attack and sharing their professional expertise to make more realistic. All authors are thankful to our research grant sponsor, Arizona State University Graduate and Professional Student Association (ASU GPSA).
Publisher Copyright:
© 2023
PY - 2023/5
Y1 - 2023/5
N2 - Unraveled is a novel cybersecurity dataset capturing Advanced Persistent Threat (APT) attacks not available in the public domain. Existing cybersecurity datasets lack coherent information about sophisticated and persistent cyber-attack features, including attack planning and deployment, stealthiness of the attacker(s), longer dorm period between attack activities, etc. Our APT attack scenario in Unraveled is implemented on a real network system established on a cloud platform to emulate an organization's network system. The new dataset provides a comprehensive network flow and host-level log information about the normal user(s) traffic and the cyber attacks traffic. To emulate realistic network traffic scenarios, Unraveled also includes attacks at different skills reflecting a typical organization's threat posture, and by utilizing APT attack information from one of the well-known APT attack databases, i.e., MITRE's APT-group database. Furthermore, we design and develop an Employee Behavior Generation (EBG) model to emulate multiple normal employees’ traffic and activities during a 6-week time period based on their pre-defined business functions. Using well-known machine learning models for anomaly detection, we show that the APT attack activities in Unraveled are hardly detected, indicating the need for more effective solutions that are based on datasets representing real world APT attacks.
AB - Unraveled is a novel cybersecurity dataset capturing Advanced Persistent Threat (APT) attacks not available in the public domain. Existing cybersecurity datasets lack coherent information about sophisticated and persistent cyber-attack features, including attack planning and deployment, stealthiness of the attacker(s), longer dorm period between attack activities, etc. Our APT attack scenario in Unraveled is implemented on a real network system established on a cloud platform to emulate an organization's network system. The new dataset provides a comprehensive network flow and host-level log information about the normal user(s) traffic and the cyber attacks traffic. To emulate realistic network traffic scenarios, Unraveled also includes attacks at different skills reflecting a typical organization's threat posture, and by utilizing APT attack information from one of the well-known APT attack databases, i.e., MITRE's APT-group database. Furthermore, we design and develop an Employee Behavior Generation (EBG) model to emulate multiple normal employees’ traffic and activities during a 6-week time period based on their pre-defined business functions. Using well-known machine learning models for anomaly detection, we show that the APT attack activities in Unraveled are hardly detected, indicating the need for more effective solutions that are based on datasets representing real world APT attacks.
KW - Advanced Persistent Threats
KW - Dataset
KW - Threat detection
UR - http://www.scopus.com/inward/record.url?scp=85150027764&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150027764&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2023.109688
DO - 10.1016/j.comnet.2023.109688
M3 - Article
AN - SCOPUS:85150027764
SN - 1389-1286
VL - 227
JO - Computer Networks
JF - Computer Networks
M1 - 109688
ER -