SyML: Guiding symbolic execution toward vulnerable states through pattern learning

Nicola Ruaro; Kyle Zeng; Lukas Dresel; Mario Polino; Tiffany Bao; Andrea Continella; Stefano Zanero; Christopher Kruegel; Giovanni Vigna

doi:10.1145/3471621.3471865

SyML: Guiding symbolic execution toward vulnerable states through pattern learning

Nicola Ruaro, Kyle Zeng, Lukas Dresel, Mario Polino, Tiffany Bao, Andrea Continella, Stefano Zanero, Christopher Kruegel, Giovanni Vigna

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Exploring many execution paths in a binary program is essential to discover new vulnerabilities. Dynamic Symbolic Execution (DSE) is useful to trigger complex input conditions and enables an accurate exploration of a program while providing extensive crash replayability and semantic insights. However, scaling this type of analysis to complex binaries is difficult. Current methods suffer from the path explosion problem, despite many attempts to mitigate this challenge (e.g., by merging paths when appropriate). Still, in general, this challenge is not yet surmounted, and most bugs discovered through such techniques are shallow. We propose a novel approach to address the path explosion problem: A smart triaging system that leverages supervised machine learning techniques to replicate human expertise, leading to vulnerable path discovery. Our approach monitors the execution traces in vulnerable programs and extracts relevant features - register and memory accesses, function complexity, system calls - to guide the symbolic exploration. We train models to learn the patterns of vulnerable paths from the extracted features, and we leverage their predictions to discover interesting execution paths in new programs. We implement our approach in a tool called SyML, and we evaluate it on the Cyber Grand Challenge (CGC) dataset - a well-known dataset of vulnerable programs - and on 3 real-world Linux binaries. We show that the knowledge collected from the analysis of vulnerable paths, without any explicit prior knowledge about vulnerability patterns, is transferrable to unseen binaries, and leads to outperforming prior work in path prioritization by triggering more, and different, unique vulnerabilities.

Original language	English (US)
Title of host publication	Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021
Publisher	Association for Computing Machinery
Pages	456-468
Number of pages	13
ISBN (Electronic)	9781450390583
DOIs	https://doi.org/10.1145/3471621.3471865
State	Published - Oct 6 2021
Event	24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021 - Virtual, Online, Spain Duration: Oct 6 2021 → Oct 8 2021

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021
Country/Territory	Spain
City	Virtual, Online
Period	10/6/21 → 10/8/21

Keywords

Machine learning
Symbolic execution
Vulnerability discovery

ASJC Scopus subject areas

Software
Human-Computer Interaction
Computer Vision and Pattern Recognition
Computer Networks and Communications

Access to Document

10.1145/3471621.3471865

Cite this

Ruaro, N., Zeng, K., Dresel, L., Polino, M., Bao, T., Continella, A., Zanero, S., Kruegel, C., & Vigna, G. (2021). SyML: Guiding symbolic execution toward vulnerable states through pattern learning. In Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021 (pp. 456-468). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3471621.3471865

SyML: Guiding symbolic execution toward vulnerable states through pattern learning. / Ruaro, Nicola; Zeng, Kyle; Dresel, Lukas et al.
Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021. Association for Computing Machinery, 2021. p. 456-468 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ruaro, N, Zeng, K, Dresel, L, Polino, M, Bao, T, Continella, A, Zanero, S, Kruegel, C & Vigna, G 2021, SyML: Guiding symbolic execution toward vulnerable states through pattern learning. in Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 456-468, 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021, Virtual, Online, Spain, 10/6/21. https://doi.org/10.1145/3471621.3471865

Ruaro N, Zeng K, Dresel L, Polino M, Bao T, Continella A et al. SyML: Guiding symbolic execution toward vulnerable states through pattern learning. In Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021. Association for Computing Machinery. 2021. p. 456-468. (ACM International Conference Proceeding Series). doi: 10.1145/3471621.3471865

@inproceedings{c7072bd3d3f74e27a0f158bf347ddd64,

title = "SyML: Guiding symbolic execution toward vulnerable states through pattern learning",

abstract = "Exploring many execution paths in a binary program is essential to discover new vulnerabilities. Dynamic Symbolic Execution (DSE) is useful to trigger complex input conditions and enables an accurate exploration of a program while providing extensive crash replayability and semantic insights. However, scaling this type of analysis to complex binaries is difficult. Current methods suffer from the path explosion problem, despite many attempts to mitigate this challenge (e.g., by merging paths when appropriate). Still, in general, this challenge is not yet surmounted, and most bugs discovered through such techniques are shallow. We propose a novel approach to address the path explosion problem: A smart triaging system that leverages supervised machine learning techniques to replicate human expertise, leading to vulnerable path discovery. Our approach monitors the execution traces in vulnerable programs and extracts relevant features - register and memory accesses, function complexity, system calls - to guide the symbolic exploration. We train models to learn the patterns of vulnerable paths from the extracted features, and we leverage their predictions to discover interesting execution paths in new programs. We implement our approach in a tool called SyML, and we evaluate it on the Cyber Grand Challenge (CGC) dataset - a well-known dataset of vulnerable programs - and on 3 real-world Linux binaries. We show that the knowledge collected from the analysis of vulnerable paths, without any explicit prior knowledge about vulnerability patterns, is transferrable to unseen binaries, and leads to outperforming prior work in path prioritization by triggering more, and different, unique vulnerabilities.",

keywords = "Machine learning, Symbolic execution, Vulnerability discovery",

author = "Nicola Ruaro and Kyle Zeng and Lukas Dresel and Mario Polino and Tiffany Bao and Andrea Continella and Stefano Zanero and Christopher Kruegel and Giovanni Vigna",

note = "Publisher Copyright: {\textcopyright} 2021 Owner/Author.; 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021 ; Conference date: 06-10-2021 Through 08-10-2021",

year = "2021",

month = oct,

day = "6",

doi = "10.1145/3471621.3471865",

language = "English (US)",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "456--468",

booktitle = "Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021",

}

TY - GEN

T1 - SyML

T2 - 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021

AU - Ruaro, Nicola

AU - Zeng, Kyle

AU - Dresel, Lukas

AU - Polino, Mario

AU - Bao, Tiffany

AU - Continella, Andrea

AU - Zanero, Stefano

AU - Kruegel, Christopher

AU - Vigna, Giovanni

PY - 2021/10/6

Y1 - 2021/10/6

N2 - Exploring many execution paths in a binary program is essential to discover new vulnerabilities. Dynamic Symbolic Execution (DSE) is useful to trigger complex input conditions and enables an accurate exploration of a program while providing extensive crash replayability and semantic insights. However, scaling this type of analysis to complex binaries is difficult. Current methods suffer from the path explosion problem, despite many attempts to mitigate this challenge (e.g., by merging paths when appropriate). Still, in general, this challenge is not yet surmounted, and most bugs discovered through such techniques are shallow. We propose a novel approach to address the path explosion problem: A smart triaging system that leverages supervised machine learning techniques to replicate human expertise, leading to vulnerable path discovery. Our approach monitors the execution traces in vulnerable programs and extracts relevant features - register and memory accesses, function complexity, system calls - to guide the symbolic exploration. We train models to learn the patterns of vulnerable paths from the extracted features, and we leverage their predictions to discover interesting execution paths in new programs. We implement our approach in a tool called SyML, and we evaluate it on the Cyber Grand Challenge (CGC) dataset - a well-known dataset of vulnerable programs - and on 3 real-world Linux binaries. We show that the knowledge collected from the analysis of vulnerable paths, without any explicit prior knowledge about vulnerability patterns, is transferrable to unseen binaries, and leads to outperforming prior work in path prioritization by triggering more, and different, unique vulnerabilities.

AB - Exploring many execution paths in a binary program is essential to discover new vulnerabilities. Dynamic Symbolic Execution (DSE) is useful to trigger complex input conditions and enables an accurate exploration of a program while providing extensive crash replayability and semantic insights. However, scaling this type of analysis to complex binaries is difficult. Current methods suffer from the path explosion problem, despite many attempts to mitigate this challenge (e.g., by merging paths when appropriate). Still, in general, this challenge is not yet surmounted, and most bugs discovered through such techniques are shallow. We propose a novel approach to address the path explosion problem: A smart triaging system that leverages supervised machine learning techniques to replicate human expertise, leading to vulnerable path discovery. Our approach monitors the execution traces in vulnerable programs and extracts relevant features - register and memory accesses, function complexity, system calls - to guide the symbolic exploration. We train models to learn the patterns of vulnerable paths from the extracted features, and we leverage their predictions to discover interesting execution paths in new programs. We implement our approach in a tool called SyML, and we evaluate it on the Cyber Grand Challenge (CGC) dataset - a well-known dataset of vulnerable programs - and on 3 real-world Linux binaries. We show that the knowledge collected from the analysis of vulnerable paths, without any explicit prior knowledge about vulnerability patterns, is transferrable to unseen binaries, and leads to outperforming prior work in path prioritization by triggering more, and different, unique vulnerabilities.

KW - Machine learning

KW - Symbolic execution

KW - Vulnerability discovery

UR - http://www.scopus.com/inward/record.url?scp=85117733433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85117733433&partnerID=8YFLogxK

U2 - 10.1145/3471621.3471865

DO - 10.1145/3471621.3471865

M3 - Conference contribution

AN - SCOPUS:85117733433

T3 - ACM International Conference Proceeding Series

SP - 456

EP - 468

BT - Proceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021

PB - Association for Computing Machinery

Y2 - 6 October 2021 through 8 October 2021

ER -

SyML: Guiding symbolic execution toward vulnerable states through pattern learning

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this