SyML: Guiding symbolic execution toward vulnerable states through pattern learning

Nicola Ruaro, Kyle Zeng, Lukas Dresel, Mario Polino, Tiffany Bao, Andrea Continella, Stefano Zanero, Christopher Kruegel, Giovanni Vigna

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Exploring many execution paths in a binary program is essential to discover new vulnerabilities. Dynamic Symbolic Execution (DSE) is useful to trigger complex input conditions and enables an accurate exploration of a program while providing extensive crash replayability and semantic insights. However, scaling this type of analysis to complex binaries is difficult. Current methods suffer from the path explosion problem, despite many attempts to mitigate this challenge (e.g., by merging paths when appropriate). Still, in general, this challenge is not yet surmounted, and most bugs discovered through such techniques are shallow. We propose a novel approach to address the path explosion problem: A smart triaging system that leverages supervised machine learning techniques to replicate human expertise, leading to vulnerable path discovery. Our approach monitors the execution traces in vulnerable programs and extracts relevant features - register and memory accesses, function complexity, system calls - to guide the symbolic exploration. We train models to learn the patterns of vulnerable paths from the extracted features, and we leverage their predictions to discover interesting execution paths in new programs. We implement our approach in a tool called SyML, and we evaluate it on the Cyber Grand Challenge (CGC) dataset - a well-known dataset of vulnerable programs - and on 3 real-world Linux binaries. We show that the knowledge collected from the analysis of vulnerable paths, without any explicit prior knowledge about vulnerability patterns, is transferrable to unseen binaries, and leads to outperforming prior work in path prioritization by triggering more, and different, unique vulnerabilities.

Original languageEnglish (US)
Title of host publicationProceedings of 2021 24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021
PublisherAssociation for Computing Machinery
Pages456-468
Number of pages13
ISBN (Electronic)9781450390583
DOIs
StatePublished - Oct 6 2021
Externally publishedYes
Event24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021 - Virtual, Online, Spain
Duration: Oct 6 2021Oct 8 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference24th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2021
Country/TerritorySpain
CityVirtual, Online
Period10/6/2110/8/21

Keywords

  • Machine learning
  • Symbolic execution
  • Vulnerability discovery

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'SyML: Guiding symbolic execution toward vulnerable states through pattern learning'. Together they form a unique fingerprint.

Cite this