BYTEWEIGHT: Learning to recognize functions in binary code

Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, David Brumley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

68 Citations (Scopus)

Abstract

Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2, 200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2, 200 binaries, we found that BYTE-WEIGHT missed 44, 621 functions in comparison with the 266, 672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459, 247 functions, BYTEWEIGHT misidentified only 43, 992 functions.

Original languageEnglish (US)
Title of host publicationProceedings of the 23rd USENIX Security Symposium
PublisherUSENIX Association
Pages845-860
Number of pages16
ISBN (Electronic)9781931971157
StatePublished - Jan 1 2014
Event23rd USENIX Security Symposium - San Diego, United States
Duration: Aug 20 2014Aug 22 2014

Publication series

NameProceedings of the 23rd USENIX Security Symposium

Conference

Conference23rd USENIX Security Symposium
CountryUnited States
CitySan Diego
Period8/20/148/22/14

Fingerprint

Binary codes
Reverse engineering
Flow control

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Safety, Risk, Reliability and Quality

Cite this

Bao, T., Burket, J., Woo, M., Turner, R., & Brumley, D. (2014). BYTEWEIGHT: Learning to recognize functions in binary code. In Proceedings of the 23rd USENIX Security Symposium (pp. 845-860). (Proceedings of the 23rd USENIX Security Symposium). USENIX Association.

BYTEWEIGHT : Learning to recognize functions in binary code. / Bao, Tiffany; Burket, Jonathan; Woo, Maverick; Turner, Rafael; Brumley, David.

Proceedings of the 23rd USENIX Security Symposium. USENIX Association, 2014. p. 845-860 (Proceedings of the 23rd USENIX Security Symposium).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bao, T, Burket, J, Woo, M, Turner, R & Brumley, D 2014, BYTEWEIGHT: Learning to recognize functions in binary code. in Proceedings of the 23rd USENIX Security Symposium. Proceedings of the 23rd USENIX Security Symposium, USENIX Association, pp. 845-860, 23rd USENIX Security Symposium, San Diego, United States, 8/20/14.
Bao T, Burket J, Woo M, Turner R, Brumley D. BYTEWEIGHT: Learning to recognize functions in binary code. In Proceedings of the 23rd USENIX Security Symposium. USENIX Association. 2014. p. 845-860. (Proceedings of the 23rd USENIX Security Symposium).
Bao, Tiffany ; Burket, Jonathan ; Woo, Maverick ; Turner, Rafael ; Brumley, David. / BYTEWEIGHT : Learning to recognize functions in binary code. Proceedings of the 23rd USENIX Security Symposium. USENIX Association, 2014. pp. 845-860 (Proceedings of the 23rd USENIX Security Symposium).
@inproceedings{66f15b6d944541eb828dec4dea0e204a,
title = "BYTEWEIGHT: Learning to recognize functions in binary code",
abstract = "Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2, 200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2, 200 binaries, we found that BYTE-WEIGHT missed 44, 621 functions in comparison with the 266, 672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459, 247 functions, BYTEWEIGHT misidentified only 43, 992 functions.",
author = "Tiffany Bao and Jonathan Burket and Maverick Woo and Rafael Turner and David Brumley",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings of the 23rd USENIX Security Symposium",
publisher = "USENIX Association",
pages = "845--860",
booktitle = "Proceedings of the 23rd USENIX Security Symposium",

}

TY - GEN

T1 - BYTEWEIGHT

T2 - Learning to recognize functions in binary code

AU - Bao, Tiffany

AU - Burket, Jonathan

AU - Woo, Maverick

AU - Turner, Rafael

AU - Brumley, David

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2, 200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2, 200 binaries, we found that BYTE-WEIGHT missed 44, 621 functions in comparison with the 266, 672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459, 247 functions, BYTEWEIGHT misidentified only 43, 992 functions.

AB - Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2, 200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2, 200 binaries, we found that BYTE-WEIGHT missed 44, 621 functions in comparison with the 266, 672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459, 247 functions, BYTEWEIGHT misidentified only 43, 992 functions.

UR - http://www.scopus.com/inward/record.url?scp=85076265022&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076265022&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85076265022

T3 - Proceedings of the 23rd USENIX Security Symposium

SP - 845

EP - 860

BT - Proceedings of the 23rd USENIX Security Symposium

PB - USENIX Association

ER -