TY - GEN
T1 - Detecting and characterizing web bot traffic in a large e-commerce marketplace
AU - Xu, Haitao
AU - Li, Zhao
AU - Chu, Chen
AU - Chen, Yuanmi
AU - Yang, Yifan
AU - Lu, Haifeng
AU - Wang, Haining
AU - Stavrou, Angelos
N1 - Funding Information:
Acknowledgement. We would like to thank the anonymous reviewers for their valuable feedback. This work was partially supported by the U.S. NSF grant CNS-1618117 and DARPA XD3 Project HR0011-16-C-0055.
Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.
AB - A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.
UR - http://www.scopus.com/inward/record.url?scp=85051846639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051846639&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-98989-1_8
DO - 10.1007/978-3-319-98989-1_8
M3 - Conference contribution
AN - SCOPUS:85051846639
SN - 9783319989884
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 143
EP - 163
BT - Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings
A2 - Zhou, Jianying
A2 - Soriano, Miguel
A2 - Lopez, Javier
PB - Springer Verlag
T2 - 23rd European Symposium on Research in Computer Security, ESORICS 2018
Y2 - 3 September 2018 through 7 September 2018
ER -