Detecting and characterizing web bot traffic in a large e-commerce marketplace

Haitao Xu, Zhao Li, Chen Chu, Yuanmi Chen, Yifan Yang, Haifeng Lu, Haining Wang, Angelos Stavrou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.

Original languageEnglish (US)
Title of host publicationComputer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings
EditorsJianying Zhou, Miguel Soriano, Javier Lopez
PublisherSpringer Verlag
Pages143-163
Number of pages21
ISBN (Print)9783319989884
DOIs
StatePublished - Jan 1 2018
Event23rd European Symposium on Research in Computer Security, ESORICS 2018 - Barcelona, Spain
Duration: Sep 3 2018Sep 7 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11099 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd European Symposium on Research in Computer Security, ESORICS 2018
CountrySpain
CityBarcelona
Period9/3/189/7/18

Fingerprint

Electronic Commerce
World Wide Web
Websites
Traffic
Response time (computer systems)
Decision trees
Feature extraction
Profitability
Servers
Internet
Expectation Maximization
Web Server
Traffic Flow
Comparative Analysis
Decision tree
Feature Selection
Response Time
Workload
Profit
Likelihood

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Xu, H., Li, Z., Chu, C., Chen, Y., Yang, Y., Lu, H., ... Stavrou, A. (2018). Detecting and characterizing web bot traffic in a large e-commerce marketplace. In J. Zhou, M. Soriano, & J. Lopez (Eds.), Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings (pp. 143-163). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11099 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-98989-1_8

Detecting and characterizing web bot traffic in a large e-commerce marketplace. / Xu, Haitao; Li, Zhao; Chu, Chen; Chen, Yuanmi; Yang, Yifan; Lu, Haifeng; Wang, Haining; Stavrou, Angelos.

Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings. ed. / Jianying Zhou; Miguel Soriano; Javier Lopez. Springer Verlag, 2018. p. 143-163 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11099 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xu, H, Li, Z, Chu, C, Chen, Y, Yang, Y, Lu, H, Wang, H & Stavrou, A 2018, Detecting and characterizing web bot traffic in a large e-commerce marketplace. in J Zhou, M Soriano & J Lopez (eds), Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11099 LNCS, Springer Verlag, pp. 143-163, 23rd European Symposium on Research in Computer Security, ESORICS 2018, Barcelona, Spain, 9/3/18. https://doi.org/10.1007/978-3-319-98989-1_8
Xu H, Li Z, Chu C, Chen Y, Yang Y, Lu H et al. Detecting and characterizing web bot traffic in a large e-commerce marketplace. In Zhou J, Soriano M, Lopez J, editors, Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings. Springer Verlag. 2018. p. 143-163. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-98989-1_8
Xu, Haitao ; Li, Zhao ; Chu, Chen ; Chen, Yuanmi ; Yang, Yifan ; Lu, Haifeng ; Wang, Haining ; Stavrou, Angelos. / Detecting and characterizing web bot traffic in a large e-commerce marketplace. Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings. editor / Jianying Zhou ; Miguel Soriano ; Javier Lopez. Springer Verlag, 2018. pp. 143-163 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{1ee426975c3d46d2ba6ef5c2d76384c5,
title = "Detecting and characterizing web bot traffic in a large e-commerce marketplace",
abstract = "A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.",
author = "Haitao Xu and Zhao Li and Chen Chu and Yuanmi Chen and Yifan Yang and Haifeng Lu and Haining Wang and Angelos Stavrou",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-98989-1_8",
language = "English (US)",
isbn = "9783319989884",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "143--163",
editor = "Jianying Zhou and Miguel Soriano and Javier Lopez",
booktitle = "Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings",

}

TY - GEN

T1 - Detecting and characterizing web bot traffic in a large e-commerce marketplace

AU - Xu, Haitao

AU - Li, Zhao

AU - Chu, Chen

AU - Chen, Yuanmi

AU - Yang, Yifan

AU - Lu, Haifeng

AU - Wang, Haining

AU - Stavrou, Angelos

PY - 2018/1/1

Y1 - 2018/1/1

N2 - A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.

AB - A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.

UR - http://www.scopus.com/inward/record.url?scp=85051846639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051846639&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-98989-1_8

DO - 10.1007/978-3-319-98989-1_8

M3 - Conference contribution

AN - SCOPUS:85051846639

SN - 9783319989884

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 143

EP - 163

BT - Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings

A2 - Zhou, Jianying

A2 - Soriano, Miguel

A2 - Lopez, Javier

PB - Springer Verlag

ER -