Preprocessing uncertain user profile data

Inferring user's actual age from ages of the user's neighbors

Sung Hyuk Park, Sang Han, Soon Young Huh, Hojin Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

User profile data (for example, age and sex) is usually self-reported by users, so it is prone to human errors or biases. For example, a user can be reluctant to provide a company with private information such as his/her actual age upon subscription, thus the user either does not fill in the age column or put in some random numbers to avoid unwanted privacy intrusion. However, inaccurate or uncertain user profile data undermines the integrity of a company's marketing or operational intelligence. Targeting customers based on uncertain user profile data will not as effective as targeting customers based on accurate user profile data. Thus companies perform preprocessing on user profile data as part of effort to maintain the accuracy of their user profile data. This paper presents a study of preprocessing uncertain user profile data based on a proposed simple collaborative learning algorithm. We demonstrate that a user's accurate profile information can be inferred from profile information of the user's social network neighbors. Particularly, we address the issue of how a communication service company can verify whether a user's reported age is true or not. We implement a simple collaborative learning algorithm using mobile network data. The dataset contains anonymized user data from a large Korean mobile company, capturing 174,071 users' demographic profiles and their communication histories. To construct a mobile social network among users, we collect 3G voice call histories including 561,787 unique call receivers who belong to the same service carrier. Results reveal that the prediction accuracy of the proposed method based on voice network data is 97% which is very high compared to 53%, the best accuracy by among competing methods and indicates that our method effectively detects users with great discrepancy between self-reported age and actual age.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Data Engineering
Pages1619-1624
Number of pages6
DOIs
StatePublished - 2009
Externally publishedYes
Event25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China
Duration: Mar 29 2009Apr 2 2009

Other

Other25th IEEE International Conference on Data Engineering, ICDE 2009
CountryChina
CityShanghai
Period3/29/094/2/09

Fingerprint

Industry
Learning algorithms
Communication
Marketing
Wireless networks

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Park, S. H., Han, S., Huh, S. Y., & Lee, H. (2009). Preprocessing uncertain user profile data: Inferring user's actual age from ages of the user's neighbors. In Proceedings - International Conference on Data Engineering (pp. 1619-1624). [4812584] https://doi.org/10.1109/ICDE.2009.154

Preprocessing uncertain user profile data : Inferring user's actual age from ages of the user's neighbors. / Park, Sung Hyuk; Han, Sang; Huh, Soon Young; Lee, Hojin.

Proceedings - International Conference on Data Engineering. 2009. p. 1619-1624 4812584.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Park, SH, Han, S, Huh, SY & Lee, H 2009, Preprocessing uncertain user profile data: Inferring user's actual age from ages of the user's neighbors. in Proceedings - International Conference on Data Engineering., 4812584, pp. 1619-1624, 25th IEEE International Conference on Data Engineering, ICDE 2009, Shanghai, China, 3/29/09. https://doi.org/10.1109/ICDE.2009.154
Park SH, Han S, Huh SY, Lee H. Preprocessing uncertain user profile data: Inferring user's actual age from ages of the user's neighbors. In Proceedings - International Conference on Data Engineering. 2009. p. 1619-1624. 4812584 https://doi.org/10.1109/ICDE.2009.154
Park, Sung Hyuk ; Han, Sang ; Huh, Soon Young ; Lee, Hojin. / Preprocessing uncertain user profile data : Inferring user's actual age from ages of the user's neighbors. Proceedings - International Conference on Data Engineering. 2009. pp. 1619-1624
@inproceedings{ffe453b6e89b4de6b3cef0b0f69d0360,
title = "Preprocessing uncertain user profile data: Inferring user's actual age from ages of the user's neighbors",
abstract = "User profile data (for example, age and sex) is usually self-reported by users, so it is prone to human errors or biases. For example, a user can be reluctant to provide a company with private information such as his/her actual age upon subscription, thus the user either does not fill in the age column or put in some random numbers to avoid unwanted privacy intrusion. However, inaccurate or uncertain user profile data undermines the integrity of a company's marketing or operational intelligence. Targeting customers based on uncertain user profile data will not as effective as targeting customers based on accurate user profile data. Thus companies perform preprocessing on user profile data as part of effort to maintain the accuracy of their user profile data. This paper presents a study of preprocessing uncertain user profile data based on a proposed simple collaborative learning algorithm. We demonstrate that a user's accurate profile information can be inferred from profile information of the user's social network neighbors. Particularly, we address the issue of how a communication service company can verify whether a user's reported age is true or not. We implement a simple collaborative learning algorithm using mobile network data. The dataset contains anonymized user data from a large Korean mobile company, capturing 174,071 users' demographic profiles and their communication histories. To construct a mobile social network among users, we collect 3G voice call histories including 561,787 unique call receivers who belong to the same service carrier. Results reveal that the prediction accuracy of the proposed method based on voice network data is 97{\%} which is very high compared to 53{\%}, the best accuracy by among competing methods and indicates that our method effectively detects users with great discrepancy between self-reported age and actual age.",
author = "Park, {Sung Hyuk} and Sang Han and Huh, {Soon Young} and Hojin Lee",
year = "2009",
doi = "10.1109/ICDE.2009.154",
language = "English (US)",
isbn = "9780769535456",
pages = "1619--1624",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Preprocessing uncertain user profile data

T2 - Inferring user's actual age from ages of the user's neighbors

AU - Park, Sung Hyuk

AU - Han, Sang

AU - Huh, Soon Young

AU - Lee, Hojin

PY - 2009

Y1 - 2009

N2 - User profile data (for example, age and sex) is usually self-reported by users, so it is prone to human errors or biases. For example, a user can be reluctant to provide a company with private information such as his/her actual age upon subscription, thus the user either does not fill in the age column or put in some random numbers to avoid unwanted privacy intrusion. However, inaccurate or uncertain user profile data undermines the integrity of a company's marketing or operational intelligence. Targeting customers based on uncertain user profile data will not as effective as targeting customers based on accurate user profile data. Thus companies perform preprocessing on user profile data as part of effort to maintain the accuracy of their user profile data. This paper presents a study of preprocessing uncertain user profile data based on a proposed simple collaborative learning algorithm. We demonstrate that a user's accurate profile information can be inferred from profile information of the user's social network neighbors. Particularly, we address the issue of how a communication service company can verify whether a user's reported age is true or not. We implement a simple collaborative learning algorithm using mobile network data. The dataset contains anonymized user data from a large Korean mobile company, capturing 174,071 users' demographic profiles and their communication histories. To construct a mobile social network among users, we collect 3G voice call histories including 561,787 unique call receivers who belong to the same service carrier. Results reveal that the prediction accuracy of the proposed method based on voice network data is 97% which is very high compared to 53%, the best accuracy by among competing methods and indicates that our method effectively detects users with great discrepancy between self-reported age and actual age.

AB - User profile data (for example, age and sex) is usually self-reported by users, so it is prone to human errors or biases. For example, a user can be reluctant to provide a company with private information such as his/her actual age upon subscription, thus the user either does not fill in the age column or put in some random numbers to avoid unwanted privacy intrusion. However, inaccurate or uncertain user profile data undermines the integrity of a company's marketing or operational intelligence. Targeting customers based on uncertain user profile data will not as effective as targeting customers based on accurate user profile data. Thus companies perform preprocessing on user profile data as part of effort to maintain the accuracy of their user profile data. This paper presents a study of preprocessing uncertain user profile data based on a proposed simple collaborative learning algorithm. We demonstrate that a user's accurate profile information can be inferred from profile information of the user's social network neighbors. Particularly, we address the issue of how a communication service company can verify whether a user's reported age is true or not. We implement a simple collaborative learning algorithm using mobile network data. The dataset contains anonymized user data from a large Korean mobile company, capturing 174,071 users' demographic profiles and their communication histories. To construct a mobile social network among users, we collect 3G voice call histories including 561,787 unique call receivers who belong to the same service carrier. Results reveal that the prediction accuracy of the proposed method based on voice network data is 97% which is very high compared to 53%, the best accuracy by among competing methods and indicates that our method effectively detects users with great discrepancy between self-reported age and actual age.

UR - http://www.scopus.com/inward/record.url?scp=67649663874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649663874&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2009.154

DO - 10.1109/ICDE.2009.154

M3 - Conference contribution

SN - 9780769535456

SP - 1619

EP - 1624

BT - Proceedings - International Conference on Data Engineering

ER -