TY - GEN
T1 - Preprocessing uncertain user profile data
T2 - 25th IEEE International Conference on Data Engineering, ICDE 2009
AU - Park, Sung Hyuk
AU - Han, Sang Pil
AU - Huh, Soon Young
AU - Lee, Hojin
PY - 2009/7/8
Y1 - 2009/7/8
N2 - User profile data (for example, age and sex) is usually self-reported by users, so it is prone to human errors or biases. For example, a user can be reluctant to provide a company with private information such as his/her actual age upon subscription, thus the user either does not fill in the age column or put in some random numbers to avoid unwanted privacy intrusion. However, inaccurate or uncertain user profile data undermines the integrity of a company's marketing or operational intelligence. Targeting customers based on uncertain user profile data will not as effective as targeting customers based on accurate user profile data. Thus companies perform preprocessing on user profile data as part of effort to maintain the accuracy of their user profile data. This paper presents a study of preprocessing uncertain user profile data based on a proposed simple collaborative learning algorithm. We demonstrate that a user's accurate profile information can be inferred from profile information of the user's social network neighbors. Particularly, we address the issue of how a communication service company can verify whether a user's reported age is true or not. We implement a simple collaborative learning algorithm using mobile network data. The dataset contains anonymized user data from a large Korean mobile company, capturing 174,071 users' demographic profiles and their communication histories. To construct a mobile social network among users, we collect 3G voice call histories including 561,787 unique call receivers who belong to the same service carrier. Results reveal that the prediction accuracy of the proposed method based on voice network data is 97% which is very high compared to 53%, the best accuracy by among competing methods and indicates that our method effectively detects users with great discrepancy between self-reported age and actual age.
AB - User profile data (for example, age and sex) is usually self-reported by users, so it is prone to human errors or biases. For example, a user can be reluctant to provide a company with private information such as his/her actual age upon subscription, thus the user either does not fill in the age column or put in some random numbers to avoid unwanted privacy intrusion. However, inaccurate or uncertain user profile data undermines the integrity of a company's marketing or operational intelligence. Targeting customers based on uncertain user profile data will not as effective as targeting customers based on accurate user profile data. Thus companies perform preprocessing on user profile data as part of effort to maintain the accuracy of their user profile data. This paper presents a study of preprocessing uncertain user profile data based on a proposed simple collaborative learning algorithm. We demonstrate that a user's accurate profile information can be inferred from profile information of the user's social network neighbors. Particularly, we address the issue of how a communication service company can verify whether a user's reported age is true or not. We implement a simple collaborative learning algorithm using mobile network data. The dataset contains anonymized user data from a large Korean mobile company, capturing 174,071 users' demographic profiles and their communication histories. To construct a mobile social network among users, we collect 3G voice call histories including 561,787 unique call receivers who belong to the same service carrier. Results reveal that the prediction accuracy of the proposed method based on voice network data is 97% which is very high compared to 53%, the best accuracy by among competing methods and indicates that our method effectively detects users with great discrepancy between self-reported age and actual age.
UR - http://www.scopus.com/inward/record.url?scp=67649663874&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649663874&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2009.154
DO - 10.1109/ICDE.2009.154
M3 - Conference contribution
AN - SCOPUS:67649663874
SN - 9780769535456
T3 - Proceedings - International Conference on Data Engineering
SP - 1619
EP - 1624
BT - Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
Y2 - 29 March 2009 through 2 April 2009
ER -