TY - JOUR
T1 - Location Prediction for Tweets
AU - Huang, Chieh Yang
AU - Tong, Hanghang
AU - He, Jingrui
AU - Maciejewski, Ross
N1 - Funding Information:
This work is supported by NSF (IIS-1651203 and IIS-1715385), ARO (W911NF-16-1-0168), and DHS (2017-ST-061-QA0001).
Publisher Copyright:
Copyright © 2019 Huang, Tong, He and Maciejewski.
PY - 2019/5/24
Y1 - 2019/5/24
N2 - Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.
AB - Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.
KW - data mining
KW - deep learning
KW - joint training
KW - location prediction
KW - multi-head self-attention mechanism
KW - tweets
UR - http://www.scopus.com/inward/record.url?scp=85118723747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118723747&partnerID=8YFLogxK
U2 - 10.3389/fdata.2019.00005
DO - 10.3389/fdata.2019.00005
M3 - Article
AN - SCOPUS:85118723747
SN - 2624-909X
VL - 2
JO - Frontiers in Big Data
JF - Frontiers in Big Data
M1 - 5
ER -