Location Prediction for Tweets

Chieh Yang Huang; Hanghang Tong; Jingrui He; Ross Maciejewski

doi:10.3389/fdata.2019.00005

Location Prediction for Tweets

Chieh Yang Huang, Hanghang Tong, Jingrui He, Ross Maciejewski

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.

Original language	English (US)
Article number	5
Journal	Frontiers in Big Data
Volume	2
DOIs	https://doi.org/10.3389/fdata.2019.00005
State	Published - May 24 2019

Keywords

data mining
deep learning
joint training
location prediction
multi-head self-attention mechanism
tweets

ASJC Scopus subject areas

Computer Science (miscellaneous)
Information Systems
Artificial Intelligence

Access to Document

10.3389/fdata.2019.00005

Cite this

@article{d5289bc163ef4c1a9b756ad0edb858a1,

title = "Location Prediction for Tweets",

abstract = "Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.",

keywords = "data mining, deep learning, joint training, location prediction, multi-head self-attention mechanism, tweets",

author = "Huang, {Chieh Yang} and Hanghang Tong and Jingrui He and Ross Maciejewski",

note = "Funding Information: This work is supported by NSF (IIS-1651203 and IIS-1715385), ARO (W911NF-16-1-0168), and DHS (2017-ST-061-QA0001). Publisher Copyright: Copyright {\textcopyright} 2019 Huang, Tong, He and Maciejewski.",

year = "2019",

month = may,

day = "24",

doi = "10.3389/fdata.2019.00005",

language = "English (US)",

volume = "2",

journal = "Frontiers in Big Data",

issn = "2624-909X",

publisher = "Frontiers Media S. A.",

}

TY - JOUR

T1 - Location Prediction for Tweets

AU - Huang, Chieh Yang

AU - Tong, Hanghang

AU - He, Jingrui

AU - Maciejewski, Ross

N1 - Funding Information: This work is supported by NSF (IIS-1651203 and IIS-1715385), ARO (W911NF-16-1-0168), and DHS (2017-ST-061-QA0001). Publisher Copyright: Copyright © 2019 Huang, Tong, He and Maciejewski.

PY - 2019/5/24

Y1 - 2019/5/24

N2 - Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.

AB - Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.

KW - data mining

KW - deep learning

KW - joint training

KW - location prediction

KW - multi-head self-attention mechanism

KW - tweets

UR - http://www.scopus.com/inward/record.url?scp=85118723747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85118723747&partnerID=8YFLogxK

U2 - 10.3389/fdata.2019.00005

DO - 10.3389/fdata.2019.00005

M3 - Article

AN - SCOPUS:85118723747

SN - 2624-909X

VL - 2

JO - Frontiers in Big Data

JF - Frontiers in Big Data

M1 - 5

ER -

Location Prediction for Tweets

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this