TY - GEN
T1 - Geospatial data mining on the web
T2 - 8th International Conference on Advanced Data Mining and Applications, ADMA 2012
AU - Li, WenWen
AU - Goodchild, Michael
AU - Church, Richard L.
AU - Zhou, Bin
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Identifying location-based information from the WWW, such as street addresses of emergency service facilities, has become increasingly popular. However, current Web-mining tools such as Google's crawler are designed to index webpages on the Internet instead of considering location information with a smaller granularity as an indexable object. This always leads to low recall of the search results. In order to retrieve the location-based information on the ever-expanding Internet with almost-unstructured Web data, there is a need of an effective Web-mining mechanism that is capable of extracting desired spatial data on the right webpages within the right scope. In this paper, we report our efforts towards automated location-information retrieval by developing a knowledge-based Web mining tool, CyberMiner, that adopts (1) a geospatial taxonomy to determine the starting URLs and domains for the spatial Web mining, (2) a rule-based forward and backward screening algorithm for efficient address extraction, and (3) inductive-learning-based semantic analysis to discover patterns of street addresses of interest. The retrieval of locations of all fire stations within Los Angeles County, California is used as a case study.
AB - Identifying location-based information from the WWW, such as street addresses of emergency service facilities, has become increasingly popular. However, current Web-mining tools such as Google's crawler are designed to index webpages on the Internet instead of considering location information with a smaller granularity as an indexable object. This always leads to low recall of the search results. In order to retrieve the location-based information on the ever-expanding Internet with almost-unstructured Web data, there is a need of an effective Web-mining mechanism that is capable of extracting desired spatial data on the right webpages within the right scope. In this paper, we report our efforts towards automated location-information retrieval by developing a knowledge-based Web mining tool, CyberMiner, that adopts (1) a geospatial taxonomy to determine the starting URLs and domains for the spatial Web mining, (2) a rule-based forward and backward screening algorithm for efficient address extraction, and (3) inductive-learning-based semantic analysis to discover patterns of street addresses of interest. The retrieval of locations of all fire stations within Los Angeles County, California is used as a case study.
KW - Emergency service facilities
KW - Inductive learning
KW - Information extraction
KW - Information retrieval
KW - Location-based services
KW - Ontology
KW - Web data mining
UR - http://www.scopus.com/inward/record.url?scp=84872710346&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84872710346&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-35527-1_46
DO - 10.1007/978-3-642-35527-1_46
M3 - Conference contribution
AN - SCOPUS:84872710346
SN - 9783642355264
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 552
EP - 563
BT - Advanced Data Mining and Applications - 8th International Conference, ADMA 2012, Proceedings
Y2 - 15 December 2012 through 18 December 2012
ER -