Similarity join for big geographic data

Yasin Silva, Jason M. Reed, Lisa M. Tsosie, Timothy A. Matti

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

Original languageEnglish (US)
Title of host publicationGeographical Information Systems
Subtitle of host publicationTrends and Technologies
PublisherCRC Press
Pages20-49
Number of pages30
ISBN (Electronic)9781466596955
ISBN (Print)9781466596931
DOIs
StatePublished - Jan 1 2014

Fingerprint

Theaters
Processing
Websites
Internet
commodity
Industry

ASJC Scopus subject areas

  • Computer Science(all)
  • Earth and Planetary Sciences(all)
  • Engineering(all)

Cite this

Silva, Y., Reed, J. M., Tsosie, L. M., & Matti, T. A. (2014). Similarity join for big geographic data. In Geographical Information Systems: Trends and Technologies (pp. 20-49). CRC Press. https://doi.org/10.1201/b16871

Similarity join for big geographic data. / Silva, Yasin; Reed, Jason M.; Tsosie, Lisa M.; Matti, Timothy A.

Geographical Information Systems: Trends and Technologies. CRC Press, 2014. p. 20-49.

Research output: Chapter in Book/Report/Conference proceedingChapter

Silva, Y, Reed, JM, Tsosie, LM & Matti, TA 2014, Similarity join for big geographic data. in Geographical Information Systems: Trends and Technologies. CRC Press, pp. 20-49. https://doi.org/10.1201/b16871
Silva Y, Reed JM, Tsosie LM, Matti TA. Similarity join for big geographic data. In Geographical Information Systems: Trends and Technologies. CRC Press. 2014. p. 20-49 https://doi.org/10.1201/b16871
Silva, Yasin ; Reed, Jason M. ; Tsosie, Lisa M. ; Matti, Timothy A. / Similarity join for big geographic data. Geographical Information Systems: Trends and Technologies. CRC Press, 2014. pp. 20-49
@inbook{64f13ff369d14a62a10bc24337b848e3,
title = "Similarity join for big geographic data",
abstract = "Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.",
author = "Yasin Silva and Reed, {Jason M.} and Tsosie, {Lisa M.} and Matti, {Timothy A.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1201/b16871",
language = "English (US)",
isbn = "9781466596931",
pages = "20--49",
booktitle = "Geographical Information Systems",
publisher = "CRC Press",

}

TY - CHAP

T1 - Similarity join for big geographic data

AU - Silva, Yasin

AU - Reed, Jason M.

AU - Tsosie, Lisa M.

AU - Matti, Timothy A.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

AB - Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

UR - http://www.scopus.com/inward/record.url?scp=85054744067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054744067&partnerID=8YFLogxK

U2 - 10.1201/b16871

DO - 10.1201/b16871

M3 - Chapter

SN - 9781466596931

SP - 20

EP - 49

BT - Geographical Information Systems

PB - CRC Press

ER -