SocioMap: A tool for exploring, translating, and merging across complex sociopolitical categories

Project: Research project

Project Details


SocioMap: A tool for exploring, linking, and merging complex, evolving sociopolitical categories

Project Summary. Anthropologists and other social scientists are increasingly leveraging complex, multi-scale data from diverse, worldwide sources to understand the causes and consequences of social and cultural variability, social stratification, migration, economic development, and violent conflict. This work frequently requires merging data across multiple datasets by fine-grained sociopolitical categories (e.g., ethnicities, cultures, languages, religions, or provinces). However, different datasets often encode corresponding sociopolitical categories in disparate formats and at different scales (e.g., Guatemala Indigenous vs. Maya vs. Kiche). These diverse encodings must be translated across datasets before merging. Within one country, building, documenting and sharing these translations across datasets is cumbersome but tractable. At global scales across thousands of finer-grained entities, the combinatorial complexity creates thorny challenges for manual reconciliation and for transparent documentation and sharing of researcher decisions. This challenge is exacerbated because many sociopolitical categories are defined by multiple contextual factors, such as shared language, religion, and population history.
This project proposes to build and disseminate a beta version for SocioMapa user-friendly set of tools to help translate sociopolitical categories and classification schemes across multiple, external datasets. This project will focus on four kinds of categoriesethnicities, languages, religions, and subdistrictsthat are commonly used in existing analyses. The beta version will be injected with a critical mass of these categories (> 10K ethnicities and religions, > 10K languages and dialects, and > 40K subdistricts) and translations between these categories across dozens of common standards and hundreds of demographic surveys and censuses worldwide. SocioMaps tools will help users: (1) explore contextual information about specific sociopolitical categories, (2) translate and share new classification schemes from datasets, standards, and published studies, and (3) merge novel combinations of datasets for researchers custom research needs. For the last function, SocioMap would automatically generate syntax (e.g., R, SPSS) to merge datasets of interest. Crucially, SocioMap does not store observational data. Rather it is an interactive dictionary of keys to help users merge observational data from diverse external datasets. Thus, SocioMap complements existing datasets storing observational data, such as HRAF and D-PLACE, by providing tools for linking these datasets with social, cultural, and demographic data from a diverse range of other datasets. SocioMap is designed to grow by permitting registered users to add new classification schemes for re-use in future projects. The project builds on our teams 7-year effort to integrate and translate data from multiple datasets by thousands of ethnicities, languages, religions and subdistricts, as well as prior funded work building and piloting a prototype system.
Intellectual Merit: Facilitating efficient and transparent translation across a wide array of fine-grained sociopolitical categories, SocioMap will help users unlock, merge, and analyze a wider range of economic, social and cultural data from diverse datasets and to conduct analyses at finer-grained scales than would be practically feasible without these tools. SocioMaps repository of documented categories and translations would also help users to design new analyses, find existing datasets with data for relevant specific categories, and compare results across studies, thereby encouraging new analyses of publicly funded data and re-assessments of the robustness and reproducibility of past findings. By storing the coding schemes created by registered users, SocioMap will also advance open access and transparency. SocioMap will also create a foundation for work with other entities (e.g. political parties, NGOs, industry and occupation classifications, and firms) and for reconciling sociopolitical categories over longer historical periods.
Broader Impacts: Infrastructure. SocioMap will both leverage existing data investments and stimulate future systematic data collection efforts at finer-grained population levels. Education. This project will train an anthropology graduate student in database management and code development in Cypher, R and Javascript. It will also train over 20 undergraduates in using the Beta version to assist in comparative data analysis. Public Policy: SocioMaps user-friendly interface will assist policymakers in analyzing population data at multiple scales to assess temporal changes in social, economic, and cultural indicators at fine grains and to analyze determinants of population disparities in these indicators.
StatusNot started
Effective start/end date6/1/215/31/23


  • National Science Foundation (NSF): $155,528.00