Maup

Topics

[AM-08-097] An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to and areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modifiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.
[CV-03-005] Statistical Mapping (Enumeration, Normalization, Classification)

Proper communication of spatial distributions, trends, and patterns in data is an important component of a cartographers work. Geospatial data is often large and complex, and due to inherent limitations of size, scalability, and sensitivity, cartographers are often required to work with data that is abstracted, aggregated, or simplified from its original form. Working with data in this manner serves to clarify cartographic messages, expedite design decisions, and assist in developing narratives, but it also introduces a degree of abstraction and subjectivity in the map that can make it easy to infer false messages from the data and ultimately can mislead map readers. This entry introduces the core topics of statistical mapping around cartography. First, we define enumeration and the aggregation of data to units of enumeration. Next, we introduce the importance of data normalization (or standardization) to more truthfully communicate cartographically and, lastly, discuss common methods of data classification and how cartographers bin data into groups that simplify communication.
[FC-07-026] Problems of Scale and Zoning

Spatial data are often encoded within a set of spatial units that exhaustively partition a region, where individual level data are aggregated, or continuous data are summarized, over a set of spatial units. Such is the case with census data aggregated to enumeration units for public dissemination. Partitioning schemes can vary by scale, where one partitioning scheme spatially nests within another, or by zoning, where two partitioning schemes have the same number of units but the unit shapes and boundaries differ. The Modifiable Areal Unit Problem (MAUP) refers to the fact the nature of spatial partitioning can affect the interpretation and results of visualization and statistical analysis. Generally, coarser scales of data aggregation tend to have stronger observed statistical associations among variables. The ecological fallacy refers to the assumption that an individual has the same attributes as the aggregate group to which it belongs. Combining spatial data with different partitioning schemes to facilitate analysis is often problematic. Areal interpolation may be used to estimate data over small areas or ecological inference may be used to infer individual behaviors from aggregate data. Researchers may also perform analyses at multiple scales as a point of comparison.
[GS-02-020] Aggregation of Spatial Entities and Legislative Redistricting

The partitioning of space is an essential consideration for the efficient allocation of resources. In the United States and many other countries, this parcelization of sub-regions for political and legislative purposes results in what is referred to as districts. A district is an aggregation of smaller, spatially bound units, along with their statistical properties, into larger spatially-bound units. When a district has the primary purpose of representation, individuals who reside within that district make up a constituency. Redistricting is often required as populations of constituents shift over time or resources that service areas change. Administrative challenges with creating districts have been greatly aided by increasing utilization of GIS. However, with these advances in geospatial methods, political disputes with the way in which districts increasingly snare the process in legal battles often centered on the topic of gerrymandering. This chapter focuses on the redistricting process within the United States and how the aggregation of representative spatial entities presents a mix of political, technical and legal challenges.
[GS-01-027] GIS&T for Equity and Social Justice

A geographic information system (GIS) can be used effectively for activities, programs, and analyses focused on equity and social justice (ESJ). Many types of inequities exist in society, but race and space are key predictors of inequity. A key concept of social justice is that any person born into society, no matter where they were born or live, will have an equitable opportunity to achieve successful life outcomes and to thrive. Geographic information science and its technologies (GIS&T) provide powerful tools to analyze equity and social justice issues and help government agencies apply an equity lens to every aspect of their administration. Given the reliance on spatial data to represent and analyze matters of ESJ, the use of these tools is necessary, logical, and appropriate. Some types of analyses and mapping commonly used with ESJ programs require careful attention to how data are combined and represented, risking misleading or false conclusions otherwise. Such outcomes could build mistrust when trust is most needed. A GIS-supported lifecycle for ESJ is presented that includes stages of exploratory issue analysis, community feedback, pro-equity programs analysis, management monitoring and stakeholder awareness, program performance metrics, and effectiveness analysis.
[AM-03-011] Spatial Statistics

Spatial statistics is dedicated to describing and modeling georeferenced data through the application of statistical theories and methods. Unlike conventional statistical approaches, which often assume independence among observations, spatial statistical techniques allow to account for locational aspects observations in addition to their attributes. Modeling georeferenced data with conventional non-spatial statistical approaches can lead to bias and unreliable results. This article first discusses measurements of spatial arrangements including mean center and standard distance deviation. It then reviews statistical methods for the types of spatial data—point data, geostatistical data, and areal data. Following this, it examines Bayesian spatial models, which offer a flexible framework for incorporating spatial dependence. Finally, the article concludes with a discussion of ongoing challenges in spatial statistics, including potential limitations of area-unit based observations, computational limitations, and issues related to data uncertainty.
[AM-04-067] Gridding, Interpolation, and Contouring

Gridding is the act of taking a field of measurements and discretizing it into a regular tessellation, often either a lattice of squares or hexagons. Gridding can either discretize continuous phenomena or aggregate discrete instances; in either case, gridding serves conceptually to assist analysis, for example in finding local minima or maxima (i.e., "hotspots"). The process of gridding often involves interpolation, which is the rational estimation of unknown data values within the bounds of known values. Contouring refers to the creation of isolines throughout a data surface, often one represented by a grid. This section describes gridding, interpolation, and contouring, highlighting a few example methods by which interpolation is frequently done in the geospatial analysis.

Maup

Topics

[AM-08-097] An Introduction to Spatial Data Mining

[CV-03-005] Statistical Mapping (Enumeration, Normalization, Classification)

[FC-07-026] Problems of Scale and Zoning

[GS-02-020] Aggregation of Spatial Entities and Legislative Redistricting

[GS-01-027] GIS&T for Equity and Social Justice

[AM-03-011] Spatial Statistics

[AM-04-067] Gridding, Interpolation, and Contouring