Spatial statistics

Topics

[AM-03-019] Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis (ESDA) is a crucial methodology within spatial statistics, designed to uncover and interpret spatial patterns, trends, and relationships within geographic datasets. Unlike traditional Exploratory Data Analysis (EDA), which focuses solely on data attributes without considering spatial context, ESDA integrates spatial information to explore how geographical factors influence data patterns. ESDA is a critical phase in the spatial data science pipeline that occurs after data collection and before modeling and consists of a combination of statistical techniques and visualizations to examine the data’s structure, detect patterns, spot outliers, and investigate relationships between variables. This exploratory phase is essential for cleaning and preparing the data, aiding in identifying potential issues such as missing values or biases, informing the selection of appropriate models and techniques, and ensuring that subsequent steps in the research pipeline are grounded in a thorough comprehension of the data's characteristics. This chapter provides a holistic overview of ESDA and situates it within the broader spatial data science pipeline, while differentiating it from aspatial EDA, elucidating its core methodologies, and discussing its implications for understanding spatial datasets - aimed at equipping readers with a comprehensive introduction of ESDA techniques, laying the groundwork for advanced spatial data analysis.
[AM-03-022] Global Measures of Spatial Association

Spatial association broadly describes how the locations and values of samples or observations vary across space. Similarity in both the attribute values and locations of observations can be assessed using measures of spatial association based upon the first law of geography. In this entry, we focus on the measures of spatial autocorrelation that assess the degree of similarity between attribute values of nearby observations across the entire study region. These global measures assess spatial relationships with the combination of spatial proximity as captured in the spatial weights matrix and the attribute similarity as captured by variable covariance (i.e. Moran’s I) or squared difference (i.e. Geary’s C). For categorical data, the join count statistic provides a global measure of spatial association. Two visualization approaches for spatial autocorrelation measures include Moran scatterplots and variograms (also known as semi-variograms).
[AM-03-023] Local Measures of Spatial Association

Local measures of spatial association are statistics used to detect variations of a variable of interest across space when the spatial relationship of the variable is not constant across the study region, known as spatial non-stationarity or spatial heterogeneity. Unlike global measures that summarize the overall spatial autocorrelation of the study area in one single value, local measures of spatial association identify local clusters (observations nearby have similar attribute values) or spatial outliers (observations nearby have different attribute values). Like global measures, local indicators of spatial association (LISA), including local Moran’s I and local Geary’s C, incorporate both spatial proximity and attribute similarity. Getis-Ord Gi*, another popular local statistic, identifies spatial clusters at various significance levels, known as hot spots (unusually high values) and cold spots (unusually low values). This so-called “hot spot analysis” has been extended to examine spatiotemporal trends in data. Bivariate local Moran’s I describes the statistical relationship between one variable at a location and a spatially lagged second variable at neighboring locations, and geographically weighted regression (GWR) allows regression coefficients to vary at each observation location. Visualization of local measures of spatial association is critical, allowing researchers of various disciplines to easily identify local pockets of interest for future examination.
[AM-03-026] Spatial Sampling for Spatial Analysis

Spatial sampling is a key estimation method in spatial analysis, where known sample points are used to predict values at unknown locations. The goal is to improve prediction accuracy by ensuring high-quality samples. This requires controlling two main factors: the location of the samples and the sample size. Sample locations must be carefully chosen within the study area to balance distribution, cost, efficiency, and prediction accuracy. The sample size should ideally be optimized, as collecting too many samples may not provide additional useful information and can increase costs. The challenge is finding the right balance between sample quantity and quality.
[AM-03-032] Spatial Autoregressive Models

Regression analysis is a statistical technique commonly used in the social and physical sciences to model relationships between variables. To make unbiased, consistent, and efficient inferences about real-world relationships a researcher using regression analysis relies on a set of assumptions about the process generating the data used in the analysis and the errors produced by the model. Several of these assumptions are frequently violated when the real-world process generating the data used in the regression analysis is spatially structured, which creates dependence among the observations and spatial structure in the model errors. To avoid the confounding effects of spatial dependence, spatial autoregression models include spatial structures that specify the relationships between observations and their neighbors. These structures are most commonly specified using a weights matrix that can take many forms and be applied to different components of the spatial autoregressive model. Properly specified, including these structures in the regression analysis can account for the effects of spatial dependence on the estimates of the model and allow researchers to make reliable inferences. While spatial autoregressive models are commonly used in spatial econometric applications, they have wide applicability for modeling spatially dependent data.
[AM-08-097] An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to and areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modifiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.
[FC-05-037] Spatial Autocorrelation
The scientific term spatial autocorrelation describes Tobler’s first law of geography: everything is related to everything else, but nearby things are more related than distant things. Spatial autocorrelation has a:
- past characterized by scientists’ non-verbal awareness of it, followed by its formalization;
- present typified by its dissemination across numerous disciplines, its explication, its visualization, and its extension to non-normal data; and
- an anticipated future in which it becomes a standard in data analytic computer software packages, as well as a routinely considered feature of space-time data and in spatial optimization practice.
Positive spatial autocorrelation constitutes the focal point of its past and present; one expectation is that negative spatial autocorrelation will become a focal point of its future.
[PD-05-031] PySAL and Spatial Statistics Libraries

As spatial statistics are essential to the geographical inquiry, accessible and flexible software offering relevant functionalities is highly desired. Python Spatial Analysis Library (PySAL) represents an endeavor towards this end. It is an open-source python library and ecosystem hosting a wide array of spatial statistical and visualization methods. Since its first public release in 2010, PySAL has been applied to address various research questions, used as teaching materials for pedagogical purposes in regular classes and conference workshops serving a wide audience, and integrated into general GIS software such as ArcGIS and QGIS. This entry first gives an overview of the history and new development with PySAL. This is followed by a discussion of PySAL’s new hierarchical structure, and two different modes of accessing PySAL’s functionalities to perform various spatial statistical tasks, including exploratory spatial data analysis, spatial regression, and geovisualization. Next, a discussion is provided on how to find and utilize useful materials for studying and using spatial statistical functions from PySAL and how to get involved with the PySAL community as a user and prospective developer. The entry ends with a brief discussion of future development with PySAL.
[AM-03-058] Hot Spots and Getis-Ord Gi* Analysis

A common goal in spatial analysis is the identification of regions containing unusually high or low values. These areas may be called hot spots if the values are high and cold spots if the values are low. These hot/cold spots indicate where the effects of spatial heterogeneity are greatest. Point density, heat, and choropleth maps all highlight these areas in one way or another. However, due to the limitations of subjective symbolization, statistical methods of hot spot detection are common. Some, like Moran’s I, simply identify the pattern for the entire study area. Local methods display the location and magnitude of individual high and low clusters. Getis-Ord Gi* analysis is the local method most associated with the term hot spots and it is the focus of the second half of the article. Getis-Ord Gi* combines the logic of a probability map with moving windows, kernels and/or adjacency weights. The result is an output surface showing neighborhoods with means significantly above or below the global mean. A primary concern is the correct parameterization, especially the correct conceptualization of spatial relationships. Spatiotemporal variants, limitations, and future directions of hot spot analysis are briefly discussed.
[AM-03-063] Regression Fundamentals

Regression analysis is a common statistical tool used to model relationships between variables and to explore the influencing factors underlying observed spatial data patterns. This entry focuses on the most basic form of regression model: linear regression. The notations, inference, assumptions, and diagnostics of linear regression are introduced, and interpretations of linear regression results are demonstrated using an empirical example in R software. The entry concludes with a brief discussion of the challenges of applying standard linear regression to spatial data.