[DM-06-086] Vector-to-Raster and Raster-to-Vector Conversions

Spatial data can be represented in vector or raster form. The vector spatial data model is coordinate-based and represents geographic features as points, lines, and polygons. The raster spatial data model is pixel-based and represents geographic phenomena as an organized matrix of cells. Each model possesses advantages, disadvantages, and tradeoffs in how data can be manipulated, analyzed, and rendered. As a result, GIS professionals often need to work between data models to achieve their analytical goals. Vector-to-raster and raster-to-vector conversions are fundamental spatial data manipulation processes used to transform one model of spatial data representation into the other to extend the utility of a spatial dataset. Vector-to-raster conversion, also known as rasterization, is the process of converting vector points, lines, and polygons into a surface of gridded cells or pixels. Advanced rasterization techniques, such as spatial interpolation and density mapping, can be used to predict raster surfaces at unsampled locations based on known values of nearby vector spatial data inputs. Raster-to-vector conversion, also known as vectorization, is the process of converting gridded cell- or pixel-based data into vector points, lines, and polygons. While powerful, these conversion processes also have implications for geographic accuracy and potential feature loss.

Tags

heatmap
interpolation
raster
rasterization
spatial resolution
vector
vectorization

Author and citation

Nelson, J. (2024). Vector-to-Raster and Raster-to-Vector Conversions. The Geographic Information Science & Technology Body of Knowledge (2024 Edition), John P. Wilson (Ed.). DOI: 10.22224/gistbok/2024.1.7

Explanation

  1. Refresher on Vector & Raster Spatial Data Models
  2. Vector-to-Raster Conversions
  3. Raster-to-Vector Conversions

1. Refresher on Vector & Raster Spatial Data Models

Spatial data can be represented in vector or raster form, each possessing advantages, disadvantages, and tradeoffs in how data can be manipulated, analyzed, and rendered. Vector data are stored as coordinate pairs whereas raster data are stored in pixels. The vector-raster discussion (and debate) in Geographic Information Science is comprehensive and dates to at least the 1980s (e.g., Peuquet 1984; Gahegan & Roberts, 1988; Goodchild 1989). The merits of space bounding versus space filling representations and their respective object- versus field-based concepts have been analyzed in detail for different types of geographic phenomena, spatial scales, and analytical use cases (Couclelis 2005).

Vector representations are useful for encoding discrete data that possess a high degree of geographic accuracy. The vector model can store many attributes for a single feature and can preserve topological relationships among features, facilitating advanced spatial operations (e.g., buffer, intersect, etc.) and statistical analyses (e.g., geographically weighted regression, spatial autocorrelation analysis, etc.). The spatial detail and scalability of vector data also tend to result in more aesthetically pleasing cartographic outputs. Raster representations, on the other hand, enable the analysis of continuous data types across large geographic areas (e.g., aspect, slope, and hillshade analysis) in addition to being able to encode discrete data types, such as soil classifications. By constraining spatial resolution to standardized grid cells and not preserving topological relationships or storing more than one attribute, the raster data model can support powerful and intuitive mathematical modeling using map algebra (Tomlin 1994; see also Grid Operations and Map Algebra). For a comprehensive overview of vector and raster data models, formats, and sources, refer to BoK entries on Classic Vector Data Models (forthcoming), the Raster Data Model, Vector Formats and Sources, and Raster Formats and Sources.

Given that vector and raster data models each support the representation of different spatial data types and subsequently different analyses and use cases, GIS professionals often need to work between data models to achieve their analytical goals. Vector-to-raster and raster-to-vector conversions are fundamental spatial data manipulation processes that enable GIS professionals to transform one model of spatial data representation into the other to extend the utility of a spatial dataset. For example, a vector-to-raster conversion might be used to transform vector polygons containing information about different rock types into raster format to support the creation of bedrock classifications for a geology map. Alternatively, a raster-to-vector conversion can be used to transform a raster digital elevation model into vector contour lines for a topographic hiking map. Figure 1 provides a visual illustration of the results of the conversion process with vector points, lines, and polygons presented on the left and corresponding raster spatial data representations presented on the right.

vector and raster representations
Figure 1. Visual illustration of vector (left) and raster (right) representations of point, line, and areal geographic features. Source: author.

 

2. Vector-to-Raster Conversions

Vector-to-raster conversion — also known as rasterization — is the process of converting vector points, lines, and polygons into a surface of gridded cells or pixels. The process includes the following key steps:

  1. Select an input vector layer (points, lines, or polygons) that will be converted to raster format.
  2. Select a field from the input vector layer to carry over into the new raster layer. Most GIS software and tools require that the field be type numeric; for multipoint and polyline datasets, z- or m-values can be used. Importantly, this step defines the sole attribute field that will be used to set pixel values.
  3. Define spatial resolution and extent of the output raster. Horizontal and vertical resolution can be set using the georeferenced units of the GIS software application environment or cell size can be set based on pixel values for width and height. The spatial extent of the output raster defaults to the minimum coverage of the selected vector layer being converted but can also be calculated from another layer in the GIS environment or current map layout.
  4. Define output data and file types. GIS software allows users to customize the numeric data type of the burn value (e.g., byte, int16, float32, etc.) and specify raster format (e.g., .tif, .img, .crf, etc.).

3.1 Points-to-Raster

At the simplest, most fundamental level a vector data point representing one geographic feature (e.g., a single tree) can be converted into a single raster cell. The value associated with that raster cell could be the height of the tree, its age, etc. The process of converting vector points to raster assumes that raster cells will be assigned the value of the point found within the cell. Raster cells may also be given a value of 0 or 1 to delineate the presence (or not) of a point feature. Cells that do not contain any points will be assigned a no data value. One issue that can arise is when more than one point is contained within a single raster cell, in which case the cell is typically assigned the most frequent attribute value found across all point features within that cell. If a common attribute does not exist, the cell is typically assigned the value of the feature with the lowest feature identification number. In either scenario, attribute values of any other data point(s) found within the cell are disregarded. Some GIS software allows users to specify cell assignment based on the most frequent attribute value found across all point features found within a single cell or based on summary statistics, such as the sum, mean, standard deviation, maximum, minimum, range, or count of attribute values for all points within each cell. If maintaining the distinct number of features during the conversion process is essential, users can specify a higher spatial resolution (i.e., smaller cell size) to ensure that a single raster cell does not contain more than one point.

3.2 Lines-to-Raster

A vector line representing a linear geographic feature (e.g., street segment) can be converted into a series of adjacent raster cells. The value associated with these raster cells could be a numeric road type classification, total traffic volume, etc (Fig. 2). The process of converting vector lines to raster typically assumes that raster cells will be assigned the attribute value of the line that intersects each cell. Alternatively, raster cells may be given a value of 0 or 1 to delineate the presence (or not) of a linear feature. Cells that do not intersect any line features will be assigned a no data value. To address cases in which more than one linear feature intersects a given raster cell, some GIS software allows users to specify cell assignment based on the feature with the longest maximum length. However, if maintaining the distinct number of features during the conversion process is important, users can specify a higher spatial resolution to ensure that a single raster cell does not reflect more than one linear feature.

linear streets after rasterized
Figure 2. A linear street network symbolized via functional class specification depicted in vector format (top) with results from raster conversion (bottom). Source: author.

 

3.3 Polygons-to-Raster

A vector polygon representing an areal geographic feature (e.g., building footprint) can be converted into a cluster of raster cells with associated values that could delineate building age, capacity, etc. Raster cells typically inherit the attribute value of the polygon found at the center of each cell but assignment can also be designated based on partial or majority containment of the polygon(s) within a given raster cell. Raster cells may also be given a value of 0 or 1 to delineate the presence (or not) of an areal feature. Cells that do not meet the criteria for polygon value assignment are given a no data value.

3.4 Beyond-the-Basics

More advanced vector-to-raster conversions typically involve spatial interpolation techniques that are used for estimating raster cell values from primarily vector points (Meng et al., 2013). While not a necessary step in the process, these interpolated raster surfaces can be symbolized as isoline map to visualize geographic phenomena that vary over space and have no distinct boundaries (e.g., elevation, rainfall, slope). spatial interpolation is conceptually grounded in the first law of geography that states: “Everything is related to everything else, but near things are more related than distant things” (Tobler 1970). Interpolation techniques can be classified as deterministic or geostatistical and take global or local form. Deterministic interpolation techniques derive raster surfaces based on the predetermined spatial context (i.e., extent of similarity or degree of smoothness) of the input vector dataset; whereas, geostatistical (or stochastic) techniques take into consideration spatial dependence and quantify the spatial autocorrelation (see Spatial Autocorrelation) among the measured spatial data values, as well as provide accuracy measures for the predictions. Global interpolation uses all available vector data points to estimate values and tends to result in a smoother surface. Local interpolation uses a subset of the entire dataset to estimate unknown values on a neighborhood-by-neighborhood basis with the goal of better capturing smaller-area surface variations in the input dataset. Table 1 provides a summary of five common point-to-raster interpolation techniques found in GIS software.

 

Table 1: Characteristics of five common point-to-raster interpolation techniques: Inverse Distance Weighting (IDW), Natural Neighbor, Spline, Polynomial, and Kriging.

Interpolation Technique Classification Form Conceptual Overview Advantages Disadvantages
Inverse Distance Weighting (IDW)  deterministic local or global (if all input data points are specified in neighborhood designation) estimates a raster value using a weighted average of the values of nearby input spatial data; points closer to the raster cell value being predicted provide greater influence than the values of points further away intuitive to implement and works well for homogenous, widely distributed data potential to over-smooth surface output and generates artificates in areas possessing high spatial variability
Natural Neighbor (Sibon 1981) deterministic local estimates a raster cell value by finding a close subset of input spatial data, generating a Voronoi tessellation, and applying weights to nearby values based on proportionate areas computationally efficient and capable of modeing complex spatial relationships potential to introduce over- and under-fitting issues when input data are sparse or poorly represent the geographic phenomenon being measured
Spline deterministic global or local estimates a raster cell value by fitting a methematical function to the input spatial data points that a) minimizes overall surface curvature and b) directly passes through the input data points works well with input data possessing high spatial variability and complex spatial patterns potential to introduce over- and under-fitting issues when input data are sparse or poorly represent the geographic phenomenon being measured
Polynomial deterministic global or local estimates a raster cell value by fitting one polygonial mathematical functoin to the entire dataset (global approach) or many polynomials to specified neighborhood designations (local approach) works well for fitting surfaces that vary slowly over space and performing trend surface analysis to assess long-range geographic patterns and processes sensitive to outlier values and neighborhood distance thresholds
Kriging (Matheron 1963; Oliver & Webster, 1990)  geostatistical global or local predicts a raster cell value by creating variograms and covariance functions to estimate the spatial autocorrelation of the input data points; spatial weights are based on a combination of the overall spatial arrangement of the measured points and  the distance between the measured points and the prediction location capable of modeling complex, highly variable spatial relationships when spatial autocorrelation exists in input dataset computationally intensive if input dataset is large and densely distributed

 

Areal interpolation techniques have also been developed to transform spatial data from areas with known values (i.e., source zones) into new areas with unknown values (i.e, target zones) (Lam 1983; see also Areal Interpolation). Areal interpolation enables GIS professionals to harmonize spatial data that has been aggregated at many different levels (e.g., census units) and make predictions about geographic phenomena (e.g., population totals, cancer rates, etc.) across different levels of aggregation (Fig. 3). One example of areal interpolation is the pycnophylactic technique, which can be used to convert vector polygons to raster surfaces using an iterative, local neighborhood approach that maximizes smooth renderings while preserving source zone volumes (Tobler 1979). Pycnophylactic interpolation can output intuitive and aesthetically-pleasing raster surfaces known as isopleth maps, however assumes that no sharp or irrelevant boundaries (e.g. river, mountain range, etc.) exist in the target zones.

 

vector and raster interpolations
Figure 3. Example vector-to-raster-to-vector areal interpolation workflow, in which spatial data values aggregated at one level (school zones) are first transformed into a smooth, continuous raster surface then used to predict values at another level of aggregation (census blocks). Map image reproduced with permission. Copyright © 2024 Esri.

 

In addition to interpolation techniques, density tools can also be used to convert vector points or lines into raster surfaces or heatmap that depict the intensity and magnitude of geographic phenomena using variation in color hue and brightness. Density measures are used to calculate a magnitude per unit area raster surface based on the number of features that are within a specified neighborhood designation; a kernel function can be applied to output a smoother surface (see Kernels and Density Estimation). Figure 4 illustrates a raster heatmap that was created from anonymized vector GPS points collected by bicyclists who use the Strava fitness tracking application.

Strava heatmap
Figure 4. Screenshot of the global raster heatmap of bicycling activity on Strava centered on greater London, UK. Source: https://www.strava.com/maps/global-heatmap.

 

3. Raster-to-Vector Conversions

Raster-to-vector conversion — also known as vectorization — is the process of converting gridded cell- or pixel-based data into vector points, lines, or polygons. The process includes the following key steps:

  1. Select an input raster layer, typically of type integer or floating point.
  2. Select and provide a name for the field (or “band” if working with satellite imagery) from the input raster layer that will be transformed into an attribute table for the new vector layer.
  3. Define the output data type, commonly a feature class, comma separated values (csv) file, or XYZ ASCII file format.

4.1 Raster-to-Points

A raster surface such as a digital elevation model (DEM) can be converted into vector points, possessing an attribute table containing geographic location, and in this case, elevation information. The location of each point reflects the centroid of each raster cell for which a data value exists. Cells containing no data values will not be converted into points.

4.2 Raster-to-Lines

Raster surfaces can also be converted into lines. For example, raster representations of stream flow derived from hydrological modeling could be converted into vector stream network centerlines to then generate attributes on stream length, flow magnitude, etc. The geographic accuracy of the output vector lines depends on the spatial resolution of the input raster data. If the resolution of the input raster is low, the output vector lines may appear jagged, in which case results may be insufficient for detailed spatial analysis and/or large-scale (small-area) mapping projects. To remedy this issue, GIS software provides conversion parameters and post-processing tools for simplifying vector outputs by removing small fluctuations and superfluous bends, while attempting to preserve the overarching shape of the linear features. 

4.3 Raster-to-Polygons

Lastly, raster surfaces can be converted into polygons. A raster surface representing land cover classification, for example, could be converted into vector polygons to enable GIS professionals to analyze areal coverage of different land type designations and/or spatially join other attributes, such as population density or weather factors to enrich understanding of anthropogenic and natural factors across the landscape. Similar to the raster-to-line conversion process, the quality of the boundaries of the output vector polygons also depends on the resolution of the input raster. Figure 5 illustrates the significance of raster resolution on the vector polygon outputs. The boundaries of the vector polygon output may look like a staircase if the input raster resolution is low. GIS software provides conversion parameters and post-processing tools for simplifying polygon boundaries, as well as supports the creation of multipart features.

raster to polygon
Figure 5. Example raster-to-polygon output depicting low raster input resolution (top) and high raster input resolution (bottom). Source: author.

 

References

Learning outcomes

Related topics