The increasing accessibility of geospatial data in the form of knowledge graphs, developed in alignment with the Semantic Web vision and employing Linked Data principles, is becoming a prominent feature of the Web. The multitude of available geospatial knowledge graphs demonstrates their indispensable role within the Web of Data Cloud. Such graphs serve as central nexuses, interconnecting events, people, and objects, offering an ever-growing semantic representation of the geospatial information wealth. The resulting resources and capabilities are being leveraged to utilize, consume, and capitalize on geospatial information through the strategic deployment of knowledge graphs. The geospatial graphs are stored and managed by triple stores, which are also known as RDF stores or knowledge bases. As examples, SPARQL and GeoSPARQL are semantic query languages that are used to retrieve and process knowledge graphs. Some developments and experiences in the GIScience community have demonstrated the feasibility of expressing queries across diverse knowledge graphs to retrieve and process geospatial data from disparate and distributed sources. These efforts have facilitated the consumption of geospatial knowledge graphs through lightweight web applications or GIS applications.
Vilches-Blázquez, L. (2025). Geospatial Semantic Queries. The Geographic Information Science & Technology Body of Knowledge (2025 Edition), John P. Wilson (ed.). DOI: 10.22224/gistbok/2025.1.7.
Many application domains related to GIScience have historically used spatially extended relational databases to perform geospatial reasoning (Battle & Kolas, 2012). The combination of efficient, stable data storage and retrieval with geospatial computation and indexing has been made possible by spatially extended databases. These advantages enable fast and accurate responses to queries such as "Which locations within a 2 km radius of a city's neighbors are likely to be flooded?". Notwithstanding their traditional adoption, relational databases continue to present challenges, including queries with numerous joins across entities, queries with variable properties used to connect data within a table or combine data from several tables, inference on datasets to support reasoning capabilities to identify errors and infer new conclusions based on the available data, qualitative spatial reasoning (Weiss, Karras, & Bernstein, 2008; Battle & Kolas, 2012), as well as the ability to share data schemas across multiple repositories for a common domain understanding.
These limitations pose significant challenges in the utilization and integration of data from disparate relational databases. In most cases, data are meticulously compartmentalized from each other, employing distinct data models, utilizing ad hoc vocabularies, and occasionally disseminated in non-standard formats by disparate owners (Vilches-Blázquez et al., 2014). However, as an increasing number of datasets are becoming available online, the ability to extract information from combinations of these datasets is becoming increasingly important in order to present a more complete understanding of the data.
Over the past two decades, the Semantic Web, an extension of the existing Web in which information is given well-defined meanings to better enable computers and people to work together (Berners-Lee, Hendler, & Lassila 2001), has been increasingly used to address some long-standing problems in the geospatial domain (Huang & Harrie 2019). In more recent times, Knowledge Graphs (KGs) have emerged as an extension of Semantic Web practices and have been adopted by various companies such as Google, IBM, Facebook, or Microsoft (Noy et al. 2019). KGs capture and convey knowledge of the real world using graph-based representations to facilitate the creation, reuse, and retrieval of human- and machine-readable structured data (Cimiano & Paulheim 2017; Hogan et al., 2021). In this way, KGs have become one of the main ways to integrate diverse data (Hitzler et al. 2020), enabling the handling and linking of multiple heterogeneous datasets within a single system (Krötzsch & Thost 2016; Bellomarini, Sallinger, & Vahdati 2020; Vilches-Blázquez & Saavedra, 2022; Rowland et al., 2022).
The vision of the Semantic Web is that any representation should be represented as a graph associated with the Resource Description Framework (RDF), which is the standard knowledge representation language for this Web. Using RDF as the normal form of knowledge representation has certain advantages, such as avoiding the use of proprietary formats, harmonizing formats (e.g., databases, shapefiles, spreadsheets, and CSV files), facilitating data integration by breaking down data silos, among others (Vilches-Blázquez et al., 2014). This provides a reliable infrastructure for sharing, publishing and querying structured data on the Semantic Web (McDonald and Levine-Clark 2017). In this context, a basic triple in a graph can be interpreted as two nodes representing tangible entities (e.g., "River" and "Sea") linked by a relation (e.g., "flow into"). The aforementioned triples (nodes and relations) (e.g., :River :flowInto :Sea) facilitate the interconnection of disparate datasets, thereby generating graph-based representations, so-called KGs (Bellomarini, Sallinger, and Vahdati 2020).
An increasing amount of geospatial data is now available in the form of KGs that adhere to the Linked Data principles (Berners-Lee, 2006): (1) use of Uniform Resource Identifiers (URIs) as names for things; (2) use of the HyperText Transfer Protocol (HTTP) so that people can look up those names (URIs); (3) when someone looks up a URI, provide useful information using the RDF and SPARQL Protocol and RDF Query Language (SPARQL) standards; and (4) include links to other URIs so that they can discover more things.
This novel approach to publishing geospatial data adheres to the best practices for publishing, discovering, and utilizing spatial data on the Web (Tandy et al., 2023; van den Brink et al., 2019). These practices include the adoption of the Linked Data principles by geospatial resources, as well as their modeling through ontologies, their representation through RDF, their indexing by search engines, and their integration with other resources (Heath & Bizer, 2011). These actions are fundamental to the transformation of geospatial data into geospatial KGs. Furthermore, the aforementioned best practices (Tandy et al., 2023) support the FAIR principles (Wilkinson et al., 2016), a set of guidelines that emphasize the ability of computational systems to Find, Access, Interoperate, and Reuse data with no or minimal human intervention. For a detailed description of the FAIR principles, see FAIR Principles - GO FAIR.
Given these best practices and (Linked Data & FAIR) principles for the GIScience community, a plethora of resources, capabilities, and methodologies are being leveraged to create, exploit, consume, and capitalize on geospatial information through the strategic deployment of KGs. The multitude of available geospatial KGs exemplifies their indispensable role within the Web of Data Cloud acting as central nodes that connect events, people, and objects (Mai et al., 2019), and providing an ever-growing semantic representation of the richness of geospatial information (Jovanovik, Homburg & Spasić, 2021).
The geospatial KGs are stored and managed by triple stores, which are also known as RDF stores or knowledge bases. These triple stores are better equipped to handle several types of issues that relational databases are not designed to handle, including the aforementioned issues related to queries with multiple joins across entities, inference on datasets (Weiss, Karras & Bernstein, 2008), qualitative spatial reasoning, or the ability to share data schemas across multiple repositories.
In this context, SPARQL and GeoSPARQL are semantic query languages used to retrieve and process KGs. Regarding SPARQL, it is a semantic query language defined by the World Wide Web Consortium (W3C) that is capable of retrieving and manipulating data stored in RDF. The language is capable of expressing queries across a range of data sources, including those stored in native RDF format or viewed as RDF through the use of middleware. The following example SPARQL query is presented to illustrate the retrieval of cities and their associated countries from DBpedia, a cross-domain knowledge base that remains a core component of the Web of Data Cloud (Bizer et al., 2009).
The SPARQL 1.1 Federated Query version enables users to construct queries and share data schemas through the utilization of ontologies (see related topics) on the Web. This process employs the same language layer (RDF) and enables the retrieval of data from disparate KGs distributed across multiple SPARQL endpoints. These endpoints are services that facilitate queries against data stored in RDF using the SPARQL language. The subsequent example presents a federated query that retrieves information from two knowledge bases, such as Wikidata and LinkedGeoData. This query employs geometrical information to ascertain the location of Automated Teller Machines (ATMs) belonging to the Bankcard-Servicenetz interbank network situated in Munich, Germany. The Wikidata initiative has compiled a collection of different examples of (federated) queries.
The Open Geospatial Consortium (OGC) has defined GeoSPARQL (Car et al., 2024) as a means of describing and querying geospatial data on the Semantic Web. Next, a dummy example from the GeoSPARQL standard is shown to represent a feature and its associated geometry.
The GeoSPARQL 1.1 specification comprises three principal elements: (i) a compact spatial domain OWL (Web Ontology Language) ontology, which enables the direct representation of geometrical forms in conjunction with spatial entities and the establishment of relationships between features through the use of spatial relations; (ii) SPARQL extension function definitions, which facilitate the calculation of relations between spatial objects; and (iii) a range of supplementary resources, including vocabularies of Simple Feature types and data validators.
Some developments and experiences in the GIScience community have demonstrated the feasibility of expressing federated queries over diverse KGs to retrieve and process geospatial data from disparate and distributed sources. In this line, Páez and Vilches-Blázquez (2022) provided an approach to request, retrieve, and consume (geospatial) KGs available on different and distributed platforms supporting SPARQL 1.1 and GeoSPARQL standards. Various examples of federated queries using GeoSPARQL are provided herein. In addition, this approach enables the consumption of geospatial KGs through lightweight web applications or a GIS application, such as QGIS.
Explain the main concepts and elements of SPARQL
Perform SPARQL queries on existing knowledge graphs
Present the GeoSPARQL ontology
Transform geospatial data to RDF using the GeoSPARQL ontology
Explain the SPARQL extension contained in GeoSPARQL
Execute diverse types of geospatial semantic queries