Semi-structured text

Topics

  • [FC-03-032] Semantic Information Elicitation

    The past few decades have been characterized by an exponential growth of digital information resources. A considerable amount of this information is semi-structured, such as XML files and metadata records and unstructured, such as scientific reports, news articles, and historical archives. These resources include a wealth of latent knowledge in a form mainly intended for human use. Semantic information elicitation refers to a set of related processes: semantic information extraction, linking, and annotation that aim to make this knowledge explicit to help computer systems make sense of the content and support ontology construction, information organization, and knowledge discovery.

    In the context of GIScience research, semantic information extraction aims at processing unstructured and semi-structured resources and identifying specific types of information: places, events, topics, geospatial concepts, and relations. These may be further linked to ontologies and knowledge bases to enrich the original unstructured content with well-defined meaning, provide access to information not explicit in the original sources, and support semantic annotation and search. Semantic analysis and visualization techniques are further employed to explore aspects latent in these sources such as the historical evolution of cities, the progression of phenomena and events and people’s perception of places and landscapes.

  • [DM-03-074] Modeling Semi-Structured and Unstructured Spatial Data

    This chapter surveys semi-structured and unstructured geospatial data, emphasizing their formats, challenges, and analytical approaches. Semi-structured data formats, such as JSON, do not follow rigid schemas but retain internal organization that supports spatial processing. These formats underpin many widely used datasets, including OpenStreetMap, and can represent both object-based and network-based spatial models. Unstructured data, including text, imagery, sensor streams, and point clouds, lack standardized formatting and must be transformed or enriched before spatial analysis is possible. For instance, crowdsourced or drone-collected imagery can be processed using Structure from Motion (SfM) to reconstruct 3D surfaces and terrain models. Textual data, such as social media posts or institutional reports, can be mined for geographic content using natural language processing techniques like named entity recognition and geoparsing. The chapter also considers recent developments in AI, including deep learning methods for image classification, segmentation of point clouds, and modeling spatiotemporal patterns from sensor data. Finally, it discusses the emerging role of multimodal models that integrate visual and textual information in geospatial workflows. Together, these tools and methods enable the use of increasingly diverse data sources in spatial analysis, broadening both the scope and depth of geographic inquiry.