The scientific and engineering advancements in the 21st century pose grand computing challenges in managing big data, using complex algorithms to extract information and knowledge from big data, and simulating complex and dynamic physical and social phenomena. Cloud computing emerged as new computing model with the potential to address these computing challenges. This entry first introduces the concept, features and service models of cloud computing. Next, the ideas of generalized architecture and service models of spatial cloud computing are then elaborated to identify the characteristics, components, development and applications of spatial cloud computing for geospatial sciences.
Huang, Q. (2020). Spatial Cloud Computing. The Geographic Information Science & Technology Body of Knowledge (2nd Quarter 2020 Edition), John P. Wilson (ed.). DOI: 10.22224/gistbok/2020.2.7.
Cloud computing: a computing model for enabling ubiquitous, convenient, and on-demand network access to a shared pool of configurable computing resources (e.g., servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (Mell and Grance 2009).
Spatial cloud computing: the cloud computing paradigm that is driven by geospatial sciences, and optimized by spatiotemporal principles for enabling geospatial science discoveries and cloud computing within distributed computing environment (Yang et al. 2011b).
Infrastructure as a Service (IaaS): As the most popular cloud service, IaaS provisions on-demand computing power, storage, networks, and other fundamental computing resources where the cloud consumer is able to deploy and run arbitrary software. IaaS enables users to obtain, access, and control a cloud server as a local server.
Platform as a Service (PaaS): PaaS provides cloud service capability for application development and deployment onto the cloud infrastructure based on a set of programming languages, libraries, services, and tools, configured as a solution by the provider. PaaS often spans the entire lifecycle of application development, including coding, testing, deployment, runtime, hosting and delivery (Hackett 2016).
Software as a Service (SaaS): As the most used cloud service, SaaS provides various capabilities of sophisticated applications that are traditionally delivered through the Web browser to end users (Armbrust et al. 2010).
2.1 Concepts
While the idea of cloud computing can be traced back to the 1950s, the conceptual model was formally proposed in the 1980s, the development started in the 1990s, and successful cloud services only became popular within the past decade (Yang and Huang 2013, Voas and Zhang 2009). Driven by cost- efficiency, auto-scaling and flexibility of cloud, many organizations have migrated their information technology (IT) systems to cloud computing, meanwhile more IT enterprises are providing cloud services with their products (Armbrust et al. 2010). With the heterogeneity of cloud services and the need of a guidance to the industry and agencies to offer or consume cloud services, National Institute of Standards and Technology (NIST) officially identified the standards for cloud computing, and defined cloud computing as “a model for enabling ubiquitous, convenient, and on-demand network access to a shared pool of configurable computing resources (e.g., servers, storage, networks, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell and Grance 2009).
Cloud computing now is considered as a general term for anything that delivers hosted services over the Internet (Attaran and Woods 2019). It begins with serving emails, and then is expanded to include many other computing capabilities and resources as services (Banerjee et al. 2011). To date, the industry offers many different types of cloud services ranging from the infrastructure level, such as Amazon Elastic Compute Cloud (Amazon EC2), to the application level, such as email, and document sharing. In particular, cloud computing is often provided through three types of service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).
2.2 Key Features
NIST outlines cloud computing’s five essentials characteristics, including on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service (Mell and Grance 2009). These five characteristics differentiate cloud computing from other distributed computing models, such as grid computing (Foster and Kesselman 2003).
2.3 Deployment Models
Based on how exclusive the computing resources are served to a cloud consumer, cloud computer are often categorized into four types: public cloud, private cloud, community cloud and hybrid cloud (Liu et al. 2011).
Figure 1. Cloud platform types and software solutions. Image source: author.
3. Spatial Cloud Computing: Enabling Geospatial Applications with Cloud Computing
3.1 Concept
Undoubtedly, in comparison to the current supports for geospatial science research and applications, such as parallel computing technology or grid computing technology only delivering computing power, geospatial scientists could benefit more from cloud computing since computing power is only one of the capabilities of cloud computing. However, it remains a significant challenge to fully exploit cloud computing to support geospatial science communities mostly because geospatial applications are different from common applications in the IT field (e.g., accounting), and have specific requirements to cloud computing platform. In particular, geospatial science problems are featured of spatial constraints and principles in the temporal and spatial scale (Yang et al. 2011b). Cloud computing platform to support geospatial science applications should take those spatial principles and constraints into consideration to better leverage and optimize cloud computing infrastructure and services. Accordingly, Yang et al. (2011a) officially defined spatial cloud computing as “the cloud computing paradigm that is driven by geospatial sciences, and optimized by spatiotemporal principles for enabling geospatial science discoveries and cloud computing within distributed computing environment”.
Spatial cloud computing optimizes the selection of cloud data centers, schedules the computing tasks by minimizing delay and cost, and maximizes the performance of the computing tasks (Figure 2). In order to maximize the elasticity, scalability, and the high-end computing capabilities offered by cloud computing for a geospatial application, several spatial and spatiotemporal patterns need to be considered and integrated: 1) the physical location of computing resources, 2) distribution of data, 3) dynamic access of users at different locations and times, and 4) study area of the application. In fact, a key technique for making big spatial data applications perform well is to consider the location, time, computing capabilities, data, and user characteristics (i.e., context), by leveraging these spatiotemporal patterns (Yang et al. 2017). For example, a location-aware application outperforms those without location-aware capability by a factor of 3-11 in the performance (Kozuch et al. 2009).
3.2 Architecture
Figure 2 shows a generalized architecture to implement a spatial cloud computing platform. To address the computing challenge by the geospatial science models, and big data challenges from the observations and model output, the design and development of a spatial cloud platform should consider three aspects: (1) the underlying computing infrastructure, (2) the computing and geospatial functions independent of the domain applications, and (3) the application level functions and interfaces directly accessible by the users (Figure 2).
Figure 2. A generalized architecture for implementing a spatial cloud computing platform. Image source: author.
First, the computing infrastructure can integrate both traditional high performance cluster infrastructure, and scalable cloud resources, which could be provisioned from private cloud platform, public cloud platform or both. By leveraging cloud resources as underlying computing infrastructure, spatial cloud computing platform can scale up automatically to run the scientific models, and handle the massive spatiotemporal data management, access, processing, analysis and visualization for different domain science applications. However, the popularity of cloud computing produces many cloud vendors and cloud computing platforms with each having their own unique strengths and limitations. Meanwhile, many cloud-enabling tools and technologies (e.g., Eucalyptus, Cloudstack, and OpenNebula) are capable of transforming an organization's existing infrastructure to a private or a hybrid cloud (Huang et al. 2013b).
While all major public or private cloud resources can contribute to build a large-scale, flexible, dynamic computing pool, cloud platforms and solutions vary wildly, making the selection and design of cloud infrastructure a major challenge. In particular, each platform may adopt different IT technologies (e.g., virtualization, storage) and have different computational capacities, scalability, price rules, security mechanisms, reliability, customization degree, usability and geographic distribution of cloud regions (Gui et al. 2014). As such, an in-depth evaluation based on these platform specific factors, along with application features (e.g., data volume size, data transfer speed, data communication and access frequency, computing intensity) and requirements (e.g., CPU, memory, storage, network, bandwidth, OS type, geolocation), should be performed to implement a platform that can satisfy the application requirements, minimize the computing cost, and maximize computation capacity provisioning.
Second, the key component of a spatial cloud computing platform should offer both computing and geospatial services that enable the data, computing and model resources to be integrated within a cloud-based cyberinfrastructure environment. This component is often defined as spatial cloud computing middleware (SCCM), hiding all the complexity of computing and data processing for the end users. Computing service provides a variety of functions to manage and leverage underlying multi-sourced computing infrastructure, such as computing task scheduling, computing resource communication and management, achieving interoperability among local IT infrastructure and different clouds, cloud resource operation and manipulation, cloud security control, user authentication and authorization, etc. Several essential computing functions to enable on-demand and flexible computing power of cloud computing are briefly introduced as below.
Depending on the maturity of cloud solutions adopted in the computing infrastructure level (Figure 2), the design and implementation of SCCM would differ considerably. Most public and provide cloud solutions support the aforementioned computing functions to a certain degree, whereas their performance (e.g., the performance of virtualization technology to launch a VM), and implementability (i.e., the easiness and possibility of implementation and customization) are highly varying (Huang et al. 2013b). For example, while many cloud services provide elasticity mechanisms, utilizing the auto-balancing or auto-scaling capabilities of cloud computing to achieve elasticity requires complex configurations and development. For EC2 cloud infrastructure, users need to configure a complex JavaScript Object Notation (JSON) template file with many sophisticated parameters to define the resource scaling rules (e.g., when, where and how to scale up a VM). Private cloud platforms built on open source cloud solutions (e.g., Eucalyptus, Cloudstack) do not even support auto-scaling through the web console or user interface. However, both public and private cloud platforms provide APIs to implement those capabilities. To address this gap and enable easy use of cloud computing, the SCCM may implement an advanced cloud load balance function to elastically provide cloud resources for different data analysis and computing requirements. With this function to enable the easy or even automatic definition of parameters for scaling rules (e.g., cloud regions for the application to scale up more resources, and the maximum number of VMs to be scaled up), the platform will support data analytics and visualization with specified computing resource information, security groups, and elastic rules.
To construct a spatial computing environment different with the common IT cloud platform, a key issue is how to incorporate the underlying resources through SCCM to support geospatial science applications. In particular, while managing, organizing, and scheduling both computing resources and instances, SCCM should apply the spatial principles and constraints to better leverage cloud computing performance for geospatial science problems. In addition to the computing capabilities common in a general cloud computing platform, geospatial services should be incorporated to provide a collection of spatial functions, and to address the issues of the data and service integration and interoperability across different models, organizations and science domains. The geospatial service often includes the following functions:
Finally, the underlying computing powers, and services often are accessible through a user-friendly spatial cloud portal. This portal serves a web-based spatial gateway for leveraging the underlying computing infrastructure and services, which are hidden from the cloud consumers, to support different domain applications. While the application level functions may vary across different scientific problems (e.g., air quality, water), common functions include data access, data visualization, model configuration and model-run tracking analysis, and data dissemination, to facilitate model runs and scientific discovery, and results sharing among the geospatial science communities. While cloud user (i.e., consumer) can access the cloud services through spatial cloud portals, only local user and administrator can directly access the private physical servers through the computing resource management interface, or command line interface.
3.3 Spatial Cloud Service Models
In addition to the three cloud services (IaaS, PaaS, and SaaS) defined by NIST (Mell and Grance 2009), several cloud services were particularly conceptualized and developed in geospatial science fields, and essential to geospatial applications, including Data as a Service (DaaS), Model as a Service (MaaS), Geoprocessing as a Service (GaaS), and Workflow as a Service (WaaS). These service models aim to enhance the delivery of data and data processing (DaaS), promote the sharing and interoperability of models (MaaS), enhance the geoprocessing capabilities (GaaS), and ease the procedure of model configuration and runs (WaaS), and therefore greatly facilitate the geospatial science in sharing and reusing data, model, and knowledge across communities.