Open map List

[CP-01-008] Spatial Cloud Computing

The scientific and engineering advancements in the 21st century pose grand computing challenges in managing big data, using complex algorithms to extract information and knowledge from big data, and simulating complex and dynamic physical and social phenomena. Cloud computing emerged as new computing model with the potential to address these computing challenges. This entry first introduces the concept, features and service models of cloud computing. Next, the ideas of generalized architecture and service models of spatial cloud computing are then elaborated to identify the characteristics, components, development and applications of spatial cloud computing for geospatial sciences.

Author and citation

Huang, Q. (2020). Spatial Cloud Computing. The Geographic Information Science & Technology Body of Knowledge (2nd Quarter 2020 Edition), John P. Wilson (ed.). DOI: 10.22224/gistbok/2020.2.7.

Explanation

Definitions
Cloud Computing Overview
Spatial Cloud Computing: Enabling Geospatial Applications with Cloud Computing

1. Definitions

Cloud computing: a computing model for enabling ubiquitous, convenient, and on-demand network access to a shared pool of configurable computing resources (e.g., servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (Mell and Grance 2009).

Spatial cloud computing: the cloud computing paradigm that is driven by geospatial sciences, and optimized by spatiotemporal principles for enabling geospatial science discoveries and cloud computing within distributed computing environment (Yang et al. 2011b).

Infrastructure as a Service (IaaS): As the most popular cloud service, IaaS provisions on-demand computing power, storage, networks, and other fundamental computing resources where the cloud consumer is able to deploy and run arbitrary software. IaaS enables users to obtain, access, and control a cloud server as a local server.

Platform as a Service (PaaS): PaaS provides cloud service capability for application development and deployment onto the cloud infrastructure based on a set of programming languages, libraries, services, and tools, configured as a solution by the provider. PaaS often spans the entire lifecycle of application development, including coding, testing, deployment, runtime, hosting and delivery (Hackett 2016).

Software as a Service (SaaS): As the most used cloud service, SaaS provides various capabilities of sophisticated applications that are traditionally delivered through the Web browser to end users (Armbrust et al. 2010).

2. Cloud Computing Overview

2.1 Concepts

While the idea of cloud computing can be traced back to the 1950s, the conceptual model was formally proposed in the 1980s, the development started in the 1990s, and successful cloud services only became popular within the past decade (Yang and Huang 2013, Voas and Zhang 2009). Driven by cost- efficiency, auto-scaling and flexibility of cloud, many organizations have migrated their information technology (IT) systems to cloud computing, meanwhile more IT enterprises are providing cloud services with their products (Armbrust et al. 2010). With the heterogeneity of cloud services and the need of a guidance to the industry and agencies to offer or consume cloud services, National Institute of Standards and Technology (NIST) officially identified the standards for cloud computing, and defined cloud computing as “a model for enabling ubiquitous, convenient, and on-demand network access to a shared pool of configurable computing resources (e.g., servers, storage, networks, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell and Grance 2009).

Cloud computing now is considered as a general term for anything that delivers hosted services over the Internet (Attaran and Woods 2019). It begins with serving emails, and then is expanded to include many other computing capabilities and resources as services (Banerjee et al. 2011). To date, the industry offers many different types of cloud services ranging from the infrastructure level, such as Amazon Elastic Compute Cloud (Amazon EC2), to the application level, such as email, and document sharing. In particular, cloud computing is often provided through three types of service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).

IaaS is the most popular cloud service, provisioning computing power, storage, networks, and other fundamental computing resources where the cloud consumer (i.e., user) can deploy and run arbitrary software. IaaS enables users to obtain, access, and control a cloud server as a local server. A popular IaaS representative is the EC2 service.
PaaS enables the development and deployment of applications onto the cloud infrastructure based on a set of programming languages, libraries, services, and tools, packaged as a solution supported by cloud provider. PaaS often spans the entire lifecycle of application development, including coding, testing, deployment, runtime, hosting and delivery (Hackett 2016). A popular representative of PaaS is Windows Azure with Visual Studio provided by Microsoft.
SaaS is the most used cloud service, offering various capabilities of sophisticated applications that are traditionally delivered through the Web browser to end users (Armbrust et al. 2010). In this case, cloud provider handles all aspects of the application, such as development, performance, maintenance, security, and availability. Well-known SaaS examples are Google Apps, a suite of cloud products for communication and collaboration, including Gmail, Docs, Drive, Calendar and others.

2.2 Key Features

NIST outlines cloud computing’s five essentials characteristics, including on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service (Mell and Grance 2009). These five characteristics differentiate cloud computing from other distributed computing models, such as grid computing (Foster and Kesselman 2003).

On-demand self-service: Cloud computing often has a large computing resource pool for users to access on demand at the back-end. Such a pool gives users unprecedented computing power with minimal management effort and interaction with the cloud provider.
Broad network access: Cloud resources are available over the network and can be accessed through a simple web interface, and different types of network terminals (e.g., mobile phones, laptops and personal digital assistants).
Resource pooling: The cloud resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to the user and application requirements (Liu et al. 2011). Traditional IT systems were used only 10% - 30% of their available computing power, whereas desktop computers utilize less than 5% of their full capacity (Marston et al. 2011). Cloud computing significantly improves the sharing of computing resources across organizations and boosts the utilization of computing resources up to 80% (Yang et al. 2011a). Meanwhile, with the shared resource model, cloud computing reduces the cost for both cloud providers and consumers to provide, purchase, operate, and maintain the computing resources.
Rapid elasticity: Within cloud services, applications can be configured to elastically acquire more resources to handle spike workloads and rapidly release the resources when the loads decrease (Huang et al. 2010). For example, EC2 provides auto-scaling service, allowing cloud consumers to scale EC2 computing capacity up or down automatically according to pre-defined conditions, such as central processing unit (CPU) utilization, and the user concurrent access number.
Measured service: Cloud resource usage is automatically monitored, controlled, reported, and charged, creating transparency of the consumed services for both cloud provider and consumer. In general, cloud vendors charge computing usage by the hour without long-term commitments from consumers. In addition, the cloud consumers can also reserve cloud resources for a long-term at a low, wholesale price. Further, some providers (e.g., Amazon EC2) even offer the price bid option for consumers to bid on unused cloud resources at an even lower price.

2.3 Deployment Models

Based on how exclusive the computing resources are served to a cloud consumer, cloud computer are often categorized into four types: public cloud, private cloud, community cloud and hybrid cloud (Liu et al. 2011).

cloud platform types and software solutions

Figure 1. Cloud platform types and software solutions. Image source: author.

A public cloud is available for open access and use by the general public. Such a cloud system is usually provided, managed, and operated by a company, and the cloud resources are charged at a pay-as-you-go model. Therefore, the public cloud is also known as a commercial cloud. To date, the public cloud is probably the most popular and mature type in cloud offerings.
A community cloud often serves for a group of cloud consumers in a specific community with similar interests and concerns, such as mission, performance, security, privacy, compliance, and jurisdiction. It may be initiated for a single organization or multiple organizations, and managed internally or by a third party.
A private cloud infrastructure is exclusively provisioned for a single organization. While offering many of the same economic and operational benefits as the public cloud, the private clouds also allow companies or organizations to have full control over their computing infrastructure. There are several open-sourced solutions available to transform the private physical infrastructure into a private cloud or building a community cloud, such as Eucalyptus, CloudStack, OpenStack, and OpenNebula (Huang et al. 2013b).
Hybrid clouds are often established to meet specific concerns or needs with a composition of two or more of the aforementioned clouds (public, community or private cloud). IT software enterprises may build a hybrid cloud with two private cloud systems, where one serves as the official product system, and the other as the developing and testing platform. Public and private clouds are also often bound to achieve both cost-efficiency and on-demand computing power. To facilitate the access and management of different private and public clouds, a few services and solutions are developed. For example, enStratius delivers brokerage for more than 10 cloud platforms.

3. Spatial Cloud Computing: Enabling Geospatial Applications with Cloud Computing

3.1 Concept

Undoubtedly, in comparison to the current supports for geospatial science research and applications, such as parallel computing technology or grid computing technology only delivering computing power, geospatial scientists could benefit more from cloud computing since computing power is only one of the capabilities of cloud computing. However, it remains a significant challenge to fully exploit cloud computing to support geospatial science communities mostly because geospatial applications are different from common applications in the IT field (e.g., accounting), and have specific requirements to cloud computing platform. In particular, geospatial science problems are featured of spatial constraints and principles in the temporal and spatial scale (Yang et al. 2011b). Cloud computing platform to support geospatial science applications should take those spatial principles and constraints into consideration to better leverage and optimize cloud computing infrastructure and services. Accordingly, Yang et al. (2011a) officially defined spatial cloud computing as “the cloud computing paradigm that is driven by geospatial sciences, and optimized by spatiotemporal principles for enabling geospatial science discoveries and cloud computing within distributed computing environment”.

Spatial cloud computing optimizes the selection of cloud data centers, schedules the computing tasks by minimizing delay and cost, and maximizes the performance of the computing tasks (Figure 2). In order to maximize the elasticity, scalability, and the high-end computing capabilities offered by cloud computing for a geospatial application, several spatial and spatiotemporal patterns need to be considered and integrated: 1) the physical location of computing resources, 2) distribution of data, 3) dynamic access of users at different locations and times, and 4) study area of the application. In fact, a key technique for making big spatial data applications perform well is to consider the location, time, computing capabilities, data, and user characteristics (i.e., context), by leveraging these spatiotemporal patterns (Yang et al. 2017). For example, a location-aware application outperforms those without location-aware capability by a factor of 3-11 in the performance (Kozuch et al. 2009).

3.2 Architecture

Figure 2 shows a generalized architecture to implement a spatial cloud computing platform. To address the computing challenge by the geospatial science models, and big data challenges from the observations and model output, the design and development of a spatial cloud platform should consider three aspects: (1) the underlying computing infrastructure, (2) the computing and geospatial functions independent of the domain applications, and (3) the application level functions and interfaces directly accessible by the users (Figure 2).

Figure 2. A generalized architecture for implementing a spatial cloud computing platform. Image source: author.

First, the computing infrastructure can integrate both traditional high performance cluster infrastructure, and scalable cloud resources, which could be provisioned from private cloud platform, public cloud platform or both. By leveraging cloud resources as underlying computing infrastructure, spatial cloud computing platform can scale up automatically to run the scientific models, and handle the massive spatiotemporal data management, access, processing, analysis and visualization for different domain science applications. However, the popularity of cloud computing produces many cloud vendors and cloud computing platforms with each having their own unique strengths and limitations. Meanwhile, many cloud-enabling tools and technologies (e.g., Eucalyptus, Cloudstack, and OpenNebula) are capable of transforming an organization's existing infrastructure to a private or a hybrid cloud (Huang et al. 2013b).

While all major public or private cloud resources can contribute to build a large-scale, flexible, dynamic computing pool, cloud platforms and solutions vary wildly, making the selection and design of cloud infrastructure a major challenge. In particular, each platform may adopt different IT technologies (e.g., virtualization, storage) and have different computational capacities, scalability, price rules, security mechanisms, reliability, customization degree, usability and geographic distribution of cloud regions (Gui et al. 2014). As such, an in-depth evaluation based on these platform specific factors, along with application features (e.g., data volume size, data transfer speed, data communication and access frequency, computing intensity) and requirements (e.g., CPU, memory, storage, network, bandwidth, OS type, geolocation), should be performed to implement a platform that can satisfy the application requirements, minimize the computing cost, and maximize computation capacity provisioning.

Second, the key component of a spatial cloud computing platform should offer both computing and geospatial services that enable the data, computing and model resources to be integrated within a cloud-based cyberinfrastructure environment. This component is often defined as spatial cloud computing middleware (SCCM), hiding all the complexity of computing and data processing for the end users. Computing service provides a variety of functions to manage and leverage underlying multi-sourced computing infrastructure, such as computing task scheduling, computing resource communication and management, achieving interoperability among local IT infrastructure and different clouds, cloud resource operation and manipulation, cloud security control, user authentication and authorization, etc. Several essential computing functions to enable on-demand and flexible computing power of cloud computing are briefly introduced as below.

Virtualization: As the most important enabling technologies of cloud computing, virtualization enables cloud computing to create a dynamic number of computing instances (i.e., virtual machines [VMs]) above the physical infrastructure based on the needs of application. Virtualization also makes a computing system capable to acquire, operate, or release VMs in a manner such that each VM operates within its own unique system environment (e.g., operating system, applications) and the crash of any VM system will not result in total system failure.
Networking: As the VMs and services should be publicly accessible to the network, SCCM should accordingly provide networking capabilities, such as dynamically setting up public/private IPs, Mac address and domain names.
Scheduling: When customers request to create a VM or deploy an application, the SCCM needs to determine which physical machine should be utilized. Such functionality is called scheduling. Except for local physical infrastructure, the SCCM can also connect to public cloud resources, such as Amazon EC2, to construct a hybrid cloud computing platform. While running computing tasks on local servers are typically free, leveraging cloud platforms will have a cost. As such, the scheduler should consider both spatiotemporal patterns of computing resources, data, users and applications, as well as costs for data access and processing to better support geospatial science applications.
Security: A cloud solution should provide different levels of security for cloud operation and network isolation. Cloud operation security encompasses all security mechanisms related to the access and operation of a cloud platform. These include access to the cloud itself by users of different groups, such as cloud administrators, and cloud consumers. Without network isolation, tenants (i.e., cloud consumers) could be exposed to a large part of the network, access data on the network that does not belong to them, or invoke side-channel tenant attacks (Oracle 2012). A network design with proper resource control and security ensures these issues are well addressed.
Communication, monitoring, and management: In order to determine which physical machine is available, the SCCM should know the availability of memory and computing capacity of each physical machine. Thus SCCM should incorporate the capability of communicating, monitoring and managing the physical computing resources. In addition, SCCM needs to provide the same communicating, monitoring and managing capability to VMs and other cloud resources (e.g., storage) to ensure cloud operations.
Load balance: In geospatial applications, elasticity is especially essential since they may require the allocation of scalable computing resources dynamically. For example, responding to natural disasters (e.g., earthquakes, wildfires and tsunami) requires elastically bringing up more computing resources to handle the spike requests from the public and decision makers (Huang et al. 2013a).

Depending on the maturity of cloud solutions adopted in the computing infrastructure level (Figure 2), the design and implementation of SCCM would differ considerably. Most public and provide cloud solutions support the aforementioned computing functions to a certain degree, whereas their performance (e.g., the performance of virtualization technology to launch a VM), and implementability (i.e., the easiness and possibility of implementation and customization) are highly varying (Huang et al. 2013b). For example, while many cloud services provide elasticity mechanisms, utilizing the auto-balancing or auto-scaling capabilities of cloud computing to achieve elasticity requires complex configurations and development. For EC2 cloud infrastructure, users need to configure a complex JavaScript Object Notation (JSON) template file with many sophisticated parameters to define the resource scaling rules (e.g., when, where and how to scale up a VM). Private cloud platforms built on open source cloud solutions (e.g., Eucalyptus, Cloudstack) do not even support auto-scaling through the web console or user interface. However, both public and private cloud platforms provide APIs to implement those capabilities. To address this gap and enable easy use of cloud computing, the SCCM may implement an advanced cloud load balance function to elastically provide cloud resources for different data analysis and computing requirements. With this function to enable the easy or even automatic definition of parameters for scaling rules (e.g., cloud regions for the application to scale up more resources, and the maximum number of VMs to be scaled up), the platform will support data analytics and visualization with specified computing resource information, security groups, and elastic rules.

To construct a spatial computing environment different with the common IT cloud platform, a key issue is how to incorporate the underlying resources through SCCM to support geospatial science applications. In particular, while managing, organizing, and scheduling both computing resources and instances, SCCM should apply the spatial principles and constraints to better leverage cloud computing performance for geospatial science problems. In addition to the computing capabilities common in a general cloud computing platform, geospatial services should be incorporated to provide a collection of spatial functions, and to address the issues of the data and service integration and interoperability across different models, organizations and science domains. The geospatial service often includes the following functions:

Isolate and componentize kernel GIS data process functions (e.g., geospatial data reprojection, and format conversion), spatial analysis, and spatial statistics as services.
Standardize the interfaces to achieve the data interoperability among communities.
Provide community tools that have consensus from different domain users for handling unique scientific data, such as Shapefile, GeoJSON, NetCDF, Grib, and HDF-EOS datasets.

Finally, the underlying computing powers, and services often are accessible through a user-friendly spatial cloud portal. This portal serves a web-based spatial gateway for leveraging the underlying computing infrastructure and services, which are hidden from the cloud consumers, to support different domain applications. While the application level functions may vary across different scientific problems (e.g., air quality, water), common functions include data access, data visualization, model configuration and model-run tracking analysis, and data dissemination, to facilitate model runs and scientific discovery, and results sharing among the geospatial science communities. While cloud user (i.e., consumer) can access the cloud services through spatial cloud portals, only local user and administrator can directly access the private physical servers through the computing resource management interface, or command line interface.

3.3 Spatial Cloud Service Models

In addition to the three cloud services (IaaS, PaaS, and SaaS) defined by NIST (Mell and Grance 2009), several cloud services were particularly conceptualized and developed in geospatial science fields, and essential to geospatial applications, including Data as a Service (DaaS), Model as a Service (MaaS), Geoprocessing as a Service (GaaS), and Workflow as a Service (WaaS). These service models aim to enhance the delivery of data and data processing (DaaS), promote the sharing and interoperability of models (MaaS), enhance the geoprocessing capabilities (GaaS), and ease the procedure of model configuration and runs (WaaS), and therefore greatly facilitate the geospatial science in sharing and reusing data, model, and knowledge across communities.

DaaS may refer to “Database as a service” for improved data storage, management and query (Mateljan, Cisic and Ogrizovic 2010), and “Discovery as a service” (Elgazzar, Hassanein and Martin 2014) for facilitating web service discovery in the literature. For example, Mateus et al. (2016) introduced the concepts of cloud spatial data warehouses and spatial online analytical processing as a service to better host databases, process analytical workloads and deliver database as a service. However, DaaS often means “Data as a Service”, addressing the issues of data discoverability, accessibility, utilizability, quality of services, pricing, and security. As data has become the enabling technology for many innovations, DaaS emerged to support data storage, discovery, access, and utilization, and deliver data and data processing on demand to end users without geographical and scalability limitations (Rajesh, Swapna and Reddy 2012). DaaS enables data-oriented innovations by better allocating and optimizing data, processing, computing resources and cloud operations.
MaaS (Roman et al. 2009, Li et al. 2017) concept was first introduced by (Roman et al. 2009) to improve model integration and interoperability across various disciplines and fields. This concept was then extended and detailed in Li et al. (2017)’s work, where cloud computing services were leveraged to address the model computability challenges over the Internet for the public. Specifically, MaaS builds and publishes various geospatial models as services, which can be accessed with an online interactive interface. Using a global climate change model, a MaaS prototype was developed to demonstrate how MaaS automates the processes of setting up computing environment, configuring and running models, and managing model outputs (Li et al. 2017). Similarly, Wen et al. (2017) introduced a model-service deployment strategy that enables modelling participants to conveniently collaborate and make full use of modelling and computational resources across an open web environment. The Soil and Water Assessment Tool (SWAT) model is employed to illustrate the model-service deployment process and to demonstrate collaborative process when performing modelling with resources provided by different stake-holders
GaaS (Huang, Li and Li 2017) supports the processing of geospatial data before they can be used for modeling, data analysis and data mining. GaaS brings scalable, on-demand, and cost–effective geoprocessing services to geospatial users, and addresses big data challenges raised in the geospatial science domain. Handling big data requires high performance data processing tools enabling the extraction of knowledge from the unprecedented amount of data (Bellettini et al. 2013). However, existing IT infrastructure systems (e.g., grid computing) for data processing fall short in addressing many challenges (e.g., performance, data storage, and fault tolerance), while processing multi-sourced, heterogeneous, large-scale data, especially stream data (Zhang et al. 2015). As such, big data processing and analytics platforms based on cloud computing (Bellettini et al. 2013), MapReduce-based data processing frameworks (Ye et al. 2012), and open standards and interfaces (e.g., OGC standards) are widely used for massive data processing, displaying and sharing. For example, Kharouf et al (2017) proposed a cloud based geoprocessing architecture based on OGC standards. Within this architecture model, data management and geoprocessing can be bind together to store and process the geospatial data from heterogeneous sources, and to provide an on demand service based on the requirement of end user in a cloud platform. Yue et al. (2013) compared geoprocessing in two cloud computing platforms – Microsoft Windows Azure and Google App Engine, from the perspective of data storage, architecture model, and development environment, recommended applications of hybrid geoprocessing clouds, and suggested an interoperable solution on geoprocessing cloud services.
WaaS (Huang et al. 2017) eases the procedure of model configuration, and facilitates the scientific model runs. While MaaS enables the computability and accessibility of a single model, WaaS allows geoscientists to build a model or service workflow, and submit it for processing in the cloud system. The workflow recruits the needed models for a scientific application or task on the fly. Traditionally, various researchers and scientists needed to build a large number of complex algorithms, models and applications tailored to their specific studies (Neuschwander and Coughlan 2002). However, the goals achieved by these efforts are not always completely different despite the differences in research objectives. If the models and applications are built as cloud services, which in turn can be shared, reused and customized by others through a WaaS, duplicated efforts can be largely eliminated. As such, Tang et al. (2017) developed a cyber-enabled spatial decision support system (SDSS) framework, integrating scientific workflows and cloud computing, to facilitate the computationally challenging fieldwork design that requires the quick selection of base camps and plots for the inventory of mangroves. Scientific workflows enable the automation of data and modeling tasks in the SDSS, whereas cloud computing provides on-demand computational support for interoperation among stakeholders for collaborative scenario evaluation for the fieldwork design of mangrove inventory. Similarly, Das et al. (2016) proposed a Cloud-based geospatial orchestration framework to access and orchestrate spatial services to process complex GIS queries with geospatial data being massive and geographically distributed over multiple data centers, and a collection of GIS operations (e.g., filtration, buffer creation, intersection). Such queries require a sequence of geospatial web services or models to execute in several virtual machines. Alternatively, Tan et al. (2016) introduced the concept of Agent-as-a-Service (AaaS)-based geospatial service aggregation, encompassing the mechanisms and algorithms for geospatial Web Processing Service (WPS) generation, geoprocessing and aggregation. An AaaS infrastructure allows separately-hosted services and data to work together without transferring a large volume of spatial data, enriches geospatial service resources in the distributed environment by utilizing the agent cloning, migration and service regeneration capabilities of the AaaS, and enables the migration of services to target computing nodes to complete a task.

References

Learning outcomes

529 - Describe the concepts and characteristics of cloud computing.
Describe the concepts and characteristics of cloud computing.
637 - Describe the service models of spatial cloud computing, as well as the goals and key functions of each service model.
Describe the service models of spatial cloud computing, as well as the goals and key functions of each service model.
825 - Discuss the differences between cloud computing and spatial cloud computing.
Discuss the differences between cloud computing and spatial cloud computing.
1154 - Explain the generalized architecture of spatial cloud computing, and the functions of each component.
Explain the generalized architecture of spatial cloud computing, and the functions of each component.
1552 - Review different cloud service models.
Review different cloud service models.
1595 - Summarize the concepts of spatial cloud computing.
Summarize the concepts of spatial cloud computing.

[CP-01-008] Spatial Cloud Computing

Tags

Author and citation

Explanation

References

Learning outcomes

Related topics

[CP-01-008] Spatial Cloud Computing

Tags

Author and citation﻿

Explanation﻿

References

Learning outcomes

Related topics

Author and citation

Explanation