Big data

Topics

  • [CP-05-023] Google Earth Engine

    Google Earth Engine (GEE) is a cloud-based platform for planetary scale geospatial data analysis and communication.  By placing more than 17 petabytes of earth science data and the tools needed to access, filter, perform, and export analyses in the same easy to use application, users are able to explore and scale up analyses in both space and time without any of the hassles traditionally encountered with big data analysis.  Constant development and refinement have propelled GEE into one of the most advanced and accessible cloud-based geospatial analysis platforms available, and the near real time data ingestion and interface flexibility means users can go from observation to presentation in a single window.

  • [CP-05-031] Apache Hadoop and Spark

    Apache Hadoop and Apache Spark are two leading frameworks for distributed big data processing that have significantly impacted geospatial analytics. Both systems use clusters of commodity hardware in a shared-nothing architecture to scale out horizontally, allowing massive spatial datasets to be processed in parallel. Hadoop popularized the MapReduce programming model and excels at batch processing of very large files. Spark is a newer engine that builds on some of Hadoop’s concepts but introduces in-memory data processing and a more flexible execution model, often yielding faster performance for many tasks. This entry focuses on the differences between Hadoop’s disk-based MapReduce approach and Spark’s in-memory approach, especially in the context of spatial (vector and raster) data processing. We also highlight several systems that extend Hadoop or Spark specifically for spatial data, and discuss emerging trends toward integrating big data frameworks with higher-level query processing.

  • [CV-05-019] Big Data Visualization

    As new information and communication technologies have altered so many aspects of our daily lives over the past decades, they have simultaneously stimulated a shift in the types of data that we collect, produce, and analyze. Together, this changing data landscape is often referred to as "big data." Big data is distinguished from "small data" not only by its high volume but also by the velocity, variety, exhaustivity, resolution, relationality, and flexibility of the datasets. This entry discusses the visualization of big spatial datasets. As many such datasets contain geographic attributes or are situated and produced within geographic space, cartography takes on a pivotal role in big data visualization. Visualization of big data is frequently and effectively used to communicate and present information, but it is in making sense of big data – generating new insights and knowledge – that visualization is becoming an indispensable tool, making cartography vital to understanding geographic big data. Although visualization of big data presents several challenges, human experts can use visualization in general, and cartography in particular, aided by interfaces and software designed for this purpose, to effectively explore and analyze big data.

  • [DC-02-004] Social Media Platforms

    Social media is a group of interactive Web 2.0 Internet-based applications that allow users to create and exchange user-generated content via virtual communities. Social media platforms have a large user population who generate massive amounts of digital footprints, which are valuable data sources for observing and analyzing human activities/behavior. This entry focuses on social media platforms that provide spatial information in different forms for Geographic Information Systems and Technology (GIS&T) research. These social media platforms can be grouped into six categories: microblogging sites, social networking sites, content sharing sites, product and service review sites, collaborative knowledge sharing sites, and others. Four methods are available for capturing data from social media platforms, including Web Application Programming Interfaces (Web APIs), Web scraping, digital participant recruitment, and direct data purchasing. This entry first overviews the history, opportunities, and challenges related to social media platforms. Each category of social media platforms is then introduced in detail, including platform features, well-known platform examples, and data capturing processes.

  • [PD-01-020] Real-time GIS Programming and Geocomputation

    Streaming data generated continuously from sensor networks, mobile devices, social media platforms and other edge devices have posed significant challenges to existing computing platforms for achieving both high throughput and low latency data processing in addition to scalable computing. This entry introduces a real-time computing and programming platform for time-critical GIS (Geographic Information System) applications. In this platform, advanced streaming data processing software, such as Apache Kafka and Spark Streaming, are integrated to enable data analytics in real-time. This computing platform can also be extended to integrate GeoAI (Geospatial Artificial Intelligence) based machine learning models to leverage both historical and streaming data to achieve real-time prediction and intelligent geospatial analytics. Two real-time geospatial applications in terms of flood simulation and climate data visualization are introduced to demonstrate how real-time programming and computing can help tackle real-world problems with important societal impacts.