Lowering the barriers for accessing distributed geospatial big data to advance spatial data science: the PolarHub solution
School of Geographical Sciences and Urban Planning
Arizona State University
Tempe, AZ 85287-5302, United States
Data is the crux of science. The widespread availability of big data today is of particular importance for fostering new forms of geospatial innovation. This paper reports a state-of-the-art solution that addresses a key cyberinfrastructure research problem—providing ready access to big, distributed geospatial data resources on the Web. We first formulate this data-access problem and introduce its indispensable elements, including identifying the cyber-location, space and time coverage, theme, and quality of the dataset. We then propose strategies to tackle each data-access issue and make the data more discoverable and usable for geospatial data users and decision makers. Among these strategies is large-scale web crawling as a key technique to support automatic collection of online geospatial data that are highly distributed, intrinsically heterogeneous, and known to be dynamic. To better understand the content and scientific meanings of the data, methods including space-time filtering, ontology-based thematic classification, and service quality evaluation are incorporated. To serve a broad scientific user community, these techniques are integrated into an operational data crawling system, PolarHub, which is also an important cyberinfrastructure building block to support effective data discovery. A series of experiments were conducted to demonstrate the outstanding performance of the PolarHub system. We expect this work to contribute significantly in building the theoretical and methodological foundation for data-driven geography and the emerging spatial data science.
Dr. Wenwen Li is Associate Professor of Computational Spatial Sciences at Arizona State University. She is the founding director of the Cyberinfrastructure and Computational Intelligence Lab (CICI, http://cici.lab.asu.edu), and Associate Director of SPatial Analysis Research Center (SPARC). Her main research interest is to develop new theories, methods and tools to advance spatial data interoperability, real-time visualization of big data, as well as high-performance spatial analytics. She has published over 70 papers in top-notch journals and conference proceedings, has served as the principle investigator (PI) of multiple national projects from NSF, United States Geological Survey and Open Geospatial Consortium. She was also the winner of 2015 NSF CAREER award, the most prestigious award of NSF to outstanding young scientists.