Clustering and Analyzing Air Pollution Data using Self-Organizing Maps

Today Enrico Steiger is presenting the following paper at AGILE 2016 conference Helsinki:

Lee, M., Steiger, E. Zipf, A. (2016): Clustering and Analyzing Air Pollution Data using Self-Organizing Maps. 19th AGILE Conference on Geographic Information Science. Helsinki, Finnland.

In Geographic Information Science the rise in the availability of spatial data paved ways for increased research in different spatial data mining techniques. This paper presents a Spatial Self-Organizing Map (Spatial SOM) for analysing high-dimensional and complex spatial datasets. The results of the selected case study with air pollution data for the US has demonstrated that Spatial SOMs are an effective and efficient analysis tool with the ability to explore, detect, and visualize latent spatial structures and characteristics within large datasets.
In this study, we set out the objective to contribute and provide insights in overcoming the challenges with spatial big data. Specifically, we aimed to study ways in which we can apply and assess spatial SOM algorithms for handling large datasets, and how we can uncover latent structures in big data with spatial dependence.
The detailed analysis of the Spatial SOM result demonstrates that Spatial SOM is an effective tool in detecting cluster with spatial dependence in the data. It is able to detect chemical species variation across continental United States. In conclusion, the case study validates the effectiveness of Spatial SOM as an analysis tool for discovering not only hidden relationships in general attribute features but also for spatial features as well.
We demonstrated that Spatial SOMs can detect underlying latent spatial and chemical structures and covariates from pollution data collected from sensors using a case study with air pollution data. In other SOM variants, usually spatial attributes are treated as any other attributes given equal weights. However, in a Spatial SOM, more weight is given to the spatial attribute, so that input vectors or observations are largely aggregated based on their geographical proximity first, and then fine tuned by other chemical features.