Exploring spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks

The investigation of human activity patterns from location-based social networks like Twitter is a promising example of how to infer relationships and latent information for the characterization of urban structures. While there is a growing research body performing spatial analysis on social media data, the high dimensionality, complexity and granularity of social media information still constitute an unresolved research issue. More precisely, user-generated datasets are of multi-scale nature, which causes limited applicability of commonly known geospatial analysis methods.

Therefore in a recently accepted paper, we propose an unsupervised neural network approach to discover collective human activities within geospatial, temporal and semantic characteristics from unstructured georeferenced tweets by using a combined geographic hierarchical self-organizing map (Geo-H-SOM).

The Figure blow shows semantic classified tweets posted within one year vertically extruded as spikes and aggregated for two exemplary topics T2-train and T3-game.

The results of our Geo-H-SOM method, which we validated in a distinct case study, demonstrate the ability to explore, abstract and cluster high-dimensional geospatial and semantic information from crowdsourced data, overcoming limitations of previous purely geospatial analysis approaches. For our case study we have shown that similarities among spatiotemporal and semantic information reveal latent human activity patterns and are a proxy indicator for the characterization of underlying urban structures. As a result, our combined Geo-SOM/H-SOM model considers tweets to be “similar” if the distance to each other is small in semantic space, in geographic space and in the time domain.

The fundamental concept behind SOMs to learn and recognize patterns on any given dataset, together with the ability to handle multiple input variables from diverse information sources (like Twitter) without having explicit knowledge about urban structures, has a great potential for modeling and predicting certain behavior and relationships for various application domains, including the investigation of user activities and collective activity structures, the study and forecast of human mobility flows or the event detection and prediction within the application of disease-, health- and disaster management.

Steiger, E., Resch, B., Zipf, A. (accepted 2015): Exploration of spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks. International Journal of Geographical Information Science (IJGIS), volume and issue pending, pp. pending, Taylor & Francis. (Published online) doi:10.1080/13658816.2015.1099658