How good can predictions from check-ins be?

Check-in data such as provided by Foursquare is one kind of social media feeds which gained considerable interest over recent years. This interest is partly due to their high degree of semantic detail, given that users check-in at places which are categorized by a relatively well-defined taxonomy. One associated prevalent task, in research as well as practical applications, is predicting human behaviour from such datasets. Assessing the success of predictions is, however, a complicated task given that the theoretical predictability inherent to such datasets remained largely unknown so far. In a recent study, which was published in Geoinformatica, we investigated the bounds of this inherent predictability of Foursquare datasets, with respect to their power in forecasting future spatial and temporal check-ins. We found that, for three exemplary yet representative datasets from Chicago, New York City and Los Angeles, the predictability ranges on an interval of [27%, 92%]. This result indicates that a certain level of accuracy will be reached even by the worst prediction algorithms. Further, this result allows estimating the relative performance of prediction algorithms. We also investigated the influence of check-in frequencies on the predictability. Our results show that the individual user-based check-in frequency has no or little effect. That is, the majority of users tend to be relatively regular with respect to their online check-in behaviour. In contrast, the check-in frequencies associated with places and time slots are negatively correlated with predictability. In other words: the cumulative mixture of people contributing to places and time slots leads to an increased level of semantic complexity, in turn, lowering predictability. However, the latter outcome also indicates great leverage effects and therefore offers potential for improving prediction algorithms.

Li, M., Westerholt, R., Fan, H. et al. (2016): Assessing spatiotemporal predictability of LBSN: a case study of three Foursquare datasets. Geoinformatica (2016). Online First Nov 2016. doi:10.1007/s10707-016-0279-5.

You can also check out a freely accessible version here: