Do people communicate about their whereabouts? Investigating the relation between user-generated text messages and Foursquare check-in places

The social functionality of places (e.g. school, restaurant) partly determines human behaviors and reflects a region’s functional configuration. Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces. Assuming their potential impacts on human verbalization behaviors, one possibility is to link the functions of places to verbal representations such as users’ postings in location-based social networks (LBSNs). In a recently published study, we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places. We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places, which enables a reliable verification of the semantic concepts identified from user-generated text snippets. A latent semantic analysis is conducted on a large Foursquare check-in dataset. The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization. To further elaborate the representativeness of text messages, this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts (i.e., representativeness), and another investigation on semantic concepts to quantify how well they can be represented by text messages (i.e., representability). The results shed light on featured terms with strong locational characteristics, as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations.

Furthermore, we found that some terms are strongly associated with (and representative for) certain semantic concepts. In this study, we proposed an entropy-based approach to quantify the representativeness of terms (RQ 2), and successfully identified representative terms such as justkeepswimming (Pool) and bowlathon(Bowling Alley), and un-representative terms such as just, really, time, and lol that may appear ubiquitously at any location.

Finally, under the assumption that some semantic concepts may have heavier impacts on users’ verbalizations and can thus be better represented by textual snippets due to linguistic uniqueness, we proposed an approach based on cosine similarity to quantify the representability of semantic concepts (RQ 3). The representability scores are verified with a prediction experiment, and results show that the prediction precision is highly correlated with the representability score assigned by our approach.

In general, our study presents comprehensive investigations on the possibility of obtaining semantic knowledge about geographic locations using text messages. The findings on the representativeness of terms and representability of semantic concepts can be further used to improve the LSA model or other text mining approaches by, e.g. tuning the weighting schema.

It should be pointed out that the way how we quantify the representativeness is scale-dependent. For example, it has been mentioned that the term dinner is representative for a generic restaurant, but not for a specific type of restaurant. It can be expected that when the semantic concepts are described at a coarser conceptual scale (e.g. without distinguishing the exact restaurant types), the same term would exhibit much higher representativeness.

With the Foursquare dataset, a LSA model has been constructed with reliable prior knowledge. Theoretically, this model can be used to detect latent semantic concepts of places from text messages of other sources such as Twitter tweets, and the feasibility has already been demonstrated with our prediction experiment. However, users may use different LBSN platforms for different reasons under different scenarios, and this may affect the performance of the identified model for cross-dataset usage. It would still be interesting to apply this model onto datasets from other platforms, because the results of such comparison might reveal some variations in platforms with respect to the usage patterns.

More details can be found in:

Pages: 159-172

Here you can find also the further contributions of the GSIS Special Issue: Crowdsourcing for Urban Geoinformatics. Geo-Spatial Information Science (GSIS), Volume 23, Issue 3. Taylor & Francis.