Analyzing the Changesets of OSM Newcomers

A new rate limit is introduced

The OpenStreetMap (OSM) API recently introduced a rate limit, which is designed to limit the impact of a particular kind of map vandalism: new users signing up and mass editing OSM data, e.g. by deleting many thousand OSM objects or removing particular tags for a large number of OSM objects. The rate limit is implemented, as a first step, by limiting the number of edits new accounts are allowed to make. For the details of the implementation you might want to take a look at the pull request which introduced this feature in a change to the config/settings.yml.

In short: A new user is in their first day of using OSM only allowed to upload 1,000 edits per hour. Adding a new object to OSM often results multiple edits from the database perspective. As such, 1,000 edits is roughly equivalent to adding 200 squared buildings to OSM. The rate limit ramps up in a non-linear manner to 100,000 edits per hour over the first week after a user starts editing.

We conducted the following analysis to shed some light on the effect of this measure. The topic received our attention when the Missing Maps community started to report that their organized humanitarian mapping activity is affected by the rate limit. We wanted to find out if these reports were just a few exceptional cases or if a broader number of mappers is affected.

OSM is a great project, and its community has embraced openness from the very beginning. Becoming an OSM contributor is as simple as signing up and agreeing to share your contributions under OpenStreetMap’s license. In accordance with OSM’s guidelines it is therefore important to carefully weigh any measure which restricts this openness. At the same time it is clear that vandalism can also pose a serious threat on OSM.

The main goal of this analysis is to provide some evidence on the following questions:

  • How many new users join OSM every day?
  • How many of the new users are likely to be affected by the rate limit?
  • Is there a rate limit which prevents vandalism, but will not affect regular mapping?

Our analysis is based on the OSM changeset database. You can set up such a database following the approach described in this GitHub repository. The database we were using contains data up to 2023-10-22. We would like to approach our questions from the following point of view:

  • If the rate limit was already in place by 01.01.2023, how many users would have been affected by them as of 22.10.2023?

We have intentionally chosen a date range which covers only the time before the rate limit was introduced. Because any changeset which is blocked by the rate limit is now not uploaded to the OSM database anymore, we lack data about how many changesets are currently affected. However, using historic data from OSM we can estimate the effect of the rate limit.

How many new users join OSM every day?

First, we need to find out how many new OSM users actually contributed in 2023. Then, we will compare this to the overall number of OSM users who contributed in 2023. Let’s take a look at the numbers’ distribution over time.

new_osm_users_per_day.png

Figure 1: Number of new OSM mappers per day

From Figure 1 we can learn, that on average there are roughly 450 new users joining OSM every day. The figure also tells us that there were some outliers in February 2023 and in August 2023. Nevertheless we rarely see more than 1,000 new users joining OSM on a day. In total more than 130,000 OSM users did their first edit in 2023. Overall there are more than 6,000 users actively contributing to OSM per day on average. However, this value fluctuates over time.

Let’s summarize these results and try to provide a rough answer to our first question:

  • Out of an average number of 5,906 active mappers per day there are roughly 449 new users.
  • 7.6% of the daily OSM users are newcomers whom could potentially be affected by the newly introduced rate limit, depending on how much they map on their first day.

How many of the new users are likely to be affected by the rate limit?

Simple approach: edits per day

Here we look at the new users. To us “suspects” are those who made more than 1000 edits on their first day, since 2023-01-01. These users could have hit the rate limit, if it had been in place already. This is still a rough measure as we take edits per day into account instead of edits per hour, with the benefit being relatively quick results with simple SQL queries. To see the effects of adjusting the rate limit, we ran the analysis with different values: 1000 / 1500 / 2000 / 2500 / 5000.

Figure 2: Daily “suspicious” new OSM mappers (simple approach: edits/day)

We now know that in total there could be up to 5,095 (out of 132,423) new users who could have been affected by a rate limit of 1000 edits per day. As expected this number gets smaller if you set a higher rate limit.

Advanced approach: edits per hour with moving time window

Out of the suspects outlined above we checked the edits that were made in a rolling window of one hour. If a user or their changesets hit the limit within this hour, it is an affected changeset. This comes very close to how the OSM API implements the rate limit.

Furthermore, we checked the users which have been banned by the OSM data working group. In the given time frame of this analysis the OSM API blocked 895 users that supposedly vandalized OSM. These users created 63 285 changesets.

Finally, we can compare if the edits that were made by the users blocked by the OSM Data Working Group are overlapping with the changesets affected by the OSM rate limit.

Figure 3: Daily “suspicious” new OSM mappers (advanced approach: edits/hour)

From the table and figure above we can learn two things:

  • 3,304 users would have hit the rate limit. As such, we estimate that a rate limit of 1000 edits per hour will affect about 2.5% of new OSM users.
  • Out of the affected users only a small number (294 out of 3,304) has actually been blocked by the OSM Data Working Group.

There are about 601 users who were blocked by the OSM Data Working Group, but would not have been affected by the rate limit. As the DWG usually acts on call this might be expected. Also non-newcomers can be blocked for various reasons. It is important to note that the reason for the rate limit was never to catch all users, but rather to limit the amount of damage one particular kind of vandalism can make.

Insights per OSM changeset hashtag

Here we check to what extent the rate limit has an effect on organized mapping activities, such as carried out by humanitarian organizations using the HOT Tasking Manager. At mapathons, people, often new to OSM and without an account, come together to map. These mapathons aim to map a lot of buildings and roads during the event, which usually takes 1-3 hours. Many participants of a mapathon create their account at the event and start mapping after they receive a short training session of 15-60 minutes. When the rate limit was introduced several mappers participating in mapathons started to report that they or some of their co-mappers could not upload data to OSM due to the rate limit.

We analyzed the number of users and the number of changesets that would have hit the rate limit and filtered by the hashtags that were used. The hashtag hotosm-project-* is used to flag all changesets that belong to any HOT Tasking Manager project. We also used the hashtag missingmaps, which is widely used by many humanitarian organizations.

From the table above you can see that organized humanitarian mapping is very likely to be affected by the new rate limit:

  • About 1,900 users who used the changeset hashtag hotosm-project-* would have hit the rate limit. This is the majority (58%) of all users affected by the rate limit (1,904 out of 3,304 users). 790 users would have exceeded the rate the limit by more than 500 edits.
  • Users participating in humanitarian mapping are responsible for a relatively small amount of the changesets (9,462 out of 58,193) that would have been blocked by the rate limit.
  • The effect of the rate limit on organized humanitarian mapping through the HOT Tasking Manager is stronger than the effect of the rate limit on users that were blocked by the OSM Data Working Group.

Which strategy can we choose to pick a rate limit which prevents vandalism, but will not affect regular mapping?

Higher Rate Limit

Our results show that adjusting the rate limit could have a positive impact on its effectiveness. Setting a higher rate limit than the one that is implemented right now could reduce the number of users and number of changesets that were unintentionally blocked by the rate limit. For example, a rate limit of 2,000 edits per hour would reduce the number of affected HOT Tasking Manager users from 1,904 to 312.

Setting the rate limit to 5,000 edits per hour would reduce the overall number of affected users drastically from 3,304 to 352. At the same time still about 50% of all changesets would be blocked.

Global versus regional rate limit

To investigate alternative options to prevent vandalism in OSM we wanted to see how the affected changesets are distributed globally. Our two maps show the affected changesets if the limit was at 1000 edits per hour and the other one the rate at 5000 edits per hour.

Figure 4: Heat Map of potentially blocked changesets (1000 edits / hour)

Figure 5: Heat Map of potentially blocked changesets (5000 edits / hour)

Figure 6: Number of affected changesets per country (rate limit 1000 & 5000)

It is interesting to see that most affected changesets are located in only a very few countries (Fig. 6). No matter which rate limit you choose Ukraine would stand out on the map. This might indicate that instead of a global rate limit, which treats all changesets the same no matter where they are located, it might be useful to apply a rate limit only to certain pre-defined regions.

Conclusion

All in all we can see that there is a big number of users who are very likely affected by the introduction of the OSM API rate limit. The results show that about 2.5% of all new OSM mappers are affected by the rate limit. The majority of these affected mappers do their first edits during a mapathon or using the HOT Tasking Manager.

Getting the rate limit right is a very difficult task. The goal of the rate limit was to make large scale vandalism much harder to accomplish, thus making it possible for the volunteers to keep track and clean up behind the vandals. Here we argue that the current limit should be subject to change to reduce the number of affected users while maintaining its goal to limit the extent of vandalism in OSM.

By adjusting the rate limit one could reduce the number of affected mappers, but still “catch” most of the affected changesets doing harm to OSM. We suggest to adjust the rate limit to 2500 or 5000 edits per hour. Implementing a location-aware rate limit could further minimize the number of false positives.

However, we understand that there is no perfect rate limit or perfect approach limit the extent of vandalism in OSM. Nevertheless, it is important to find the right balance between measures that counteract vandalism, but accidentally also stop newcomers from joining OSM and doing their first edits.

The OSM community currently discusses many further ideas how to react to the threat of vandalism seen in the past months. From our perspective it is important that any proposed measure and its effects should be evaluated before its implementation. Here we were able show some of the unwanted side effects of the current rate limit, even though, as of now, we are still lacking knowledge and data about how effective the current rate limit is to limit the extent of vandalism. The insights presented in this blog post are needed so that we can compare the current approach to all the other ideas and measures discussed.

Get in touch with us via ohsome@heigit.org if you have further questions about our analysis and its results. You can find a jupyter notebook with the data analysis workflow on GitHub.


Posted

in

by

Tags: