Tuesday, March 28, 2023

New gitlab.freedesktop.org spamfighting abilities

As of today, gitlab.freedesktop.org allows anyone with a GitLab Developer role or above to remove spam issues. If you are reading this article a while after it's published, it's best to refer to the damspam README for up-to-date details. I'm going to start with the TLDR first.

For Maintainers

Create a personal access token with API access and save the token value as $XDG_CONFIG_HOME/damspam/user.token Then run the following commands with your project's full path (e.g. mesa/mesa, pipewire/wireplumber, xorg/lib/libX11):

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ damspam request-webhook foo/bar
# clean up, no longer needed.
$ pip uninstall damspam
$ rm $XDG_CONFIG_HOME/damspam/user.token
The damspam command will file an issue in the freedesktop/fdo-bots repository. This issue will be automatically processed by a bot and should be done by the time you finish the above commands, see this issue for an example. Note: the issue processing requires a git push to an internal repo - if you script this for multiple repos please put a sleep(30) in to avoid conflicts.

Once the request has been processed (and again, this should be instant), any issue in your project that gets assigned the label Spam will be processed automatically by damspam. See the next section for details.

For Developers

Once the maintainer for your project has requested the webhook, simply assign the Spam label to any issue that is spam. The issue creator will be blocked (i.e. cannot login), this issue and any other issue filed by the same user will be closed and made confidential (i.e. they are no longer visible to the public). In the future, one of the GitLab admins can remove that user completely but meanwhile, they and their spam are gone from the public eye and they're blocked from producing more. This should happen within seconds of assigning the Spam label.

For GitLab Admins

Create a personal access token with API access for the @spambot user and save the token value as $XDG_CONFIG_HOME/damspam/spambot.token. This is so you can operate as spambot instead of your own user. Then run the following command to remove all tagged spammers:

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ damspam purge-spammers
The last command will list any users that are spammers (together with an issue that should make it simple to check whether it is indeed spam) and after interactive confirmation purge them as requested. At the time of writing, the output looks like this:
$ damspam purge-spammers
0: naughtyuser              : https://gitlab.freedesktop.org/somenamespace/project/-/issues/1234: [STREAMING@TV]!* LOOK AT ME
1: abcuseless               : https://gitlab.freedesktop.org/somenamespace/project/-/issues/4567: ((@))THIS STREAM IS IMPORTANT
2: anothergit               : https://gitlab.freedesktop.org/somenamespace/project/-/issues/8778: Buy something, really
3: whatawasteofalife        : https://gitlab.freedesktop.org/somenamespace/project/-/issues/9889: What a waste of oxygen I am
Purging a user means a full delete including all issues, MRs, etc. This is nonrecoverable!
Please select the users to purge:
[q]uit, purge [a]ll, or the index: 
     
Purging the spammers will hard-delete them and remove anything they ever did on gitlab. This is irreversible.

How it works

There are two components at play here: hookiedookie, a generic webhook dispatcher, and damspam which handles the actual spam issues. Hookiedookie provides an HTTP server and "does things" with JSON data on request. What it does is relatively generic (see the Settings.yaml example file) but it's set up to be triggered by a GitLab webhook and thus receives this payload. For damspam the rules we have for hookiedookie come down to something like this: if the URL is "webhooks/namespace/project" and damspam is set up for this project and the payload is an issue event and it has the "Spam" label in the issue labels, call out to damspam and pass the payload on. Other rules we currently use are automatic reload on push events or the rule to trigger the webhook request processing bot as above.

This is also the reason a maintainer has to request the webhook. When the request is processed, the spambot installs a webhook with a secret token (a uuid) in the project. That token will be sent as header (a standard GitLab feature). The project/token pair is also added to hookiedookie and any webhook data must contain the project name and matching token, otherwise it is discarded. Since the token is write-only, no-one (not even the maintainers of the project) can see it.

damspam gets the payload forwarded but is otherwise unaware of how it is invoked. It checks the issue, fetches the data needed, does some safety check and if it determines that yes, this is spam, then it closes the issue, makes it confidential, blocks the user and then recurses into every issue this user ever filed. Not necessarily in that order. There are some safety checks, so you don't have to worry about it suddenly blocking every project member.

Why?

For a while now, we've suffered from a deluge of spam (and worse) that makes it through the spam filters. GitLab has a Report Abuse feature for this but it's... woefully incomplete. The UI guides users to do the right thing - as reporter you can tick "the user is sending spam" and it automatically adds a link to the reported issue. But: none of this useful data is visible to admins. Seriously, look at the official screenshots. There is no link to the issue, all you get is a username, the user that reported it and the content of a textbox that almost never has any useful information. The link to the issue? Not there. The selection that the user is a spammer? Not there.

For an admin, this is frustrating at best. To verify that the user is indeed sending spam, you have to find the issue first. Which, at best, requires several clicks and digging through the profile activities. At worst you know that the user is a spammer because you trust the reporter but you just can't find the issue for whatever reason.

But even worse: reporting spam does nothing immediately. The spam stays up until an admin wakes up, reviews the abuse reports and removes that user. Meanwhile, the spammer can happily keep filing issues against the project. Overall, it is not a particularly great situation.

With hookiedookie and damspam, we're now better equipped to stand against the tide of spam. Anyone who can assign labels can help fight spam and the effect is immediate. And it's - for our use-cases - safe enough: if you trust someone to be a developer on your project, we can trust them to not willy-nilly remove issues pretending they're spam. In fact, they probably could've deleted issues beforehand already anyway if they wanted to make them disappear.

Other instances

While we're definitely aiming at gitlab.freedesktop.org, there's nothing in particular that requires this instance. If you're the admin for a public gitlab instance feel free to talk to Benjamin Tissoires or me to check whether this could be useful for you too, and what changes would be necessary.