Google’s Jigsaw unit is providing the source code for Harassment Manager, an open source anti-harassment tool. The application, which is aimed at journalists and other prominent figures, uses Jigsaw’s Perspective API to allow users to sort through potentially harmful remarks on social media networks like Twitter. In June, it will be released as source code for developers to build on, followed by a functional application for Thomson Reuters Foundation journalists.
Harassment Manager can now use Twitter’s API to combine moderating features such as concealing tweet replies and muting or blocking users with a bulk filtering and reporting mechanism. Threats, insults, and profanity are among the elements used by Perspective to assess the “toxicity” of messages. It organises messages into queues on a dashboard, allowing users to respond to them in bulk rather than individually using Twitter’s usual moderating tools. They can opt to blur the text of the messages so they don’t have to read each one individually, and they can search for keywords in addition to using the automatically generated queues.
Users can also receive a single report containing abusive messages, which serves as a paper trail for their employer or, in the case of unlawful content such as direct threats, law enforcement. Users will not be able to download a standalone programme for the time being. Instead, developers can freely create apps that incorporate its capabilities, and partners like the Thomson Reuters Foundation will launch services that use it.
On International Women’s Day, Jigsaw released Harassment Manager, emphasising that the tool is especially relevant to female journalists who endure gender-based abuse. “journalists and activists with large Twitter presences” as well as non-profits like the International Women’s Media Foundation and the Committee To Protect Journalists. In a Medium post, the team says it’s hoping developers can tailor it for other at-risk social media users. “Our hope is that this technology provides a resource for people who are facing harassment online, especially female journalists, activists, politicians and other public figures, who deal with disproportionately high toxicity online,” the post states.
Google has previously used Perspective to automate moderating. It released a browser application called Tune in 2019 that allowed social media users to avoid seeing messages with a high likelihood of being toxic, and it’s been utilised to supplement human moderation by numerous commenting platforms (including Vox Media’s Coral). However, as we stated around the time of the release of Perspective and Tune, the language analysis model has never been ideal. Jigsaw-style AI can accidentally correlate phrases like “blind” or “deaf” — which aren’t always bad — with toxicity, misclassifying satirical content or failing to detect abusive remarks. Jigsaw has also been accused of having a terrible company atmosphere, though this has been refuted by Google.
Harassment Manager, unlike AI-powered moderation on platforms like Twitter and Instagram, is not a platform-side moderation feature. It appears to be a sorting tool for dealing with the sometimes overwhelming volume of social media comments, and it could be useful for individuals outside of the journalistic industry – even if they can’t use it right now.