Twitter started this experiment last May with a limited set of users on iOS. Now it’s expanding to all users on Android and iOS. The company said that this will cover potentially harmful or offensive replies — such as insults, strong language, or hateful remarks — in English for now. If the app’s algorithm detects such a reply, it’ll ask the user to reconsider sending it. You can delete the tweet or edit your response, but if you’re determined, you can still send the tweet with profanities. The firm admitted in the test last year, the algorithm failed to contextually separate a mean reply, sarcasm, and friendly banter. While the team has observed this behavior and made some changes to it, there’s a chance that the algorithm might get it wrong. In that case, you can tap on the “Did we get this wrong?” link to submit your feedback. Twitter also considers if you and the person you’re sending your reply to interact frequently, to gauge if the reply is mean or just meant as a joke. Twitter said that this method of prompting yielded encouraging results in its tests as 34% of people decided to alter or delete their replies. That also means that 66% of people still decided to send it. Plus, there are ways to modify words and fool the algorithm into thinking that it’s a clean reply. And it doesn’t cover languages other than English, so if anyone’s multilingual, they can get away with abusive replies. Despite all these hiccups, Twitter’s new feature is a positive step in reducing toxicity on the platform if it can bring down hateful comments by just a few notches.

Twitter will now reprimand you for nasty replies - 29Twitter will now reprimand you for nasty replies - 28