According to Statista-gathered data, 18 million hate speech content items were actioned by the company in the second quarter of 2023 alone. Yet it is far from the highest value of 31.5 million in the second quarter of 2021, when 31.5 million content pieces were actioned. And these numbers are only the tip of the iceberg when it comes to Internet hate speech.
The data from the Anti-Defamation League shows that 52% of US adults have been harassed online during their lifetime. Also, the report states that 33% of adults and 51% of teens experienced online harassment during the past twelve months.
Fighting hate speech and online harassment has become an important element of state jurisdictions around the world, with multiple countries penalizing the expression of hate toward people due to factors like skin color, sexual orientation, religion, or ethnic background, among others. Yet fighting this, especially considering the scale mentioned above, is extremely difficult. But not impossible, especially for AI-powered tools.
How AI can support the process:
Internet hate propagation and online hate speech have been issues since nearly the dawn of the online world, with the first examples emerging in the early 1990s. Initially, the problem appeared to be ignored, yet time has shown that isolated internet communities can inspire violence in the real world as well as online.
This can be tackled by moderation teams, which are usually understaffed, considering the nearly endless amount of content they need to scan. AI can support them by:
Automatic reading of content
The machine is tireless and ever-vigilant. Thus, an AI-powered system can just scan content in an automated way and on time. As such, the probability of spotting hateful content increases. With AI systems in-game, it is much more probable to process the flood of hateful content fast enough to prevent it from wreaking havoc.
Flagging suspicious messages
In the processing of language flow, the automated system can pre-process messages and provide the moderation team with flagged messages for later manual inspection. Instead of reading tons of content, the team may focus on the final stage of the work to be done and, by that – increase efficiency.
Real-time reaction
Last but not least – some messages are toxic enough to be automatically removed by the system. This task is usually performed by automated systems looking for keywords that are used in a text. As a downside – it struggles to identify more deeply coded or obfuscated hate messages. With such codes, the AI-powered system can also deal more efficiently.
Challenges for AI Hate Speech Detection
While powerful, modern LLMs and AI-powered solutions can still face challenges and problems related to their performance.
Cultural aspects
One of the challenges regarding hate speech is the fact that it can be highly contextual and follow unobvious patterns. Some communities, for example, can see a term as offensive when used by someone outside of their community and non-offensive when used internally. Also, in particular communities, words recognized as vulgar can be considered praise when used in the right context.
False positives
The high contextuality of hate speech can lead to over-vigilance and the flagging of all suspicious messages as hateful, which may paralyze the flow. Also, users may get annoyed by the fact that their messages are constantly being flagged and, sometimes, their accounts suspended.
False negatives
The opposite of the above, false negatives are harmful messages that manage to sneak through. Their existence may be a source of great frustration for users, especially when paired with false positives – why are harmless comments flagged as suspicious while harmful ones are left unchecked?
Yet the modern development of machine learning-based solutions can tackle the challenge of contextuality – all due to the hard work and brilliant minds of Tooploox researchers!
Tooploox research paper
To tackle the challenge of the contextuality of speech and hate speech, the Tooploox team has built a system that correlates a message with the speaker’s pre-identified profile to better understand the speaker’s true intentions.
Our approach
One of the main challenges with hate speech classifications is the fact that the majority of hate speech detection systems work in a 0-1 manner, considering a particular word or sentence hateful or not. Yet, in reality, hate speech can be seen as a spectrum, with people unwilling to offend one another or with some terms being derogatory in one community and not in another one.
This approach had to be split into two separate stages:
- Building speaker profiles using existing information about them
- Correlating the user’s speech patterns with their profile so that the system can assess the sentiment of a particular sentence or word used.
The stages above are represented in the model by the Profile Extractor, which is required to identify and find the matching profile, and the Text Encoder, which is where the text itself is analyzed.
Profile making with the Profile Extractor
The profile extractor delivers information about the user’s profile using the analysis of one’s general tone in conversations. This includes deviations from the most common patterns, metadata about the user, and its unique identifier. The profile also includes historical evaluations and other features as available. Obviously – the more information, the better and more compelling the profile is.
The system later analyzes the gathered information to create a corresponding profile.
Language analysis with Text Encoder
The language is analyzed with transformer language models, where every word has its own vector representation of meaning. For the sake of analyzing the sentiment of a message, a weight is given to the output, which is later divided by the number of tokens. The function can be fine-tuned later as necessary.
Datasets used
One of the challenges regarding hate speech recognition is the availability of data that shows hate and non-hate speech with the ability to spot and examine the difference between them. The system required such datasets both for training and validation. The datasets used include (but aren’t limited to):
- Wikipedia-Detox – a crowd-sourced dataset that contains one million annotations covering 100,000 discussions, where hate speech and personal attacks exist near polite and neutral discussions.
- Emotion Simple – this dataset consists of over 100 texts (opinions posted on various websites) marked on ten scales by 5365 annotators. The aspects that are represented by markings include joy, fear, surprise, disgust, trust, and anger.
- Emotion Meanings – a vast collection of 6000 assessed words that are placed on Robert Plutchik’s wheel of emotions.
The Effect:
The experiments performed by the Tooploox research team have proven that adding personalization and profile extraction significantly reduces the uncertainty of hate speech recognition. Therefore, the number of false positives and false negatives significantly dropped in the examined text samples.
Practical applications:
The system can be easily applied to multiple use cases, especially where detecting sentiment and hate speech is crucial. These include:
- Social media hate speech recognition – hate speech in social media can be used as a form of cyberviolence against either individuals or social groups. This system can spot the early signs that hate is on the rise.
- Marketing sentiment analysis – correct emotion identification, as well as sentiment analysis, can be crucial for marketing teams, where the specialists can reshape their communication efforts to attract customers.
- Extremist content flagged and spotted – global security can be enhanced by spotting rises in extremist content in public internet spaces, preventing individuals from joining or engaging in malicious activities.
Summary
The system can be used in various contexts and use cases, improving the quality of discussions, limiting the impact of hate speech, and encouraging users to be more polite to each other. More about the solution can be found in the research paper.