Trust and Safety policies rely on two primary enforcement methods: content and behavioral moderation. In many cases, content is the primary method due to a variety of factors including available third party support, market expertise and the nature of keyword-based technologies. As the tactics of malicious actors seeking to evade content moderation continues to evolve, it’s essential online safety professionals develop the skills and understanding to employ a behavioral moderation approach to safeguard their communities and platforms.
How does content moderation work?
Content Moderation focuses on the identification and removal of violative content including hate speech, explicit material, mis-and disinformation, and other pieces of user-generated content that breach a platforms’ policies. These policies are codified around prohibited content categories, the enforcement of which typically relies on a combination of automated detection (including machine learning classifiers) and manual review by human moderators.
Content moderation is generally an item by item process. Each piece of content posted by a user on a social media platform or online service is reviewed and labeled by automated systems on its own merit to determine whether any actions need to be taken. When violative content is discovered, a flagged item of content can then be labeled as a particular risk. The categories of violative risks are determined by the platform in accordance with their own community policies and can include pornography, abuse, hate speech, profanity, spam and other harmful behaviors.
There are a diversity of user-generated content types uploaded on a particular platform or service that can be reviewed for risk. This can include:
A single piece of content can be labeled with multiple risks (for example a written post that includes hate speech and profanity) and moderation actions are then determined on the basis of the platform’s particular terms of service.
How does behavioral moderation work?
In contrast, behavioral moderation goes beyond analyzing individual user-generated posts to appraise risk signals derived from user behavior patterns and interactions with other problematic communities or potential victims on the social media platform. As some risks such as child grooming or bullying and harassment only manifest as harmful user behaviors, platforms adopting an online trust and safety approach solely focusing on a content may be unable to adequately identify and mitigate against such behavioral risks.
The behavioral moderation approach can be employed to detect bad actors like trolls, harassers, coordinated disinformation campaigns, and other malicious communities misusing the social media platform for their own purposes.
An enforcement strategy premised on incorporating behavioral moderation will focus on understanding the conduct and context of a user that is being moderated. Some of the most commonly employed methodologies for such an approach can include reviewing the number of moderation actions enforced on a particular user and assigning a rating or score for their account. Enforcement actions can then be taken after the user reaches a particular risk threshold based on the type of risk(s) involved, the frequency of violations by a single account and the user’s broader conduct on the particular platform.
Organizations seeking to incorporate behavioral moderation into their trust and safety framework will also need to develop platform policies that focus on clearly identifying harmful or violative conduct and informing their users what content and behaviors fall within scope of policy enforcement.
Crucially, the behavioral approach can identify risk signals that may not typically be flagged as harmful by the traditional content moderation approach. Such signals can include:
- Asking for information
- Offering something for sale
- sharing personal information
- Complimenting another user
- Discussing a mutual interest or hobby or referencing a particular website or payment provider amongst others
When each of these signals are considered individually by platforms they may appear to be innocuous. However, when appraised holistically in conjunction with other risk signals examining a user’s conduct on the platform they can enable an evaluation of a user engaging in harmful activity on the platform. The continual detection of harmful behaviors like child grooming, extortion or spamming can only be achieved at scale with this type of approach.
Elevating trust and safety with behavioral intelligence
Resolver has over 20 years of experience providing fully managed online Trust and Safety solutions to a diverse range of social platforms and online services.
In our experience, while the traditional content approach remains integral to identifying fundamental threats affecting an online platform or service, most out-of-the-box content moderation tools do not provide for effective behavioral approaches. Moreover, platforms that have attempted to build in-house capacity for implementing behavioral approaches have often struggled to scale their operations when starting with an ad-hoc reactive approach. Some of the most common challenges faced by platforms when attempting to implement behavioral moderation include:
- Observing bad actor behavior that falls outside of content based policy enforcement approaches and seeking to make a combination of Content, Actor and Behavior policies align
- Attempting to create platform policies premised on identifying harmful behavior alongside the traditional content based approach but struggling to scale enforcement
- A lack of depth in human intelligence expertise across different risk verticals limiting the adoption of behavioral moderation in only high harm and low-volume threat environments
Rising threats to user safety on social media platforms
Meanwhile, the volume and severity of online harms facing users on social media platforms and online services continue to rise. According to data from Meta Community Standards Enforcement Report, the platform took action on a staggering 20.3 million pieces of child endangerment content across two of its platforms in just a three month period between October and December 2023. While TikTok removed over 1.3 billion fake likes on its platform over the same time frame. A 2023 survey by the Anti-Defamation League found that 53% of Americans had experienced harassment on social media representing a 12 point increase from the previous year.
Evaluating the growing volume in user-generated content to detect harmful behaviors can prove challenging to traditional human review systems contributing towards inconsistencies, delays and potential oversights that can have a corrosive impact on the user experience and platform reputation over the long term. Implementing an online trust and safety strategy solely focused on content moderation can often miss the ways in which bad actor communities continuously evolve their violative behaviors to evade scrutiny from social media platforms. Such communities often adapt their methodologies to exploit gaps in platform policies to evade enforcement against their online assets.
Addressing this proliferation of user-generated content that lies at the “borderline” between different platform policies requires experienced moderators with a deeper understanding of the interchangeable nature of different risk verticals on their platform. According to Henry Adams, Account Executive for Trust and Safety Partnerships at Resolver “A trust and safety framework that is solely focussed on content risks can lead to missing the signals of insidious behavioral abuse. Taking a holistic approach to user protection by investing in behavior detection is critical to community safety, and can enable entire bad actor network takedowns.”
Consequently, striking the right balance between automated systems and human oversight is crucial to maintaining accountability and ensuring that moderation decisions are fair and just while also ensuring that platforms are able to maintain a safe and vibrant online community of users.
A path forward
In the dynamic and ever-evolving digital threat landscape, a comprehensive strategy blending automated detection with threat actor intelligence drawn from a team of experienced trust and safety professionals is integral to effectively protect virtual environments. Adopting a behavioral moderation approach through the integration of insights drawn from analyzing complex harmful user interactions on the platform is not without its challenges. In particular, issues surrounding privacy, algorithmic bias, ethical use of user data and transparent reporting mechanisms that can explain the rationale around moderation decisions must be carefully considered. However, the potential rewards of augmenting content moderation through behavioral intelligence are immense.
Resolver Platform Risk Intelligence and Moderation for Platforms offers partners a fully-managed and proactive solution for Trust and Safety teams that need to elevate their content moderation strategies with up-to-date threat actor intelligence, scale moderation measures that reflect their unique identity and community of users and remain complaint with the latest online safety regulations such as the Digital Services Act and the Online Safety Bill.