Signals Aren't Decisions
The gap between what moderation systems flag and what to do about it.
Most moderation systems already do a decent job generating signals. But without enough context -- around norms, intent, or usage -- those signals often translate into more work, not less.
That gap matters today -- and it becomes more important as the volume of user-generated content continues to grow.
Here's a typical example, pulled from a review site:

A user writes: "it's a scam… little kid with a chip on its shoulder… disgusting attitude"
A traditional moderation system flags this as:
- bullying
- minor-related content
- elevated risk

At a glance, that classification isn't unreasonable. The words are there.
But in context, this sits within a page of Trustpilot reviews -- customers complaining about a financial service. The tone is harsh, but typical for that setting. No one reading the page would interpret this as a child safety issue.
A more contextual analysis reflects that distinction:


The page is classified as "caution," driven primarily by finance and legal topic matches, along with a generally negative tone -- not by any actual safety violation.
"Overwhelmingly negative customer reviews… the aggressive tone triggers flags, but the content consists of consumer grievances."
The signal isn't wrong -- but it isn't sufficient. Without context, it creates a review task that doesn't need to exist.
A similar pattern shows up in other cases.
In a Better Business Bureau complaint, a user writes:
"little boy hangs up on customers… mad little teenager"
Again, the presence of words like "boy" and "teenager" leads to escalation.

But the context is straightforward: a frustrated customer describing poor service. It's an insult, not a reference to a minor in any meaningful sense.
Across the page, the underlying issues are familiar -- shipping delays, refunds, unresponsive support. The language occasionally becomes personal, but the content itself is routine.


Again, the system is picking up on something real -- but without context, that signal turns into additional work rather than a useful decision.
That doesn't mean the signals are wrong, or that everything being flagged is benign.
In another case, a complaints page shows repeated allegations of fraud, legal disputes, and consistent escalation across multiple users. The signals aren't just present -- they're dense and persistent across entries.

The distinction from earlier examples isn't that the content is inherently dangerous -- it's that the signals are consistent and reinforcing rather than isolated or incidental.
In cases like this, signals are more likely to correspond to something that warrants attention, even if the underlying content is still primarily complaints and allegations.
This is where signals begin to translate into a decision -- and where moderation effort is actually warranted.
Taken together, these cases point to a more specific issue than "false positives" or "over-flagging."
Detection systems answer a narrow question: what categories does this content match? But moderation requires a different one: what is this content, in context -- and what should happen as a result? Those are not the same problem.
Without context, signals expand the review surface. With context, they collapse it.
Moderation doesn't fail because systems can't find signals. It fails when those signals aren't grounded in enough context to be useful -- and when that happens, the result isn't just inaccuracy, it's more work.