Signals Aren't Decisions

The gap between what moderation systems flag and what to do about it.

Most moderation systems already do a decent job generating signals. But without enough context -- around norms, intent, or usage -- those signals often translate into more work, not less.

That gap matters today -- and it becomes more important as the volume of user-generated content continues to grow.

Here's a typical example, pulled from a review site:

Trustpilot review: "Don't use this company, it's a scam, little kid with a chip on its shoulder"

A user writes: "it's a scam… little kid with a chip on its shoulder… disgusting attitude"

A traditional moderation system flags this as:

bullying
minor-related content
elevated risk

API result flagging the text as High risk: minor_implicitly_mentioned and minor_explicitly_mentioned

At a glance, that classification isn't unreasonable. The words are there.

But in context, this sits within a page of Trustpilot reviews -- customers complaining about a financial service. The tone is harsh, but typical for that setting. No one reading the page would interpret this as a child safety issue.

A more contextual analysis reflects that distinction:

Contextual analysis summary: page dominated by negative reviews, caution rating

Sentiment: 0% positive, -100% negative, mean valence -0.91

The page is classified as "caution," driven primarily by finance and legal topic matches, along with a generally negative tone -- not by any actual safety violation.

"Overwhelmingly negative customer reviews… the aggressive tone triggers flags, but the content consists of consumer grievances."

The signal isn't wrong -- but it isn't sufficient. Without context, it creates a review task that doesn't need to exist.

A similar pattern shows up in other cases.

In a Better Business Bureau complaint, a user writes:

"little boy hangs up on customers… mad little teenager"

Again, the presence of words like "boy" and "teenager" leads to escalation.

API result: "mad little teenager" and "little boy hangs up on customers" flagged as High risk

But the context is straightforward: a frustrated customer describing poor service. It's an insult, not a reference to a minor in any meaningful sense.

Across the page, the underlying issues are familiar -- shipping delays, refunds, unresponsive support. The language occasionally becomes personal, but the content itself is routine.

Contextual analysis: BBB vehicle parts retailer classified as caution

Sentiment: +30% positive, -70% negative, mean valence -0.24

Again, the system is picking up on something real -- but without context, that signal turns into additional work rather than a useful decision.

That doesn't mean the signals are wrong, or that everything being flagged is benign.

In another case, a complaints page shows repeated allegations of fraud, legal disputes, and consistent escalation across multiple users. The signals aren't just present -- they're dense and persistent across entries.

Contextual analysis: Echelon Services BBB profile classified as unsafe

The distinction from earlier examples isn't that the content is inherently dangerous -- it's that the signals are consistent and reinforcing rather than isolated or incidental.

In cases like this, signals are more likely to correspond to something that warrants attention, even if the underlying content is still primarily complaints and allegations.

This is where signals begin to translate into a decision -- and where moderation effort is actually warranted.

Taken together, these cases point to a more specific issue than "false positives" or "over-flagging."

Detection systems answer a narrow question: what categories does this content match? But moderation requires a different one: what is this content, in context -- and what should happen as a result? Those are not the same problem.

Without context, signals expand the review surface. With context, they collapse it.

Moderation doesn't fail because systems can't find signals. It fails when those signals aren't grounded in enough context to be useful -- and when that happens, the result isn't just inaccuracy, it's more work.

Analyze any URL for tone, claims, safety, and opinions