Skip to content

Clustering and Narrative Detection

Clustering and narrative detection operate after event normalization and publication.

The clustering worker examines recent eligible events and groups related observations using stable keys and relationship logic.

A cluster is updated rather than recreated when new events match it. The process records member relationships, time range, source breakdown, entities, aggregate confidence, and importance.

A persistent backend cluster normally requires multiple related events. Single observations can still appear in the live interface while the system waits for corroboration.

A cluster becomes more informative when it connects independent domains. For example:

SEC filing
+ financial news
+ social statement
+ price anomaly
= one cross-source developing situation

This does not mean that all sources receive equal weight. The primary filing can anchor the fact, while social and market data explain reaction and interpretation.

NataPulse limits several common failure modes:

  • entity-less noise dominating a cluster;
  • long text receiving excessive importance from repeated weak words;
  • one low-trust social observation creating a high-priority cluster;
  • a single quantitative signal creating a cluster by itself;
  • duplicate stories inflating source breadth.

Narrative logic examines the movement of events and clusters over time. It can evaluate:

  • recent growth;
  • persistence;
  • entity and source breadth;
  • corroboration;
  • trend direction;
  • materiality;
  • cluster count;
  • historical comparison.

The resulting public narrative card provides a concise trend signal and evidence counts. Users can then investigate the underlying clusters or launch Deep Research.

Scoring and clustering can be re-run when logic improves. Idempotent derivation and stable identifiers allow historical records to be re-evaluated without fabricating new activity.