Methodology V2 - Phase 2

Effective 2026-02-17T06:03:00Z, the sentiment scoring pipeline has been updated with a crypto relevance filter. This completes the two-phase Methodology V2 deployment that began on 2026-02-16 with the sentiment model upgrade.

What Changed

A relevance filter has been added to the pipeline between deduplication and sentiment scoring. Posts collected from X (Twitter) are now classified for crypto relevance before they reach the sentiment models. Non-crypto content is removed, and only posts determined to be about cryptocurrency proceed to sentiment scoring.

The filter uses a dedicated classification model trained on crypto and non-crypto examples. Posts that do not pass the filter are stored separately for auditing purposes - no data is discarded silently.

Why This Matters

Coin-specific search queries on X often return posts that mention a token’s name in a non-crypto context. These off-topic posts previously passed through to sentiment scoring, introducing noise into aggregated metrics. The relevance filter removes this noise at the source, before sentiment scores are computed.

Impact on Data

All data produced after the cutover timestamp reflects the updated pipeline:

  • crypto_filter_enabled field added to ai_sentiment metadata (value: true)
  • crypto_filter_model field added to ai_sentiment metadata
  • Post volumes (posts_total) will be lower than Phase 1 baseline, as irrelevant posts are excluded
  • Sentiment signal quality improves - aggregated scores reflect only crypto-relevant discussion
  • The query volume per coin has been increased to compensate for filtered posts, maintaining comparable coverage of relevant content

Cutover Details

DetailValue
Phase 2 cutover2026-02-17T06:03:00Z
Phase 1 cutover2026-02-16T05:14:00Z
Version identifierv2.0
Regime fieldmethodology_regime: "v2"

The methodology_regime field (introduced in Phase 1) continues to identify all V2 data. Researchers can use this field to distinguish between V1 and V2 methodology without date-based filtering.

Pipeline Summary (V2 Complete)

The full V2 pipeline, now active:

  1. Collect posts from X (Twitter) via coin-specific queries
  2. Deduplicate against previously seen posts
  3. Filter for crypto relevance (new in Phase 2)
  4. Score remaining posts with primary sentiment model
  5. Validate scores with secondary confidence model
  6. Aggregate into hourly and cycle-level metrics

Phase 2 cutover timestamp: 2026-02-17T06:03:00Z.