ACCESS

Explorer (Tier 1)

Flat, analysis-ready daily table with aggregated X (Twitter) sentiment.

$8/month

This dataset captures 2-hour observation sessions of cryptocurrency trading pairs on Binance. Each row represents one session where a coin was tracked for market conditions and social sentiment.

Key characteristics

  • Flat schema — 19 columns, no nested structs, ready for analysis
  • Essential metrics only — price, spread, score, sentiment summary
  • X (Twitter) sentiment — post counts and mean score
  • Daily granularity — one parquet file per UTC day

Use cases

  • Quick sentiment screening across coins
  • Correlation studies between price and social activity
  • Lightweight backtesting inputs
  • Learning/prototyping with minimal data complexity

Overview

PropertyValue
FormatApache Parquet (zstd compressed)
GranularityDaily (one file per UTC day)
Schema Versionv7
Columns19 flat fields

R2 Layout

tier1/daily/
└── YYYY-MM-DD/
    ├── data.parquet
    └── manifest.json

Field Reference (19 columns)

Identity & Timing (6)

FieldTypeDescription
symbolstringThe cryptocurrency trading pair symbol in BASEQUOTE format (e.g., ‘ALTUSDC’ where ALT is the base asset and USDC is the quote currency). This is the exchange symbol used on Binance spot market. The symbol identifies which coin is being tracked in this watchlist session. Pattern: 3-15 uppercase alphanumeric characters. Source: Captured at session admission from the universe scanning loop in the main bot.
snapshot_tsstringUTC timestamp of when this watchlist entry was first created/admitted. Format: ‘YYYY-MM-DDTHH:MM:SS.sssZ’ (ISO 8601 with Z suffix). This marks the start of the 2-hour tracking session. Source: Generated by datetime.now(timezone.utc) in core/watchlist_builder.py at the moment of entry construction.
meta_added_tsstringUTC timestamp (ISO 8601) of when this session was admitted to the active watchlist. This is the official session start time. Format: ‘YYYY-MM-DDTHH:MM:SS.ssssssZ’. Source: Set by admit_session() in core/watchlist_v2_session.py at the moment of watchlist insertion.
meta_expires_tsstringUTC timestamp (ISO 8601) when this session will expire. Calculated as added_ts + TTL (default 2 hours / 7200 seconds). After this time, the entry is archived and removed from active watchlist. Source: Computed as added_ts + WATCHLIST_TTL_HOURS in admit_session().
meta_duration_secdoubleTotal session duration in seconds from admission to archival. Calculated as (expired_ts - added_ts).total_seconds(). Typically ~7200 seconds (2 hours) but may vary slightly based on archiver timing. Source: Computed in _archive_entry() in core/watchlist_archiver.py.
meta_archive_schema_versionint64The schema version of the entry at the time of archival. Copied from schema_version to preserve the original schema even if migrations occur. Used for archive compatibility. Current production version is 7. Source: Set by _archive_entry() from meta.schema_version.

Spot Market (4)

FieldTypeDescription
spot_middoubleMid-price calculated as (bid + ask) / 2, in quote currency (USD for USDC pairs). This is the theoretical fair price at the moment of snapshot. Source: Computed from Binance spot API bookTicker endpoint bid/ask prices in build_spot_raw_v5().
spot_spread_bpsdoubleBid-ask spread expressed in basis points (1 bps = 0.01%). Formula: 10000 * (ask - bid) / mid. Lower values indicate tighter, more liquid markets. Example: 7.5 bps means the spread is 0.075% of the mid price. Source: Computed in build_spot_raw_v5() from bid/ask prices.
spot_range_pct_24hdouble24-hour price range as a percentage of mid price. Formula: 100 * (high24 - low24) / mid. Measures intraday volatility - higher values indicate more price movement. Example: 5.46 means the 24h high-low range was 5.46% of current price. Source: Computed from Binance spot API ticker/24hr high/low prices.
spot_ticker24_chgdouble24-hour price change as a signed percentage. Positive = price increased, negative = price decreased over last 24 hours. Example: -0.149 means price dropped 0.149% in 24h. Source: Binance spot API ticker/24hr endpoint ‘priceChangePercent’ field.

Derived Metrics (2)

FieldTypeDescription
derived_liq_global_pctdoubleGlobal liquidity percentile rank across ALL trading pairs in the universe. Range: 0-100 percentile. Measures this coin’s 24h quote volume relative to all other coins. Formula: percentile_rank(this_coin_qv, all_coins_qv) * 100. Higher = more liquid than peers. Example: 31.7 means this coin has more volume than ~32% of all coins. Source: Computed in base_scorer.py using bisect_right on sorted global volume baseline.
derived_spread_bpsdoubleCurrent bid-ask spread in basis points. Formula: ((ask - bid) / mid) * 10000. Example: 7.56 bps = 0.0756% spread. NOTE: This field is updated by the sampler every ~10 seconds during the 2-hour session, so the archived value reflects the LAST sample before expiry. Source: Computed in watchlist_sampler.py from live bid/ask prices.

Scoring (1)

FieldTypeDescription
score_finaldoubleA composite quality score (0-100) derived from weighted individual factor scores. Key factors include Price Action (Momentum, Volatility), Liquidity Health (Spread Efficiency, Depth), and Order Flow (Taker Buy/Sell Pressure). This metric acts as a quality filter: higher scores (≥60) indicate tradeable, liquid assets with strong market interest, while lower scores filter out predominantly illiquid or noise-heavy pairs. Source: base_scorer.py.

X (Twitter) Sentiment (6)

FieldTypeDescription
sentiment_posts_totalint64Total number of tweets collected and analyzed for this cryptocurrency during the most recent scraping cycle. A ‘cycle’ is one complete pass through all tracked coins in the scraper’s round-robin schedule. The duration of a cycle varies based on API rate limits and queue size (typically ~49 minutes). Source: Sum of all tweets for this coin in the cycle.
sentiment_posts_posint64Count of tweets classified as POSITIVE sentiment using lexicon-based analysis. A tweet is positive if its sentiment score > 0.1 on a scale of -1.0 to +1.0. The lexicon matches terms from categories: positive_general (bullish, moon, hodl, gains, etc.) and pump_hype (pump, breakout, rally, etc.). Source: Computed by aggregation_utils.aggregate_sentiment_metrics().
sentiment_posts_neuint64Count of tweets classified as NEUTRAL sentiment using lexicon-based analysis. A tweet is neutral if its sentiment score is between -0.1 and +0.1 on a scale of -1.0 to +1.0. These tweets contain no strong positive or negative sentiment signals. Source: Computed by aggregation_utils.aggregate_sentiment_metrics().
sentiment_posts_negint64Count of tweets classified as NEGATIVE sentiment using lexicon-based analysis. A tweet is negative if its sentiment score < -0.1 on a scale of -1.0 to +1.0. The lexicon matches terms from categories: negative_general (bearish, crash, dump, etc.), fud_fear (fud, scam, rugpull, etc.), and scam_rug. Source: Computed by aggregation_utils.aggregate_sentiment_metrics().
sentiment_mean_scoredoubleMean of hybrid_score_3class values across all tweets. Each tweet gets -1.0 (negative), 0.0 (neutral), or +1.0 (positive) from the hybrid two-model AI system. Range: -1.0 to +1.0. Positive values indicate overall positive sentiment, negative values indicate overall negative. Example: 0.67 means sentiment skews positive. The hybrid system uses a primary DistilBERT model for classification and a referee model for confidence calibration and overrides. Source: statistics.mean(hybrid_score_3class values). Note: Sentiment means are typically positive due to structural properties of crypto social discourse; interpret alongside post counts, balance, and silence indicators.
sentiment_is_silentboolTrue if zero tweets were collected for this coin during this scraping cycle. A silent coin may indicate low market interest or search query issues. Source: Determined by build_cycle_bucket() when posts_total == 0.

Notes

  • Tier 1 is intended for research and analysis. It is not real-time data and is not presented as a trading signal.
  • When sentiment_is_silent is true, sentiment fields can be NULL.

Documentation

Download the schema reference and quick-start guide for this tier: