ACCESS

Data Access

Choose your tier and get daily crypto market and sentiment datasets, updated every 24 hours.

149,824 entries archived 61 days of data Updated daily
Tier 1

Explorer

Lightweight sentiment context

$ 8 /month
  • 19 flat columns
  • Sentiment summary (positive/neutral/negative + mean)
  • Spot price + spread + 24h change
  • Final liquidity score
  • Daily parquet files (~175 KB/day)

Archive access: Up to 6 months rolling history

Best for: Quick screening, correlation studies, learning

View schema →
Tier 2

Analyst

Full market microstructure + sentiment

$ 18 /month
  • 8 nested column groups
  • Order book depth at multiple levels
  • Rich sentiment details (last 1-hour cycle)
  • Derived analytics (depth imbalance, flow)
  • Daily parquet files (~0.8 MB/day)

Archive access: Up to 6 months rolling history

Best for: Research, dashboards, analysis workflows

View schema →
Tier 3

Researcher

Complete data for ML and backtesting

$ 38 /month
  • 12 nested column groups
  • 700+ price samples per entry (~10s intervals)
  • Futures data (funding rate, open interest)
  • Multi-window sentiment (current + trailing)
  • Daily parquet files (~10 MB/day)

Archive access: Up to 6 months rolling history

Best for: ML training, backtesting, quantitative research

View schema →

Tier Details

Full schema references for each tier. These pages are intended to be read before subscribing.

Explorer (Tier 1) Flat, analysis-ready daily table with aggregated X (Twitter) sentiment.

Key Capabilities

  • Flat schema — 19 columns, no nested structs, ready for analysis immediately.
  • Essential metrics only: price, spread, liquidity score, sentiment summary.
  • X (Twitter) sentiment: post counts and mean score from AI classification.
  • Daily granularity: one parquet file per UTC day.

Loading Tier 1 (Simple)

import pyarrow.parquet as pq

# Read entire day - no complex parsing needed
table = pq.read_table("tier1/daily/2026-01-18/data.parquet")
df = table.to_pandas()
Analyst (Tier 2) Structured daily export with nested market microstructure and rich X (Twitter) sentiment.

Key Capabilities

  • Nested structures optimized for rich data encapsulation.
  • Derived analytics: depth imbalance, flow metrics.
  • Multi-component scores: spread, depth, liquidity, flow → final.
  • X (Twitter) sentiment: detailed AI classification & engagement data.

Accessing Nested Fields (Tier 2)

# Accessing nested score components
df['final_score'] = df['scores'].apply(lambda x: x['final'])
df['liquidity'] = df['scores'].apply(lambda x: x['liq'])

# Accessing tweets from the 2-hour window
df['tweets'] = df['twitter_sentiment_last_cycle'].apply(
    lambda x: x['posts_total']
)
Researcher (Tier 3) Our most granular dataset. Includes full market microstructure, derivates, multi-window sentiment, and 10s price samples.

Key Capabilities

  • High-frequency samples: 700+ price points per session (every ~10s).
  • Futures data: funding rates, open interest, trader positioning.
  • Machine Learning ready: designed specifically for model training.
  • Backtesting optimized: full intraday granularity.

Working with Time-Series (Tier 3)

# Extract the high-freq price series for backtesting
entry = df.iloc[0]
price_series = entry['spot_prices'] 

# Each 'p' has ts, mid, bid, ask, spread_bps
timestamps = [p['ts'] for p in price_series]
mid_prices = [p['mid'] for p in price_series]

What's Included

📦
Daily Parquet Files

New data published every 24 hours in analysis-ready format

📚
Rolling Historical Data

Up to 6 months of archives, expanding as data accumulates

🔗
Signed Download Links

Time-limited secure URLs via Patreon, refreshed regularly

📖
Documentation + Examples

Schema reference, Python quickstart, and DuckDB examples

Frequently Asked Questions

How do I access the data after subscribing?

After subscribing on Patreon, download links are posted weekly for your tier. Links refresh each week and grant access to private files hosted on Cloudflare R2 with fast global delivery.

What format is the data in?

All tiers provide Apache Parquet files (zstd compressed). This format is widely supported by Python (PyArrow, Pandas), DuckDB, and most data tools.

How often is the data updated?

New daily files are published every 24 hours. You get access to historical archives plus all future updates while subscribed.

Can I upgrade or downgrade my tier?

Yes. You can change your tier at any time through Patreon. Upgrades take effect immediately; downgrades apply at the next billing cycle.

How much historical data is available?

Access includes a rolling history of up to six months. As the dataset continues to accumulate, the available historical window expands automatically.

What does the data cover?

Each entry is a ~2-hour observation session for a cryptocurrency trading pair on Binance. We track spot market data and aggregate X (Twitter) sentiment using a custom-trained DistilBERT model with a hybrid two-model arbitration system.

Is this real-time data?

No. This is historical batch data updated daily. It is designed for research, analysis, and backtesting—not live trading signals.

Are sentiment scores reliable?

Sentiment scores represent language patterns in crypto social media and are skewed positive by design. They are preserved as observed rather than normalized. For interpretation guidance, see Research Methodology.

When is sentiment captured relative to price monitoring?

Sentiment is captured in a ~1-hour window immediately prior to the 2-hour price monitoring session. When a coin enters the active watchlist, it is assigned the aggregated sentiment from the preceding Twitter scrape cycle. This creates a natural lead-time between sentiment observation and price evolution.

This dataset is descriptive only. It documents observed patterns in market snapshots and social sentiment, but makes no claims about predictive value. No correlation between sentiment and forward returns is implied or claimed. Data is provided for research purposes.