Technical · standalone repo · June 2026

Counts as the backbone, top posts as the flesh, static JSON as the product.

The whole system is a cron-driven pipeline that ends in JSON files on a CDN. No public API, no server in the read path, no filtered stream. Cheap to run, trivially cacheable, and simple enough that a third party could operate it with their own X credentials if we ever open it up.

Cloudflare Workers + R2 + PagesX API v2 · counts + recent searchLLM enrichment No public APIOwn repo, Sift-independentS3-compatible by construction

Architecture

Five stages, all on Cloudflare. The X keys live only in the collector Worker's secrets; everything public is a derived artifact. The site never talks to a backend: it fetches versioned JSON from the CDN and renders tweets via official X embeds client-side.

Worker · cron

1 · Collect

Hourly counts per entity (volume backbone). Every 2-4h: recent-search pulls of new posts per entity, relevancy-sorted, capped.

R2

2 · Raw store

Append-only raw API responses: raw/x_official/<entity>/<date>/<run>/. Replayable forever.

Worker · queue

3 · Enrich

LLM batch classification: sentiment, themes, author class. Dedup, RT/spam handling, language filter. Idempotent per post ID.

Worker

4 · Rollup

Window aggregates per entity (1h/6h/24h/7d/30d), cross-entity mindshare, deltas, top posts/voices.

R2 → CDN

5 · Publish

Atomic JSON swap: public/index.json + per-entity summaries. Site + reads hydrate from these.

Design rule: every stage reads only the previous stage's artifact. Kill any Worker mid-run and the next cron heals; replay any day from raw. State lives in R2 objects plus a small D1 working set (the hot relational tier, next section), nothing else.

Storage: three tiers, and no database in the read path

Do we want a relational database? Yes: a small one, in the middle, that we can lose without losing anything. The shape of this product is "append-only events in, windowed aggregates out, static files served": that wants a lake + a hot working set + a CDN, not a big always-on database. The rule that matters most: readers never touch a database. The site is JSON on a CDN; a database outage can degrade freshness, never availability.

TierWhat lives therePropertiesSize & retention
R2 · the lakeRaw API responses, append-only, exactly as fetched (raw/<source>/<entity>/<date>/<run>/)Source of truth. Immutable, replayable forever: any chart is reproducible from raw + prompt/score versions. Never queried at runtime.~1-3 GB/month at 20 entities; never deleted
D1 (SQLite) · the hot working setenriched_posts (flat, queryable: entity, created_at, sentiment, themes, capabilities, tags, author, conversation, engagement), hourly_counts, authors, classification cache, budget ledger, entity state, eventsThe relational tier: every rollup is SQL over indexed windows instead of re-reading JSONL files. Disposable by design: it's a materialized view of raw; losing it costs a replay, not data.~500k hot rows ≈ 0.5 GB at 20 entities; rows pruned past 90 days (frozen launches and published rollups don't need them); 10 GB D1 ceiling leaves 10-20× headroom
Public JSON · the productImmutable public/runs/<runId>/ artifacts + a tiny manifest pointer, on the CDNThe only thing readers touch. Atomic by construction (manifest swap), rollback is repointing, cache-perfect (immutable files + no-store manifest).~10 MB per run; runs older than 48h cleaned
Why not Postgres / ClickHouse now

The math doesn't ask for it: ~475k posts/month, ~500k hot rows, single-writer pipeline, zero concurrent readers (publishes are the only consumer). SQLite with two indexes does this without breaking a sweat, costs nothing, and adds no external service, no connection pooling, no second ops surface. Reaching for a "real" database here would be architecture cosplay.

The escape hatch, pre-built

Rollups are pure functions that take rows; only the data-access layer knows it's D1. The swap triggers are explicit: > 50 tracked entities, > 2-3 GB hot, cross-entity queries slowing publishes, or wanting ad-hoc analyst SQL over history. Then: Postgres (Neon via Hyperdrive) behind the same interface, backfilled by replaying raw. A contained change, not a migration project.

The X API in 2026: what changed and what it costs

Verified against current docs (June 2026). The headline: in February 2026, X replaced the Basic/Pro tiers with pay-per-use for new developers (~$0.005 per post read, 2M reads/month ceiling; reads deduplicated within a 24h UTC window). Pro ($5k/mo, 1M reads) survives only for grandfathered subscribers. We run on our Enterprise contract; the design stays within pay-per-use limits so anyone could run it.

FactDetailDesign consequence
Counts endpoints/2/tweets/counts/recent exists on all paid access; 300 req/15min; does not consume the post-read cap (billed ~$0.005/req on pay-per-use)Counts are the hourly backbone. 20 entities × hourly ≈ 14.4k req/mo: trivial on Enterprise, ~$72/mo pay-per-use
Recent search100 results/page, sort_order=relevancy|recency, 450 req/15min, 512-char queries (4,096 on Enterprise)Top-posts pulls fit easily; alias groups must fit 512 chars for third-party portability
No engagement operatorsmin_faves / min_retweets do not exist in API v2 at any tier (web UI only)Pull relevancy-sorted + recency pages, rank locally on public_metrics. Re-fetch metrics for yesterday's top posts once daily (engagement matures for ~24h)
Full-archive searchPay-per-use and Enterprise; 500 results/pageUsed once per new entity for 30-day backfill, then never again
Embedspublish.x.com/oembed free, no auth; widgets.js works but occasionally renders blankEmbeds-only display, styled link-out card as fallback
Derived dataAggregate analysis that doesn't store personal data is explicitly permitted; raw content redistribution is notPublic JSON = IDs + aggregates. Tweet text never ships in public artifacts (see Compliance)

Monthly budget, 20 entities

StreamCadencePosts read / monthPay-per-use cost
Counts backbone (volume, mindshare)hourly, all entities0 (not capped)~$72
Content pulls, baselineevery 4h · ~100 new posts/entity/pull avg, capped 1,000/entity/day~360k~$1,800
Launch mode (1-2 entities/mo)every 30-60 min for 72h · ~750/hr~54k~$270
Daily metrics re-fetch (top 100/entity)daily~60k~$300 (24h dedup shaves this)
Total~475k~$2.4k/mo · free on our Enterprise
Hard rule: per-entity daily read caps enforced in the collector (default 1,000/day baseline, 20,000/day launch mode). Without caps, a busy news week at 20 entities can triple consumption. A third party at reduced cadence (counts + 1 pull/day) runs the whole thing for ~$300/mo.

Collection: query design

Each entity's registry entry compiles to numbered queries (the raw keys in the existing blob already follow this pattern). Aliases are OR-grouped, retweets excluded from content pulls but included in counts, and ambiguous names get guard terms.

// registry/entities/fable-5.json — compiled to queries below
{
  "slug": "fable-5", "kind": "model", "name": "Claude Fable 5", "lab": "anthropic",
  "aliases": ["Fable 5", "Claude Fable", "fable-5", "Fable Anthropic"],
  "guards": ["-aesop -disney"],            // disambiguation for generic words
  "official_handles": ["claudeai", "ClaudeDevs", "AnthropicAI"],
  "launch": { "flag": true, "until": "2026-06-12T18:00:00Z" }
}

// counts query (RTs included — volume is volume)
("Fable 5" OR "Claude Fable" OR "fable-5") -aesop -disney lang:en

// content query (RTs excluded — themes/top posts come from originals)
("Fable 5" OR "Claude Fable" OR "fable-5") -aesop -disney -is:retweet lang:en
// → 2 pulls per cycle: sort_order=relevancy (top) + sort_order=recency since last_seen_id

Enrichment: LLM classification

One batched LLM pass per content pull, ~50 posts per call, classification cached by (post ID, prompt version) in D1 so nothing is ever classified twice. Cheap-fast model class (Gemini Flash / Haiku tier); the per-post cost is fractions of a cent and the whole month of classification costs less than one day of X API.

DimensionOutputNotes
Sentimentpositive / neutral / negative + confidencePer-post, about the entity (not general mood). Published as shares + net score. Neutral-heavy news days are expected and annotated, not hidden.
Themeslike/dislike + topic from a controlled vocabulary, emergent topics flagged for reviewVocabulary v1: novelty, quality, speed, ux, demo, coding, agents, writing, pricing, limits, trust_safety, bugs, confusion, benchmarks. Each theme keeps its top N post IDs as receipts (drives the drill-in), plus an optional sub-facet ("70% of pricing posts cite token limits") when one sub-pattern crosses half the theme.
Capabilities & tagsPer post: stances (positive/negative) on the fixed 8-dimension rubric — coding, writing, art_design, reasoning_depth, speed, accuracy, agents_tools, price_value — plus free-text good_at / bad_at tags ("openclaw", "svg art", "excel formulas")Rides the same classification call: no extra LLM cost. Tags are normalized (lowercase, variant-collapse map maintained in code: "open claw" → "openclaw") and feed both the per-entity tag list and the global searchable index. The rubric is the always-on scorecard; tags are its long tail.
Consensus summary60-90 word "people say" paragraph per entity per windowThe Amazon-review-style summary. Hard constraint enforced by a validator: every claim in the paragraph must cite a theme present in this window's rollup, with its count; phrases carry a citations array (theme + post IDs) so the UI can link them. A draft that mentions anything uncited is rejected and regenerated. Regenerates only when theme counts shift > 20%, so the prose stays stable between refreshes.
Author identity + affiliationTwo orthogonal fields, classified once per author: class (identity: official / leadership / employee / partner / builder / researcher / creator / power_user / media / investor / influencer / anon) and affiliation (a lab slug or null; registry official handles short-circuit, bios resolve employment/founding/partnership)Cached 30 days. Relationship is never stored: it's derived per author-entity pair — affiliation = entity's lab → owned; partner tie → affiliated; affiliation = other tracked lab → rival; else community. One author record makes @sama owned on GPT-5.2 and rival on Fable 5. Powers voiced-by, community-first voices, deep-thread filtering, and the relationship split on every metric, from day one.
Builder panelboolean: member of the curated builder list~300 hand-curated accounts (people who demonstrably ship) maintained in the registry. Powers the builder sentiment series. Zero marginal X cost: panel members' posts already arrive via the entity pulls; this is a filter, not a new collection stream.
Spam/bot score0-1Heuristics first (account age, follower ratio, duplicate text), LLM only for the gray zone. High-spam posts drop from all metrics; the rate is published.
Determinism rule: prompts are versioned files in the repo; a prompt change bumps the cache key and triggers re-classification of the active window only. Rollups are pure functions of enriched rows, so any chart can be reproduced from raw + prompt version.

Derived signals: the signature numbers

All four signature metrics (see Product) are computed in the rollup stage as pure, versioned functions of enriched rows. No new collection, no new infrastructure: they are math on data the pipeline already has.

SignalComputationNotes
Vibe Score0-100 standing favorability: crowd net sentiment + builder net sentiment (double-weighted) − theme severity drag, recency-weighted over a rolling 7d window. Volume is deliberately excluded; mindshare measures popularity separately. Renders only above a sample floor (n ≥ 30 posts).The Now clock's number: leaderboard Score column, entity hero, the Gap's y-axis. v1 ships crowd-only (vibe-score.v1.ts); v2 adds builder weighting when the panel lands. Versioned and published like the Launch Score formula.
Launch Score0-100 weighted blend: velocity percentile vs the launch archive, builder uptake (panel members posting / panel size), crowd net sentiment, durability (day-3 / day-1 volume). Provisional from T0, frozen at T+72h.Formula is a versioned file (launch-score.v1.ts) and published verbatim on the methodology page, like an index methodology. Re-scoring history requires a version bump and shows both versions.
Change eventsDiff of consecutive rollups against thresholds: theme count ratio > 2× over 6h, new top post, velocity peak, first official/builder post, RT-chain > 500 near-identical (amplification flag). Deduped by (entity, event type, day).Emitted to events.json, newest-first, capped at 50 per entity. Each event carries the receipt (post ID or chart anchor). Powers the homepage ticker, entity timelines, and the Slack webhook for surprise launches.
Builder seriesSentiment rollup filtered to builder-panel authors, published alongside the crowd series in every window.Minimum sample floor (n ≥ 8 posts) before the series renders, to avoid one tweet swinging the line. Panel list is public on the methodology page; suggestions via PR if open-sourced.
The GapScatter join of our Vibe Score against a published capability index per model, refetched nightly.Capability axis is third-party data used with attribution (artificialanalysis.ai index preferred; LMArena Elo as fallback). Sourcing/permission is a rollout open question; the chart ships only with clean attribution.
Crowd scorecardPer rubric dimension: score = 100 × positive / (positive + negative) over opinionated posts in the window; n and a trend arrow (vs prior window) attached; renders only at n ≥ 10.Share-based, so models with wildly different volumes compare honestly. Same rubric on every entity page; the compare view (v2) lays scorecards side by side for free.
Tag indexGlobal tags.json: tag → entities with good/bad counts, score = 100 × good / (good + bad), evidence rank = score × log1p(n), top receipt post IDs. Tags render per-entity at ≥ 8 posts.Powers the typeahead search and /good-at/<tag> pages: client-side over one static file, no backend. The verdict snippet per row goes through the standard citation validator. Likely the property's biggest SEO surface ("best model for X").
Threads & relationship splitPosts group by conversation_id. thread_score = log1p(replies) + 2·log1p(unique participants) + reply-chain depth; the default "deep threads" surface requires a community root; rival-rooted threads appear with a rival badge, owned-rooted ones go to the owned rail. Top ~20 candidate threads per entity per day get one conversation_id search pull each to complete the thread (cheap, inside caps). The four-way relationship split (owned / affiliated / rival / community mentions) is published per window.Kills the "top posts are always lab announcements" failure mode: a 47-reply argument outranks a 600-RT announcement by construction. Owned content ships in its own labeled rail, not hidden.
Voice scorePer author per entity per window: log1p(earned engagement on classified posts) + 1.5·log1p(threads rooted with thread_score ≥ 50) + 0.5·active days, percentile-scaled 0-100 within the entity's community authors. "Rising" badge when an author's score jumps ≥ 25 points window-over-window. v2 adds engagement-source weighting (a reply from a builder counts more than one from an anon).Ranks community voices by standing earned in this conversation, decoupled from follower count. Followers ship only as a bucket, for display. Voices panels are community-by-default with official/affiliated and rival tabs.
Theme intelligencePer theme per window: rate_per_1k = 1000 × count / classified; field median across same-kind entities, same window; trend label from last-6h vs prior-18h hourly rate (accelerating ≥ 1.5×, fading ≤ 0.67×, else steady); voiced_by shares from author classes; facets from per-theme keyword lists + classifier output (render at ≥ 10% share); verdict one-liner through the same citation validator as the consensus summary; emerging topics surface at ≥ 15 posts with an "emerging" badge.All pure rollup math except facet extraction and verdicts (one extra LLM call per major theme, only when its count moved > 20%). The theme object powers chips, drill-ins, and the dedicated /models/<slug>/themes/<topic> pages with per-theme OG cards. Cross-model theme views (v2) reuse rate_per_1k as already computed.

Data contracts

Three public artifacts, all derived, all versioned with a schema_version. The existing fable-5 summary.json is the seed of the entity summary; v2 splits it into a private full variant and a public variant with no tweet text.

// public/index.json — the leaderboard (one file, ~10KB)
{
  "schema_version": 2, "updated_at": "2026-06-09T21:47:52Z",
  "windows": { "24h": { "total_mentions": 41200,
    "entities": [ { "slug": "fable-5", "rank": 2, "mindshare": 0.218, "delta_24h": 0.186, "vibe_score": 62,
      "sentiment": { "pos": 0.11, "neu": 0.87, "neg": 0.02 },
      "spark_7d": [38, 41, 35, 44, 39, 1107, 2890],
      "top_theme": "novelty", "launch_mode": true } ] } }
}

// public/models/fable-5/summary.json — entity page (evolved from today's blob)
{
  "schema_version": 2, "entity": "fable-5", "updated_at": "…",
  "volume_by_hour": [{ "hour": "2026-06-09T21:00Z", "mentions": 783 }],
  "windows": { "24h": {
    "mentions": 1107, "unique_authors": 976,
    "sentiment": { "pos": 119, "neu": 965, "neg": 23 },
    "say_summary": { "text": "People are taken with how different Fable 5 feels…",
      "citations": [{ "phrase": "token limits", "topic": "pricing", "post_ids": ["2064…"] }] },
    "likes_themes":    [{ "topic": "novelty", "count": 198, "post_ids": ["2064…", "…"], "…": "same shape as below" }],
    "dislikes_themes": [{ "topic": "pricing", "count": 74, "post_ids": ["2064…", "…"],
      "rate_per_1k": 67, "field_median_per_1k": 29, "trend": "accelerating",
      "hourly": [{ "hour": "2026-06-09T21:00Z", "count": 22 }],
      "facets": [{ "label": "token limits", "share": 0.7, "post_ids": ["…"] }],
      "voiced_by": { "builder": 0.31, "influencer": 0.12, "media": 0.08, "anon": 0.49 },
      "verdict": { "text": "The complaint is specific: caps hit mid-session…", "validated": true },
      "emerging": false }],
    "top_posts":  [{ "post_id": "2064453497…", "engagement_score": 935, "author_class": "official", "owned": true }],
    "voices": { "community": [{ "author_id": "…", "class": "builder", "voice_score": 94,
        "rising": false, "post_count": 3, "follower_bucket": "100k-1M" }],
      "owned_affiliated": ["…"], "rival": ["…"] },
    "capabilities": [{ "dimension": "coding", "score": 82, "n": 214, "trend": "up", "post_ids": ["…"] }],
    "tags": [{ "tag": "openclaw", "good": 41, "bad": 3, "post_ids": ["…"] }],
    "threads": [{ "conversation_id": "2064…", "root_post_id": "2064…", "replies": 31,
      "participants": 18, "depth": 4, "organic": true, "thread_score": 87 }],
    "relationship_split": { "owned": 212, "affiliated": 31, "rival": 9, "community": 855 }
  } }
}
// note: post_id / author_id only — text, handles, bios stay in the private variant.
// The page renders posts + author cards client-side via the X embed/oEmbed APIs.

Plus four smaller public artifacts: reads.json (editorial manifest: slug, title, dek, live/frozen status), events.json (the change feed, receipts as post IDs), launches.json (one fingerprint per launch: T0, hourly curve T0→T+72h, subscores, final Launch Score, score version), and tags.json (the global capability-tag index behind search and /good-at pages). Private artifacts (full text, handles, raw) live under derived/ and raw/ prefixes that are never CDN-exposed.

Cadence and launch mode

A per-entity state machine, evaluated hourly by the scheduler against fresh counts. Manual override always wins: flip the flag in the registry before a known launch.

StateCountsContent pullsEnter whenExit when
baselinehourlyevery 4hdefaultspike detected or manual flag
elevatedhourlyhourlyvolume > 3× trailing 7-day hourly median for 2 consecutive hours72h below 2× median
launchevery 30 minevery 30-60 min, cap raised to 20k/daymanual flag (pre-launch) or elevated + official-handle launch post detectedflag expiry (default 72h), decays to elevated
Why z-score on counts: counts are cheap and not read-capped, so detection costs nothing. The trigger that matters most in practice is the manual flag: we know launches are coming, and the playbook (see Rollout) flips it the night before.

Repo strategy: private machine, public methodology

Decided: the pipeline and site stay private, the entity registry becomes a small public repo, and the methodology page stays radically transparent. Two reasons drive it. First, the anti-gaming logic (spam scores, author classification, amplification detection) only works closed: publish the rules and astroturfers route around them. Second, the collector must be swappable in private: we want the freedom to move collection from the official Enterprise API to a cheaper collector without that swap being visible anywhere.

LayerPostureWhy
Pipeline + site codePrivateAnti-gaming rules stay closed; no turnkey clone for someone with a bigger megaphone; schema and formula iteration without deprecation debt
Collector implementationsPrivate, strictly behind the Collector interfaceTransport freedom: official Enterprise today, cheaper collector tomorrow, invisible from outside
Entity registryPublic repo: entities, aliases, guard terms, official handles, builder panelCommunity PRs fix recall (locals know "K3" means Kimi); doubles as published methodology; zero clone risk because it's data, not machine
MethodologyPublic pageQuery strings, theme vocabulary, score formulas, downloadable JSON, daily quality notes: the audit surface that earns trust without the repo
Deployment + dataPrivate alwaysCredentials, raw/derived buckets, domain, editorial reads

The Collector contract: the swap point

Everything upstream of R2 raw sits behind one interface; everything downstream consumes normalized rows and never learns the transport.

// the only boundary the rest of the pipeline ever sees
interface Collector {
  counts(query, bucket): CountPoint[]      // hourly volume backbone
  posts(query, opts):    NormalizedPost[]  // relevancy + recency pulls
}
// implementations (all private): x-official (Enterprise), x-payg,
// x-3p (twitterapi.io client, already production-proven in records-ingestion-utils).
// Raw lands under raw/<source>/<entity>/… — the source tag never reaches public artifacts.

Compliance: how we show tweets without storing people

What we do
  • Display via official embeds only. oEmbed (publish.x.com, free, no auth) at read-generation time; widgets.js on dashboard pages. X explicitly encourages this path.
  • Public JSON carries IDs and aggregates. No tweet text, no handles, no bios in any CDN-exposed artifact.
  • Deletions handle themselves on display (a deleted tweet renders as nothing), plus a daily sweep re-checks top-post IDs and drops dead ones from JSON.
  • Aggregate analytics are explicitly permitted by the developer agreement when no personal data is stored in the published artifact.
Edges we watch
  • Embed flakiness: widgets.js intermittently renders blank; every embed slot has a styled fallback card that links out. Reads pre-render oEmbed HTML so they degrade gracefully.
  • "Accounts to follow" lists: rendered as official follow-button embeds, not from our JSON, for the same no-stored-personal-data reason.
  • Internal storage is fine (raw text in private R2 for processing) but honors compliance events; the daily sweep covers our public surface.
  • X pricing/ToS volatility is the real platform risk; the counts-first design minimizes exposure, and news sources (v2) diversify it.

The site itself

TanStack Start (Vite + TanStack Router) deployed static to Cloudflare Pages: public routes are prerendered to real HTML for SEO and OG cards, then the client hydrates and fetches JSON at runtime. Charts are inline SVG generated from JSON (no chart lib, so the React runtime is the only meaningful bundle). Critically, data refreshes don't rebuild the site: pages fetch the latest JSON at runtime, so a 30-minute launch-mode cadence costs zero deploys.

Dashboard pages

Static shell + JSON hydration

index.json and per-entity summaries fetched client-side with a cache-busting version param. Sub-second loads, no backend.

Reads

MDX documents with live blocks

LLM drafts from the same JSON, human edits in a PR, data blocks re-hydrate while the read is live, then freeze. The draft prompt is part of the repo.

Design

Sift brand system

Cream/navy/lavender, Space Grotesk + Inter, paper grain: the fable-5 prototype already proved the look. "by Sift" mark links home.

Risks

RiskSeverityMitigation
X API pricing or policy shifts againhighCounts-first design keeps spend small; collector behind an interface (official / twitterapi.io); news sources in v2 diversify
Alias queries miss or over-match conversation (recall/precision)medPublic query strings invite correction; guard terms per entity; weekly precision audit on a sample (LLM judges "is this about the entity?")
RT floods and engagement-farming skew volumemedRTs segregated from themes/top posts; spam score; amplification surfaced as its own metric rather than hidden in volume
Sentiment credibility attacked by model fanbasesmedMethodology page with prompts + biases stated; per-chart deep links; downloadable JSON; never editorialize in the data layer
Gap chart depends on third-party capability scoreslowAttribution-first; artificialanalysis preferred, LMArena Elo fallback; chart degrades to sentiment-only ranking if neither is usable
Embed flakiness degrades pageslowFallback link cards everywhere; reads pre-render oEmbed HTML
Launch-mode misses a surprise launchlowZ-score auto-trigger catches what the manual flag misses, within ~2 hours