The whole system is a cron-driven pipeline that ends in JSON files on a CDN. No public API, no server in the read path, no filtered stream. Cheap to run, trivially cacheable, and simple enough that a third party could operate it with their own X credentials if we ever open it up.
Five stages, all on Cloudflare. The X keys live only in the collector Worker's secrets; everything public is a derived artifact. The site never talks to a backend: it fetches versioned JSON from the CDN and renders tweets via official X embeds client-side.
Hourly counts per entity (volume backbone). Every 2-4h: recent-search pulls of new posts per entity, relevancy-sorted, capped.
Append-only raw API responses: raw/x_official/<entity>/<date>/<run>/. Replayable forever.
LLM batch classification: sentiment, themes, author class. Dedup, RT/spam handling, language filter. Idempotent per post ID.
Window aggregates per entity (1h/6h/24h/7d/30d), cross-entity mindshare, deltas, top posts/voices.
Atomic JSON swap: public/index.json + per-entity summaries. Site + reads hydrate from these.
Do we want a relational database? Yes: a small one, in the middle, that we can lose without losing anything. The shape of this product is "append-only events in, windowed aggregates out, static files served": that wants a lake + a hot working set + a CDN, not a big always-on database. The rule that matters most: readers never touch a database. The site is JSON on a CDN; a database outage can degrade freshness, never availability.
| Tier | What lives there | Properties | Size & retention |
|---|---|---|---|
| R2 · the lake | Raw API responses, append-only, exactly as fetched (raw/<source>/<entity>/<date>/<run>/) | Source of truth. Immutable, replayable forever: any chart is reproducible from raw + prompt/score versions. Never queried at runtime. | ~1-3 GB/month at 20 entities; never deleted |
| D1 (SQLite) · the hot working set | enriched_posts (flat, queryable: entity, created_at, sentiment, themes, capabilities, tags, author, conversation, engagement), hourly_counts, authors, classification cache, budget ledger, entity state, events | The relational tier: every rollup is SQL over indexed windows instead of re-reading JSONL files. Disposable by design: it's a materialized view of raw; losing it costs a replay, not data. | ~500k hot rows ≈ 0.5 GB at 20 entities; rows pruned past 90 days (frozen launches and published rollups don't need them); 10 GB D1 ceiling leaves 10-20× headroom |
| Public JSON · the product | Immutable public/runs/<runId>/ artifacts + a tiny manifest pointer, on the CDN | The only thing readers touch. Atomic by construction (manifest swap), rollback is repointing, cache-perfect (immutable files + no-store manifest). | ~10 MB per run; runs older than 48h cleaned |
The math doesn't ask for it: ~475k posts/month, ~500k hot rows, single-writer pipeline, zero concurrent readers (publishes are the only consumer). SQLite with two indexes does this without breaking a sweat, costs nothing, and adds no external service, no connection pooling, no second ops surface. Reaching for a "real" database here would be architecture cosplay.
Rollups are pure functions that take rows; only the data-access layer knows it's D1. The swap triggers are explicit: > 50 tracked entities, > 2-3 GB hot, cross-entity queries slowing publishes, or wanting ad-hoc analyst SQL over history. Then: Postgres (Neon via Hyperdrive) behind the same interface, backfilled by replaying raw. A contained change, not a migration project.
Verified against current docs (June 2026). The headline: in February 2026, X replaced the Basic/Pro tiers with pay-per-use for new developers (~$0.005 per post read, 2M reads/month ceiling; reads deduplicated within a 24h UTC window). Pro ($5k/mo, 1M reads) survives only for grandfathered subscribers. We run on our Enterprise contract; the design stays within pay-per-use limits so anyone could run it.
| Fact | Detail | Design consequence |
|---|---|---|
| Counts endpoints | /2/tweets/counts/recent exists on all paid access; 300 req/15min; does not consume the post-read cap (billed ~$0.005/req on pay-per-use) | Counts are the hourly backbone. 20 entities × hourly ≈ 14.4k req/mo: trivial on Enterprise, ~$72/mo pay-per-use |
| Recent search | 100 results/page, sort_order=relevancy|recency, 450 req/15min, 512-char queries (4,096 on Enterprise) | Top-posts pulls fit easily; alias groups must fit 512 chars for third-party portability |
| No engagement operators | min_faves / min_retweets do not exist in API v2 at any tier (web UI only) | Pull relevancy-sorted + recency pages, rank locally on public_metrics. Re-fetch metrics for yesterday's top posts once daily (engagement matures for ~24h) |
| Full-archive search | Pay-per-use and Enterprise; 500 results/page | Used once per new entity for 30-day backfill, then never again |
| Embeds | publish.x.com/oembed free, no auth; widgets.js works but occasionally renders blank | Embeds-only display, styled link-out card as fallback |
| Derived data | Aggregate analysis that doesn't store personal data is explicitly permitted; raw content redistribution is not | Public JSON = IDs + aggregates. Tweet text never ships in public artifacts (see Compliance) |
| Stream | Cadence | Posts read / month | Pay-per-use cost |
|---|---|---|---|
| Counts backbone (volume, mindshare) | hourly, all entities | 0 (not capped) | ~$72 |
| Content pulls, baseline | every 4h · ~100 new posts/entity/pull avg, capped 1,000/entity/day | ~360k | ~$1,800 |
| Launch mode (1-2 entities/mo) | every 30-60 min for 72h · ~750/hr | ~54k | ~$270 |
| Daily metrics re-fetch (top 100/entity) | daily | ~60k | ~$300 (24h dedup shaves this) |
| Total | ~475k | ~$2.4k/mo · free on our Enterprise |
Each entity's registry entry compiles to numbered queries (the raw keys in the existing blob already follow this pattern). Aliases are OR-grouped, retweets excluded from content pulls but included in counts, and ambiguous names get guard terms.
// registry/entities/fable-5.json — compiled to queries below { "slug": "fable-5", "kind": "model", "name": "Claude Fable 5", "lab": "anthropic", "aliases": ["Fable 5", "Claude Fable", "fable-5", "Fable Anthropic"], "guards": ["-aesop -disney"], // disambiguation for generic words "official_handles": ["claudeai", "ClaudeDevs", "AnthropicAI"], "launch": { "flag": true, "until": "2026-06-12T18:00:00Z" } } // counts query (RTs included — volume is volume) ("Fable 5" OR "Claude Fable" OR "fable-5") -aesop -disney lang:en // content query (RTs excluded — themes/top posts come from originals) ("Fable 5" OR "Claude Fable" OR "fable-5") -aesop -disney -is:retweet lang:en // → 2 pulls per cycle: sort_order=relevancy (top) + sort_order=recency since last_seen_id
since_id guarantees we never miss the long tail. Both land in raw; dedup happens at enrich.One batched LLM pass per content pull, ~50 posts per call, classification cached by (post ID, prompt version) in D1 so nothing is ever classified twice. Cheap-fast model class (Gemini Flash / Haiku tier); the per-post cost is fractions of a cent and the whole month of classification costs less than one day of X API.
| Dimension | Output | Notes |
|---|---|---|
| Sentiment | positive / neutral / negative + confidence | Per-post, about the entity (not general mood). Published as shares + net score. Neutral-heavy news days are expected and annotated, not hidden. |
| Themes | like/dislike + topic from a controlled vocabulary, emergent topics flagged for review | Vocabulary v1: novelty, quality, speed, ux, demo, coding, agents, writing, pricing, limits, trust_safety, bugs, confusion, benchmarks. Each theme keeps its top N post IDs as receipts (drives the drill-in), plus an optional sub-facet ("70% of pricing posts cite token limits") when one sub-pattern crosses half the theme. |
| Capabilities & tags | Per post: stances (positive/negative) on the fixed 8-dimension rubric — coding, writing, art_design, reasoning_depth, speed, accuracy, agents_tools, price_value — plus free-text good_at / bad_at tags ("openclaw", "svg art", "excel formulas") | Rides the same classification call: no extra LLM cost. Tags are normalized (lowercase, variant-collapse map maintained in code: "open claw" → "openclaw") and feed both the per-entity tag list and the global searchable index. The rubric is the always-on scorecard; tags are its long tail. |
| Consensus summary | 60-90 word "people say" paragraph per entity per window | The Amazon-review-style summary. Hard constraint enforced by a validator: every claim in the paragraph must cite a theme present in this window's rollup, with its count; phrases carry a citations array (theme + post IDs) so the UI can link them. A draft that mentions anything uncited is rejected and regenerated. Regenerates only when theme counts shift > 20%, so the prose stays stable between refreshes. |
| Author identity + affiliation | Two orthogonal fields, classified once per author: class (identity: official / leadership / employee / partner / builder / researcher / creator / power_user / media / investor / influencer / anon) and affiliation (a lab slug or null; registry official handles short-circuit, bios resolve employment/founding/partnership) | Cached 30 days. Relationship is never stored: it's derived per author-entity pair — affiliation = entity's lab → owned; partner tie → affiliated; affiliation = other tracked lab → rival; else community. One author record makes @sama owned on GPT-5.2 and rival on Fable 5. Powers voiced-by, community-first voices, deep-thread filtering, and the relationship split on every metric, from day one. |
| Builder panel | boolean: member of the curated builder list | ~300 hand-curated accounts (people who demonstrably ship) maintained in the registry. Powers the builder sentiment series. Zero marginal X cost: panel members' posts already arrive via the entity pulls; this is a filter, not a new collection stream. |
| Spam/bot score | 0-1 | Heuristics first (account age, follower ratio, duplicate text), LLM only for the gray zone. High-spam posts drop from all metrics; the rate is published. |
All four signature metrics (see Product) are computed in the rollup stage as pure, versioned functions of enriched rows. No new collection, no new infrastructure: they are math on data the pipeline already has.
| Signal | Computation | Notes |
|---|---|---|
| Vibe Score | 0-100 standing favorability: crowd net sentiment + builder net sentiment (double-weighted) − theme severity drag, recency-weighted over a rolling 7d window. Volume is deliberately excluded; mindshare measures popularity separately. Renders only above a sample floor (n ≥ 30 posts). | The Now clock's number: leaderboard Score column, entity hero, the Gap's y-axis. v1 ships crowd-only (vibe-score.v1.ts); v2 adds builder weighting when the panel lands. Versioned and published like the Launch Score formula. |
| Launch Score | 0-100 weighted blend: velocity percentile vs the launch archive, builder uptake (panel members posting / panel size), crowd net sentiment, durability (day-3 / day-1 volume). Provisional from T0, frozen at T+72h. | Formula is a versioned file (launch-score.v1.ts) and published verbatim on the methodology page, like an index methodology. Re-scoring history requires a version bump and shows both versions. |
| Change events | Diff of consecutive rollups against thresholds: theme count ratio > 2× over 6h, new top post, velocity peak, first official/builder post, RT-chain > 500 near-identical (amplification flag). Deduped by (entity, event type, day). | Emitted to events.json, newest-first, capped at 50 per entity. Each event carries the receipt (post ID or chart anchor). Powers the homepage ticker, entity timelines, and the Slack webhook for surprise launches. |
| Builder series | Sentiment rollup filtered to builder-panel authors, published alongside the crowd series in every window. | Minimum sample floor (n ≥ 8 posts) before the series renders, to avoid one tweet swinging the line. Panel list is public on the methodology page; suggestions via PR if open-sourced. |
| The Gap | Scatter join of our Vibe Score against a published capability index per model, refetched nightly. | Capability axis is third-party data used with attribution (artificialanalysis.ai index preferred; LMArena Elo as fallback). Sourcing/permission is a rollout open question; the chart ships only with clean attribution. |
| Crowd scorecard | Per rubric dimension: score = 100 × positive / (positive + negative) over opinionated posts in the window; n and a trend arrow (vs prior window) attached; renders only at n ≥ 10. | Share-based, so models with wildly different volumes compare honestly. Same rubric on every entity page; the compare view (v2) lays scorecards side by side for free. |
| Tag index | Global tags.json: tag → entities with good/bad counts, score = 100 × good / (good + bad), evidence rank = score × log1p(n), top receipt post IDs. Tags render per-entity at ≥ 8 posts. | Powers the typeahead search and /good-at/<tag> pages: client-side over one static file, no backend. The verdict snippet per row goes through the standard citation validator. Likely the property's biggest SEO surface ("best model for X"). |
| Threads & relationship split | Posts group by conversation_id. thread_score = log1p(replies) + 2·log1p(unique participants) + reply-chain depth; the default "deep threads" surface requires a community root; rival-rooted threads appear with a rival badge, owned-rooted ones go to the owned rail. Top ~20 candidate threads per entity per day get one conversation_id search pull each to complete the thread (cheap, inside caps). The four-way relationship split (owned / affiliated / rival / community mentions) is published per window. | Kills the "top posts are always lab announcements" failure mode: a 47-reply argument outranks a 600-RT announcement by construction. Owned content ships in its own labeled rail, not hidden. |
| Voice score | Per author per entity per window: log1p(earned engagement on classified posts) + 1.5·log1p(threads rooted with thread_score ≥ 50) + 0.5·active days, percentile-scaled 0-100 within the entity's community authors. "Rising" badge when an author's score jumps ≥ 25 points window-over-window. v2 adds engagement-source weighting (a reply from a builder counts more than one from an anon). | Ranks community voices by standing earned in this conversation, decoupled from follower count. Followers ship only as a bucket, for display. Voices panels are community-by-default with official/affiliated and rival tabs. |
| Theme intelligence | Per theme per window: rate_per_1k = 1000 × count / classified; field median across same-kind entities, same window; trend label from last-6h vs prior-18h hourly rate (accelerating ≥ 1.5×, fading ≤ 0.67×, else steady); voiced_by shares from author classes; facets from per-theme keyword lists + classifier output (render at ≥ 10% share); verdict one-liner through the same citation validator as the consensus summary; emerging topics surface at ≥ 15 posts with an "emerging" badge. | All pure rollup math except facet extraction and verdicts (one extra LLM call per major theme, only when its count moved > 20%). The theme object powers chips, drill-ins, and the dedicated /models/<slug>/themes/<topic> pages with per-theme OG cards. Cross-model theme views (v2) reuse rate_per_1k as already computed. |
Three public artifacts, all derived, all versioned with a schema_version. The existing fable-5 summary.json is the seed of the entity summary; v2 splits it into a private full variant and a public variant with no tweet text.
// public/index.json — the leaderboard (one file, ~10KB) { "schema_version": 2, "updated_at": "2026-06-09T21:47:52Z", "windows": { "24h": { "total_mentions": 41200, "entities": [ { "slug": "fable-5", "rank": 2, "mindshare": 0.218, "delta_24h": 0.186, "vibe_score": 62, "sentiment": { "pos": 0.11, "neu": 0.87, "neg": 0.02 }, "spark_7d": [38, 41, 35, 44, 39, 1107, 2890], "top_theme": "novelty", "launch_mode": true } ] } } } // public/models/fable-5/summary.json — entity page (evolved from today's blob) { "schema_version": 2, "entity": "fable-5", "updated_at": "…", "volume_by_hour": [{ "hour": "2026-06-09T21:00Z", "mentions": 783 }], "windows": { "24h": { "mentions": 1107, "unique_authors": 976, "sentiment": { "pos": 119, "neu": 965, "neg": 23 }, "say_summary": { "text": "People are taken with how different Fable 5 feels…", "citations": [{ "phrase": "token limits", "topic": "pricing", "post_ids": ["2064…"] }] }, "likes_themes": [{ "topic": "novelty", "count": 198, "post_ids": ["2064…", "…"], "…": "same shape as below" }], "dislikes_themes": [{ "topic": "pricing", "count": 74, "post_ids": ["2064…", "…"], "rate_per_1k": 67, "field_median_per_1k": 29, "trend": "accelerating", "hourly": [{ "hour": "2026-06-09T21:00Z", "count": 22 }], "facets": [{ "label": "token limits", "share": 0.7, "post_ids": ["…"] }], "voiced_by": { "builder": 0.31, "influencer": 0.12, "media": 0.08, "anon": 0.49 }, "verdict": { "text": "The complaint is specific: caps hit mid-session…", "validated": true }, "emerging": false }], "top_posts": [{ "post_id": "2064453497…", "engagement_score": 935, "author_class": "official", "owned": true }], "voices": { "community": [{ "author_id": "…", "class": "builder", "voice_score": 94, "rising": false, "post_count": 3, "follower_bucket": "100k-1M" }], "owned_affiliated": ["…"], "rival": ["…"] }, "capabilities": [{ "dimension": "coding", "score": 82, "n": 214, "trend": "up", "post_ids": ["…"] }], "tags": [{ "tag": "openclaw", "good": 41, "bad": 3, "post_ids": ["…"] }], "threads": [{ "conversation_id": "2064…", "root_post_id": "2064…", "replies": 31, "participants": 18, "depth": 4, "organic": true, "thread_score": 87 }], "relationship_split": { "owned": 212, "affiliated": 31, "rival": 9, "community": 855 } } } } // note: post_id / author_id only — text, handles, bios stay in the private variant. // The page renders posts + author cards client-side via the X embed/oEmbed APIs.
Plus four smaller public artifacts: reads.json (editorial manifest: slug, title, dek, live/frozen status), events.json (the change feed, receipts as post IDs), launches.json (one fingerprint per launch: T0, hourly curve T0→T+72h, subscores, final Launch Score, score version), and tags.json (the global capability-tag index behind search and /good-at pages). Private artifacts (full text, handles, raw) live under derived/ and raw/ prefixes that are never CDN-exposed.
A per-entity state machine, evaluated hourly by the scheduler against fresh counts. Manual override always wins: flip the flag in the registry before a known launch.
| State | Counts | Content pulls | Enter when | Exit when |
|---|---|---|---|---|
| baseline | hourly | every 4h | default | spike detected or manual flag |
| elevated | hourly | hourly | volume > 3× trailing 7-day hourly median for 2 consecutive hours | 72h below 2× median |
| launch | every 30 min | every 30-60 min, cap raised to 20k/day | manual flag (pre-launch) or elevated + official-handle launch post detected | flag expiry (default 72h), decays to elevated |
Decided: the pipeline and site stay private, the entity registry becomes a small public repo, and the methodology page stays radically transparent. Two reasons drive it. First, the anti-gaming logic (spam scores, author classification, amplification detection) only works closed: publish the rules and astroturfers route around them. Second, the collector must be swappable in private: we want the freedom to move collection from the official Enterprise API to a cheaper collector without that swap being visible anywhere.
| Layer | Posture | Why |
|---|---|---|
| Pipeline + site code | Private | Anti-gaming rules stay closed; no turnkey clone for someone with a bigger megaphone; schema and formula iteration without deprecation debt |
| Collector implementations | Private, strictly behind the Collector interface | Transport freedom: official Enterprise today, cheaper collector tomorrow, invisible from outside |
| Entity registry | Public repo: entities, aliases, guard terms, official handles, builder panel | Community PRs fix recall (locals know "K3" means Kimi); doubles as published methodology; zero clone risk because it's data, not machine |
| Methodology | Public page | Query strings, theme vocabulary, score formulas, downloadable JSON, daily quality notes: the audit surface that earns trust without the repo |
| Deployment + data | Private always | Credentials, raw/derived buckets, domain, editorial reads |
Everything upstream of R2 raw sits behind one interface; everything downstream consumes normalized rows and never learns the transport.
// the only boundary the rest of the pipeline ever sees interface Collector { counts(query, bucket): CountPoint[] // hourly volume backbone posts(query, opts): NormalizedPost[] // relevancy + recency pulls } // implementations (all private): x-official (Enterprise), x-payg, // x-3p (twitterapi.io client, already production-proven in records-ingestion-utils). // Raw lands under raw/<source>/<entity>/… — the source tag never reaches public artifacts.
launch-score, vibe-score) for citability; they're published on the methodology page anyway, so code form costs nothing.publish.x.com, free, no auth) at read-generation time; widgets.js on dashboard pages. X explicitly encourages this path.TanStack Start (Vite + TanStack Router) deployed static to Cloudflare Pages: public routes are prerendered to real HTML for SEO and OG cards, then the client hydrates and fetches JSON at runtime. Charts are inline SVG generated from JSON (no chart lib, so the React runtime is the only meaningful bundle). Critically, data refreshes don't rebuild the site: pages fetch the latest JSON at runtime, so a 30-minute launch-mode cadence costs zero deploys.
index.json and per-entity summaries fetched client-side with a cache-busting version param. Sub-second loads, no backend.
LLM drafts from the same JSON, human edits in a PR, data blocks re-hydrate while the read is live, then freeze. The draft prompt is part of the repo.
Cream/navy/lavender, Space Grotesk + Inter, paper grain: the fable-5 prototype already proved the look. "by Sift" mark links home.
| Risk | Severity | Mitigation |
|---|---|---|
| X API pricing or policy shifts again | high | Counts-first design keeps spend small; collector behind an interface (official / twitterapi.io); news sources in v2 diversify |
| Alias queries miss or over-match conversation (recall/precision) | med | Public query strings invite correction; guard terms per entity; weekly precision audit on a sample (LLM judges "is this about the entity?") |
| RT floods and engagement-farming skew volume | med | RTs segregated from themes/top posts; spam score; amplification surfaced as its own metric rather than hidden in volume |
| Sentiment credibility attacked by model fanbases | med | Methodology page with prompts + biases stated; per-chart deep links; downloadable JSON; never editorialize in the data layer |
| Gap chart depends on third-party capability scores | low | Attribution-first; artificialanalysis preferred, LMArena Elo fallback; chart degrades to sentiment-only ranking if neither is usable |
| Embed flakiness degrades pages | low | Fallback link cards everywhere; reads pre-render oEmbed HTML |
| Launch-mode misses a surprise launch | low | Z-score auto-trigger catches what the manual flag misses, within ~2 hours |