Technical · standalone repo · June 2026

Counts as the backbone, top posts as the flesh, static JSON as the product.

The whole system is a cron-driven pipeline that ends in JSON files on a CDN. No public API, no server in the read path, no filtered stream. Cheap to run, trivially cacheable, and simple enough that a third party could operate it with their own X credentials if we ever open it up.

Cloudflare Workers + R2 + PagesX API v2 · counts + recent searchLLM enrichment No public APIOwn repo, Sift-independentS3-compatible by construction

Architecture

Five stages, all on Cloudflare. The X keys live only in the collector Worker's secrets; everything public is a derived artifact. The site never talks to a backend: it fetches versioned JSON from the CDN and renders tweets via official X embeds client-side.

Worker · cron

1 · Collect

Hourly counts per entity (volume backbone). Every 2-4h: recent-search pulls of new posts per entity, relevancy-sorted, capped.

→

2 · Raw store

Append-only raw API responses: raw/x_official/<entity>/<date>/<run>/. Replayable forever.

→

Worker · queue

3 · Enrich

LLM batch classification: sentiment, themes, author class. Dedup, RT/spam handling, language filter. Idempotent per post ID.

→

Worker

4 · Rollup

Window aggregates per entity (1h/6h/24h/7d/30d), cross-entity mindshare, deltas, top posts/voices.

→

R2 → CDN

5 · Publish

Atomic JSON swap: public/index.json + per-entity summaries. Site + reads hydrate from these.

Design rule: every stage reads only the previous stage's artifact. Kill any Worker mid-run and the next cron heals; replay any day from raw. State lives in R2 objects plus a small D1 working set (the hot relational tier, next section), nothing else.

Storage: three tiers, and no database in the read path

Do we want a relational database? Yes: a small one, in the middle, that we can lose without losing anything. The shape of this product is "append-only events in, windowed aggregates out, static files served": that wants a lake + a hot working set + a CDN, not a big always-on database. The rule that matters most: readers never touch a database. The site is JSON on a CDN; a database outage can degrade freshness, never availability.

Tier	What lives there	Properties	Size & retention
R2 · the lake	Raw API responses, append-only, exactly as fetched (`raw/<source>/<entity>/<date>/<run>/`)	Source of truth. Immutable, replayable forever: any chart is reproducible from raw + prompt/score versions. Never queried at runtime.	~1-3 GB/month at 20 entities; never deleted
D1 (SQLite) · the hot working set	`enriched_posts` (flat, queryable: entity, created_at, sentiment, themes, capabilities, tags, author, conversation, engagement), `hourly_counts`, `authors`, classification cache, budget ledger, entity state, events	The relational tier: every rollup is SQL over indexed windows instead of re-reading JSONL files. Disposable by design: it's a materialized view of raw; losing it costs a replay, not data.	~500k hot rows ≈ 0.5 GB at 20 entities; rows pruned past 90 days (frozen launches and published rollups don't need them); 10 GB D1 ceiling leaves 10-20× headroom
Public JSON · the product	Immutable `public/runs/<runId>/` artifacts + a tiny manifest pointer, on the CDN	The only thing readers touch. Atomic by construction (manifest swap), rollback is repointing, cache-perfect (immutable files + no-store manifest).	~10 MB per run; runs older than 48h cleaned

Why not Postgres / ClickHouse now

The math doesn't ask for it: ~475k posts/month, ~500k hot rows, single-writer pipeline, zero concurrent readers (publishes are the only consumer). SQLite with two indexes does this without breaking a sweat, costs nothing, and adds no external service, no connection pooling, no second ops surface. Reaching for a "real" database here would be architecture cosplay.

The escape hatch, pre-built

Rollups are pure functions that take rows; only the data-access layer knows it's D1. The swap triggers are explicit: > 50 tracked entities, > 2-3 GB hot, cross-entity queries slowing publishes, or wanting ad-hoc analyst SQL over history. Then: Postgres (Neon via Hyperdrive) behind the same interface, backfilled by replaying raw. A contained change, not a migration project.

The X API in 2026: what changed and what it costs

Verified against current docs (June 2026). The headline: in February 2026, X replaced the Basic/Pro tiers with pay-per-use for new developers (~$0.005 per post read, 2M reads/month ceiling; reads deduplicated within a 24h UTC window). Pro ($5k/mo, 1M reads) survives only for grandfathered subscribers. We run on our Enterprise contract; the design stays within pay-per-use limits so anyone could run it.

Fact	Detail	Design consequence
Counts endpoints	`/2/tweets/counts/recent` exists on all paid access; 300 req/15min; does not consume the post-read cap (billed ~$0.005/req on pay-per-use)	Counts are the hourly backbone. 20 entities × hourly ≈ 14.4k req/mo: trivial on Enterprise, ~$72/mo pay-per-use
Recent search	100 results/page, `sort_order=relevancy\|recency`, 450 req/15min, 512-char queries (4,096 on Enterprise)	Top-posts pulls fit easily; alias groups must fit 512 chars for third-party portability
No engagement operators	`min_faves` / `min_retweets` do not exist in API v2 at any tier (web UI only)	Pull relevancy-sorted + recency pages, rank locally on `public_metrics`. Re-fetch metrics for yesterday's top posts once daily (engagement matures for ~24h)
Full-archive search	Pay-per-use and Enterprise; 500 results/page	Used once per new entity for 30-day backfill, then never again
Embeds	`publish.x.com/oembed` free, no auth; widgets.js works but occasionally renders blank	Embeds-only display, styled link-out card as fallback
Derived data	Aggregate analysis that doesn't store personal data is explicitly permitted; raw content redistribution is not	Public JSON = IDs + aggregates. Tweet text never ships in public artifacts (see Compliance)

Monthly budget, 20 entities

Stream	Cadence	Posts read / month	Pay-per-use cost
Counts backbone (volume, mindshare)	hourly, all entities	0 (not capped)	~$72
Content pulls, baseline	every 4h · ~100 new posts/entity/pull avg, capped 1,000/entity/day	~360k	~$1,800
Launch mode (1-2 entities/mo)	every 30-60 min for 72h · ~750/hr	~54k	~$270
Daily metrics re-fetch (top 100/entity)	daily	~60k	~$300 (24h dedup shaves this)
Total		~475k	~$2.4k/mo · free on our Enterprise

Hard rule: per-entity daily read caps enforced in the collector (default 1,000/day baseline, 20,000/day launch mode). Without caps, a busy news week at 20 entities can triple consumption. A third party at reduced cadence (counts + 1 pull/day) runs the whole thing for ~$300/mo.

Collection: query design

Each entity's registry entry compiles to numbered queries (the raw keys in the existing blob already follow this pattern). Aliases are OR-grouped, retweets excluded from content pulls but included in counts, and ambiguous names get guard terms.

// registry/entities/fable-5.json — compiled to queries below
{
  "slug": "fable-5", "kind": "model", "name": "Claude Fable 5", "lab": "anthropic",
  "aliases": ["Fable 5", "Claude Fable", "fable-5", "Fable Anthropic"],
  "guards": ["-aesop -disney"],            // disambiguation for generic words
  "official_handles": ["claudeai", "ClaudeDevs", "AnthropicAI"],
  "launch": { "flag": true, "until": "2026-06-12T18:00:00Z" }
}

// counts query (RTs included — volume is volume)
("Fable 5" OR "Claude Fable" OR "fable-5") -aesop -disney lang:en

// content query (RTs excluded — themes/top posts come from originals)
("Fable 5" OR "Claude Fable" OR "fable-5") -aesop -disney -is:retweet lang:en
// → 2 pulls per cycle: sort_order=relevancy (top) + sort_order=recency since last_seen_id

Relevancy + recency double-pull. Relevancy surfaces what's resonating; recency with since_id guarantees we never miss the long tail. Both land in raw; dedup happens at enrich.
RT floods are data, not noise. The June 9 capture is dominated by retweets of one @WatcherGuru post. They count toward volume and amplification metrics, are excluded from theme classification and top-post ranking, and the biggest RT chains surface as "amplification events."
Query strings are public. Published on the methodology page verbatim. Anyone can re-run them.
New entity bootstrap: one full-archive pull (30 days) seeds history, then the entity joins the normal cadence.

Enrichment: LLM classification

One batched LLM pass per content pull, ~50 posts per call, classification cached by (post ID, prompt version) in D1 so nothing is ever classified twice. Cheap-fast model class (Gemini Flash / Haiku tier); the per-post cost is fractions of a cent and the whole month of classification costs less than one day of X API.

Dimension	Output	Notes
Sentiment	positive / neutral / negative + confidence	Per-post, about the entity (not general mood). Published as shares + net score. Neutral-heavy news days are expected and annotated, not hidden.
Themes	like/dislike + topic from a controlled vocabulary, emergent topics flagged for review	Vocabulary v1: novelty, quality, speed, ux, demo, coding, agents, writing, pricing, limits, trust_safety, bugs, confusion, benchmarks. Each theme keeps its top N post IDs as receipts (drives the drill-in), plus an optional sub-facet ("70% of pricing posts cite token limits") when one sub-pattern crosses half the theme.
Capabilities & tags	Per post: stances (positive/negative) on the fixed 8-dimension rubric — coding, writing, art_design, reasoning_depth, speed, accuracy, agents_tools, price_value — plus free-text `good_at` / `bad_at` tags ("openclaw", "svg art", "excel formulas")	Rides the same classification call: no extra LLM cost. Tags are normalized (lowercase, variant-collapse map maintained in code: "open claw" → "openclaw") and feed both the per-entity tag list and the global searchable index. The rubric is the always-on scorecard; tags are its long tail.
Consensus summary	60-90 word "people say" paragraph per entity per window	The Amazon-review-style summary. Hard constraint enforced by a validator: every claim in the paragraph must cite a theme present in this window's rollup, with its count; phrases carry a citations array (theme + post IDs) so the UI can link them. A draft that mentions anything uncited is rejected and regenerated. Regenerates only when theme counts shift > 20%, so the prose stays stable between refreshes.
Author identity + affiliation	Two orthogonal fields, classified once per author: `class` (identity: official / leadership / employee / partner / builder / researcher / creator / power_user / media / investor / influencer / anon) and `affiliation` (a lab slug or null; registry official handles short-circuit, bios resolve employment/founding/partnership)	Cached 30 days. Relationship is never stored: it's derived per author-entity pair — affiliation = entity's lab → owned; partner tie → affiliated; affiliation = other tracked lab → rival; else community. One author record makes @sama owned on GPT-5.2 and rival on Fable 5. Powers voiced-by, community-first voices, deep-thread filtering, and the relationship split on every metric, from day one.
Builder panel	boolean: member of the curated builder list	~300 hand-curated accounts (people who demonstrably ship) maintained in the registry. Powers the builder sentiment series. Zero marginal X cost: panel members' posts already arrive via the entity pulls; this is a filter, not a new collection stream.
Spam/bot score	0-1	Heuristics first (account age, follower ratio, duplicate text), LLM only for the gray zone. High-spam posts drop from all metrics; the rate is published.

Determinism rule: prompts are versioned files in the repo; a prompt change bumps the cache key and triggers re-classification of the active window only. Rollups are pure functions of enriched rows, so any chart can be reproduced from raw + prompt version.

Derived signals: the signature numbers

All four signature metrics (see Product) are computed in the rollup stage as pure, versioned functions of enriched rows. No new collection, no new infrastructure: they are math on data the pipeline already has.

Signal	Computation	Notes
Vibe Score	0-100 standing favorability: crowd net sentiment + builder net sentiment (double-weighted) − theme severity drag, recency-weighted over a rolling 7d window. Volume is deliberately excluded; mindshare measures popularity separately. Renders only above a sample floor (n ≥ 30 posts).	The Now clock's number: leaderboard Score column, entity hero, the Gap's y-axis. v1 ships crowd-only (`vibe-score.v1.ts`); v2 adds builder weighting when the panel lands. Versioned and published like the Launch Score formula.
Launch Score	0-100 weighted blend: velocity percentile vs the launch archive, builder uptake (panel members posting / panel size), crowd net sentiment, durability (day-3 / day-1 volume). Provisional from T0, frozen at T+72h.	Formula is a versioned file (`launch-score.v1.ts`) and published verbatim on the methodology page, like an index methodology. Re-scoring history requires a version bump and shows both versions.
Change events	Diff of consecutive rollups against thresholds: theme count ratio > 2× over 6h, new top post, velocity peak, first official/builder post, RT-chain > 500 near-identical (amplification flag). Deduped by (entity, event type, day).	Emitted to `events.json`, newest-first, capped at 50 per entity. Each event carries the receipt (post ID or chart anchor). Powers the homepage ticker, entity timelines, and the Slack webhook for surprise launches.
Builder series	Sentiment rollup filtered to builder-panel authors, published alongside the crowd series in every window.	Minimum sample floor (n ≥ 8 posts) before the series renders, to avoid one tweet swinging the line. Panel list is public on the methodology page; suggestions via PR if open-sourced.
The Gap	Scatter join of our Vibe Score against a published capability index per model, refetched nightly.	Capability axis is third-party data used with attribution (artificialanalysis.ai index preferred; LMArena Elo as fallback). Sourcing/permission is a rollout open question; the chart ships only with clean attribution.
Crowd scorecard	Per rubric dimension: score = 100 × positive / (positive + negative) over opinionated posts in the window; n and a trend arrow (vs prior window) attached; renders only at n ≥ 10.	Share-based, so models with wildly different volumes compare honestly. Same rubric on every entity page; the compare view (v2) lays scorecards side by side for free.
Tag index	Global `tags.json`: tag → entities with good/bad counts, score = 100 × good / (good + bad), evidence rank = score × log1p(n), top receipt post IDs. Tags render per-entity at ≥ 8 posts.	Powers the typeahead search and `/good-at/<tag>` pages: client-side over one static file, no backend. The verdict snippet per row goes through the standard citation validator. Likely the property's biggest SEO surface ("best model for X").
Threads & relationship split	Posts group by `conversation_id`. thread_score = log1p(replies) + 2·log1p(unique participants) + reply-chain depth; the default "deep threads" surface requires a community root; rival-rooted threads appear with a rival badge, owned-rooted ones go to the owned rail. Top ~20 candidate threads per entity per day get one conversation_id search pull each to complete the thread (cheap, inside caps). The four-way relationship split (owned / affiliated / rival / community mentions) is published per window.	Kills the "top posts are always lab announcements" failure mode: a 47-reply argument outranks a 600-RT announcement by construction. Owned content ships in its own labeled rail, not hidden.
Voice score	Per author per entity per window: log1p(earned engagement on classified posts) + 1.5·log1p(threads rooted with thread_score ≥ 50) + 0.5·active days, percentile-scaled 0-100 within the entity's community authors. "Rising" badge when an author's score jumps ≥ 25 points window-over-window. v2 adds engagement-source weighting (a reply from a builder counts more than one from an anon).	Ranks community voices by standing earned in this conversation, decoupled from follower count. Followers ship only as a bucket, for display. Voices panels are community-by-default with official/affiliated and rival tabs.
Theme intelligence	Per theme per window: rate_per_1k = 1000 × count / classified; field median across same-kind entities, same window; trend label from last-6h vs prior-18h hourly rate (accelerating ≥ 1.5×, fading ≤ 0.67×, else steady); voiced_by shares from author classes; facets from per-theme keyword lists + classifier output (render at ≥ 10% share); verdict one-liner through the same citation validator as the consensus summary; emerging topics surface at ≥ 15 posts with an "emerging" badge.	All pure rollup math except facet extraction and verdicts (one extra LLM call per major theme, only when its count moved > 20%). The theme object powers chips, drill-ins, and the dedicated /models/<slug>/themes/<topic> pages with per-theme OG cards. Cross-model theme views (v2) reuse rate_per_1k as already computed.

Data contracts

Three public artifacts, all derived, all versioned with a schema_version. The existing fable-5 summary.json is the seed of the entity summary; v2 splits it into a private full variant and a public variant with no tweet text.

// public/index.json — the leaderboard (one file, ~10KB)
{
  "schema_version": 2, "updated_at": "2026-06-09T21:47:52Z",
  "windows": { "24h": { "total_mentions": 41200,
    "entities": [ { "slug": "fable-5", "rank": 2, "mindshare": 0.218, "delta_24h": 0.186, "vibe_score": 62,
      "sentiment": { "pos": 0.11, "neu": 0.87, "neg": 0.02 },
      "spark_7d": [38, 41, 35, 44, 39, 1107, 2890],
      "top_theme": "novelty", "launch_mode": true } ] } }
}

// public/models/fable-5/summary.json — entity page (evolved from today's blob)
{
  "schema_version": 2, "entity": "fable-5", "updated_at": "…",
  "volume_by_hour": [{ "hour": "2026-06-09T21:00Z", "mentions": 783 }],
  "windows": { "24h": {
    "mentions": 1107, "unique_authors": 976,
    "sentiment": { "pos": 119, "neu": 965, "neg": 23 },
    "say_summary": { "text": "People are taken with how different Fable 5 feels…",
      "citations": [{ "phrase": "token limits", "topic": "pricing", "post_ids": ["2064…"] }] },
    "likes_themes":    [{ "topic": "novelty", "count": 198, "post_ids": ["2064…", "…"], "…": "same shape as below" }],
    "dislikes_themes": [{ "topic": "pricing", "count": 74, "post_ids": ["2064…", "…"],
      "rate_per_1k": 67, "field_median_per_1k": 29, "trend": "accelerating",
      "hourly": [{ "hour": "2026-06-09T21:00Z", "count": 22 }],
      "facets": [{ "label": "token limits", "share": 0.7, "post_ids": ["…"] }],
      "voiced_by": { "builder": 0.31, "influencer": 0.12, "media": 0.08, "anon": 0.49 },
      "verdict": { "text": "The complaint is specific: caps hit mid-session…", "validated": true },
      "emerging": false }],
    "top_posts":  [{ "post_id": "2064453497…", "engagement_score": 935, "author_class": "official", "owned": true }],
    "voices": { "community": [{ "author_id": "…", "class": "builder", "voice_score": 94,
        "rising": false, "post_count": 3, "follower_bucket": "100k-1M" }],
      "owned_affiliated": ["…"], "rival": ["…"] },
    "capabilities": [{ "dimension": "coding", "score": 82, "n": 214, "trend": "up", "post_ids": ["…"] }],
    "tags": [{ "tag": "openclaw", "good": 41, "bad": 3, "post_ids": ["…"] }],
    "threads": [{ "conversation_id": "2064…", "root_post_id": "2064…", "replies": 31,
      "participants": 18, "depth": 4, "organic": true, "thread_score": 87 }],
    "relationship_split": { "owned": 212, "affiliated": 31, "rival": 9, "community": 855 }
  } }
}
// note: post_id / author_id only — text, handles, bios stay in the private variant.
// The page renders posts + author cards client-side via the X embed/oEmbed APIs.

Plus four smaller public artifacts: reads.json (editorial manifest: slug, title, dek, live/frozen status), events.json (the change feed, receipts as post IDs), launches.json (one fingerprint per launch: T0, hourly curve T0→T+72h, subscores, final Launch Score, score version), and tags.json (the global capability-tag index behind search and /good-at pages). Private artifacts (full text, handles, raw) live under derived/ and raw/ prefixes that are never CDN-exposed.

Cadence and launch mode

A per-entity state machine, evaluated hourly by the scheduler against fresh counts. Manual override always wins: flip the flag in the registry before a known launch.

State	Counts	Content pulls	Enter when	Exit when
baseline	hourly	every 4h	default	spike detected or manual flag
elevated	hourly	hourly	volume > 3× trailing 7-day hourly median for 2 consecutive hours	72h below 2× median
launch	every 30 min	every 30-60 min, cap raised to 20k/day	manual flag (pre-launch) or elevated + official-handle launch post detected	flag expiry (default 72h), decays to elevated

Why z-score on counts: counts are cheap and not read-capped, so detection costs nothing. The trigger that matters most in practice is the manual flag: we know launches are coming, and the playbook (see Rollout) flips it the night before.

Repo strategy: private machine, public methodology

Decided: the pipeline and site stay private, the entity registry becomes a small public repo, and the methodology page stays radically transparent. Two reasons drive it. First, the anti-gaming logic (spam scores, author classification, amplification detection) only works closed: publish the rules and astroturfers route around them. Second, the collector must be swappable in private: we want the freedom to move collection from the official Enterprise API to a cheaper collector without that swap being visible anywhere.

Layer	Posture	Why
Pipeline + site code	Private	Anti-gaming rules stay closed; no turnkey clone for someone with a bigger megaphone; schema and formula iteration without deprecation debt
Collector implementations	Private, strictly behind the Collector interface	Transport freedom: official Enterprise today, cheaper collector tomorrow, invisible from outside
Entity registry	Public repo: entities, aliases, guard terms, official handles, builder panel	Community PRs fix recall (locals know "K3" means Kimi); doubles as published methodology; zero clone risk because it's data, not machine
Methodology	Public page	Query strings, theme vocabulary, score formulas, downloadable JSON, daily quality notes: the audit surface that earns trust without the repo
Deployment + data	Private always	Credentials, raw/derived buckets, domain, editorial reads

The Collector contract: the swap point

Everything upstream of R2 raw sits behind one interface; everything downstream consumes normalized rows and never learns the transport.

// the only boundary the rest of the pipeline ever sees
interface Collector {
  counts(query, bucket): CountPoint[]      // hourly volume backbone
  posts(query, opts):    NormalizedPost[]  // relevancy + recency pulls
}
// implementations (all private): x-official (Enterprise), x-payg,
// x-3p (twitterapi.io client, already production-proven in records-ingestion-utils).
// Raw lands under raw/<source>/<entity>/… — the source tag never reaches public artifacts.

Shadow-mode swaps. Run the candidate collector alongside the incumbent on 2-3 entities for a week; compare hourly counts, unique authors, and top-post overlap. Swap when parity clears ~95%. Counts can stay on the official API as the reference series even after content pulls move.
Methodology promises outputs, not transport. The public page commits to query strings, windows, dedup rules, and downloadable outputs, all independently checkable by re-running the queries. It never names a vendor, so a collector swap changes nothing we've promised.
Display is transport-independent. Tweets render via official X embeds regardless of how collection happens; the display-side compliance posture never moves.
The financial point: third-party collection runs orders of magnitude cheaper per post than pay-per-use (verify current pricing at swap time); at our volume that's ~$2.4k/mo equivalent versus double-digit dollars.
We say it out loud. A public "How we're open" page states the posture rather than leaving it implicit: open data, open methodology, open registry, closed machine, and the anti-gaming reason why. The score formulas additionally ship as tiny public packages (launch-score, vibe-score) for citability; they're published on the methodology page anyway, so code form costs nothing.

Compliance: how we show tweets without storing people

What we do

Display via official embeds only. oEmbed (publish.x.com, free, no auth) at read-generation time; widgets.js on dashboard pages. X explicitly encourages this path.
Public JSON carries IDs and aggregates. No tweet text, no handles, no bios in any CDN-exposed artifact.
Deletions handle themselves on display (a deleted tweet renders as nothing), plus a daily sweep re-checks top-post IDs and drops dead ones from JSON.
Aggregate analytics are explicitly permitted by the developer agreement when no personal data is stored in the published artifact.

Edges we watch

Embed flakiness: widgets.js intermittently renders blank; every embed slot has a styled fallback card that links out. Reads pre-render oEmbed HTML so they degrade gracefully.
"Accounts to follow" lists: rendered as official follow-button embeds, not from our JSON, for the same no-stored-personal-data reason.
Internal storage is fine (raw text in private R2 for processing) but honors compliance events; the daily sweep covers our public surface.
X pricing/ToS volatility is the real platform risk; the counts-first design minimizes exposure, and news sources (v2) diversify it.

The site itself

TanStack Start (Vite + TanStack Router) deployed static to Cloudflare Pages: public routes are prerendered to real HTML for SEO and OG cards, then the client hydrates and fetches JSON at runtime. Charts are inline SVG generated from JSON (no chart lib, so the React runtime is the only meaningful bundle). Critically, data refreshes don't rebuild the site: pages fetch the latest JSON at runtime, so a 30-minute launch-mode cadence costs zero deploys.

Dashboard pages

Static shell + JSON hydration

index.json and per-entity summaries fetched client-side with a cache-busting version param. Sub-second loads, no backend.

Reads

MDX documents with live blocks

LLM drafts from the same JSON, human edits in a PR, data blocks re-hydrate while the read is live, then freeze. The draft prompt is part of the repo.

Design

Sift brand system

Cream/navy/lavender, Space Grotesk + Inter, paper grain: the fable-5 prototype already proved the look. "by Sift" mark links home.

Risks

Risk	Severity	Mitigation
X API pricing or policy shifts again	high	Counts-first design keeps spend small; collector behind an interface (official / twitterapi.io); news sources in v2 diversify
Alias queries miss or over-match conversation (recall/precision)	med	Public query strings invite correction; guard terms per entity; weekly precision audit on a sample (LLM judges "is this about the entity?")
RT floods and engagement-farming skew volume	med	RTs segregated from themes/top posts; spam score; amplification surfaced as its own metric rather than hidden in volume
Sentiment credibility attacked by model fanbases	med	Methodology page with prompts + biases stated; per-chart deep links; downloadable JSON; never editorialize in the data layer
Gap chart depends on third-party capability scores	low	Attribution-first; artificialanalysis preferred, LMArena Elo fallback; chart degrades to sentiment-only ranking if neither is usable
Embed flakiness degrades pages	low	Fallback link cards everywhere; reads pre-render oEmbed HTML
Launch-mode misses a surprise launch	low	Z-score auto-trigger catches what the manual flag misses, within ~2 hours