Product · standalone property · June 2026

Benchmarks measure what models can do. We measure what people think of them.

A public, always-fresh read on the AI model conversation: which models have mindshare, whether the crowd is loving or roasting them, what specifically people like and dislike, and which voices are driving it. artificialanalysis.ai for capability; this for perception. Refreshed every few hours, near-real-time during launches.

X data (official API)Launch modeEditorial launch reads Standalone brand · by SiftOwn repoNo login, no paywall

Why this, why now

Every benchmark that matters is saturated or gamed, and everyone knows it. When a model ships, the real eval happens in public on X within 48 hours: builders post side-by-sides, influencers pick winners, and complaints (pricing, limits, regressions) crystallize into narrative. Nobody aggregates that signal. We already built this machine for brands; pointing it at AI models is a showcase with a built-in audience.

48h

The real eval window

Sentiment about a model is effectively decided in the first two days. A leaderboard that updates every few hours during a launch owns that moment; monthly reports miss it entirely.

0

Nobody owns this view

artificialanalysis owns capability. LMArena owns head-to-head preference (and is itself accused of being gamed). Nobody owns mindshare + sentiment + who-said-what. The slot is open.

Sift's best demo

This is Sift's core listening pipeline running on the most-watched topic in tech. Every chart is an implicit product demo, with "powered by Sift" on a property journalists will cite.

Positioning: the perception layer next to the capability layer

We are complementary to artificialanalysis, not competing with it. Their axis is what models score; ours is what people say. The two views disagree often, and the disagreement is the story.

artificialanalysis.aiThis property
Unit of truthBenchmark scores, latency, price per tokenMentions, sentiment, themes, voices on X
Update cycleWhen they re-run evalsEvery 2-4 hours; 30-60 min during launches
Can it be gamed?Increasingly yes (training on test sets, eval-tuning)Astroturfing is possible but visible: author history, account age, and voice classification expose it
Answers"Which model is most capable per dollar?""Which model is winning hearts? What do people actually complain about? Who should I follow?"
EditorialMethodology pagesAuto-drafted, human-edited launch reads (the fable-5 page is the prototype)

Name candidates

Standalone brand with a visible "by Sift" mark. Mockups below use vibebench as the placeholder. Final call is a rollout open question; domains unverified.

Recommended

VibeBench · vibebench.ai

"Vibe check" is already the community's word for post-launch model evaluation. Memorable, dev-native, slightly irreverent, and the name itself states the thesis: vibes are the benchmark that can't be trained on. Risk: reads playful to enterprise press.

Safe

ModelPulse · modelpulse.ai

Descriptive and credible in a citation ("according to ModelPulse..."). Pulse conveys the refresh cadence. Risk: forgettable, sounds like a monitoring SaaS.

Premium

Zeitgeist · zeitgeist-ai.com

Exactly the right word: the spirit of the moment. Feels editorial and serious. Risk: spelling, domain availability, and a German loanword in every URL.

Descriptive

AI Mindshare Index · aimindshare.com

Maximum SEO clarity, zero explanation needed, "Index" invites citation. Risk: generic; harder to build brand affection around.

Entity taxonomy: four vectors, one registry

v0 ships with one model (Fable 5, the data we already have). The registry is designed for all four vectors from day one, because mindshare attaches at different altitudes: the model gets the launch buzz, the lab gets the trust debate, the product gets the consumer complaints.

KindExamplesWhat mindshare means hereRelations
modelFable 5, GPT-5.2, Gemini 3 Ultra, Grok 4.1Launch buzz, capability debate, like/dislike themesbelongs to a lab; succeeded-by chains for generations
open-weightDeepSeek V4, Qwen3.5, Kimi K3, Llama 5Same as model + deployment/cost/fine-tune chatter; distinct audiencebelongs to a lab; tagged open-weight for the filter tab
labAnthropic, OpenAI, Google DeepMind, xAI, Meta AITrust, drama, policy, hiring, brand-level sentimentparent of models and products; owns official handles
productChatGPT, Claude Code, Cursor, Copilot, Gemini appConsumer UX sentiment, pricing/limits complaints, switching talkbelongs to a lab (or company); often the loudest vector
Registry entry = slug, kind, display name, lab relation, alias list ("Fable 5", "Claude Fable", "fable-5"), X query templates, official handles (@claudeai, @ClaudeDevs, @AnthropicAI), launch-mode flag + window. The full schema lives in the Technical doc.

Two clocks, gracefully

Visitors bring two different questions in two different tenses: "what do people like right now?" and "how is/was the launch?" The product treats them as two clocks running on the same data, so neither hijacks the other. Now is rolling and never freezes; Launch is T0-aligned, lives for 72 hours, then freezes into a permanent archive.

Clock 1 · Now

Rolling windows, always on

The leaderboard, like/dislike themes, voices, and the change feed all run on rolling windows (1h to 30d). This is the default lens on every page. Launch chrome never replaces it: at most it adds a badge and a second tab.

Clock 2 · Launch

T0-aligned, then frozen

When launch mode is on, the entity page grows a Launch tab: day-0-aligned curves, the Launch Score forming in real time, an overlay against past launches. At T+72h the fingerprint and final score freeze into /launches. The Now tab just keeps rolling.

QuestionSurfaceLifecycle
"What do people like right now?"Leaderboard + entity Now tab: Vibe Score, themes, voices, change feedRolling, refreshed every 2-4h (30 min in launch mode)
"How is the launch going?"Entity Launch tab: day-0 curves, provisional Launch Score, past-launch overlayLive for 72h, decays with launch mode
"How do launches compare?"/launches archive: every fingerprint, every final scorePermanent, append-only, uncopyable
"What's the story?"Reads: drafted from the same JSON, shipped when the story warrants oneOccasional, frozen on publish

How a visitor moves through it

1 · Leaderboard

Who has mindshare right now. Rank, share of voice, sentiment split, momentum, change ticker.

2 · Entity page

One model in depth: Now tab (likes/dislikes with receipts, voices) + Launch tab during launches.

3 · Launches

The archive: every launch's day-0 fingerprint and Launch Score, side by side across history.

4 · Reads & methodology

Occasional thought pieces drafted from the data, and the credibility anchor explaining every number.

Surface 1 · The leaderboard

The homepage. One scroll answers "what's hot, what's loved, what's hated." Mindshare is share of tracked-set mentions (counts API, so it's cheap and hourly). Sentiment splits come from LLM-classified content pulls. The Fable 5 row carries real numbers from the June 9 capture; other rows are illustrative.

vibebench.ai
vibebench.
by Sift
LIVE · updated 21:47 UTC
All
Frontier
Open-weight
Products
Labs
1h
6h
24h
7d
30d
Posts tracked · 24h
41.2K
▲ 64% · launch day
Most discussed
GPT-5.2
24.1% mindshare
Biggest mover
Fable 5
▲ 18.6 pts since launch
Top vibe score
DeepSeek V4
78 / 100
Most polarizing
Grok 4.1
29% pos · 24% neg
Hot themes pricing ▲ 2.1x token limits ▲ 1.8x agents coding trust & safety
Changed 21:04@ClaudeDevs Build Day post becomes Fable 5's top post (935 eng) 20:31Fable 5 pricing complaints 2.1× in 6h, 70% cite token limits 19:12first builder hands-on for Fable 5 (@giffmana)
Mindshare · share of tracked conversation · 24h
positiveneutralnegative
#ModelMindshareΔ 24hSentimentScore7-dayTop theme
1
G
GPT-5.2OpenAI
24.1%▼ 4.258agents
2
F
Fable 5 launchAnthropic
21.8%▲ 18.662novelty
3
G
Gemini 3 UltraGoogle DeepMind
16.4%▼ 2.864multimodal
4
X
Grok 4.1xAI
11.2%▼ 3.141trust & safety
5
D
DeepSeek V4 openDeepSeek
9.7%▲ 1.478cost
6
L
Llama 5 openMeta AI
5.8%50fine-tuning
7
Q
Qwen3.5-Max openAlibaba
4.6%▲ 0.666coding
8
K
Kimi K3 openMoonshot
3.4%▼ 0.960agents
Launch read · live
Fable 5: the first 24 hours

1,107 posts, a 9-point net-positive start, and a pricing complaint that won't quit. Updated every 30 minutes.

Read · Jun 4
DeepSeek V4, week one: the cost story ate the capability story

The crowd cares less about MMLU than about $0.12 per million tokens.

Read · May 28
Gemini 3 Ultra's quiet climb

No launch spike, but the steadiest positive drift of any frontier model this quarter.

Everything above the fold comes from two JSON files on the CDN: index.json (leaderboard rows, KPIs, themes) and reads.json (editorial strip). No backend, no API: the page is static and hydrates client-side. The Fable 5 row is real captured data.

Surface 2 · Entity page

One model in full depth. Mentions, sentiment, themes, and post metrics are real, from the June 9 Fable 5 capture (1,107 mentions, 976 unique authors); thread footprints, facet shares, and voiced-by splits are illustrative until the pipeline computes them. The what-people-say module reads like an Amazon review summary: an AI consensus paragraph that may only state what the theme counts support, color-coded aspect chips (mixed themes like quality show both sides), and a per-theme drill-in with the receipts as embeds. Top posts render as official X embeds; voices carry classification badges (the official-vs-3rd-party axis, formalized in v2).

vibebench.ai/models/fable-5
vibebench.
by Sift
LAUNCH MODE · 30 min refresh
F
Claude Fable 5
launched Jun 9
Anthropic · frontier model · aliases: "Fable 5", "Claude Fable", "fable-5" · official: @claudeai @ClaudeDevs
Now
Launch · day 1 · 87
1h
6h
24h
7d
Mentions · 24h
1,107
▲ from ~40/day pre-launch
Unique authors
976
88% post once
Vibe score
62 / 100
crowd +9 · builders +40
Mindshare rank
#2
▲ from #7 yesterday
Peak velocity
783/hr
21:00 UTC
Mention volume · hourly (counts API)
mentions/hr◆ launch 18:02 UTC
8005002500 Jun 8, 12:00Jun 9, 00:00Jun 9, 12:0021:00 launch post ↑ 783/hr
What people say · classified from 1,107 posts · 24h
AI summary · every claim cites posts

People are taken with how different Fable 5 feels: novelty dominates the positive conversation (198 posts), with the hands-on demo wave (25) and ux praise (37) behind it. The friction is concentrated in pricing (74 posts), and 70% of those cite token limits specifically. quality is genuinely mixed: praised in long-form writing, dinged on consistency.

✓ novelty 198 ✓ ux 37 ✓ demos 25 ✓ speed 21 ◐ quality 23 · 10 ✗ pricing 74 ✗ trust & safety 35 ✗ bugs 17 ✗ confusion 3
pricing74 posts · 24h67/1k · 2.3× frontier median
token limits
70%
subscription tiers
19%
API cost
11%
voiced by: builders 31% · influencers 12% · media 8% · anon 49% · accelerating ▲ 2.1× / 6h
Open theme page · all 74 posts →
B
Shibetoshi Nakamoto
@BillyM2k · most engaged in theme · Jun 9
𝕏

playing with the new claude fable aaaaaaaand my token limit was reached

♥ 94💬 31👁 7.6Kvia X embed
Deep threads
Owned · 12
Most liked
G
Lucas Beyer
@giffmana · builder · Jun 9
𝕏

Actually it's fine guys! I figured out a way, see below. Claude Fable 5 is a great model.

♥ 269👁 20.2K
↳ 31 replies18 peopledepth 4organic
M
Miles Deutscher
@milesdeutscher · influencer · Jun 9
𝕏

This is a historic day for AI. Claude Fable (Mythos) was just released, and it's insane.

👁 5.9K
↳ 87 replies54 peopledepth 3organic
L
LexnLin
@LexnLin · builder · Jun 9
𝕏

omg hahah look what Claude Fable made! level devil but fable devil (medium reasoning effort)

♥ 58👁 5.9K
↳ 19 replies11 peopledepth 5organic
Voices · 24h
Community
Official & affiliated
Rival takes · 4
AccountWhoVoice scorePosts
@giffmanabuilder943
@karpathybuilder911
@TrungTPhancreator732
@BillyM2kpower user682
@milesdeutscherinfluencer612
@AshleyDCan risingcreator551

Ranked by voice score (earned engagement + threads rooted + consistency), not followers. @claudeai and @sama live in their own tabs: official content and rival takes are context, not "the community."

Accounts to follow on this topic
rendered as X follow embeds
@ClaudeDevs · official, highest-engagement launch contentFollow ↗
@giffmana · builder, hands-on evals within hoursFollow ↗
@milesdeutscher · influencer, fastest amplificationFollow ↗
@TrungTPhan · analyst, business framingFollow ↗
v2: a toggle splits every panel on this page into Official vs Community series, so you can see whether a narrative started with the lab or with the crowd.

The page is a static template hydrated from one file: models/fable-5/summary.json. Posts render as official X embeds (oEmbed), so deleted tweets vanish on their own and we never republish text. The cards above are styled placeholders for the embed slots.

Themes are the content engine

A count next to a chip says that people complain about pricing. Compelling content answers the four questions a count can't: what exactly, said by whom, is it growing or fading, and is it normal for the category. Every theme carries six layers, all computed, all receipted.

① Facets

What, exactly

Inside every theme, the sub-complaints with shares: pricing splits into token limits (70%), subscription tiers (19%), API cost (11%). Each facet keeps its own receipts. "Pricing" is a category; "caps hit mid-session" is content.

② Trajectory

Noise or real problem

Hourly and daily series per theme, labeled accelerating / steady / fading. The complaint half-life is the tell: launch-day gripes that fade by day 3 were noise; ones still growing are product truth. Nobody else measures this.

③ Voiced-by

Who's saying it

Theme × author class. 74 pricing complaints where a third come from builders is a different fact than 74 from anon accounts. Complaints get weighed, not just counted.

④ Field benchmark

Is 74 a lot?

Complaint rate per 1k classified posts, against the median for the entity's kind. "67/1k, 2.3× the frontier median" turns a raw count into a judgment. Amazon can't compare across products; we compare across every model we track.

⑤ Verdict line

One validated sentence

An LLM one-liner per major theme ("The complaint is specific: caps hit mid-session, not subscription pricing"), passed through the same citation validator as the consensus summary. Quotable, and never beyond the data.

⑥ Emerging themes

What's new this window

Out-of-vocabulary topics proposed by the classifier surface with an "emerging" badge once they cross a floor (15 posts). The vocabulary grows from the conversation instead of ossifying.

Every theme is a page. Chips and drill-ins link to /models/fable-5/themes/pricing: a shareable URL with its own OG card. In v2 the homepage's hot-theme chips link to the cross-model view of the same theme ("who gets roasted on pricing"), which may be the single most clickable page on the site.

Surface 2b · The theme page

One theme, fully unpacked. Real counts from the June 9 capture (74 pricing posts of 1,107 mentions); facet shares, voiced-by splits, and field rates are illustrative until the pipeline computes them.

vibebench.ai/models/fable-5/themes/pricing
vibebench.
by Sift
LAUNCH MODE · 30 min refresh
Claude Fable 5 / what people say / pricing
✗ pricing
74 posts · 24h 67 per 1k classified · 2.3× frontier median
6h
24h
7d

"The complaint is specific: usage caps hit mid-session on the new model, not subscription pricing. Heavy users ran into limits within hours of launch."

validated · cites 74 posts
Trajectory · posts per hour in theme
accelerating · ▲ 2.1× over 6h
24120 ignites 19:40 · first 90+ eng post Jun 9, 12:0018:0021:45
Facets · what "pricing" actually contains
each keeps its own receipts
token limits
52 · 70%
posts ↗
subscription tiers
14 · 19%
posts ↗
API cost
8 · 11%
posts ↗

Facets come from per-theme keyword lists plus classifier output; a facet renders once it covers ≥ 10% of the theme. Sub-10% chatter stays in the long tail count.

Voiced by · who carries this complaint
builders 31% influencers 12% media 8% anon 49%

Nearly a third of the complaint comes from builder-class accounts: this one carries weight. A theme that's 90% anon reads very differently, and the page says so.

Vs the field · pricing complaints per 1k classified posts
same window, same theme
ModelRate /1kvs median
Fable 5672.3×
Grok 4.1411.4×
GPT-5.2311.1×
Gemini 3 Ultra240.8×
frontier median29
The receipts · most engaged in theme
official X embeds · sorted by engagement
B
Shibetoshi Nakamoto
@BillyM2k · influencer · Jun 9
𝕏

playing with the new claude fable aaaaaaaand my token limit was reached

♥ 94💬 31👁 7.6K
𝕏
X embed loading · post 2064453111…
𝕏
X embed loading · post 2064457329…

Theme pages render from the theme object inside summary.json: facets, trajectory, voiced-by, field rates, verdict, and receipt post IDs all ship in one place. Every theme page gets its own OG card, because "2.3× the frontier median" is built to be screenshot.

The scorecard and the tag index

Themes follow the conversation; the scorecard answers the questions people always bring: how is it at coding, art, depth, speed, price? A fixed rubric, scored per model from classified posts, always rendered, comparable across the whole field. And for everything the rubric can't predict, emergent capability tags ("good at OpenClaw") that accumulate from the discourse and become searchable.

Crowd scorecard · Fable 5 · 7d
score = positive share among opinionated posts · every row receipted
coding
82
n=214 · ▲
reasoning depth
77
n=156 · —
agents & tools
74
n=118 · ▲
writing
71
n=64 · —
speed
68
n=89 · ▲
accuracy
63
n=41 · —
art & design
41
n=23 · ▼
price & value
31
n=92 · ▼

Share-based (positive / opinionated), so a model with 50 mentions and one with 5,000 compare honestly. Renders at n ≥ 10; thin rows show "not enough signal" instead of a fake number. Same rubric on every model page → instant cross-model comparison.

Emergent tags · what the crowd says it's good (and bad) at
free-text, normalized, ≥ 8 posts to render
✓ openclaw 41 ✓ svg art 18 ✓ long-context recall 16 ✓ one-shot games 14 ✗ excel formulas 12 ✓ d3 charts 9 ✗ ascii art 8

The classifier extracts "good at / bad at" objects as free text; normalization collapses variants ("OpenClaw", "open claw", "openclaw agent" → openclaw). Tags are the rubric's long tail: nobody plans a rubric row for OpenClaw, but 41 people just told us. Every tag links to its receipts, and every tag feeds the global searchable index →

vibebench.ai/good-at/openclaw
vibebench.
by Sift
LIVE · updated 21:47 UTC
What the crowd says, ranked by evidence · 87 posts across 4 models · 7d window
#ModelCrowd verdict✓ good✗ badScoreReceipts
1
F
Fable 5
"one-shots OpenClaw configs that used to take an afternoon"41393posts ↗
2
G
GPT-5.2
"solid but needs more steering on skills"29878posts ↗
3
K
Kimi K3
"surprisingly capable for the price"12471posts ↗
4
X
Grok 4.1
"keeps hallucinating tool names"5936posts ↗

Verdict snippets are validated one-liners (same citation rules as everywhere else). Search is a typeahead over one static tags.json: no backend, instant, works offline.

Every "best model for X" Google query eventually lands somewhere. /good-at/<tag> pages are built to be that somewhere: crowd-evidence-ranked, receipted, refreshed every few hours. This may be the property's biggest organic-traffic surface.

Conversations, not announcements

Rank by raw engagement and the "top posts" of every launch are the labs' own announcements, forever. The real review is the 47-reply thread where builders argue about what broke. So the data model treats owned vs organic as a first-class axis from day one, and the default content surface is deep organic threads, not loud posts.

Who's who, in the model

Every author resolves

Authors carry an identity (12 classes, builder to anon), a lab affiliation, builder-panel membership, and a follower bucket. Relationship to each entity (owned / affiliated / rival / community) derives from one helper. Full model in the next section.

Depth beats volume

Thread score

Posts group by conversation. A thread scores on replies, unique participants, and reply-chain depth, with an organic-root requirement for the default view. A 600-RT announcement scores below a 47-reply, 23-person argument, by design.

Owned is labeled, not hidden

Both stories told

Owned content gets its own labeled rail (announcements matter; they're just not "the conversation"). The owned/organic mention split is itself a published stat: a launch that's 80% owned-amplification reads very differently from one that's 80% organic.

In the entity mockup above, the tab is "Deep threads" and every card carries its thread footprint (replies · participants · organic/owned). The official Build Day post still exists: in the owned rail, the launch read, and the official voices tab, where it belongs.

The voice model: identity × relationship

"Official vs community" is two different questions, so the data model keeps two orthogonal axes. Identity is who an account is (builder, creator, media…), classified once per author. Relationship is where they stand relative to a specific entity (owned / affiliated / rival / community), derived per author-entity pair from one affiliation field. @sama is one author record: owned on GPT-5.2 pages, rival on Fable 5 pages. Conflating these is how every social tool gets this wrong.

Identity (classified once)Who that is · June 9 capture examplesRelationship to Fable 5 (derived)
officialBrand accounts from the registry: @claudeai, @ClaudeDevsowned
leadershipExecs/founders of a tracked lab: @sama (OpenAI CEO)rival
employeeLab staff by bio: researchers, DevRel, engineersowned or rival, by lab
partnerCommercial/ecosystem ties: integrators, launch partners, sponsored creatorsaffiliated
builderPeople who demonstrably ship: @giffmana, @karpathycommunity
researcherAcademics, eval authors, independent benchmarkerscommunity
creatorEducators, newsletter/video explainers: @TrungTPhan, @AshleyDCancommunity
power_userHeavy daily users with opinions and no product: @BillyM2kcommunity
mediaJournalists and outlets: @verge, @WatcherGurucommunity
investorVCs, analysts, market commentatorscommunity
influencerReach-first amplifiers: @milesdeutscher, @MarioNawfalcommunity
anonThe long tail (88% of Fable 5's authors posted exactly once)community
Voice score

Standing, not followers

Per author per entity: earned engagement on classified posts + threads rooted that went deep + active days in window, percentile-scaled within the community. @giffmana outranks accounts with 10× his followers because the conversation engaged him. Follower count is a display detail, never a rank key.

Community-first surfaces

Three tabs, community default

Voices panels default to the Community tab. Official & affiliated get their own tab; rival takes get theirs (a rival researcher dunking on a launch is real signal, labeled honestly). "Rising" badges mark community voices whose score jumped this window: new experts surface instead of the same six accounts forever.

Splits everywhere

One helper, four buckets

Every metric publishes its relationship split: owned / affiliated / rival / community mentions per window. A launch that's 80% owned amplification versus 80% community chatter is the difference between a push and a moment, and the site says which one happened.

Compliance note: author records (handle, bio evidence, affiliation) stay private. Public artifacts carry author IDs, identity class, relationship, voice score, and follower bucket; the site renders actual names via official X embeds and follow buttons, same as posts.

Surface 3 · The launch read

The editorial layer. The guaranteed launch artifact is the Launch tab and its score (pure data, zero human dependency); a read ships on top when the story warrants one. An LLM drafts the narrative from the same JSON the dashboard uses, a human edits and signs off, and during launch week the data blocks keep refreshing live inside the prose. The fable-5 page on the landing site is the hand-built prototype of exactly this.

vibebench.ai/reads/fable-5-first-24-hours
vibebench.
by Sift
LIVE · next refresh 22:17 UTC
Launch read · updating live
Fable 5: the first 24 hours

Anthropic shipped Fable 5 at 18:02 UTC on June 9. Within four hours it had jumped from #7 to #2 in mindshare. The crowd's verdict so far: genuinely novel, mildly magical, and everyone is hitting their token limit.

1,107mentions · 24h
#2mindshare rank
+9net sentiment
783/hrpeak velocity

The launch arc was textbook: a quiet Monday baseline of roughly 40 mentions a day, a vertical spike when @claudeai posted, and a second wave at 21:00 UTC when the demos started landing. What's unusual is the shape of the sentiment. Most frontier launches open with a polarized split; Fable 5 opened with an 87% neutral wall of news-sharing and a small but unusually durable positive core.

What's landing

Novelty dominates (198 classified posts). The "Mythos" architecture is the hook; the word "different" appears in a fifth of all positive posts. UX (37) and the demo wave (25) follow: the level-devil game clone from @LexnLin did real numbers for a single-shot demo.

G
Lucas Beyer
@giffmana · Jun 9
𝕏

Actually it's fine guys! I figured out a way, see below. Claude Fable 5 is a great model.

♥ 269👁 20.2Kvia X embed
What's grating

Pricing is the complaint (74 posts), and inside it, one specific grievance: token limits. The single most-engaged negative post of the day is a one-liner about hitting the cap. Trust-and-safety chatter (35) is mostly spillover from the broader alignment debate rather than anything specific to this launch; bugs (17) are scattered and minor so far.

Who's driving it

Official accounts earned the engagement crown honestly: @ClaudeDevs' Build Day post is the top post of the launch, full stop. The amplification layer was crypto-news media (@WatcherGuru's "JUST IN" carried 608 retweets of reach on its own), and the credibility layer came from builders: @karpathy and @giffmana posting hands-on within hours moved the sentiment needle more than any official content.

Methodology: mentions counted via the X counts API across 4 alias queries; sentiment and themes classified per-post by LLM; full pipeline on the methodology page. Source data: models/fable-5/summary.json

Reads are versioned documents, not dashboards: each refresh appends to the timeline, and once launch mode ends the read freezes into the permanent record of how the launch went. That archive becomes the moat.

The signature numbers

What separates "a dashboard" from "the thing people cite": coined metrics with named methodologies, and charts that exist nowhere else. Five of them, all computed from data the pipeline already produces. The first two are the two clocks' numbers: the daily standing and the opening weekend.

① The Vibe Score · the standing number
always live · Now clock
62Fable 5 · today
Crowd favorability net +9, scaled55
Builder favorability net +40 · double-weighted78
Theme severity drag pricing complaints−6
Smoothing7d recency-weighted · floor n ≥ 30

Pure favorability: how the model is regarded right now, on purpose excluding volume (mindshare already measures popularity, the way box office sits next to the Tomatometer). The leaderboard's Score column, the entity hero number, and the Gap's y-axis.

② The Launch Score · one number per launch, forever
provisional until T+72h
87Fable 5 · day 1
Velocity percentile vs all archived launches92
Builder uptake81
Net sentiment78
Durability day-3 vs day-1 volumelocks at T+72h

The Vibe Score's sibling on the Launch clock: opening weekend, frozen into the archive at T+72h. Weighted blend, formula versioned and published like an index methodology. "Fable 5 debuted at 87" is the sentence the press writes; nobody has coined this for model launches.

③ The Gap · benchmarks vs vibes
capability axis: published indices, attributed
underrated by benchmarks benchmark darlings, crowd shrugs DeepSeek V4 · 78 Qwen3.5 Kimi K3 Gemini 3 Ultra Fable 5 · 62 GPT-5.2 Grok 4.1 Llama 5 ← capability score (artificialanalysis.ai, attributed) vibe score →

The thesis as one chart. The off-diagonal is the story: models people love that benchmarks underrate, and the reverse. Standing homepage chart, regenerated every refresh, built to be screenshot.

④ Builder sentiment · the crowd vs the people who ship
curated panel, ~300 accounts
+500−50 builders +40 crowd +9

Aggregate sentiment is 87% neutral noise on news days. A curated builder panel (the karpathy/giffmana tier, classification already in the pipeline) gets its own series. Zero marginal X cost: panel members are already in the pulls.

⑤ The change feed · what moved since you last looked
diffed every refresh, receipts attached
21:04 · @ClaudeDevs Build Day post becomes the launch's top postpost ↗
21:00 · velocity peaks at 783/hr, 24× pre-launch baselinechart ↗
20:31 · pricing complaints double in 6h; 70% cite token limitstheme ↗
19:40 · coordinated amplification flagged: 600+ near-identical RTsevent ↗
19:12 · first builder hands-on (@giffmana)post ↗

Dashboards show state; return visits come from change. Events are pure diffs of consecutive rollups (thresholded, deduped), shown as a ticker on the homepage and a timeline on entity pages. Brigading gets flagged in public rather than silently absorbed: the credibility defense doubles as content.

Methodology page · the credibility anchor

A leaderboard nobody trusts is worthless. The methodology page is a first-class surface, written for the skeptical reader, and every chart on the site deep-links to the section that explains it.

What it states plainly
  • What counts as a mention: the exact alias queries per entity, public for inspection.
  • How sentiment is classified: LLM per-post classification, the prompt version, and known biases (neutral-heavy on news days).
  • What mindshare is: share of mentions within the tracked set. Not "all of X," and we say so.
  • How we handle retweets and spam: RTs count toward volume, never toward themes or top posts; spam heuristics published.
  • What we refuse to infer: no engagement-rate guesses, no follower-demographic claims, no astroturf accusations without data.
  • What's open and what's closed: open registry, formulas, and data downloads; closed collection and anti-gaming detection, with the reason stated (publish the filter, invite the gaming).
Why it's a feature, not a footnote

artificialanalysis earned citations because their methodology survives scrutiny. Ours must survive a harder test: sentiment is easier to dismiss than latency. The defense is radical transparency, including publishing the per-entity query strings and a daily data-quality note ("Jun 9: RT flood from crypto-news accounts inflated volume; themes unaffected").

It also disarms the obvious attack: when a lab's fans claim the numbers are rigged, the answer is a public query string and a JSON file they can download.

Audiences

Builders & eng leaders

"What should I actually use?"

Benchmarks are table stakes; they want the field report. Like/dislike themes and builder-class voices answer the question benchmarks can't: what's it like to live with this model.

DevRel, comms & founders

"How is our launch going?"

Launch mode is a war room they don't have to build. The hour-by-hour arc, the complaint taxonomy, and who's amplifying: this audience converts to Sift pipeline.

Media & analysts

"Give me a citable number"

"Mindshare jumped from 3% to 22% in 24 hours (per vibebench)" is a sentence that writes itself into coverage. Citations are the growth loop.

Honest v1 boundaries, and what's next

v1 limits, stated out loud
  • X only. News/RSS is a v2 add; Reddit is excluded deliberately (licensing).
  • English-first. Queries are lang:en; the Yahoo-Japan-sized non-English conversation is invisible in v1.
  • Neutral-heavy sentiment on news days. RT floods read neutral; we show net score and say why.
  • Tracked-set mindshare. Share is relative to entities we track, not all of AI discourse.
  • Embeds need JS and occasionally flake; cards degrade to styled links.
Where it goes
  • Official vs 3rd-party split on every chart (the classification already exists per-author).
  • Cross-model theme pages: the hot-theme chips open "who gets roasted on pricing" across the field.
  • Compare view: two entities or two launches, day-0 aligned on the same axes.
  • News & legal web sources alongside X for a second opinion on every narrative.
  • An X account + OG chart cards: auto-drafted visuals, human-approved. The data markets itself.
  • Embeddable badges: live mindshare and Launch Score widgets for blogs and press.
  • Downloadable JSON with a documented schema, so researchers build on it and the site graduates to infrastructure.