AI Voice Benchmarks

	artificialanalysis.ai	This property
Unit of truth	Benchmark scores, latency, price per token	Mentions, sentiment, themes, voices on X
Update cycle	When they re-run evals	Every 2-4 hours; 30-60 min during launches
Can it be gamed?	Increasingly yes (training on test sets, eval-tuning)	Astroturfing is possible but visible: author history, account age, and voice classification expose it
Answers	"Which model is most capable per dollar?"	"Which model is winning hearts? What do people actually complain about? Who should I follow?"
Editorial	Methodology pages	Auto-drafted, human-edited launch reads (the fable-5 page is the prototype)

Kind	Examples	What mindshare means here	Relations
model	Fable 5, GPT-5.2, Gemini 3 Ultra, Grok 4.1	Launch buzz, capability debate, like/dislike themes	belongs to a lab; succeeded-by chains for generations
open-weight	DeepSeek V4, Qwen3.5, Kimi K3, Llama 5	Same as model + deployment/cost/fine-tune chatter; distinct audience	belongs to a lab; tagged open-weight for the filter tab
lab	Anthropic, OpenAI, Google DeepMind, xAI, Meta AI	Trust, drama, policy, hiring, brand-level sentiment	parent of models and products; owns official handles
product	ChatGPT, Claude Code, Cursor, Copilot, Gemini app	Consumer UX sentiment, pricing/limits complaints, switching talk	belongs to a lab (or company); often the loudest vector

Question	Surface	Lifecycle
"What do people like right now?"	Leaderboard + entity Now tab: Vibe Score, themes, voices, change feed	Rolling, refreshed every 2-4h (30 min in launch mode)
"How is the launch going?"	Entity Launch tab: day-0 curves, provisional Launch Score, past-launch overlay	Live for 72h, decays with launch mode
"How do launches compare?"	/launches archive: every fingerprint, every final score	Permanent, append-only, uncopyable
"What's the story?"	Reads: drafted from the same JSON, shipped when the story warrants one	Occasional, frozen on publish

#	Model	Mindshare	Δ 24h	Score	Top theme
1	G GPT-5.2OpenAI	24.1%	▼ 4.2	58	agents
2	F Fable 5 launchAnthropic	21.8%	▲ 18.6	62	novelty
3	G Gemini 3 UltraGoogle DeepMind	16.4%	▼ 2.8	64	multimodal
4	X Grok 4.1xAI	11.2%	▼ 3.1	41	trust & safety
5	D DeepSeek V4 openDeepSeek	9.7%	▲ 1.4	78	cost
6	L Llama 5 openMeta AI	5.8%	—	50	fine-tuning
7	Q Qwen3.5-Max openAlibaba	4.6%	▲ 0.6	66	coding
8	K Kimi K3 openMoonshot	3.4%	▼ 0.9	60	agents

Account	Who	Voice score	Posts
@giffmana	builder	94	3
@karpathy	builder	91	1
@TrungTPhan	creator	73	2
@BillyM2k	power user	68	2
@milesdeutscher	influencer	61	2
@AshleyDCan rising	creator	55	1

Model	Rate /1k	vs median
Fable 5	67	2.3×
Grok 4.1	41	1.4×
GPT-5.2	31	1.1×
Gemini 3 Ultra	24	0.8×
frontier median	29

#	Model	Crowd verdict	✓ good	✗ bad	Score	Receipts
1	F Fable 5	"one-shots OpenClaw configs that used to take an afternoon"	41	3	93	posts ↗
2	G GPT-5.2	"solid but needs more steering on skills"	29	8	78	posts ↗
3	K Kimi K3	"surprisingly capable for the price"	12	4	71	posts ↗
4	X Grok 4.1	"keeps hallucinating tool names"	5	9	36	posts ↗

Identity (classified once)	Who that is · June 9 capture examples	Relationship to Fable 5 (derived)
official	Brand accounts from the registry: @claudeai, @ClaudeDevs	owned
leadership	Execs/founders of a tracked lab: @sama (OpenAI CEO)	rival
employee	Lab staff by bio: researchers, DevRel, engineers	owned or rival, by lab
partner	Commercial/ecosystem ties: integrators, launch partners, sponsored creators	affiliated
builder	People who demonstrably ship: @giffmana, @karpathy	community
researcher	Academics, eval authors, independent benchmarkers	community
creator	Educators, newsletter/video explainers: @TrungTPhan, @AshleyDCan	community
power_user	Heavy daily users with opinions and no product: @BillyM2k	community
media	Journalists and outlets: @verge, @WatcherGuru	community
investor	VCs, analysts, market commentators	community
influencer	Reach-first amplifiers: @milesdeutscher, @MarioNawfal	community
anon	The long tail (88% of Fable 5's authors posted exactly once)	community

Launch read · updating live

Fable 5: the first 24 hours

Anthropic shipped Fable 5 at 18:02 UTC on June 9. Within four hours it had jumped from #7 to #2 in mindshare. The crowd's verdict so far: genuinely novel, mildly magical, and everyone is hitting their token limit.

Sift Intelligence·drafted from 1,107 posts by 976 authors·human-edited·updated Jun 9, 21:47 UTC

1,107mentions · 24h

#2mindshare rank

+9net sentiment

783/hrpeak velocity

The launch arc was textbook: a quiet Monday baseline of roughly 40 mentions a day, a vertical spike when @claudeai posted, and a second wave at 21:00 UTC when the demos started landing. What's unusual is the shape of the sentiment. Most frontier launches open with a polarized split; Fable 5 opened with an 87% neutral wall of news-sharing and a small but unusually durable positive core.

What's landing

Novelty dominates (198 classified posts). The "Mythos" architecture is the hook; the word "different" appears in a fifth of all positive posts. UX (37) and the demo wave (25) follow: the level-devil game clone from @LexnLin did real numbers for a single-shot demo.

Lucas Beyer

@giffmana · Jun 9

𝕏

Actually it's fine guys! I figured out a way, see below. Claude Fable 5 is a great model.

♥ 269👁 20.2Kvia X embed

What's grating

Pricing is the complaint (74 posts), and inside it, one specific grievance: token limits. The single most-engaged negative post of the day is a one-liner about hitting the cap. Trust-and-safety chatter (35) is mostly spillover from the broader alignment debate rather than anything specific to this launch; bugs (17) are scattered and minor so far.

Who's driving it

Official accounts earned the engagement crown honestly: @ClaudeDevs' Build Day post is the top post of the launch, full stop. The amplification layer was crypto-news media (@WatcherGuru's "JUST IN" carried 608 retweets of reach on its own), and the credibility layer came from builders: @karpathy and @giffmana posting hands-on within hours moved the sentiment needle more than any official content.

Methodology: mentions counted via the X counts API across 4 alias queries; sentiment and themes classified per-post by LLM; full pipeline on the methodology page. Source data: models/fable-5/summary.json

Benchmarks measure what models can do. We measure what people think of them.

Why this, why now

The real eval window

Nobody owns this view

Sift's best demo

Positioning: the perception layer next to the capability layer

Name candidates

VibeBench · vibebench.ai

ModelPulse · modelpulse.ai

Zeitgeist · zeitgeist-ai.com

AI Mindshare Index · aimindshare.com

Entity taxonomy: four vectors, one registry

Two clocks, gracefully

Rolling windows, always on

T0-aligned, then frozen

How a visitor moves through it

1 · Leaderboard

2 · Entity page

3 · Launches

4 · Reads & methodology

Surface 1 · The leaderboard

Mindshare · share of tracked conversation · 24h

Fable 5: the first 24 hours

DeepSeek V4, week one: the cost story ate the capability story

Gemini 3 Ultra's quiet climb

Surface 2 · Entity page

Claude Fable 5

Mention volume · hourly (counts API)

What people say · classified from 1,107 posts · 24h

Voices · 24h

Accounts to follow on this topic

Themes are the content engine

What, exactly

Noise or real problem

Who's saying it

Is 74 a lot?

One validated sentence

What's new this window

Surface 2b · The theme page

✗ pricing

Trajectory · posts per hour in theme

Facets · what "pricing" actually contains

Voiced by · who carries this complaint

Vs the field · pricing complaints per 1k classified posts

The receipts · most engaged in theme

The scorecard and the tag index

Crowd scorecard · Fable 5 · 7d

Emergent tags · what the crowd says it's good (and bad) at

Conversations, not announcements

Every author resolves

Thread score

Both stories told

The voice model: identity × relationship

Standing, not followers

Three tabs, community default

One helper, four buckets

Surface 3 · The launch read

Fable 5: the first 24 hours

What's landing

What's grating

Who's driving it

The signature numbers

① The Vibe Score · the standing number

② The Launch Score · one number per launch, forever

③ The Gap · benchmarks vs vibes

④ Builder sentiment · the crowd vs the people who ship

⑤ The change feed · what moved since you last looked

Methodology page · the credibility anchor

Audiences

"What should I actually use?"

"How is our launch going?"

"Give me a citable number"

Honest v1 boundaries, and what's next