# AI Voice Benchmarks · Agent Build Pack

You are building **v0 → v1** of a public website that tracks AI-model mindshare and sentiment on X. artificialanalysis.ai measures what models can do; this property measures what people think of them. The product decisions are final and documented; your job is execution quality.

## Read first, in this order

1. `../product.html` — what the site is, the surfaces, full-scale mockups (match them)
2. `../technical.html` — architecture, X API budget, compliance rules, repo posture
3. `../rollout.html` — v0/v1 scope, launch playbook
4. This folder — the exact specs you implement: `01`-`07` (subsystems + acceptance), `08` (runtime wiring), `09` (ops runbook), `10` (gotchas — read BEFORE writing collector/embed/queue code, not after something breaks)

Live copies: https://ai-voice-benchmarks.pages.dev

## When documents disagree

Precedence: `02-data-contracts` > `05-rollups-scores` > `04-enrichment` > `03-collector` > `08-workers` > `06-site`; the product.html mockups govern UI questions; `10-gotchas` overrides anything it contradicts (it encodes reality). If a genuine conflict survives that ordering, stop and ask, citing both passages. Do not improvise schema or formula changes: those are version bumps with golden updates (09 §3).

## Non-negotiable constraints

1. **New private repo**, standalone. Zero imports from the Sift monorepo. Name placeholder: `ai-voice-bench`.
2. **Public JSON artifacts never contain tweet text, handles, bios, or display names.** Post IDs, author IDs, and aggregates only. Tweets render client-side via official X embeds (oEmbed / widgets.js) with a styled link-out fallback.
3. **The Collector interface is the privacy boundary** (`03-collector.md`). Nothing downstream of raw storage may know which transport produced a row. The `source` tag stays in private artifacts.
4. **Anti-gaming logic stays in this repo** (spam scoring, author classification, amplification detection). It is never published, never described in public artifacts beyond the methodology page's stated policy.
5. **Budget guards are hard guards** (`03-collector.md` §5). The pipeline must be incapable of exceeding per-entity daily caps and the global monthly cap, even if cron misfires or queues replay.
6. **Storage is the S3 API** (R2 via aws4fetch or the S3 SDK). No R2-only bindings in the storage layer, so the bucket is swappable.
7. **Determinism**: prompts and score formulas are versioned files. Rollups are pure functions of enriched rows. Any published chart must be reproducible from raw + versions.
8. **Schema discipline**: every public artifact carries `schema_version`. Additive changes only within a version.
9. Style: TypeScript strict, no `any`, arrow functions, biome (100-char width, no semicolons), vitest. Structured JSON logs via a tiny `log()` helper; no bare console noise.

## Build order (each step ends green)

| Step | Deliverable | Gate |
|---|---|---|
| 1 | Repo scaffold (`01`) | `pnpm install && pnpm test && pnpm typecheck` pass on empty packages |
| 2 | Contracts package (`02`) | zod schemas validate `fixtures/fable-summary-v1.json` |
| 3 | Scores package (`05`) | golden tests pass; formulas pure + versioned |
| 4 | Collector + raw storage (`03`) | FixtureCollector replays fixtures; budget-guard tests pass; x-official impl behind env flag |
| 5 | Enrichment worker (`04`) | classification of fixture posts produces stable goldens with mocked LLM; summary validator rejects the bad fixture |
| 6 | Rollup + publish (`05`) | end-to-end fixture run emits `index.json` + `summary.json` + `events.json` matching snapshots; atomic publish verified |
| 7 | Site (`06`) | leaderboard + entity + methodology pages render from fixture JSON; mockup parity; embeds degrade gracefully offline |
| 8 | Launch machinery (`05` §4-5) | state-machine unit tests; launch fingerprint + provisional score from fixture timeline |
| 9 | Workers wiring + ops (`08`, `09` §1) | local `pnpm dev:pipeline` full tick → publish against fixtures; provisioning scripted; idempotency-under-redelivery tests pass |
| 10 | Acceptance (`07`) | full checklist; then live Fable 5 soak |

Work on a branch per step; commit as you go. After every step, run the full gate before moving on.

## v0 scope (ship this)

One entity (`fable-5`) end to end on real infra: counts hourly, content pulls 4h, enrichment, rollups, public JSON, entity page + methodology + "how we're open" page on a placeholder domain, launch fingerprint + provisional Launch Score from the June 9 capture. Leaderboard renders with one row.

## v1 scope (next)

~20 entities (registry seed provided), cross-entity mindshare + leaderboard, launch-mode state machine live, change feed + ticker, Vibe Score on the board, reads feed, OG cards, public registry repo split.

## What you do NOT build

- No accounts, no auth, no backend API, no database in the read path
- No Reddit, no news ingestion (v2)
- No builder panel weighting in Vibe Score (v1 is crowd-only; the formula file is already structured for v2)
- No filtered stream
- No custom embed renderer: official embeds or fallback card, nothing else

## Done means

`07-acceptance.md` checklist fully green, including: 7-day unattended soak on the real entity, numbers reconciled against `fixtures/fable-summary-v1.json` tolerances, Lighthouse ≥ 90 on all pages, and zero raw-text leaks in public artifacts (automated check).
