# 01 · Repo scaffold

Private repo `ai-voice-bench`. pnpm workspace, TypeScript strict everywhere, biome, vitest.

```
ai-voice-bench/
├── package.json              # workspace root: test/typecheck/lint/format scripts fan out
├── pnpm-workspace.yaml
├── biome.json                # 100-char, no semicolons, ES5 trailing commas, organized imports
├── tsconfig.base.json        # strict, noUncheckedIndexedAccess, exactOptionalPropertyTypes
├── .env.example              # X_BEARER_TOKEN, LLM_API_KEY, S3_* (endpoint/key/secret/bucket), SLACK_WEBHOOK
├── packages/
│   ├── contracts/            # zod schemas + inferred types for every artifact (02)
│   ├── scores/               # vibe-score.v1.ts, launch-score.v1.ts — pure, zero deps (05)
│   ├── collector/            # Collector interface + impls + budget guards (03)
│   ├── enrich/               # LLM classification, prompts/, validator (04)
│   ├── rollup/               # windows, mindshare, events diff, launch fsm (05)
│   └── storage/              # S3-API client wrapper, atomic publish, raw/derived/public layout
├── workers/
│   ├── scheduler/            # cron entry: counts hourly, state machine eval, enqueue content pulls
│   ├── ingest/               # queue consumer: content pulls → raw → enqueue enrich
│   ├── enricher/             # queue consumer: classify → derived
│   └── publisher/            # rollup → public JSON atomic swap
├── site/                     # Astro static site (06)
├── registry/                 # entity JSON files (seed provided) — future public repo, keep import-clean
├── fixtures/                 # fable-summary-v1.json + goldens
└── ops/
    ├── wrangler/             # one wrangler.toml per worker (Workers + Queues + R2 + D1 bindings)
    └── schema.sql            # D1 schema below
```

## Cloudflare resources

- **R2 bucket** `avb-data`: prefixes `raw/`, `derived/`, `public/`. Only `public/` is exposed (custom domain or Pages function proxy with cache headers).
- **Queues**: `avb-content-pulls`, `avb-enrich`. Max retries 3, DLQ `avb-dlq`.
- **D1** `avb-state`:

Storage tiers (rationale in `../technical.html` § Storage): R2 = append-only lake and source of truth; D1 = the hot relational working set, **disposable and rebuildable from raw**; public JSON = the product. Readers never touch D1.

```sql
CREATE TABLE posts_seen (post_id TEXT PRIMARY KEY, entity TEXT NOT NULL, first_seen TEXT NOT NULL);
-- the queryable analytical tier: one flat row per classified post; pruned past 90 days.
-- All rollups are SQL over this table + hourly_counts; raw JSONL is never re-read at publish time.
CREATE TABLE enriched_posts (
  post_id TEXT NOT NULL, entity TEXT NOT NULL, created_at TEXT NOT NULL,
  author_id TEXT NOT NULL, conversation_id TEXT, is_reply INTEGER NOT NULL,
  sentiment TEXT NOT NULL, confidence REAL NOT NULL, about_entity INTEGER NOT NULL,
  spam_score REAL NOT NULL, engagement_score INTEGER NOT NULL,
  likes_json TEXT NOT NULL, dislikes_json TEXT NOT NULL,        -- theme slugs
  capabilities_json TEXT NOT NULL, good_at_json TEXT NOT NULL, bad_at_json TEXT NOT NULL,
  prompt_version TEXT NOT NULL,
  PRIMARY KEY (post_id, entity)
);
CREATE INDEX idx_enriched_entity_time ON enriched_posts (entity, created_at);
CREATE INDEX idx_enriched_convo ON enriched_posts (conversation_id);
CREATE TABLE hourly_counts (entity TEXT NOT NULL, hour TEXT NOT NULL, mentions INTEGER NOT NULL,
  PRIMARY KEY (entity, hour));   -- pruned past 35 days
CREATE TABLE classifications (post_id TEXT NOT NULL, prompt_version TEXT NOT NULL,
  result_json TEXT NOT NULL, classified_at TEXT NOT NULL, PRIMARY KEY (post_id, prompt_version));
CREATE TABLE authors (author_id TEXT PRIMARY KEY, class TEXT NOT NULL, is_builder_panel INTEGER
  DEFAULT 0, classified_at TEXT NOT NULL);
CREATE TABLE budget_ledger (day TEXT NOT NULL, entity TEXT NOT NULL, posts_read INTEGER NOT NULL,
  PRIMARY KEY (day, entity));
CREATE TABLE entity_state (entity TEXT PRIMARY KEY, state TEXT NOT NULL DEFAULT 'baseline',
  state_since TEXT NOT NULL, launch_flag_until TEXT, baseline_hourly_median REAL);
CREATE TABLE events_emitted (event_key TEXT PRIMARY KEY, emitted_at TEXT NOT NULL);
```

- **Crons**: scheduler `0 * * * *` (counts + state eval + enqueue); publisher runs at the end of each enrich batch via queue, plus `*/30 * * * *` safety publish.
- **Pages**: `site/` deploys to Pages; data fetched at runtime from `public/` JSON (no rebuild on data refresh).

## Conventions

- Workers stay thin: parse env, wire deps, call package functions. All logic lives in `packages/` so vitest covers it without Miniflare except for integration tests (use `@cloudflare/vitest-pool-workers` for those).
- Every package exports `version` constants where behavior is versioned (prompt, score, schema).
- `log(level, msg, fields)` helper emits one-line JSON; no other console use.
- Secrets only via wrangler secrets / `.env` locally. CI: typecheck + lint + test on every push.
