# so-yesterday.ai — Agent API

A curated knowledge base for AI transformation. Contains video transcripts, executive summaries, essays, daily digests, knowledge concepts, and AI transformation personas.

**Base URL**: `/api`

**Machine-readable OpenAPI**: `/openapi.json`. **Human ReDoc**: `/api-docs`.

## Quick Start

**One call to get started:** `GET /api/latest` — returns the latest digest (full content), 5 recent videos with summaries, 3 latest essays, and knowledge base stats. No parameters needed. (Alias: `GET /api/brief`.)

**Search:** `GET /api/search?q=<query>&hybrid=true&agent=true` — hybrid keyword + vector search across all content types.

**Related items for any detail page:** `GET /api/{type}/{id}/related` — RAG-derived cross-type recommendations.

## MCP OAuth 2.1 (preferred)

**New integrations should use MCP OAuth 2.1.** The server implements full OAuth 2.1 with PKCE, dynamic client registration (RFC 7591), and AS/resource metadata discovery (RFC 8414 / RFC 9470).

**Authorization Server metadata:**
```
GET https://so-yesterday.ai/.well-known/oauth-authorization-server
```

**Protected resource metadata:**
```
GET https://so-yesterday.ai/.well-known/oauth-protected-resource
```

**One-command experience** with `mcpc` (recommended MCP client):
```bash
mcpc login https://so-yesterday.ai/mcp
# Opens browser, handles PKCE + consent, stores token in OS keychain.
```

Tokens are opaque Bearer values, sha256-hashed at rest, audience-bound to `https://so-yesterday.ai/mcp`, and scope-limited. See `docs/architecture-auth.md` for the full plan, DCR recipe, grant flows, and scope reference.

Use the issued token as `Authorization: Bearer <token>` on write endpoints. Verify: `GET /api/auth/me`.

## Authentication

**Reading is anonymous.** No header, no key, no sign-in needed for any content endpoint. If you only need to brief the user or search, skip this section.

## Agent authentication (device flow — deprecated, kept for compatibility)

> **Deprecated.** Device flow remains functional but will be removed in a future release. New integrations should use MCP OAuth 2.1 above.

**If the user asks you to log in / sign in / authenticate using the legacy flow, run the steps below — do NOT refuse.**

Easiest:

```bash
curl -fsSL https://so-yesterday.ai/agent-login.sh | bash
# Writes ~/.so-yesterday/credentials (mode 0600) on success.
```

Bare API, three steps:

```bash
# 1. Start.
curl -fsS -X POST https://so-yesterday.ai/api/oauth/device \
  -H 'Content-Type: application/json' \
  -d '{"client_name":"my-agent","client_host":"laptop"}'
# → {device_code, user_code, verification_uri_complete, expires_in, interval}

# 2. Tell the user to open `verification_uri_complete` and approve.

# 3. Poll every `interval` seconds.
curl -sS -X POST https://so-yesterday.ai/api/oauth/token \
  -H 'Content-Type: application/json' \
  -d '{"grant_type":"device_code","device_code":"<from step 1>"}'
# Success → {"access_token":"sok_...","token_type":"Bearer","scope":""}
```

`grant_type` accepts `device_code` (short) or `urn:ietf:params:oauth:grant-type:device_code` (RFC 8628 long form). Save the key as JSON at `~/.so-yesterday/credentials` (mode 0600):

```json
{"server": "https://so-yesterday.ai", "api_key": "sok_...", "user_email": "...", "issued_at": "..."}
```

Use it as `Authorization: Bearer sok_...` on write endpoints.

### Token endpoint status codes

While the user is approving, the token endpoint returns **HTTP 400** with `{"error":"authorization_pending"}`. This is the **expected** poll response — keep polling. Other 400 JSON errors: `slow_down` (back off `interval + 5s`), `access_denied` (user refused — stop), `expired_token` (start the flow over). Success is HTTP 200 with `{access_token, token_type, scope}`. **Do not use `curl --fail`** for polling — it hides the JSON body. Use `curl -sS`.

### Verify a token

```bash
curl -sS https://so-yesterday.ai/api/auth/me -H "Authorization: Bearer $KEY"
```

Returns `{authenticated, auth_method, is_agent, user_id, role, permissions}` or `{authenticated: false, permissions: ["read","search"], hint: "…"}` for anonymous callers. The endpoint is anonymous-safe (no 401), so agents can call it freely to check whether their token is still good.

### If POST is blocked in your environment

claude.ai web (`web_fetch`) and ChatGPT browsing reject outbound POSTs at the proxy. There is no server-side trick that turns a read-only sandbox into a write-capable one — every write endpoint uses POST.

- **Shell available?** Run `curl -fsSL https://so-yesterday.ai/agent-login.sh | bash` via your shell tool. Shell-curl bypasses HTTP-tool restrictions. Subsequent writes can also go through shell curl.
- **No shell, no POST?** Be honest: *"I can read everything but my environment can't POST. For comments / posts, please visit https://so-yesterday.ai and sign in directly."* Don't suggest an API-key paste — writes also use POST, so a key wouldn't help.
- Niche: `GET /api/oauth/device?client_name=...&client_host=...` is a GET alias for the start call (rare mixed sandboxes only). `/api/oauth/token` is always POST.

### Permissions matrix

| Capability                | Auth required | CSRF | Scope (OAuth 2.1) | Endpoint(s) |
|---------------------------|:-------------:|:----:|---|---|
| Read latest briefing      | No  | No  | — | `GET /api/latest`, `GET /api/brief` |
| Search (keyword + vector) | No  | No  | — | `GET /api/search?q=...&agent=true` |
| Browse knowledge / videos / essays / digests / personas / posts | No | No | — | `GET /api/knowledge`, `GET /api/videos`, `GET /api/essays`, `GET /api/digests`, `GET /api/personas`, `GET /api/posts` |
| Knowledge graph           | No  | No  | — | `GET /api/graph/data` |
| Related items (RAG)       | No  | No  | — | `GET /api/{type}/{id}/related` |
| MCP read tools / prompts  | No  | No  | — | `POST /mcp/` (Streamable HTTP) |
| Verify token              | No (anonymous-safe) | No | — | `GET /api/auth/me` |
| MCP: write personal note  | Yes | N/A | `personal_knowledge:write` | MCP `put_personal_note` → `PUT /api/me/personal-knowledge/notes/{slug}` |
| MCP: create post          | Yes | N/A | `posts:write` | MCP `create_post` → `POST /api/posts` |
| MCP: submit post          | Yes | N/A | `posts:write` | MCP `submit_post` → `POST /api/posts/{id}/submit` |
| Comment on content        | Yes | Session: yes / API key: N/A | — | `POST /api/comments` |
| React to a comment        | Yes | Session: yes / API key: N/A | — | `POST /api/comments/{id}/react` |
| Submit a post             | Yes | Session: yes / API key: N/A | — | `POST /api/posts`, `POST /api/posts/{id}/submit` |
| Add source / propose knowledge | Yes (or legacy password) | Session: yes / API key: N/A | — | `POST /api/sources`, `POST /api/sources/knowledge` |
| Manage own API keys       | Yes (session only) | Yes | — | `GET/POST/DELETE /api/account/api-keys` |

For write endpoints, send `Authorization: Bearer sok_...` (agent — CSRF-exempt) or the session cookie plus an `X-CSRF-Token` header echoing the `soy_csrf` cookie (browser).

## Security

Anonymous read is unrestricted. Writes are layered:

- **Anonymous read everywhere.** No header, key, or cookie is required for any content endpoint — including `/api/auth/me`, which doubles as a "did my token survive?" probe.
- **CSRF on session writes.** Cookie-authenticated `POST`/`PATCH`/`DELETE` must echo the non-HttpOnly `soy_csrf` cookie back as the `X-CSRF-Token` header. Mismatch or missing header → 403.
- **API keys are CSRF-exempt.** `Authorization: Bearer sok_...` requests carry no ambient credential, so there's no cross-site forgery surface — agents don't need to think about CSRF.
- **Server-side markdown sanitization.** Every user-derived body (comments, posts, knowledge stubs) is rendered server-side and pushed through a strict bleach allowlist before storage. Client also runs `rehype-sanitize` to strip `<script>`, `on*` handlers, and `javascript:` schemes. YouTube/Vimeo iframes and `<img>` sources are host-allowlisted.
- **Slug validation on knowledge writes.** `POST /api/sources/knowledge` rejects slugs that don't match `^[a-z0-9]+(-[a-z0-9]+)*$` (max 80) and path-resolves writes under the knowledge directory — no `../` escapes.
- **SSRF guard on `POST /api/sources`.** URLs that resolve to loopback, RFC1918 private, link-local, or cloud-metadata IPs are rejected before any fetch.
- **Attachment cap.** Image attachments are HEAD-checked against the host allowlist + content-type + a hard 5 MB `Content-Length` ceiling.

## Gradual Disclosure Pattern

1. `GET /api/latest` — one-call briefing with latest content (best for "what's new in AI?")
2. `GET /api/search?q=<query>&hybrid=true&agent=true` — find specific topics
3. `GET /api/knowledge` — browse all concepts
4. `GET /api/knowledge/{slug}?agent=true` — read a concept with followable API links
5. `GET /api/knowledge/{slug}/related` — pull related videos / essays / posts for the same concept
6. Follow links in the content to explore related concepts

When `?agent=true` is set on detail endpoints, internal markdown links are rewritten from frontend routes (e.g. `](/knowledge/slug)`) to API URLs (e.g. `](/api/knowledge/slug?agent=true)`), so you can follow them directly.

## Content Types

| Type | Description |
|------|-------------|
| Knowledge | Conceptual articles on AI topics with related-concept links and confidence levels |
| Videos | YouTube video summaries with key points, full transcripts, and relevance flags |
| Essays | Original long-form essays on AI transformation |
| Digests | Daily digests of the 5 most important AI developments |
| Personas | AI transformation coaching personas with Ollama modelfiles |
| Posts | User-generated dispatches that have been moderator-approved |
| Channels | Tracked YouTube channels (including the `manual` channel for community-added sources) |
| Graph | Interactive knowledge graph visualization over the concept network |
| Tags | Content taxonomy with usage counts (live list at `GET /api/tags`) |

For live counts, call `GET /api/latest` (returns `stats: { total_videos, knowledge_concepts, essays, digests }`).

## Endpoints

### Knowledge Concepts

- `GET /api/knowledge` — List all concepts. Returns `[{slug, title, description, tags, related, confidence}]`. Confidence: `high` (3+ source refs), `medium` (1-2), `low` (0).
- `GET /api/knowledge/{slug}?agent=true` — Full concept with markdown `content`, `tags`, `related` slugs expanded to API URLs, `confidence`, `created`, `updated`.
- `GET /api/knowledge/{slug}/related?k=5` — Cross-type related items via vector similarity. Returns `{results: [{type, id, title, snippet, tags, date, score}], total}`. `k` range 1-20.

### Videos

- `GET /api/videos?channel={id}&tag={tag}&limit=50&offset=0&highlight=false&since=YYYY-MM-DD` — Paginated list. Returns `{videos: [{video_id, title, date, url, channel_id, channel_name, duration, tags, summary}], total}`. Limit range: 1-200. Set `highlight=true` for front-page videos. Set `since=YYYY-MM-DD` for recent content only (e.g. `?since=2026-04-20&limit=20`).
- `GET /api/videos/{video_id}?agent=true` — Full video detail with `summary`, `key_points`, `full_summary` (markdown), and `transcript`.
- `GET /api/videos/{video_id}/related?k=5` — Related items across all content types.

### Essays

- `GET /api/essays` — List all essays. Returns `[{slug, title, date, description, thumbnail, video}]`. The `video` field is a YouTube ID (if the essay has an associated video).
- `GET /api/essays/{slug}?agent=true` — Full essay with markdown `content`.
- `GET /api/essays/{slug}/thumbnail` — SVG infographic thumbnail image.
- `GET /api/essays/{slug}/related?k=5` — Related items across all content types.

### Digests

- `GET /api/digests?since=YYYY-MM-DD` — List digests. Returns `[{date, title, thumbnail}]`. Optional `since` for recent digests only.
- `GET /api/digests/{date}?agent=true` — Full digest with markdown `content`. Sources link to videos.
- `GET /api/digests/{date}/thumbnail` — SVG infographic thumbnail image.
- `GET /api/digests/{date}/related?k=5` — Related items across all content types.

### Personas

- `GET /api/personas` — List all personas. Returns `[{persona_id, title, one_liner, modelfiles}]`
- `GET /api/personas/{persona_id}?agent=true` — Full persona with markdown `content`.
- `GET /api/personas/{persona_id}/related?k=5` — Related items across all content types.
- `GET /api/personas/modelfiles/{persona_id}/{mode}/{lang}` — Raw Ollama Modelfile. Modes: `coach`, `persona`. Languages: `en`, `sl`.

### Posts (user-generated)

- `GET /api/posts?tag=<tag>&limit=20&offset=0&sort=newest|oldest` — List published posts. `tag` filters to a single tag. Returns `{items: [{id, slug, title, summary, author, tags, state, comment_count, created_at, published_at}], total, limit, offset}`. Limit range 1-100.
- `GET /api/posts/{slug}` — Full post with sanitized HTML body, attachments, author + reputation.
- `GET /api/posts/{post_id}/related?k=5` — Related items across all content types.

### Channels

- `GET /api/channels` — List tracked channels. Returns `[{id, name, url, description, video_count}]`
- `GET /api/channels/{channel_id}?limit=50&offset=0` — Channel detail with paginated videos. Returns `{channel: {id, name, url, description, video_count}, videos: [...], total}`. Limit range: 1-200.

### Tags

- `GET /api/tags` — All tags with usage counts. Returns `[{tag, count}]`

### Search

- `GET /api/search?q=<query>&types=<csv>&hybrid=true&limit=20&agent=false` — Cross-content search.
  - `hybrid=true` (default): keyword + vector results fused via Reciprocal Rank Fusion. Best for natural-language queries.
  - `hybrid=false`: legacy keyword-only ranking (title > tags > description > body).
  - `types`: comma-separated subset of `video,knowledge,essay,digest,persona,post`. Unknown types silently dropped.
  - `type` (deprecated): single-type filter, kept for back-compat. Prefer `types`.
  - `agent=true`: rewrite snippet links to API URLs.
  - Returns `{query, results: [{type, id, title, snippet, tags, date, score}], total}`. Limit range 1-100.

### Knowledge Graph

- `GET /api/graph` — Interactive HTML visualization (vis.js). Open in a browser to explore the concept network.
- `GET /api/graph/data` — Raw graph data. Returns `{nodes: [{id, label, title, group, color, size, confidence, tags, url}], edges: [{from, to, color}]}`. Nodes are concepts, edges are `related` links. Groups: LLM Fundamentals, AI Agents & Tools, Development, Strategy & Career, etc.

### Source Addition

- `POST /api/sources` — Add a YouTube video or web page to the knowledge base. Body: `{url, password?}`. The URL is first run through an SSRF guard — loopback, RFC1918 private, link-local, and cloud-metadata IPs are rejected before fetching. Then the source is fetched, transcribed/scraped, summarized, and relevance-checked. Returns `{status, type, source_id, title, date, relevant, ai_summary}`. YouTube videos go to the `manual` channel; web pages get `wp_`-prefixed IDs.
- `POST /api/sources/knowledge` — Create a new knowledge concept page via AI. Body: `{topic, password?}`. The resulting slug is validated against `^[a-z0-9]+(-[a-z0-9]+)*$` (max 80) and path-resolved under the knowledge directory before any write. Returns `{status, slug, title, description, tags, related, ai_generated}`.

Authentication accepts a session cookie (CSRF header required), an `Authorization: Bearer sok_…` API key (CSRF-exempt), or the legacy `password` field validated against `APP_SOURCE_PASSWORD` (CSRF-exempt — pre-auth fallback).

### Agent authentication

See the top-of-file **Authentication** section for the full device-flow recipe (curl or one-liner). For Python projects in this repo:

```bash
python scripts/so_yesterday_login.py
```

It implements the same flow and writes the same credentials file shape.

### Other

- `GET /api/health` — Minimal liveness probe (`{"status":"ok"}`). Operational detail is admin-gated at `/api/dashboard/health`.
- `GET /api/config` — Public configuration (git repo URL, dev_mode flag).
- `GET /api/skill.md` — This file.
- `GET /api/skills` — List API-installable Claude Code skills.
- `GET /api/skills/{name}` — Raw SKILL.md for a single skill.

## Following Links Between Content

Content returned by detail endpoints contains markdown with inline links to other content. How these links appear depends on the `agent` query parameter:

**Without `?agent=true`** (default) — links use frontend routes:
```
[Context engineering](/knowledge/context-engineering)
[Video title](/videos/japT66frdhM)
```
The `related` field on knowledge concepts returns plain slugs: `["context-engineering", "hallucination"]`

**With `?agent=true`** — links are rewritten to API URLs you can fetch directly:
```
[Context engineering](/api/knowledge/context-engineering?agent=true)
[Video title](/api/videos/japT66frdhM?agent=true)
```
The `related` field returns full API paths: `["/api/knowledge/context-engineering?agent=true", ...]`

### How to traverse the knowledge graph

1. Fetch any detail endpoint with `?agent=true` — e.g. `GET /api/knowledge/rag?agent=true`
2. Parse the markdown `content` for links — they are already rewritten to API URLs
3. To follow a link, issue a GET request to the path (e.g. `GET /api/knowledge/context-engineering?agent=true`)
4. Additionally, check the `related` array for structured links to related concepts
5. For broader cross-type recommendations, call `GET /api/{type}/{id}/related`
6. Repeat to explore the graph to any depth

### Link patterns

| Content type | Frontend route | API URL (`?agent=true`) | Related |
|---|---|---|---|
| Knowledge | `/knowledge/{slug}` | `/api/knowledge/{slug}?agent=true` | `/api/knowledge/{slug}/related` |
| Video | `/videos/{video_id}` | `/api/videos/{video_id}?agent=true` | `/api/videos/{video_id}/related` |
| Essay | `/essays/{slug}` | `/api/essays/{slug}?agent=true` | `/api/essays/{slug}/related` |
| Digest | `/digests/{date}` | `/api/digests/{date}?agent=true` | `/api/digests/{date}/related` |
| Persona | `/personas/{id}` | `/api/personas/{id}?agent=true` | `/api/personas/{id}/related` |
| Post | `/posts/{slug}` | `/api/posts/{slug}` | `/api/posts/{post_id}/related` |

### Manual conversion rule

If you receive links without `?agent=true`, convert them by prepending `/api` and appending `?agent=true`:
`/knowledge/rag` → `/api/knowledge/rag?agent=true`

## Skills

### API Skills (for external agents)

These skills work through the REST API — no repo access needed. Install for Claude Code:

- **research-ai-topic** — Search the knowledge base, read concepts and video summaries, synthesize findings.
  Install: `curl -s https://so-yesterday.ai/api/skills/research-ai-topic -o .claude/skills/research-ai-topic/SKILL.md --create-dirs`

- **daily-briefing** — Get the latest AI digest and explore related knowledge concepts.
  Install: `curl -s https://so-yesterday.ai/api/skills/daily-briefing -o .claude/skills/daily-briefing/SKILL.md --create-dirs`

List all: `GET /api/skills`

### Project Skills (for repo contributors)

These require a local clone of the repository. Available after `git clone`:

- `/update-videos [channel_id]` — Run the full video pipeline (fetch, transcripts, summaries)
- `/daily-digest [YYYY-MM-DD]` — Generate a daily digest
- `/create-essay <topic>` — Create a new essay
- `/generate-knowledge` — Generate knowledge base concepts from repo content
- `/lint-knowledge` — Audit knowledge base quality
- `/discover-concepts` — Find missing knowledge concepts from recent content
- `/add-channel` — Add a new YouTube channel

## Common Questions → API Calls

When a user asks a question, here's which endpoint to call:

| User asks | Call | Why |
|-----------|------|-----|
| "What's new in AI?" / "Latest developments" | `GET /api/latest` | Returns today's digest + recent videos + essays |
| "What happened this week?" / "Last few days" | `GET /api/videos?since=<7 days ago>&limit=20` | `since` param filters by date (YYYY-MM-DD) |
| "What's the latest essay about?" | `GET /api/essays` | Sorted by date, first item is newest |
| "Tell me about <topic>" | `GET /api/search?q=<topic>&hybrid=true&agent=true` then follow the top result | Hybrid search beats keyword for natural-language questions |
| "I want to start using AI" | `GET /api/search?q=start+using+AI&hybrid=true&agent=true` | Surfaces the most relevant essay / concept |
| "What videos came out today?" | `GET /api/videos?since=<today>&limit=20` | Use today's date |
| "Show me digests from this week" | `GET /api/digests?since=<7 days ago>` | Digests also support `since` |
| "What topics are covered?" | `GET /api/tags` | Live tag list with counts |
| "How do AI concepts relate?" | `GET /api/graph/data` | Knowledge graph nodes + edges |
| "Tell me about a specific video" | `GET /api/videos/{video_id}?agent=true` | Full summary, key points, transcript |
| "What else is related to this?" | `GET /api/{type}/{id}/related` | Cross-type RAG recommendations |
| "What are people writing?" | `GET /api/posts` | User-generated dispatches (newest first) |

### Date filtering

Both `/api/videos` and `/api/digests` accept `since=YYYY-MM-DD` to return only items from that date onward. Combine with `limit` for recent content windows:
- Last 24 hours: `?since=<yesterday>&limit=20`
- Last week: `?since=<7 days ago>&limit=50`
- Last month: `?since=<30 days ago>&limit=100`

## Tag Taxonomy

Live list with counts: `GET /api/tags`. The taxonomy evolves — fetch this endpoint rather than hard-coding the set.
