Brain Dead — How I Built an AI-Operated Knowledge Vault
Architecture, data model, and decisions behind an Obsidian vault where AI agents do most of the writing.
This is a write-up of my personal knowledge management system. It’s an Obsidian vault where AI agents do most of the writing — processing meeting transcripts, extracting entities, enriching concept files, and generating deliverables. I designed the structure so agents can operate autonomously without breaking things, while I retain full visibility and editorial control through Obsidian’s UI.
If you want to build something similar, this document covers the architecture, the data model, and the decisions behind them.
Much of the architecture is inspired by Joi Ito’s Switchboard/jibrain system — a production knowledge architecture for AI agents running ~4,300 files across multiple machines. The three-tier pipeline (intake/atlas/domains), frontmatter-as-contract, filter-before-read, the reweave pass, and the observation system all trace back to patterns Joi documented. I adapted them for my own workflow and toolchain, but the core ideas are his. Thanks also to Axel Quack for helping me think through many of these decisions.
The Core Idea
Most knowledge management systems are designed for humans to write in. This one is designed for AI agents to write in, with humans as editors and dispatchers.
The practical difference: when I take a meeting, an agent processes the transcript, writes a summary, extracts every person/org/concept mentioned, creates or enriches entity files in a knowledge graph, and tags everything with project affinity. I review the output in Obsidian. The ratio is roughly 90% agent-written, 10% human-edited.
This only works if the vault has strong structural contracts. An agent can’t “just figure out” where to put a file or what metadata it needs. Every decision point needs an explicit rule. That’s what most of this document describes.
Design Principles
Six principles govern the architecture:
-
Agent-first. The vault is an operating system for agents. Frontmatter and folder structure are the API. If a human can’t explain a routing decision in one sentence, the agent won’t get it right either.
-
Sources are sticky. External sources (meeting transcripts, articles, research) land in one place and never move. Routing corrections are frontmatter edits, not file moves. This eliminates an entire class of broken-link and lost-file problems.
-
Entities are global. People, organizations, and concepts live in a flat registry (the “atlas”), not inside any project. They accumulate substance from being referenced across multiple engagements. A concept file about “digital twins” gets richer every time it comes up in a different project context.
-
Filter-before-read. Every file has a
descriptionfield (~150 chars). Agents scan descriptions before opening files. This keeps token usage sane when the vault has hundreds of files. -
Read-before-write. Before creating an entity, agents always check if it already exists. Enrichment over duplication is a hard rule.
-
Defer on doubt. When identity resolution is ambiguous (is “Sarah” from this meeting the same Sarah from that project?), the agent writes to a pending queue rather than guessing. False merges are worse than review queues.
The Six-Folder Architecture
vault/
├── intake/ # All external sources land here. Sticky.
├── atlas/ # Global entity registry
│ ├── people/
│ │ └── _pending/
│ ├── organizations/
│ │ └── _pending/
│ ├── concepts/
│ │ └── _pending/
│ ├── products/
│ └── MOCs/ # Area-level indexes
├── projects/[name]/
│ ├── deliverables/ # What ships to clients
│ ├── atlas/ # Project-specific synthesis
│ ├── MOC.md # Dataview dashboard
│ └── project-index.md # Narrative context
├── domains/[name]/
│ ├── _domain.md # Manifest: subfolders + promotion rules
│ ├── [custom subfolders]/ # Declared per-domain
│ └── MOC.md
├── archive/ # Completed projects
└── _system/ # Architecture docs, templates, schemas
intake/
Every external source enters the vault here: meeting transcripts, web articles, research, client materials. Files use a timestamp-slug naming convention (2026-04-08_1430_call-with-rafe.md) to prevent collisions.
The key design choice: intake is sticky. Files never move out. Project affinity is a YAML frontmatter field (project: ["[[project-name]]"]), not folder location. This means routing errors are one-line edits, not file moves that break wikilinks across the vault.
Value migrates out of intake files via extraction — concepts, people, and organizations get pulled into the atlas as part of the ingestion process. The intake file remains as the citable receipt.
atlas/
A flat, global entity registry. Four subdirectories:
- people/ — One file per real person, identity-resolved. Contains role, company, how-we-met, last-contact, and an appearances log.
- organizations/ — Companies, institutions, agencies. Same enrichment pattern.
- concepts/ — The most valuable part. Each concept file grows over time as different sources mention it. A concept like “Third Spaces” starts with a definition and accumulates evidence from every meeting, article, and research source that touches it.
- products/ — Tools, platforms, software. Created manually (not auto-extracted) when something is significant enough to track.
People, organizations, and concepts also have _pending/ subdirectories. When an agent can’t confidently resolve identity (e.g., a first-name-only mention), the entity goes to pending instead of being silently merged with the wrong file.
MOCs/ contains area-level indexes (Map of Content files) powered by Dataview queries. These are broad practice areas — “AI & Agents,” “Strategy & Business,” “Culture & Trends,” etc. They’re the human browsing interface.
projects/
Time-bound client work. Each project has:
- deliverables/ — What actually ships.
- atlas/ — Project-specific synthesis that doesn’t belong in the global atlas (session designs, working frameworks specific to this engagement).
- MOC.md — A Dataview dashboard that queries global intake and atlas by project tag. This is how you see everything related to a project in one view, even though the files are physically distributed.
- project-index.md — Narrative context: who’s the client, what’s the scope, what phase are we in.
domains/
Ongoing areas of responsibility that aren’t time-bound like projects. I have three: writing, new-business, and vault-ops.
Each domain declares its own structure via a _domain.md manifest file. This is the key design decision: domains are self-describing. Rather than a single domain template, each domain declares what subfolders it needs and what templates apply to them.
Here’s what a domain manifest looks like:
---
type: domain
description: "Published essays, drafts, ideas, and long-form writing projects"
status: active
created: 2026-04-07
folders:
- name: published
template: _system/templates/domain-writing-published-template.md
- name: projects
template: _system/templates/domain-writing-project-template.md
---
The folders: array declares custom subfolders and their associated Templater templates. When an agent creates a file in domains/writing/published/, Templater auto-applies the published template.
Each domain also has promotion rules — explicit criteria for when content moves between subfolders. For example, in the writing domain:
- Intake to published: Essay has shipped externally. Requires
platform,published_date, andurlpopulated in frontmatter. - Intake to projects: Multi-file essay project with at least an outline and some draft material.
Another domain (new-business) has a completely different structure:
folders:
- name: cases # portfolio cards of past engagements
- name: positioning # versioned positioning statements
- name: outputs # rendered deliverables (proposals, decks)
With its own promotion rules: cases get written after an engagement wraps, with specific required fields. Positioning is versioned (new file per rewrite, old versions get status: archived). Outputs are compiled from selected cases for specific audiences.
This flexibility is intentional — domains are too varied to force into a single template. The manifest + promotion rules pattern lets each domain define its own lifecycle while the vault-level contracts (frontmatter, wikilinks, extraction) stay consistent.
Agents always read _domain.md before writing in a domain. It’s the contract for that workspace.
_system/
The vault’s own documentation. Agents consult these before writing anything. The key files:
- architecture.md — Full V2 spec: design principles, folder structure, entry paths, extraction contract, deferred decisions
- schemas.md — Frontmatter contracts for every type (intake, people, org, concept, product, deliverable, domain, etc.)
- routing-tree.md — Decision tree for where content goes (Drop Pattern vs Work Pattern, atlas routing, domain routing)
- agent-rules.md — Writing conventions, extraction contract checklist, identity resolution sequence, wikilink rules, tagging rules, hub assignment rules, project/domain onboarding sequences
- agents.md — Registry of skills and what they do, write to, and when to use them
- templates/ — Templater templates for every file type
- observations/ — Agent-logged friction points and structural concerns
The observation system deserves a mention: agents can log observations to _system/observations/ when they notice gaps, contradictions, or friction. Each observation has a category (gap, contradiction, connection, friction) and a status (pending, resolved). The idea is proactive maintenance — if an agent notices “Templater rules are getting complex” or “three intake files reference an entity that doesn’t exist yet,” it logs the observation rather than trying to fix it in the moment. These get reviewed periodically.
The Data Model
Frontmatter is the contract layer between human and agent. Every file in the vault has YAML frontmatter. The universal minimum:
---
type: concept # concept | person | organization | product | reference |
# meeting | deliverable | hub | domain | observation | system
description: "~150 chars — what is this and why would you search for it?"
status: active # draft | active | archived (or delivered for deliverables)
created: 2026-04-08
tags: [bci, fashion] # topics only, never artifact types
---
Type System
The type field is the primary classifier. Content types (concept, person, organization, product, reference, meeting, deliverable) vs. operational types (hub, domain, observation, system). Each type has its own schema with additional required fields.
Atlas entities additionally require:
canonical_id: "email@example.com" # or slugified name
aliases: ["Nat", "Natalia K"] # all known variants
These power identity resolution — when an agent processes a transcript mentioning “Nat,” it checks canonical_id and aliases across all people files to find the right match.
The description Field
This deserves special mention because it’s load-bearing for agent efficiency. Every file has a description that explains what it is and why you’d care, in ~150 characters. Agents scan descriptions before reading file contents, which keeps token costs manageable at scale.
Good: “Four-criteria model predicting where BCI gains traction first — gaming before work” Bad: “A concept about BCI” (restates the title, tells you nothing)
Full Entity Schemas
Beyond the universal minimum, each entity type has its own schema. Here are all four atlas types:
People (atlas/people/):
---
type: person
description: "~150 chars: who this person is and their relevance"
status: active
created: 2026-04-08
modified: 2026-04-08
canonical_id: "email@example.com" # email preferred, or firstname-lastname-org slug
aliases: ["Nat", "Natalia K", "Dr. Klein"]
email: "email@example.com"
company: "Organization Name"
role: "Title"
projects: ["[[project-name]]"]
how-we-met: "Short context"
last-contact: 2026-04-08
tags: []
---
Body structure: ## Bio (2-4 sentences), ## Appearances (append-only log of meetings/sources where this person came up, with dates and what was discussed).
Organizations (atlas/organizations/):
---
type: organization
description: "~150 chars: what this org is and how it relates to the work"
status: active
created: 2026-04-08
modified: 2026-04-08
canonical_id: "org-name-slug"
aliases: ["Old Name", "Acronym", "Brand"]
projects: ["[[project-name]]"]
tags: []
---
Body structure: ## What is it?, ## Relationship to my work, ## Sources (append-only, same as concepts).
Concepts (atlas/concepts/):
---
type: concept
description: "~150 chars: what this concept is and why it matters"
status: active
created: 2026-04-08
modified: 2026-04-08
canonical_id: "concept-name-slug"
aliases: ["Alternative Name", "Related Phrasing"]
target_project: "primary-project-name"
projects: ["[[project-name]]"]
hub: "[[Area MOC]]"
tags: []
---
Body structure: ## What is it? (definition), ## Why it matters (strategic implication), ## Connections (wikilinked related concepts with relationship descriptions), ## Sources (append-only, each entry a specific paragraph about what that source contributed). The hub field links the concept to one of the area MOCs for browsing. target_project indicates the primary project context.
Products (atlas/products/):
---
type: product
description: "~150 chars: what this product is and how it relates to the work"
status: active
created: 2026-04-08
modified: 2026-04-08
canonical_id: "product-name-slug"
aliases: ["Short Name", "Former Name"]
maker: "[[Organization Name]]"
projects: ["[[project-name]]"]
tags: []
---
Body structure: ## What is it?, ## How I use it, ## Connections, ## Sources. Products are manually created — not part of the extraction contract. They’re tracked when significant enough to warrant an entity.
Intake Schema
Sources in global intake/ have additional fields beyond the universal minimum:
---
type: meeting # or reference, concept, research, etc.
description: "~150 chars"
status: active
source: granola # granola | client | research | manual | clipping
source_url: https://... # for URL sources; omit for paste/file
created: 2026-04-08
agent: summarize-meetings # which skill wrote this
project: ["[[project-name]]"] # array — empty if not project-affiliated
domain: ["[[domain-name]]"] # array — independent of project
extraction_status: processed # always processed at write time
tags: []
---
The project: and domain: fields are arrays of wikilinks. A source can belong to multiple projects or domains. This is how a single intake file appears in multiple project MOC dashboards without being duplicated.
Tags
Tags describe what content is about, never what kind of artifact it is. [bci, fashion, gen-z] yes. [meeting, draft, analysis] no — that’s what the type and status fields are for. Lowercase, hyphenated, no vocabulary control. If clusters emerge, Dataview finds them.
Three rules: (1) topic only, never artifact kind — type: and status: handle that; (2) format normalized — lowercase, hyphens, singular preferred, max ~25 chars; (3) each file gets 3-8 topical tags. There’s no canonical vocabulary file. The agent just follows the format and tags what it sees. Singletons are fine.
Wikilinks
Every mention of a person, organization, concept, or product gets wikilinked: [[Name]]. Always bare-filename format, never path-qualified ([[Name]] not [[atlas/people/Name]]). This means links survive file reorganization.
Wikilinks serve double duty:
- Navigation — Click through related entities in Obsidian’s graph view or backlinks panel.
- Signal — When an unresolved wikilink appears in 3+ files, that’s a trigger to create the entity. For products specifically, red (unresolved) links accumulate intentionally until something is significant enough to warrant its own file.
Content Routing
There are exactly two paths for content to enter the vault:
Drop Pattern (external sources)
Something from outside the vault (meeting transcript, web article, research PDF, client material) needs to enter.
- Source gets written to
intake/with full frontmatter - The extraction contract runs immediately (not deferred):
- Identify all people, organizations, and concepts mentioned
- For each: check if it already exists in atlas — enrich existing file, or create new, or defer to pending
extraction_status: processedgets set on the intake file- Any ambiguous entities deferred to
_pending/are surfaced to me in chat
This is always atomic. The intake skill won’t let you write a source and skip extraction.
Work Pattern (project-native content)
An agent is working inside a project and generates something new — a deliverable, a draft, an analysis. This goes directly to projects/[name]/ or projects/[name]/deliverables/. No intake stage, no extraction. It’s already in context.
Why Two Paths?
The temptation is to route everything through intake. But project-native work (a strategy deck, a session design) isn’t a “source” — it’s output. Running extraction on it would be circular. Keeping the paths separate means each has clear, simple rules.
Identity Resolution
This is the most technically interesting part. When an agent processes a meeting transcript and encounters a person’s name, it needs to decide: is this an existing person in the atlas, or someone new?
The resolution sequence, in priority order:
- Email exact match — strongest signal. If the transcript has an email, match against
email:orcanonical_id:. - canonical_id match — slugified name matches existing ID.
- Filename match —
Firstname Lastname.mdexists. - Alias match — name appears in any file’s
aliases:array. - Single unambiguous partial match — one candidate, context (company, project) makes it clear. Auto-enrich and add the new alias.
- Multiple candidates or ambiguous — defer to
_pending/withstatus: needs_review. - First-name-only, no context — defer to
_pending/.
Steps 6 and 7 are the safety net. The system strongly prefers false negatives (an extra pending entry to review) over false positives (silently merging two different people into one file). In practice, the pending queue gets reviewed weekly and most entries are easy to resolve with a bit of human context.
Concept Enrichment: Where Knowledge Compounds
The concept files in atlas/concepts/ (schema shown above) are where the vault’s real value accumulates. They follow a “Karpathy-style” enrichment model — one file per concept, growing over time as different sources contribute to it.
The Sources section is append-only. Each entry is a specific paragraph about what that particular source contributed — not just a link. This means a concept like “Third Spaces” starts as a stub from one meeting and gradually becomes a rich, multi-sourced knowledge artifact as it comes up across different project contexts over months.
When a concept file accumulates 30+ sources, it’s a candidate for re-synthesis — condensing the evidence into a richer “What is it?” section. This is a deferred design decision (no dedicated skill yet), but the trigger is defined.
MOCs: The Human Browsing Layer
MOCs (Maps of Content) are Dataview-powered dashboards. They’re how I navigate the vault in Obsidian — agents don’t use them directly (they query files with Glob/Grep instead), but MOCs are the primary human interface.
Project MOCs
Every project has a MOC.md that queries global intake and atlas by project tag. Here’s the template:
## Recent Meetings
TABLE description, file.mtime as "Modified"
FROM "intake"
WHERE contains(project, [[Project Name]])
AND type = "meeting"
SORT file.mtime DESC
LIMIT 10
## Concepts
TABLE description
FROM "atlas/concepts"
WHERE contains(projects, [[Project Name]])
SORT file.mtime DESC
## People
TABLE role, company, last-contact
FROM "atlas/people"
WHERE contains(projects, [[Project Name]])
SORT last-contact DESC
## Organizations
TABLE description
FROM "atlas/organizations"
WHERE contains(projects, [[Project Name]])
## Deliverables
TABLE description, status
FROM "projects/project-name/deliverables"
SORT file.mtime DESC
The key insight: project content is found via frontmatter tags, not folder location. The MOC queries intake/ and atlas/ globally, filtered by the project wikilink in each file’s project: or projects: array. Files are physically distributed across the vault but appear together in the MOC.
Area MOCs
atlas/MOCs/ contains 8 broad-area MOCs for concept browsing: AI & Agents, Strategy & Business, Culture & Trends, Methods & Tools, Personal, Meetings, People, Projects. These are practice areas — things I do professionally, not project topics.
Each concept file has a hub: field linking it to one area MOC. The area MOC queries concepts by hub. This gives you a browsable concept library organized by practice area.
There’s also a master concepts MOC (atlas/concepts/MOC.md) with multiple views: recently enriched, deepest (by file size), recently created, by hub area, orphans (no hub assigned), and alphabetical. This is for cross-cutting concept discovery.
New area MOCs are never created automatically. They require all four criteria: 15+ concepts sharing the theme, cross-project provenance (3+ different projects), persistence (months of accumulation), and identity (the user describes themselves professionally as working on this). Tags and master concept files handle everything else.
Domain MOCs
Each domain has its own MOC that queries its subfolders. These are simpler — just Dataview tables over the domain’s declared directories.
Templates and Templater
Templater is the Obsidian plugin that makes file creation zero-friction. When you create a new file in any directory, Templater checks regex rules and auto-applies the matching template with pre-filled frontmatter.
How It Works
Templater uses regex-based folder matching. Each rule maps a directory pattern to a template file:
atlas/people/→ person template (pre-fills type, canonical_id, aliases, email, company, role, etc.)atlas/organizations/→ organization templateatlas/concepts/→ concept template (pre-fills hub, projects, connections, sources sections)atlas/products/→ product templateintake/→ intake template (pre-fills source, extraction_status, project/domain arrays)domains/writing/published/→ published essay template (pre-fills platform, published_date, url)domains/new-business/cases/→ case template- Each domain subfolder gets its own rule
Templates use Templater expressions for dynamic content:
created: <% tp.date.now("YYYY-MM-DD") %>
And <% tp.file.title %> for the filename as the h1 heading.
Template Design
Templates are minimal scaffolds, not content generators. They set up the right frontmatter and body structure, then get out of the way. A person template gives you:
---
type: person
description: ""
status: active
created: 2026-04-10
modified: 2026-04-10
canonical_id: ""
aliases: []
email: ""
company: ""
role: ""
projects: []
how-we-met: ""
last-contact: 2026-04-10
tags: []
---
# Filename
## Bio
[2-4 sentences about who this person is]
## Appearances
- [[meeting-or-source]] (2026-04-10) — what came up in this context
The agent fills in the fields. Templater just ensures the right structure exists from the start.
Scaling Concern
Each domain subfolder needs its own Templater regex rule. With 3 domains averaging 2-3 custom subfolders each, that’s ~10 rules. The architecture has a deferred design decision: if rule count exceeds 25, replace inline regex rules with generated config from _domain.md manifests. Not a problem yet but being tracked.
Agent Integration
Claude Code + Skills
The agents are Claude Code sessions (Anthropic’s CLI tool) with custom skills. Skills are structured prompts that enforce the vault’s contracts:
- /summarize-meetings — Processes Granola meeting transcripts. Reads from a transcript directory, writes summaries to intake, runs the full extraction contract.
- /intake — Ingests any non-meeting external source (URL, paste, file). Same extraction contract.
Skills are the enforcement layer. Rather than giving agents the rules and hoping they follow them, skills encode the rules into executable workflows. The agent can’t write to intake without running extraction, because the skill won’t let it.
MCP Servers
The agent connects to several MCP (Model Context Protocol) servers that extend its capabilities:
- qmd — Local hybrid search engine (BM25 + vector + reranking) over the vault’s markdown files. This is how agents search the vault semantically rather than just by filename.
- GitHub — For code-related work and PR management.
- Slack, Gmail, Calendar — For context about meetings, communications, scheduling.
- Things — Task management integration.
Search Strategy
Agents use different search tools for different queries:
- Glob/Grep — Exact-match lookups. “Find the atlas file for Sarah Chen.” “Which intake files are tagged with this project?”
- qmd — Semantic search. “What do we know about participatory design in retail contexts?” This is the default for broad content discovery.
- ccvault — Searching previous Claude Code conversation history for past decisions and discussions.
Session Continuity
This is the mechanism that makes multi-session work coherent. Each project and domain workspace has a _session.md file — narrative state written by the agent at the end of each work session. It captures what was done, what’s next, open questions, and key decisions.
A session-start hook automatically detects which project or domain folder the agent was launched in and injects the contents of _session.md into the conversation context. The agent picks up exactly where the last session left off.
The onboarding sequence for projects:
_session.md— auto-injected, narrative from last sessionMOC.md— Dataview dashboard (human view of current state)project-index.md— client, scope, phase, statusatlas/— project-specific synthesisdeliverables/— what shipped or is in progress- Recent intake scan — Glob+Grep on
intake/*.mdfiltered by project tag
Step 6 is necessary because MOCs are Dataview queries that only render in Obsidian — agents reading raw files see query syntax, not results. The Glob+Grep scan gives the agent awareness of meeting summaries processed since the last _session.md write.
For domains, the same pattern with domain-specific steps:
_session.md(if it exists)_domain.md— structure and promotion rulesMOC.md— domain index- Promoted subfolders in declared order
Important subtlety: _session.md is written by the agent at session end, not auto-updated. If three meetings get processed between sessions via /summarize-meetings, those meeting summaries exist in intake/ but aren’t reflected in _session.md. The intake scan in step 6 catches this gap.
For cross-session context beyond _session.md, agents can search previous Claude Code conversation history via a dedicated search tool. This surfaces past decisions and discussions that informed the current state.
What’s Working
-
Meeting processing is the killer workflow. A 60-minute meeting goes from raw transcript to fully extracted summary + entity enrichment in about 2 minutes of agent time. The concept files genuinely compound — after 6 projects, some concepts have 10+ sources and are legitimately useful reference material.
-
Identity resolution works better than expected. The alias system handles most real-world name variations. The pending queue catches genuine ambiguities. I review maybe 5-10 pending entries per week.
-
Sticky intake was the right call. Before this, files would get moved between folders and links would break. Now every file has exactly one home and project affinity is just metadata.
-
The atlas as a shared layer across projects is where the real value accumulates. When I start a new engagement, the atlas already has context on people, orgs, and concepts from previous work.
What’s Still Evolving
-
Domains are the newest tier (writing, new-business, vault-ops). The architecture supports them but the patterns are still settling — especially around how domain intake interacts with global intake.
-
Concept re-synthesis doesn’t have a dedicated skill yet. When a concept file gets 30+ sources, it needs a human or agent to distill the sources into a richer definition. The trigger exists as a deferred design decision.
-
Legacy migration is ongoing. The vault was restructured from V1 (project-scoped everything) to V2 (global intake + flat atlas) in April 2026. Most content has been migrated but some legacy organization and research files are still pending.
-
Observation system (
_system/observations/) exists but isn’t heavily used yet. The idea is that agents log structural friction (like “Templater rules are getting complex”) so it can be addressed proactively. -
Scaling — with ~750 files, search and agent operations are fast. The main scaling concern is Templater regex rules (one per domain subfolder) and concept file sizes as sources accumulate.
If You’re Building Something Similar
A few things I’d emphasize:
Start with the data model, not the folder structure. The frontmatter schemas are what make everything else work. If your agents know what fields every file needs, routing and search become straightforward.
Make intake sticky from day one. Every system I’ve seen that moves files between folders eventually breaks links, loses files, or creates ambiguous routing. Metadata-based organization is more robust than folder-based.
Identity resolution needs explicit rules, not vibes. “The agent will figure out if it’s the same person” is a recipe for silently merged entities. The 7-step sequence with a pending queue is more engineering than most knowledge systems do, but it pays off immediately.
Skills > instructions. Don’t give agents a page of rules and hope they follow them. Encode the rules into executable workflows (skills/tools/functions) that make the wrong thing hard to do.
The atlas is the moat. Concept files that accumulate evidence from multiple sources over months become genuinely unique knowledge artifacts. This is the thing a flat note-taking app can’t do.