What is a content operating system?

A content operating system is a code-based infrastructure for creating, distributing, and compounding content across platforms. Instead of a content calendar or spreadsheet, you build voice rules, platform playbooks, and production pipelines in version-controlled files.

Voice DNA is the foundational layer of the 3-tier system. It encodes your cadence, vocabulary, anti-patterns, and identity markers into files that AI agents read before generating any content. Same voice, every platform, every time.

How do platform playbooks work?

Each platform gets its own playbook that inherits from your Voice DNA. The LinkedIn playbook knows paragraph structure, emoji usage, and CTA patterns. The TikTok playbook knows 16-second structures and hook formats. Same voice, different container.

How does the recursive feedback loop work?

Every published post teaches the system. Performance data feeds back into voice rules. When the anti-slop guide catches a new pattern, it gets added. When a hook style outperforms, it gets documented. The 100th post is easier than the 1st.

Can I build my own Content OS?

Absolutely. The wiki, how-to guides, and method page document the entire system. Start with Voice DNA (who you are), add platform playbooks (how you adapt), then build content ops (how you ship). Everything here is public.

$ man content-wiki/voice-in-content-pipelines

Content Workflowsintermediate

building voice into your content system

the pipeline from written draft to published audio without doing it manually each time

by Shawn Tenam

the core pipeline

The basic flow: written content -> ElevenLabs API -> MP3 file -> hosted and embedded or distributed as podcast episode. For a blog-to-audio pipeline, you need three things: a script that reads your post content, an API call to ElevenLabs, and a place to store the resulting MP3. S3 or Cloudflare R2 for storage. Your CMS or site builder for embedding. The script looks roughly like this in Python: import requests, os text = open("post.txt").read() res = requests.post( f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}", headers={"xi-api-key": os.environ["XI_KEY"]}, json={"text": text, "model_id": "eleven_monolingual_v1"} ) open("output.mp3", "wb").write(res.content) That's the whole thing. You wrap that in whatever automation you already have, whether it's a GitHub Action, a cron job, or a build step.

batch processing

One-off generation is fine for occasional posts. For a content operation generating audio regularly, you want batch processing. Batch approach: queue all posts that don't have audio yet, generate them in sequence (not parallel, to avoid rate limit issues), store results, update your CMS to mark them as having audio available. The rate limit on ElevenLabs API is per-second concurrent requests, not per-day volume. So sequential calls with a short sleep between them (0.5-1 second) avoids 429 errors without meaningfully slowing down a batch run. For a 30-post backlog, a batch script runs in under 10 minutes and generates audio for everything at once. After that, new posts get audio generated as part of the publish workflow, not as a separate manual step. Character tracking matters in batch mode. Build in a check that logs characters used per run so you can see where you're tracking against your monthly quota.

quality control before publishing

AI voice needs a human listen before it goes live. This is not optional. Common issues to listen for: - Technical term mispronunciation (API, SaaS, specific product names like "Figma" or "Supabase") - Incorrect emphasis on compound words or acronyms - Awkward pauses mid-sentence from punctuation the model interprets differently than you intended - Energy drop at the end of long paragraphs where the model seems to run out of steam The fix for most of these is editing the source text rather than the audio. Add commas to control pacing. Spell out acronyms phonetically for the model ("S-A-A-S" instead of "SaaS"). Break up sentences that are too long. A full listen on every post takes 3-5 minutes per piece. Spot-checking (first 30 seconds, a middle section, the end) cuts that to 60-90 seconds and catches most issues. Pick your threshold based on how prominent the audio feature is on your site.

combining voice with video

AI-generated voice as narration track over screen recordings or animations is a workflow that removes one of the hardest constraints in video production: needing to record good audio at the same time as capturing screen content. The workflow: capture your screen silently while doing the thing you want to show. Write the narration separately as a script. Generate audio from the script. Drop the audio over the video in your editor and sync. This is faster than trying to narrate live because you can iterate on the script without re-recording the screen capture. The script can be shorter or longer than the raw recording ... you adjust pacing in the edit. One gotcha: AI voice pacing is consistent and slightly mechanical compared to live narration. When the audio says "and here you can see..." but there's a 2-second gap before that thing appears on screen, the sync feels off. Script your narration to match the actual timing of what happens on screen, not just what you want to explain.

Super Whisper integration

Super Whisper is speech-to-text. You speak messy, it transcribes. ElevenLabs is text-to-audio. You pass clean text, it reads it back. The combination: speak a rough draft into Super Whisper while you're walking, cooking, or commuting. Get back a messy transcript. Clean it up in your editor. Pass the cleaned version to ElevenLabs. Publish both the text post and the audio version. This is a real workflow for people who think better out loud than at a keyboard. The speaking-to-draft step captures ideas in flow state that keyboard drafting sometimes kills. The AI voice step means you don't have to also record a clean audio read ... which would require setting up a microphone, a quiet environment, and doing multiple takes. The friction you're removing: you speak when inspiration hits -> clean text appears -> polished audio gets generated automatically -> both formats published. Three steps that used to require maybe five different sessions collapsed into one continuous flow.

real voice vs AI voice

The distinction that matters: is this content building a personal relationship or distributing information at scale? Use your real voice for: - LinkedIn video posts where you want people to feel they know you - Podcast appearances and interviews - Sales calls, demos where you're present - Content where authenticity and real-time reaction are the whole point Use AI voice for: - Documentation and knowledge base audio - Tutorial narration over screen recordings - Content that needs to exist in audio form but isn't a personal brand moment - Any content you're generating faster than you could record The wrong framing: "AI voice = lazy." The right framing: AI voice at scale gets content to people who prefer listening, in a format they can consume on a commute, without requiring you to block out recording time for every piece of content you publish. Frequently asked questions: Does AI voice hurt SEO? No. The text content is what Google indexes. Audio is supplementary. Will listeners know it's AI? Some will. Most won't on casual listen. If you're cloning your own voice, it's close enough that the gap is shrinking. Disclosure norms are still forming but leaning toward transparency being standard practice. How long should audio versions be? Same length as the content. Don't truncate for audio. If the post is long, the audio is long. Listeners who click play expect the full version.

← content wiki knowledge guide →