Skip to main content

Building brand-aligned Claude Skills so Claude stops defaulting to generic marketing templates

Xander Sebastian Xander Sebastian Published
Table of contents 10 sections

Most marketing-focused Claude Skills ship as dressed-up prompt templates and drift back to generic by paragraph five. The architectural fix: a Skill skeleton that reads brand references at runtime, validates output against your taxonomy, and enforces hard rules before returning. From running 139 custom Skills across 30 plugins for three brands.

9 min read

Claude Skills for marketing tend to ship as dressed-up prompt templates. "Write a blog post" with a personality line bolted on. The output still reads like every other SaaS blog. The fix is not a better personality prompt. It is a Skill skeleton that reads brand references at runtime, validates output against your taxonomy, and enforces hard rules before returning anything.

Key takeaways

  • Marketing-focused Claude Skills usually ship as prompt templates with a personality line. They drift back to generic by paragraph five.
  • Brand alignment is architectural, not lexical: runtime file reads, taxonomy validation, and hard-rule enforcement.
  • We run 139 custom Skills across 30 plugins for three brands. Every Skill follows the same five-step skeleton.
  • The economics flip in your favour around the tenth Skill, not the first.

The situation

We invoked one of the public marketing Skills on a piece for our P3 Attribution pillar. The output read like the template you would find on the product page of any AI marketing tool: tri-bullets, "Understanding attribution" H2s, the word "delve" in the third paragraph. None of the things that mark a Robotic Pixels post were in there. No specific numbers. No named tools. No place where human judgment still matters.

This is the default Claude output failure mode for marketers. According to Anthropic's "Skills explained" announcement, prompt templates are "variable placeholders within individual prompts" used during a single conversation. Skills are something different: persistent folders that load dynamically when relevant. Most public marketing Skills have not actually used the second capability. They are prompt templates wearing the Skill format.

We run 139 custom Skills across 30 plugins, supporting three brands (Robotic Pixels, Campakt, and a personal brand) on a single autonomous business OS. The patterns that keep voice consistent across that many Skills were not obvious. We learned them by shipping Skills that did not work and watching what broke.

Why most people get this wrong

The standard pattern of "marketing-focused" Skills available today is to take a marketing task, write a LinkedIn post or draft an email, and add personality keywords to the prompt. "Write in a confident, casual tone." "Sound like a practitioner."

This is treating Skills as prompt templates instead of brand-aligned artefacts. The personality keywords are vibes, not constraints. By paragraph five the output drifts back to corporate-textbook because the Skill never had a way to validate the output against your actual brand standards.

One independent practitioner who has documented this voice-drift problem put it bluntly: "Even when you have set up Claude Projects, uploaded your brand guide, and written detailed instructions, the first paragraph sounds great. By paragraph five, Claude is writing like a corporate textbook again. Claude will not proactively tell you when its output has drifted from your guidelines." (Use AI to Write, 2026)

The fix is not more personality prompts. A Claude Skill can do things a prompt template cannot. It can read files at runtime, run code, validate output against schemas. Brand alignment is what happens when you actually use those capabilities.

The core concept

Anthropic launched Agent Skills for Claude in December 2025. The launch announcement defined Skills as folders of instructions, scripts, and resources that Claude loads dynamically when relevant to a task, with full filesystem access in Claude's VM environment, executable code, and progressive disclosure of context (Anthropic, "Introducing Agent Skills", 2025). These capabilities are what make Skills different from prompt templates.

A brand-aligned Skill uses these capabilities like this:

flowchart TD
    A[Skill invocation] --> B{Pre-flight}
    B -->|System paused| Z[Exit cleanly + log]
    B -->|Brand paused| Z
    B -->|Dry-run| Z
    B -->|OK| C[Load brand references<br/>voice + taxonomy + hard-rules<br/>+ audience + SEO pillars]
    C --> D[Load craft references<br/>EEAT rubric + post-type structures<br/>+ statistics tiers + schema markup]
    D --> E[Generate output]
    E --> F{Validate against<br/>loaded constraints}
    F -->|Word count out of range| E
    F -->|Required sections missing| E
    F -->|Hard rule violation| E
    F -->|All clear| G[Persist to Postgres]
    G --> H[Log to optimisation_log]
    H --> I([Return])

The skeleton has five steps. Every RP Skill follows it.

  1. Pre-flight checks. System pause? Brand pause? Dry-run? If any are active, exit cleanly. Log the exit so the next Skill that needs to know what happened can find it.
  2. Load brand references. Read the brand's voice profile, taxonomy, hard rules, audience definition, and SEO pillars from the knowledge base at runtime. Not bundled into the Skill itself. Read fresh on every invocation. The references can change without redeploying the Skill.
  3. Load craft references. EEAT rubric, post-type structures, statistics tiers, schema markup. Universal craft, brand-agnostic.
  4. Generate with validation. Produce the output, then check it against the loaded constraints. Word count in range? Required sections present? Hard rules clear? If any constraint fails, revise before returning.
  5. Persist and log. Write to Postgres. Log the run to the optimisation log so the silent-failure detector can see the Skill ran.

This skeleton is the difference between a Skill that ships brand-aligned output and a Skill that prompts Claude to "sound like a practitioner."

How we actually do it

Here is what the skeleton looks like in a real Skill. This is the pre-flight section of content-brief, the Skill that, recursively, generated the brief this article was written from:

-- Step 1: Pre-flight checks
SELECT system_pause, dry_run_default, calm_mode
FROM ops.runtime_state WHERE id = 1;
-- If system_pause = true, log exit and return.

SELECT COUNT(*) FROM ops.pause_rules
WHERE brand = '{brand}' AND scope IN ('all', 'content')
  AND status = 'Active' AND deleted_at IS NULL;
-- If > 0, the brand is paused; exit.

Then the brand reference load. Five files for an RP brief, queried from ops.brand_references:

SELECT slug, body FROM ops.brand_references
WHERE brand = 'rp'
  AND slug IN ('voice-rp', 'taxonomy-rp', 'seo-pillars-rp',
               'hard-rules-rp', 'audience-rp')
  AND deleted_at IS NULL;

The voice file alone is 21 kilobytes of patio11 / Lenny Rachitsky territory. Anti-patterns to cut, voice reference quotes ("tuning forks"), the five-principle voice profile. The Skill has no way to drift toward generic because the file it reads on every invocation says, in plain text, do not write like that.

The validation step is where the architectural difference is most visible. The Skill checks the output for hard-rule violations before returning. If the draft contains "delve" anywhere (forbidden by our Rule 14), the Skill flags it. If a category is not in the registered taxonomy, the Skill flags it. The flag is not a personality prompt. It is a programmatic check against a list.

This is also where Skills diverge from orchestration frameworks like LangChain. LangChain composes "chains" and "agents" in code. You wire prompts and tools together programmatically. Skills go the other way: filesystem-based, progressive disclosure, the model decides what to load when. For brand alignment that is the right primitive, because the rules of your brand can be expressed as files Claude reads when it needs them rather than chains the developer pre-composes.

Worked example

Take the content-brief Skill end-to-end for one specific input. The input is the title of an article on the RP content calendar, plus the calendar item's row ID.

The Skill runs the pre-flight, then reads the brand reference files. It also fetches the calendar row from ops.content_calendar_rp and checks for SERP intelligence and a research dossier. Both optional but valued.

Then the generation step. The Skill produces the brief: three headlines (SEO, angle-led, social), the angle in two or three sentences, target keywords from the SEO pillars file, the pillar assignment, required sections from the post-type-structures file, the claim inventory.

The claim inventory is the part most marketing Skills do not do at all. Every factual assertion the article will need to make is logged as a claim, typed (STAT, QUOTE, COMPETITIVE, INTERNAL, DEFINITIONAL), and prioritised. The downstream Skill, claim-evidence-binding, routes each claim to the appropriate evidence source before drafting starts. Statistics queue routes to the statistics-researcher Skill. Quotes queue routes to the quote-finder Skill. Internal-experience claims auto-bind. Competitive claims check against the linked research dossier.

The output is structured JSON written to Postgres in a single transaction, not a paragraph of prose telling Claude how to write. The drafting Skill then loads it and writes against the verified evidence. There is no place in the workflow where Claude can drift, because the constraints are read fresh, validated, and persisted at every step.

This article is a worked example of itself. The brief that produced it was generated by content-brief. The evidence binding was produced by claim-evidence-binding. Both ran through the skeleton above. The recursion is the point: the same Skills that we are explaining are the Skills that produced the explanation.

Common failure modes

Three patterns we have watched ourselves trip over.

The first is Skills that leak generic voice on edge cases. A Skill works fine on common topics; the brand voice is preserved because the test cases were common topics. Then someone runs it on a topic the voice file does not cover well. A niche product launch. A technical post outside the SEO pillars. A piece that fits no category cleanly. The Skill silently falls back to generic Claude. The mitigation: the Skill should detect when the brand reference does not cover the topic and refuse to draft, flagging it for human input rather than producing a clean-looking but off-brand draft.

The second is Skills that skip brand references under time pressure. When a Skill runs on a tight deadline and the brand reference query times out, the Skill has to choose. Fail loudly or proceed with what it has. Skipping the reference load with a default fallback is the most insidious failure mode. The Skill produces output, the output looks acceptable, and nobody notices the brand drift until weeks later. Our Skills fail loudly. The output is no output, with a clear log entry, rather than a quietly-broken artefact.

The third is Skills over-fit to one post type. A Skill optimised for flagship-length posts produces strange output on definitional posts. A Skill optimised for tutorials over-explains in opinion pieces. Mitigation: every Skill takes a post_type argument and loads the post-type-structures file. The structural template is read at runtime, not baked in.

We do not measure voice compliance with a single percentage. We measure it qualitatively against our Rule 14 anti-pattern list (no "delve", no tri-bullets, no "Understanding X" H2s, no perfectly balanced lists, no rhetorical-question section openers). Default Claude output for marketing tasks fails Rule 14 reliably. Brand-aligned Skill output fails it rarely, and when it does the validation layer catches it before the draft returns.

The economics

Cost per 1,000 Skill invocations. We run on Claude Max, which is GBP 79 per month per finance.manual_costs. At our current cadence of approximately 4,500 Skill invocations per month across scheduled tasks and Cowork sessions, that works out to roughly £18 per 1,000 invocations. The equivalent API price for the same workload (mostly Sonnet 4.6 at approximately 5K input tokens and 2K output tokens per invocation) sits at roughly £36 per 1,000. Claude Max is approximately 2x cheaper at our volume.

The break-even between Max and API. Below approximately 2,200 invocations per month, the API is cheaper because you avoid the Max base fee. Above that, Max wins by a widening margin. If you are running fewer than two scheduled Skills per day, stay on the API.

When the ecosystem economics flip in your favour. The marginal cost of each new Skill in an existing brand ecosystem drops fast. The first Skill carries the cost of building the brand-reference layer (voice file, taxonomy, hard rules, audience profile, SEO pillars). Skill 2 in the same brand reuses all of that and only needs its own SKILL.md and validation. By Skill 10 the brand-reference layer is paying back across every Skill in the ecosystem. The architectural investment is one-time; the architectural payoff compounds.

The stack: Claude Code as the development surface, Claude Desktop and Cowork as the runtime, Postgres (Northflank) as the system of record, GitHub as the deployment artefact. No bespoke infrastructure. Everything composable, everything inspectable.

Takeaway

Claude Skills are not better prompt templates. They are a different category. The capability difference (runtime file reads, executable code, programmatic validation) is what enables brand-aligned output instead of personality-prompted drift.

If you are an SMB founder writing your own marketing content with Claude, the value of building Skills is real but slow. The infrastructure pays back at roughly the tenth Skill, not the first. Start with one Skill that loads your brand voice file at runtime and refuses to draft if it cannot read it. Watch what changes.

If you are a mid-market marketing team running marketing through Claude already, the failure mode you are probably hitting is voice drift on edge cases and silent Skill fallbacks. Audit your Skills for both patterns. The Skill that "works most of the time" is the one that is leaking generic Claude output into pieces that should sound like you.

The thing about Skills nobody warns you about: you cannot Skill your way to a good Skill. Skill-writing is the human-judgment-required step. Every Skill we ship was written, edited, and reviewed by a human after the AI drafted it. The architectural pattern is the contribution. The words have to be ours.

Built with Claude

This post was produced using Claude as a research, drafting, and editing partner.

  • Models: claude-opus-4-6 for drafting, claude-sonnet-4-6 for editing and fact-checking
  • Workflow: content-brief Skill, claim-evidence-binding Skill, content-drafter Skill, humanise-content Skill, human structural edit, fact-check, final review
  • Word count: approximately 2,140 words
  • Human review: [editor name] / Alexander (final)

For more on how RP produces content with Claude at production scale, see the Claude for Marketing pillar hub -- the central hub for everything we publish on Claude in marketing operations.

Frequently asked

What is a Claude Skill?
A Skill is a folder of instructions, scripts, and resources that Claude discovers and loads dynamically when relevant to a task. Skills run in Claude's VM environment with filesystem access and executable code, which is what separates them from prompt templates. Prompt templates are variable placeholders within a single conversation. Skills persist and can include multiple files (Anthropic, "Skills explained", 2025).
When should marketers build a custom Claude Skill instead of using a public one?
When the public Skill ships brand drift you cannot fix with personality prompts. Most public marketing Skills are prompt-template style, which means they read like any other SaaS blog by paragraph five. If your brand has a strong voice that needs to survive into the output, you need a Skill that reads your voice profile at runtime and validates against it. That is custom.
How long does it take to build one brand-aligned Skill?
It varies a lot by Skill complexity. A simple read-only status reader takes substantially less effort than a multi-step Ask-mode Skill with human review loops. The pattern that holds across all Skills: the first Skill in a brand carries the cost of building the brand-reference layer; every Skill after that is much faster because the references already exist. Time to build is rarely the bottleneck. Skill-writing judgment is.
Do Skills work for non-English brand voices?
We have not tested this at production scale. The reference-load architecture should work in any language; Claude's underlying language handling is the open question. If you try it, the pattern to watch is whether Claude maintains the voice across longer outputs in the target language as reliably as it does in English.
What happens if a Skill cannot read its brand reference file?
Our Skills fail loudly. The output is no output, with a clear log entry. Silent fallback to generic Claude is the failure mode that will quietly degrade your brand over months. Worth the occasional "the system did not run" message in exchange for never publishing off-brand output.

Continue reading

Ready to put AI to work in your marketing?

Book a Fit Call — 20 minutes to find out if we're the right fit. No pitch deck, no fluff. If we are, a Foundation Sprint sets the scope.