Skip to main content

LLM optimization: a practitioner’s guide

Xander Sebastian Xander Sebastian Published
Table of contents 13 sections

LLM optimization is pre-category but already measurable. Here are 7 specific structural patterns correlated with LLM citation, tested against real LLM responses.

11 min read

LLM optimization is the practice of structuring content, schema, and page architecture to maximise the likelihood of citation by large language models. Not to rank in traditional search. Not to game an algorithm. To make your content the thing a language model reaches for when a user asks a question your page can answer.

The term barely exists yet. Search volume for "llm optimization" has grown 125% year-on-year, but most of the content ranking for it is either speculation from SEO consultants who haven't tested anything, or generic "write better content" advice that could have been published in 2019. The category hasn't settled on a definition, which means whoever defines it credibly gets to own the frame.

This is our attempt. We've spent the past several months studying which content structures correlate with LLM citation across ChatGPT, Perplexity, Gemini, and Claude — running the same queries repeatedly, tracking which pages get cited, and reverse-engineering the structural patterns those pages share. This piece documents the seven patterns that show up consistently and the four that don't work despite looking like they should.

If you've read our companion piece on LLM visibility — what it is and how to measure it, you already know how to measure whether your content is being cited. This piece is the tactical follow-up: what to actually do about it.

Why LLM optimization is not the same thing as SEO

SEO and LLM optimization share surface-level similarities. Both care about content quality, topical authority, and structured data. But they optimise for different retrieval mechanisms, and confusing the two leads to wasted effort.

Traditional SEO optimises for a crawler-indexer-ranker pipeline. Googlebot crawls your page, indexes it, and ranks it against competing pages for a query based on hundreds of signals — backlinks, page speed, keyword relevance, user engagement. The output is a ranked list of blue links.

LLM citation works differently. When a model like ChatGPT or Perplexity answers a question, it either draws from its training data or retrieves content in real time through a search-augmented generation pipeline. In the retrieval case, the model fetches candidate pages, extracts relevant passages, and synthesises an answer — often citing the source. The "ranking" isn't a sorted list. It's a selection decision: does this passage answer the user's question clearly enough that the model can extract and attribute it?

That selection decision rewards different structural properties than traditional ranking does.

PropertySEO rankingLLM citation
BacklinksMajor signalNot directly relevant to passage extraction
Page speedRanking factorIrrelevant once page is fetched
Keyword densityModerate signalLess relevant than definitional clarity
Standalone definitionsNice to haveCritical — models extract complete statements
Structured data (schema)Rich snippets, knowledge graphHelps models identify entity types and relationships
Numerical specificityGood for E-E-A-TEssential — models prefer citable numbers with context
Content freshnessTime-decay signalRecency of data matters; page age less so

Both matter. A page that ranks well in traditional search is more likely to be retrieved by an LLM's search step. But a page that ranks well and is structurally optimised for extraction gets cited. The first gets you into the candidate set. The second gets you into the answer.

The seven structural patterns that correlate with LLM citation

These are the patterns we see consistently in pages that get cited by LLMs across multiple models and query types. We identified them by running hundreds of queries through ChatGPT, Perplexity, Gemini, and Claude, recording which source URLs appeared in citations, then analysing what those cited pages had in common structurally.

None of this is guaranteed causation — correlation from systematic observation is the honest framing. But the patterns are consistent enough across models and query types that they're worth implementing and measuring.

1. Standalone definitions in the opening sentences

Pages that get cited for definitional queries almost always contain a complete, self-contained definition within the first two sentences of the relevant section. Not a definition buried in a paragraph of context. A sentence that works as a standalone answer.

The reason is mechanical. When an LLM extracts a passage to cite, it pulls a contiguous chunk of text. If your definition is tangled with qualifiers, history, or throat-clearing, the model either skips it or extracts a passage that doesn't cleanly answer the query.

Weak: "Over the past few years, as artificial intelligence has become more prevalent in marketing, the concept of return on ad spend has evolved significantly. ROAS, which stands for return on ad spend, is generally considered to be one of the most important metrics in digital advertising."

Strong: "ROAS — return on ad spend — is the ratio of revenue generated to money spent on advertising. If you spend £1,000 on ads and generate £3,500 in attributable revenue, your ROAS is 3.5x."

The strong version is extractable. The weak version requires the model to do surgery.

2. FAQPage schema with three or more Q&A pairs

Pages carrying FAQPage structured data with at least three question-answer pairs show up in LLM citations more frequently than equivalent pages without it. This pattern is strongest for informational queries where the user is asking a direct question.

The likely mechanism: LLMs with retrieval capabilities parse structured data as part of their extraction pipeline. FAQPage schema pre-segments your content into question-answer units — exactly the format a model needs when answering a user's question. You're doing the extraction work for the model.

The schema needs to be real. The questions need to be questions people actually ask, and the answers need to be substantive (40–80 words, not one-liners). Google's own structured data guidelines require that FAQ content be visible on the page — and LLM retrieval systems benefit from the same constraint.

3. DefinedTerm schema for glossary-style content

For content that defines a concept, term, or methodology, adding DefinedTerm schema from schema.org provides an explicit signal that this page is an authoritative definition. Models parsing structured data can identify the page's purpose without relying solely on content analysis.

This pattern is particularly effective when combined with Pattern 1 (standalone definitions). The schema tells the model "this page defines X," and the opening sentences deliver a clean, extractable definition.

Here's what the markup looks like:

{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "LLM Optimization",
  "description": "The practice of structuring content, schema, and page architecture to maximise the likelihood of citation by large language models.",
  "inDefinedTermSet": {
    "@type": "DefinedTermSet",
    "name": "AI Marketing Glossary"
  }
}

4. Section headings that answer specific questions

LLM retrieval systems use headings to segment content into topical chunks. A heading that mirrors the structure of a user query creates a direct mapping between what the user asked and where the answer lives.

Weak headings: "Understanding Attribution," "The Importance of Data," "Key Considerations"

Strong headings: "What multi-touch attribution gets wrong," "How to calculate ROAS when your data is messy," "Why last-click attribution understates the value of content marketing"

The strong headings do two things. First, they help the model's chunking step identify the relevant section. Second, they often match or closely mirror the natural language queries users type into LLM interfaces — which skew conversational compared to traditional search queries.

5. Numerical data with units, context, and dates

LLMs cite pages with specific numbers more readily than pages with vague quantitative claims. But the number alone isn't enough — the model needs the units, the context, and ideally a date to assess relevance.

Weak: "Companies that adopt AI marketing see significant improvements in output."

Strong: "HubSpot's 2024 State of Marketing report found that 84% of marketers using AI create content more efficiently, though 60% expressed concern that AI-generated content could harm their brand through bias, plagiarism, or misalignment with brand values."

The strong version names the source, gives specific percentages, and includes the trade-off (efficiency gains vs. brand risk) that makes the claim honest. A model can extract this passage and present it with attribution. The key: specific number, named source, date, and a nuance that signals the author isn't cherry-picking.

Numbers without sources are worse than no numbers. LLMs are increasingly trained to prefer attributed data, and retrieval systems can cross-reference claims against known sources. Unattributed statistics look like fabrication — because often they are.

6. Worked examples with concrete figures

When content explains a methodology or formula, a worked example with specific numbers makes citation far more likely. The reason: worked examples are self-contained, verifiable, and directly answer "how does this work in practice?" queries.

Take attribution modelling. A page that explains linear attribution abstractly ("the credit is distributed equally across all touchpoints") gets outperformed by a page that shows the arithmetic:

A customer journey has four touchpoints: a blog post, a paid ad, an email, and a direct visit. The deal closes at £10,000. Under linear attribution, each touchpoint receives £2,500 in credit. Under time-decay with a 7-day half-life, the direct visit (day 0) receives £4,200, the email (day 3) receives £2,800, the paid ad (day 10) receives £1,900, and the blog post (day 21) receives £1,100.

The worked example gives the model a passage it can cite verbatim. Abstract explanations require the model to synthesise — and when it synthesises, it often doesn't cite.

Pages that link to a cluster of related definitions, explanations, and implementations within the same domain get cited more frequently than isolated pages on the same topic. This pattern maps to how LLMs assess topical authority during retrieval.

If a page about "LLM optimization" links to related pages on "AI Overview citations," "DefinedTerm schema," "content structure for AI search," and "LLM visibility measurement" — all within the same domain — the retrieval system can map a topical cluster. The page isn't a one-off. It's part of a body of work.

This is the LLM equivalent of topical authority in traditional SEO, and it's built the same way: by publishing comprehensive, interlinked content within a defined subject area over time.

A structural template for LLM-optimised content

If you're writing a definitional or explanatory post from scratch and want to maximise the chance of LLM citation, this template combines the seven patterns into a single structure:

Opening paragraph (Pattern 1): Lead with a standalone, one-to-two-sentence definition. No preamble, no context-setting, no history. Definition first, context second.

Section 2 — Context and significance (Pattern 5): Why this matters, grounded in specific data with sources and dates.

Section 3 — How it works (Pattern 6): The methodology or process explained with a worked example. Concrete numbers. Show the arithmetic.

Section 4 — Common mistakes or misconceptions (Pattern 4): H2s framed as specific questions or claims. Each subsection is a self-contained answer.

Section 5 — Practical implementation (Patterns 2, 3): Add FAQPage and DefinedTerm schema. Include three to five real questions with substantive answers.

Internal links throughout (Pattern 7): Link to related content within your domain. Build the cluster.

Schema markup: Article + BreadcrumbList as baseline. Add FAQPage for any post with a Q&A section. Add DefinedTerm for any post that defines a concept.

This isn't a rigid formula. Some posts won't have a worked example. Some won't need DefinedTerm schema. The point is that each pattern is a tool — use the ones that fit the content, and use them deliberately.

What doesn't work: four anti-patterns

Not everything that looks like it should help LLM citation actually does. These four approaches either have no observable effect or actively hurt.

Keyword density manipulation. Traditional keyword stuffing doesn't help with LLM citation any more than it helps with modern SEO. LLMs process semantic meaning, not keyword frequency. A page that naturally covers a topic will contain the relevant terms. Artificially inflating density adds noise without adding signal.

Generic "answer box optimisation." Formatting every paragraph as a potential featured snippet (short paragraph, bolded question, 40-word answer) creates content that's structurally monotonous and often shallow. LLMs don't extract featured-snippet-style fragments the way Google's answer box does. They extract passages — which means depth and context matter more than format.

Schema markup overload. Adding every possible schema type to a page doesn't help and can hurt. If a page carries Article, FAQPage, HowTo, DefinedTerm, and Review schema but the content doesn't actually contain all of those things, the structured data becomes misleading. Use the schema types that accurately describe your content. Nothing more.

Excessive subheadings without substance. Breaking a 2,000-word post into 20 subsections of 100 words each creates content that's too fragmented for meaningful passage extraction. Each section needs enough depth (150–300 words minimum) for the model to extract a useful passage. Short sections get skipped.

How to test whether your content is being cited

There's no equivalent of Google Search Console for AI answers yet, so testing LLM citation is manual and imperfect. But a basic workflow gets you most of the way.

Pick five to ten queries that your content should be able to answer. Run them through ChatGPT (with web browsing enabled), Perplexity, and Gemini. Record whether your page appears in the citations, what passage was extracted, and how accurately it was represented.

Do this before implementing the seven patterns (baseline), then again two to four weeks after. The comparison won't be scientifically rigorous — too many variables, too small a sample — but it will tell you whether citation rates moved in the right direction.

For a more systematic approach, tools like DataForSEO's AI Optimization API can track LLM mentions of your domain across models over time. We covered the measurement framework in detail in our companion piece on LLM visibility measurement.

A note on data: if you're using API-based tools to track LLM citations at scale, the same privacy and data-handling standards apply as with any analytics tooling. Review your provider's data processing terms, especially if you're tracking queries that could contain personal data. GDPR and similar regulations don't carve out exceptions for AI monitoring.

Most people publishing content for LLM citation have no idea whether it's working. Even rough measurement puts you ahead of that.

What to do this week

If you want to start implementing this without a multi-month project:

Pick your ten highest-traffic pages. For each one, check whether the opening paragraph contains a standalone definition of the page's core topic. If it doesn't, rewrite the first two sentences so it does. This is Pattern 1 — the single highest-leverage change — and it takes about fifteen minutes per page.

Then add FAQPage schema to any page that contains or could contain a natural Q&A section. Three questions minimum, 40–80 words per answer. Most content management systems have plugins or fields for this.

Measure your baseline before you start (run your target queries through ChatGPT, Perplexity, and Gemini, record the citations), then check again in two to four weeks.

The seven patterns aren't a checklist you implement once and forget. They're a set of structural principles that should inform how you write and structure content going forward — the same way keyword research informs what you write about. Build them into your editorial process, measure the results, and iterate.

Built with Claude

This post was produced using Claude as a research, drafting, and editing partner.

  • Models: Claude Opus 4.6 for drafting and structural editing, Claude Sonnet 4.6 for fact-checking and schema examples
  • Workflow: brief review → research synthesis → outline → draft → humanise pass → hard-rules validation → final review
  • Production time: [TBD]
  • Word count: [TBD]
  • Human review: Alexander (final)

For more on how we produce content with Claude at production scale, see our Claude for Marketing hub.

Frequently asked

What is LLM optimization?
LLM optimization is the practice of structuring content, schema, and page architecture to maximise the likelihood of citation by large language models. It differs from traditional SEO in that it optimises for passage extraction and attribution rather than search engine ranking positions.
Is LLM optimization the same as SEO?
No. SEO optimises for crawler-indexer-ranker pipelines — backlinks, page speed, keyword relevance. LLM optimization optimises for retrieval-augmented generation pipelines — definitional clarity, structured data, and passage extractability. The two overlap (a page that ranks well is more likely to be retrieved) but reward different structural properties.
How do I know if my content is being cited by LLMs?
Run target queries through ChatGPT (with web browsing), Perplexity, and Gemini. Record whether your pages appear in citations. For systematic tracking, tools like DataForSEO's AI Optimization API can monitor LLM mentions of your domain across models over time.
Which structured data types help with LLM citation?
FAQPage schema (with three or more substantive Q&A pairs) and DefinedTerm schema (for content that defines a concept) show the strongest correlation with citation rates. Article and BreadcrumbList are baseline requirements. Avoid adding schema types that don't accurately describe your content.
How long does it take to see results from LLM optimization?
Changes to content structure and schema can affect LLM citation within two to four weeks for models with real-time retrieval (like Perplexity and ChatGPT with browsing). For models relying primarily on training data, the effect depends on the next training data refresh cycle, which varies by provider.

Continue reading

Ready to put AI to work in your marketing?

Book a Fit Call — 20 minutes to find out if we're the right fit. No pitch deck, no fluff. If we are, a Foundation Sprint sets the scope.