Schema markup and platform tradeoffs

Schema markup is a recognized best practice for helping LLMs and search engines understand and cite web content. Cited’s docs site does not currently implement custom JSON-LD schema beyond what our hosting platform provides by default. This page explains the reason honestly, what we do instead, and when this may change.

Why not schema markup (yet)

Cited’s docs are built on Mintlify, a documentation platform we chose for its built-in llms.txt generation, clean design, and operational simplicity. Mintlify uses React Server Components for rendering, and we found through production testing that custom JSON-LD injection via MDX components does not survive the RSC serialization pipeline — the schema content ends up in the JavaScript RSC payload rather than as parseable <script type="application/ld+json"> DOM elements. We evaluated several workarounds:

Frontmatter fields — Mintlify does not currently support a structuredData or jsonLd frontmatter field
Config-based injection — Mintlify’s docs.json does not support a customHead or head option for arbitrary HTML
Build-time scripts — Mintlify does not expose prebuild hooks that would allow injecting schema into the generated HTML
Google Tag Manager — GTM-injected schema executes client-side via JavaScript, which means most AI crawlers (GPTBot, ClaudeBot, PerplexityBot) would not see it. GTM would have partially solved the problem for traditional search engines at the cost of added complexity, without reaching our primary use case.

We chose to not ship a partial solution that creates complexity without reaching the crawlers we care about. Instead, we lean harder on what already works well.

What we do instead

Four mechanisms carry the discoverability load without requiring custom schema.

Auto-generated llms.txt and llms-full.txt

Mintlify generates /llms.txt with comprehensive page listings and /llms-full.txt with full markdown content of every page. Both files are served at the domain root and referenced via HTTP link headers on every page response. This is a strong mechanism for LLM discovery and on-demand content retrieval — arguably more directly useful for AI search than JSON-LD.

Textual citability patterns

The content on every page follows a deliberate citability structure:

Definition-first ledes — first paragraph is a Wikipedia-style definition of the page subject
Entity-first sentences — brand names and concepts appear before numbers and data
Table pre-summaries — natural-language sentences summarize table content before the table itself
Self-contained FAQ answers — each answer reads standalone without requiring the question for context
Explicit comparisons — both values appear in the same sentence when comparing

These patterns are enforced by the docs-auditor agent on every page before publication.

Clean HTML structure

Mintlify renders semantic HTML with proper heading hierarchy, table markup, code blocks, and internal link structure. AI crawlers receive the same full content as human browsers, verified via user-agent-specific crawl tests.

Comprehensive crawler allowlist

The robots.txt file explicitly allows all major AI crawlers — GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, and xAI — alongside traditional search engine crawlers.

What we lose by not having custom schema

To be transparent about the tradeoffs:

Google’s rich results for FAQ pages (the expandable Q&A displayed in search results) are not available for our docs
The DefinedTerm entity-graph signal that helps Google’s Knowledge Graph is not present
A structural signal that reinforces content type and authority is missing

These are real losses. We accept them because the alternative — a GTM-based workaround that adds dual-maintenance complexity without reaching AI crawlers — was not worth the cost.

When this may change

We will revisit this decision when any of the following happens:

Mintlify adds native JSON-LD support. The most likely path. Their platform evolves regularly and structured data is a common request from technical docs sites.
A new mechanism emerges. Server-side injection patterns, edge workers, or other approaches may become viable over time.
The tradeoff calculation changes. If schema markup becomes demonstrably critical for AI citation beyond its current role as one signal among several, the GTM workaround or a platform migration may become worth the cost.

We track this publicly so readers know the state of play. If this page is out of date — if we have since implemented schema — you will see an updated version here.

What sources LLMs cite — the broader framework of how LLMs select content to cite
Benchmarks methodology — how the Cited Index data is constructed
Citations vs mentions — the basic vocabulary distinction

Documentation Index

​Why not schema markup (yet)

​What we do instead

​Auto-generated llms.txt and llms-full.txt

​Textual citability patterns

​Clean HTML structure

​Comprehensive crawler allowlist

​What we lose by not having custom schema

​When this may change

​Related concepts