Methodology
Cited checks for/llms.txt at your domain root and, if found, grades the file across five content sections. The signal scores out of 8 — 4 points for the file existing with content, plus up to 4 bonus points for substantive structure.
We fetch from your domain root first, then try the www or non-www variant if the primary fails. An HTTP 200 with non-empty body counts as found. Any other response (404, 500, timeout) means the file is missing — the signal returns 0/8 with a recommendation to create one.
When the file exists, we grade five sections used by AI models to understand your brand:
Company description. At least 2 lines of prose (over 40 characters each) describing what your company does. This is what AI models cite when asked “what is BrandName?”
Product or service list. At least 2 enumerated items (markdown lists or numbered lists). This is what AI models cite when asked “what does BrandName sell?”
Key content URLs. At least 2 https:// links to important pages — blog, docs, pricing, case studies. These tell AI which pages to fetch for deeper context.
Contact information. An email address, a contact/support/help URL, or a social media link (LinkedIn, Twitter/X, Facebook, Instagram). Used when AI is asked “how do I reach BrandName?”
Preferred citation format. Text matching patterns like “cite as”, “preferred reference”, “please cite”, “brand name:”, or “official name:”. Tells AI how you want to be named in answers — full name, abbreviation, or with a tagline.
Each section found adds 2 points to a separate quality grade (out of 10) reported alongside the score. The main signal score also picks up bonus points for structure: any markdown headings (+1), any links (+1), any descriptive lines over 50 characters (+1), and a file longer than 10 lines (+1). The score is capped at 8.
Verification
You can verify our finding yourself in any browser. Step 1: Check the file exists. Visithttps://yoursite.com/llms.txt directly. If you see plain text or markdown content, the file exists. If you get a 404, you don’t have one. Also try the www or non-www variant — Cited tests both, so the file at https://www.yoursite.com/llms.txt works even if your canonical is the apex domain.
Step 2: Check for the five sections. The llms.txt spec (llmstxt.org) is intentionally loose, but the scanner grades against five practical sections. Open your file and check:
- Two or more lines of prose describing your company (each over 40 characters, not lists or headings)
- A list of products or services (markdown list with
-or*, or numbered) - Two or more
https://links to key pages - A contact method — email, social link, or contact/support URL
- A citation hint — text saying how to refer to your brand
/llms.txt (curation file) and /llms-full.txt (full content dump for AI training). Cited only checks the first. If you have both, the curation file is what matters for citation routing.
If your verification disagrees with Cited’s finding, that’s a bug — let us know.
Technical detail
llms.txt is an emerging convention proposed by Jeremy Howard in September 2024 and codified at llmstxt.org. It is not yet an RFC or IETF standard; adoption is voluntary and the spec is evolving. AI platforms including Anthropic, OpenAI, Perplexity, and Mintlify have published their own llms.txt files as reference implementations, signaling community acceptance. Parsing logic. The scanner does no XML or strict schema parsing — llms.txt is unstructured markdown by design. Detection is heuristic:- File existence: HTTP 200 with non-empty body at
/llms.txt - Prose detection: lines longer than 40 characters that don’t start with
#,-, orhttp - List detection: lines matching
^\s*[-*]\s+\S(markdown bullets) or^\s*\d+[.)]\s+\S(numbered) - Link detection: regex match against
https?://\S+ - Contact detection: email regex (
[\w.-]+@[\w.-]+\.\w{2,}), contact URL pattern (/contact,/support,/help), or social platform URLs (LinkedIn, Twitter/X, Facebook, Instagram) - Citation hint detection: case-insensitive string matching against six phrases including “cite as”, “please cite”, “brand name:”, “official name:”, “preferred reference”
details field.
Edge cases the scanner handles:
- www/non-www mismatch — If
https://yoursite.com/llms.txtfails buthttps://www.yoursite.com/llms.txtsucceeds, the scanner finds it. Either variant counts. - Redirects — HTTP 301/302 are followed. The final URL after redirect is what gets graded.
- HTML returned instead of plain text — Some misconfigured servers return the homepage at
/llms.txt. The scanner still grades the response content; HTML-only pages typically miss list and citation patterns and score low on quality. - Empty files — A 200 response with an empty body counts as missing. We require at least some content.
- Connection timeout — 8-second timeout. Files that take longer to serve are treated as missing.
/llms-full.txt— the long-form content dump variant some sites publish alongside the curation file. AI models may use it for training; Cited doesn’t currently grade it.- llms.txt format compliance against the llmstxt.org spec. The spec proposes a specific H1 + blockquote + sectioned-list structure; the scanner uses looser heuristics so non-spec-conforming files that still cover the five sections get credit.
- Whether AI models actually read your llms.txt. Adoption is uneven across platforms — Anthropic and Perplexity reportedly consult it; OpenAI’s posture is unclear; Google has not committed. The signal measures availability, not utilization.
- Frequency of update. A stale llms.txt with old product names or removed links still scores well if the structural sections are present.