Crawl Budget - Cited Docs

Crawl budget is the number of pages a search engine or AI crawler will fetch from a website in a given time period. Crawlers allocate their resources across millions of sites; each site gets a finite share of crawl activity. If a site has many low-value pages (empty stubs, thin content, duplicate pages), crawlers may spend their budget on those instead of reaching the high-value pages.

Why it matters

For AI visibility, crawl budget matters because AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) need to access and index your content to include it in AI-generated answers. A site with 10,000 pages where only 50 have substantive content wastes crawler time on the other 9,950. Concentrating crawl budget on the pages that matter — through noindex directives, sitemap curation, and llms.txt — is a basic AEO/GEO optimization.

How Cited uses it

Cited’s own docs site addresses crawl budget by adding noindex: true to stub pages that have no real content yet, keeping them out of the sitemap until they are populated. This focuses crawler attention on the published pages with substantive content. The site’s robots.txt explicitly allows all major AI crawlers to access all published content without restriction.

llms.txt — a direct signal to AI crawlers about which pages matter
Content freshness — crawlers revisit fresh content more frequently
What sources LLMs cite — crawlability is a prerequisite for citation

Content Freshness E-E-A-T

​Why it matters

​How Cited uses it

​Related concepts

Why it matters

How Cited uses it

Related concepts