Skip to main content
AI models prefer to cite pages that demonstrate depth of treatment — not just word count, but enough substance to answer a question thoroughly. Thin pages get skipped even when they’re factually correct, because AI can’t be confident it’s getting the complete answer from one source.

Methodology

Cited samples up to 5 pages from your site and measures the visible word count on each. The average across the sample determines the score. We don’t penalize occasional short pages — a contact page or thin landing page won’t fail the signal on its own — but consistently thin content across the sample drags the average down. The scoring tiers are calibrated against AI citation patterns observed in production. Pages averaging:
  • Over 800 words earn the full 6/6 — this is the band where AI models confidently treat content as a single authoritative source.
  • 500-800 words earn 4/6 — substantive but typically requires AI to combine your page with another source for full coverage.
  • 300-499 words earn 2/6 — usually too thin for primary citation; AI may cite you as a supporting source after citing a more detailed page.
  • Under 300 words earn 1/6 — flagged as critically thin. AI models routinely skip these in favor of competitors with deeper content.
The signal evaluates the average across sampled pages, not the minimum or maximum. This rewards consistent depth across your site rather than one-off cornerstone pages surrounded by thin content. A blog post averaging 1,200 words paired with four 200-word product descriptions averages 400 — landing in the 2/6 tier despite having one strong page. The signal scores out of 6. Status thresholds: pass at 5/6 (calibrated to require at least the 500-word tier average), partial at 3/6 (the 300-499 tier), fail below. What counts as “content”? The scanner extracts visible text from the entire page body after JavaScript hydration — headings, paragraphs, lists, captions, navigation, footer. This includes everything a user could read by scrolling the page, not just the main article. Pages with heavy sidebar content (long footer link lists, large “related posts” widgets) score higher than they should; pages with thin content surrounded by minimal chrome score closer to their main-content reality.

Verification

You can verify our finding yourself with a one-line browser check. Step 1: Open the pages we sampled. Cited reports the URLs we tested. Open each in a new browser tab. Step 2: Count words via the console. Open DevTools (Cmd+Option+I / Ctrl+Shift+I), Console tab, and run:
document.body.innerText.trim().split(/\s+/).length
This returns the same word count the scanner sees — every visible word on the page, including navigation, sidebars, and footer. Compare against the score tier you fell into (800+ for full score, 500+ for partial credit, 300+ for minimum). Step 3: Estimate main-content words separately. Run this variation to count words inside the main content only:
const main = document.querySelector('main, article, [role="main"]') || document.body;
main.innerText.trim().split(/\s+/).length
If the main-content count is much lower than the full-page count, your page has substantial chrome (nav, footer, sidebars) inflating the score. The main-content number is closer to what AI models actually cite. Step 4: Spot-check thin pages. If your score is partial or failing, identify which sampled pages were thin. Pages under 500 words are usually candidates for either expansion (add depth) or consolidation (merge two thin pages into one). Product pages and landing pages often live in this zone — consider whether the page’s purpose justifies its length. If your verification disagrees with Cited’s finding, that’s a bug — let us know.

Technical detail

Word count as a depth proxy traces to early SEO research (Backlinko, Ahrefs studies from 2016 onward) showing strong correlation between content length and search ranking. Modern AI citation studies show a similar pattern: AI models routinely prefer 1,000+ word sources over 300-word sources for substantive queries. The thresholds Cited uses are conservative — 800 words is well above the 600-word minimum many studies cite — to favor pages that clearly clear the bar over edge cases. Extraction logic. The scanner counts words on the rendered page:
  • document.body.innerText returns the page’s visible text content after JavaScript hydration
  • The text is trimmed and split on whitespace (/\s+/)
  • The resulting array length is the word count
  • Empty pages (no visible text) report 0 words
This is a deliberate simplicity choice — the scanner doesn’t try to identify “main” content vs chrome. Pages with thin main content surrounded by lots of navigation, footer, or sidebar content will score higher than their actual editorial depth warrants. The trade-off favors consistency across diverse page templates: any heuristic for identifying “main” content fails on a meaningful fraction of real sites (Shopify product pages, sidebar-heavy blogs, single-page-app routes). Aggregation. Total word count is summed across all sampled pages, divided by the page count, and rounded. The score lookup is a flat staircase: avg > 800 → 6, avg >= 500 → 4, avg >= 300 → 2, else → 1. No interpolation between tiers — a 799-word average and a 600-word average both score 4. Edge cases the scanner handles:
  • Pages with no visible text — single-page apps that haven’t hydrated, blank error pages, paywall walls. These contribute 0 words to the average but still count as sampled pages, dragging the average down.
  • Pages with massive chrome — sites with very long footers (link sitemaps, comprehensive nav) inflate the word count without contributing editorial depth. The scanner can’t distinguish.
  • Pages with images-as-content — infographics, product detail images, photo galleries. The alt text and any visible captions count, but the visual content doesn’t. Image-heavy pages tend to underscore.
  • Hidden content — text inside display: none or visibility-hidden containers doesn’t count because innerText excludes hidden content by design. This includes collapsed accordion content visible only after a click.
  • Iframes — content inside iframes (embedded videos, third-party widgets, YouTube embeds) doesn’t count because each iframe is a separate document context.
What this signal does not measure:
  • Content quality or accuracy. A page can be 2,000 words of off-topic ramble and score 6/6. The scanner counts words, not insight.
  • Main-content depth specifically. Pages with thin articles but rich navigation score higher than they should. This is a known limitation.
  • Unique vs duplicate content. A page with 1,200 words copied from another page on your site scores the same as 1,200 words of original content. Duplicate content is a separate concern that we don’t currently model.
  • Reading level or readability. Word count is independent of how readable the content is. A 1,000-word page written at a PhD reading level scores the same as a 1,000-word page written for general audiences.
For brands scoring in the 1-2/6 tier, the highest-leverage fix is identifying the 3-5 most important pages (homepage, cornerstone content, top product pages) and expanding them to 800+ words each. The signal evaluates the sample average, so lifting your most-sampled pages lifts the score. See also: Heading Hierarchy, Answer-Block Formatting, Internal Linking Quality.