Skip to main content
AI models parse page structure through headings. Pages with broken hierarchy — multiple H1s, skipped levels, missing top-level — give AI no way to determine which content is the answer to a question. Clean hierarchy gets cited; broken hierarchy gets skipped.

Methodology

Cited samples up to 5 pages from your site — the same hierarchical sample used by Page Crawlability — and tests each page’s heading structure against two requirements that AI parsers consistently honor. Exactly one H1 per page. The H1 tells AI models what the page is about. Pages with zero H1s have no anchor for the page’s primary topic; pages with multiple H1s force the parser to guess which one matters. Sites built with template systems (especially older WordPress themes and some Shopify sections) often render the logo as an H1 in addition to the article title, producing two H1s on every page. Each sampled page either passes this check (1 point) or fails it. Logical nesting — no skipped levels. A heading sequence like H1 → H2 → H3 is logical. A sequence like H1 → H3 → H2 → H4 skips a level (from H1 to H3 is a 2-level jump) and confuses parsers. We walk each page’s heading list in DOM order and check that no consecutive pair jumps by more than one level. The first jump found fails the page (1 point lost). All sampled pages either pass nesting (1 point) or fail it. The scoring is a ratio: total points earned (out of 2 checks × N pages) scaled to the signal’s 7-point max. Sites where every sampled page has one H1 and clean nesting score 7/7. Sites where 3 of 5 pages have multiple H1s typically score 4-5/7 — the broken pages drag the ratio down but don’t fail the signal outright. Sites where every page is broken score 0/7 and are flagged as critical. The signal scores out of 7. Status thresholds: pass at 5/7 (70%), partial at 3/7 (40%), fail below.

Verification

You can verify our finding yourself in a browser. Step 1: Open the pages we sampled. Cited reports the specific URLs we tested. Open each one in a new browser tab. The sample usually includes your homepage, a blog or article page, a product page, an about page, and one other. Step 2: Inspect headings via DevTools. Open the browser’s developer tools (Cmd+Option+I on Mac, Ctrl+Shift+I on Windows). In the Console tab, run this one-liner to list every heading on the page:
[...document.querySelectorAll('h1,h2,h3,h4,h5,h6')].map(h => `${h.tagName} ${h.textContent.trim()}`)
The output is the same DOM-ordered list Cited’s scanner sees. Count the H1s — exactly one means the page passes the first check. Look for any sequence where one heading’s level is more than one greater than its predecessor’s — that’s a skipped-level jump and fails the second check. Step 3: Search for hidden H1s. If your DevTools console shows two H1s but you only see one rendered, the second is probably hidden by CSS or sitting in a navigation/header partial. Inspect it directly to find where it lives. Common locations: site logo wrapped in <h1>, “Home” link in nav, sticky header brand mark. Step 4: Validate with a Lighthouse Accessibility audit. Chrome DevTools includes Lighthouse — run an Accessibility audit, and the “Heading levels skip” check surfaces the same logical-nesting issue Cited measures. Lighthouse uses a slightly different presentation but the underlying rule is identical. If your verification disagrees with Cited’s finding, that’s a bug — let us know.

Technical detail

Heading hierarchy is governed by the HTML Living Standard (WHATWG, continuously updated since 2004), which specifies semantic meaning for <h1> through <h6> elements. Accessibility guidelines from the W3C (WCAG 2.1, Success Criterion 1.3.1 and 2.4.6) and AI training pipelines both rely on clean heading hierarchy to determine document structure. Extraction logic. Cited’s scanner uses DOM parsing (not regex) to extract headings:
  • document.querySelectorAll("h1, h2, h3, h4, h5, h6") collects every heading element on the rendered page
  • Each heading is recorded as {level, text} — level is the numeric H-tag (1-6), text is the trimmed textContent
  • Headings with empty text content are filtered out before evaluation
  • The page is parsed AFTER JavaScript hydration (3-second Puppeteer delay), so client-rendered headings count
Per-page evaluation. Each page receives two pass/fail checks:
  • H1 uniqueness: count of H1s must equal exactly 1. Zero H1s fails; two or more H1s fails.
  • Logical nesting: walking the heading list in DOM order, the next heading’s level must be at most one greater than the current heading’s level. The first violation fails the page; subsequent violations aren’t checked once one is found.
Score calculation. Total points = sum of pass/fail across all sampled pages × 2 checks. Ratio = total points ÷ (page count × 2). Final score = round(ratio × 7). Edge cases the scanner handles:
  • Empty heading tags<h2></h2> is silently excluded from the extracted list because the text-content filter drops it. The page isn’t penalized for the empty heading, but it also doesn’t count toward structural completeness.
  • Headings inside <header> or <footer> — these count the same as headings in the main content. Sites with multiple H1s often have one in <header> (the logo) and one in <main> (the article title). The scanner sees both.
  • Heading levels jumping DOWN — a sequence like H3 → H2 doesn’t fail nesting; only upward jumps of more than one level fail. Going from a deep heading back up to a shallower one is normal document structure.
  • JavaScript-injected headings — headings added by JS after domcontentloaded plus the 3-second hydration delay are detected. Headings injected much later (e.g., from analytics or third-party widgets) may be missed.
  • Headings inside iframes — the scanner only evaluates the parent document. Headings inside embedded iframes aren’t crawled because each iframe is a separate document context.
What this signal does not measure:
  • Descriptive heading text. A page with <h1>Welcome</h1> and a page with <h1>Best Coffee Beans for French Press</h1> both pass the H1-count check. The text quality is a topic for content depth and answer-block formatting, not hierarchy.
  • Visual heading styling. A <div class="heading-style-h1"> styled to look like an H1 isn’t a semantic H1 to AI parsers. The scanner only counts actual <hN> tags, not visual mimics.
  • Empty headings (penalty). As above, empty <h2></h2> tags don’t fail any check — they’re just invisible. Removing them is good hygiene but not measured here.
  • Heading order semantics. A page can have a logical hierarchy that nonetheless reads poorly (e.g., conclusion before introduction). Hierarchy validation is mechanical, not editorial.
For most brands using modern CMSes (Webflow, current WordPress themes, Shopify with custom templates), the most common failure is multiple H1s from logo or nav elements. Wrapping the logo in a <div> or <span> instead of <h1> is usually a 1-line CSS or template fix. See also: Answer-Block Formatting, Page Crawlability.