Methodology
Cited samples up to 5 pages from your site and extracts every date it can find on each. The signal evaluates the most recent date across the entire sample — not per page, not an average — because AI models look at the freshest available timestamp when deciding whether to cite a page or treat its content as stale. Dates come from three structured sources:<time>elements with adatetimeattribute (<time datetime="2026-05-15">) or visible text content- Open Graph article meta tags —
<meta property="article:published_time">and<meta property="article:modified_time"> - JSON-LD
datePublishedanddateModifiedfields inside any structured data block
- Less than 3 months old → 5/5. Fresh content — AI models treat the page as current.
- 3 to 12 months old → 3/5. Moderately fresh — AI may cite but with implicit recency caveats.
- More than 12 months old → 1/5. Stale — AI models systematically downweight or skip for time-sensitive queries.
- No dates found anywhere → 1/5. Same tier as stale, because AI models can’t distinguish “fresh content with no datestamp” from “old content the author chose not to date.”
Verification
You can verify our finding yourself in a browser. Step 1: Open the pages we sampled. Cited reports the URLs we tested. Open each in a new tab. Step 2: Extract dates via the console. Open DevTools (Cmd+Option+I / Ctrl+Shift+I), Console tab, and run:
new Date('your-string') — if isNaN(parsed.getTime()) is true, the string is unparseable).
Step 4: Check for missing dates on important pages. If your homepage, top blog posts, or cornerstone product pages don’t have any of the three date sources, AI models can’t tell when the content was last refreshed. Adding <time datetime="…"> to bylines or dateModified to JSON-LD on those pages is the fastest fresh-up.
If your verification disagrees with Cited’s finding, that’s a bug — let us know.
Technical detail
Content freshness as a citation signal traces to information retrieval research from the early 2000s, formalized in Google’s “QDF” (Query Deserves Freshness) algorithm description around 2007. AI models inherited the same recency weighting: for queries about evolving topics — product comparisons, pricing, regulatory guidance, anything tied to a year — the freshest credible source ranks first. Extraction logic. The crawler runs three independent date extraction passes per page and concatenates the results into one list:<time>elements — every<time>tag in the rendered DOM contributes either itsdatetimeattribute value (preferred) or its trimmedtextContent. If both exist,datetimewins because it’s the machine-readable form.- Open Graph meta tags — the
article:published_timeandarticle:modified_timemeta properties are read directly from<head>. Both contribute if both exist. - JSON-LD dates — every JSON-LD block is parsed (with errors silently swallowed); each parsed object’s
datePublishedanddateModifiedfields are extracted. Arrays at the top level are walked. The scanner does NOT recurse into nested objects likemainEntity— only top-level dates count.
parseDate(), which calls new Date(dateStr), rejects NaN.getTime() (invalid), and rejects values more than 86,400,000ms (one day) in the future. This catches the most common parsing errors: malformed timestamps, accidental epoch values, and content-management systems that emit far-future dates as drafts.
Score calculation. Valid dates are sorted descending. The most recent date’s calendar-month delta from Date.now() determines the tier: < 3 → 5, <= 12 → 3, else 1. Sites with zero valid dates score 1, identical to the stale tier.
Edge cases the scanner handles:
- Multiple date formats —
2026-05-15,May 15, 2026,2026-05-15T14:30:00Z,Wed, 15 May 2026 14:30:00 GMTall parse vianew Date()and yield the same instant. The scanner doesn’t require a specific format becausenew Date()accepts most common forms. - Future-dated drafts — some CMSes emit
dateModifiedslightly in the future when content is being staged. The scanner accepts up to 24 hours of future drift but rejects anything beyond. Strict adherence would miss legitimate scheduled-publish workflows. - Pages with no date in the DOM but dates in JSON-LD — modern sites often have visible bylines stripped from the design but still emit
datePublishedin Article schema. These pages count as dated because the scanner sees the JSON-LD. A clean design isn’t penalized as long as the structured data carries the timestamp. <time>elements withoutdatetimeattribute — visible text like<time>May 15, 2026</time>falls back to the text content. The string is then parsed; failures are silently dropped. Most human-readable date phrasings (“May 15, 2026”, “15 May 2026”) parse; ones with relative terms (“yesterday”, “3 days ago”) don’t.- Time zones — dates without time zones are interpreted as UTC by
Date(). A site in another time zone whosedatePublishedlacks a+00:00suffix might appear off-by-one in edge cases near the day boundary, but this never changes the month-bucket score.
- Whether the content was actually updated. A site can update its
dateModifiedfield nightly via cron without changing any content. The scanner trusts the timestamp; AI models often cross-reference content changes against timestamps but Cited doesn’t. - Per-page freshness. A site with one updated-yesterday cornerstone page and four 5-year-old pages scores 5/5 because the most-recent date is recent. The aggregation rewards having any fresh anchor; it doesn’t penalize per-page staleness.
- The right page being fresh. Updating a footer disclaimer’s timestamp doesn’t help if the article AI wants to cite is still dated 2019. The scanner sees the freshest date anywhere across the sample, not the freshest date on the page AI would actually quote.
- Content type appropriateness. An evergreen explainer doesn’t need a 2026 timestamp the way a “Best products of 2026” listicle does. The signal applies the same threshold to all pages because the scanner has no content-type awareness.
dateModified values, refreshing the underlying content, and updating the timestamp. The order matters — bumping timestamps without refreshing content is a signal AI models eventually detect and penalize. Most CMSes (WordPress with Yoast, Webflow CMS, Ghost) emit dateModified automatically when a post is republished.
See also: JSON-LD Structured Data, Author Attribution, Sitemap Accessibility.