Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getcited.in/llms.txt

Use this file to discover all available pages before exploring further.

The quality of AI visibility measurement depends on the quality of the prompts. Cited uses an automated query generation pipeline — Query Gen V2 — that crawls brand and competitor websites, synthesizes consumer-authentic questions, and classifies each by intent type. The goal is to produce prompts that sound like real customer questions, not marketing jargon.

Query Gen V2 pipeline

The pipeline runs in three stages when a brand is onboarded or when the query cache expires (30-day TTL). Stage 1 — Brand intelligence crawl. Cited’s brand intelligence module crawls the brand’s website and competitor sites to understand product positioning, features, pricing, category context, and use cases. The crawled data is stored as a JSONB profile with a 30-day cache. Stage 2 — Query synthesis. Claude Sonnet (claude-sonnet-4-6) processes the crawled intelligence and generates candidate queries across six consumer intent types. The model is prompted to produce questions that sound like real customer conversations, not keyword-style searches. Stage 3 — Quality filtering. Each candidate query passes through a three-gate filter:
  1. Consumer authenticity scoring — Claude Haiku (claude-haiku-4-5-20251001) rates each query on specificity, purchase intent, and consumer authenticity. Queries below a consumer_authenticity score of 4 are discarded.
  2. Banned jargon filter — a hard-coded list of marketing jargon terms. Queries containing phrases like “innovative solution” or “cutting-edge technology” are rejected regardless of other scores.
  3. Semantic deduplication — OpenAI’s text-embedding-3-small model generates embeddings for all surviving queries. Pairs with cosine similarity above 0.85 are flagged as duplicates and one is removed.
The pipeline repeats stages 2-3 up to three times if the initial pass does not produce enough queries to meet the target count for the brand’s plan tier.

Six consumer intent types

Each generated query is classified into one of six consumer intent types, designed to reflect how real customers ask questions across their decision journey.
Intent typeTarget shareDescriptionExample
problem_first25%Customer describes a real problem”my skin gets oily by afternoon”
context_specific20%Customer has a specific use case”best mattress for back pain under 20000”
budget_anchored15%Price-conscious comparison”affordable noise cancelling earphones”
comparison15%Head-to-head evaluation”boat vs noise earbuds”
recommendation_seeking15%Open to suggestions”which HR software do startups use in India”
feature_curious10%Specific feature question”does wakefit mattress have cooling gel”
These six types are the production classification for query generation. They are related to but distinct from the four-type intent taxonomy used in the foundations (informational, commercial, navigational, transactional). The six types here are more granular because they drive query generation targeting; the four-type taxonomy is a conceptual framework for reporting and analysis.

Anti-jargon filtering

The query generation pipeline includes a banned vocabulary list and a consumer_authenticity score gate. Queries that sound like marketing copy are rejected in favor of queries that sound like real customer language. The jargon_free flag is a hard gate — queries that fail it are discarded regardless of other quality scores. This filter exists because LLMs respond differently to marketing-style queries than to consumer-style queries. “Which CRM is easiest to set up” surfaces different brand mentions than “which CRM leverages innovative AI capabilities.” The consumer-authentic framing produces measurements that reflect real customer discovery patterns.

Query banks and brand coverage

The Cited Index uses 185 active queries across 8 categories (20-25 per category), applied to all 253 brands in the index. Each query is used for every brand in its category — the same 25 travel queries are run for all 21 travel brands. For individual brand dashboards, custom query libraries are generated per brand using the full Query Gen V2 pipeline. Plan-tier query counts determine the library size:
  • Starter: 25 queries
  • Pro: 75 queries
  • Scale: 125+ queries
Custom libraries include a broader mix than the Cited Index — branded queries, comparison queries, and use-case queries alongside category queries. This is why dashboard mention rates are typically higher than Cited Index benchmarks.

Frequently asked questions

Yes. The Cited dashboard shows the full prompt library for each tracked brand. Custom queries can be requested for deep-dive audits. The standard monitoring pipeline uses the auto-generated query set.
Query banks have a 30-day cache TTL. When the cache expires, the brand intelligence crawl re-runs and generates fresh queries. Manual refresh is available for audits or when a brand’s positioning changes significantly.
Google search queries are keyword-optimized (“best CRM India 2026”). AI conversations are more natural and conversational (“which CRM do startups in India actually use”). Using Google-style keywords would measure something different from what customers actually ask AI platforms.