The quality of AI visibility measurement depends on the quality of the prompts. Cited uses an automated query generation pipeline — Query Gen V2 — that crawls brand and competitor websites, synthesizes consumer-authentic questions, and classifies each by intent type. The goal is to produce prompts that sound like real customer questions, not marketing jargon.Documentation Index
Fetch the complete documentation index at: https://docs.getcited.in/llms.txt
Use this file to discover all available pages before exploring further.
Query Gen V2 pipeline
The pipeline runs in three stages when a brand is onboarded or when the query cache expires (30-day TTL). Stage 1 — Brand intelligence crawl. Cited’s brand intelligence module crawls the brand’s website and competitor sites to understand product positioning, features, pricing, category context, and use cases. The crawled data is stored as a JSONB profile with a 30-day cache. Stage 2 — Query synthesis. Claude Sonnet (claude-sonnet-4-6) processes the crawled intelligence and generates candidate queries across six consumer intent types. The model is prompted to produce questions that sound like real customer conversations, not keyword-style searches. Stage 3 — Quality filtering. Each candidate query passes through a three-gate filter:- Consumer authenticity scoring — Claude Haiku (claude-haiku-4-5-20251001) rates each query on specificity, purchase intent, and consumer authenticity. Queries below a consumer_authenticity score of 4 are discarded.
- Banned jargon filter — a hard-coded list of marketing jargon terms. Queries containing phrases like “innovative solution” or “cutting-edge technology” are rejected regardless of other scores.
- Semantic deduplication — OpenAI’s text-embedding-3-small model generates embeddings for all surviving queries. Pairs with cosine similarity above 0.85 are flagged as duplicates and one is removed.
Six consumer intent types
Each generated query is classified into one of six consumer intent types, designed to reflect how real customers ask questions across their decision journey.| Intent type | Target share | Description | Example |
|---|---|---|---|
| problem_first | 25% | Customer describes a real problem | ”my skin gets oily by afternoon” |
| context_specific | 20% | Customer has a specific use case | ”best mattress for back pain under 20000” |
| budget_anchored | 15% | Price-conscious comparison | ”affordable noise cancelling earphones” |
| comparison | 15% | Head-to-head evaluation | ”boat vs noise earbuds” |
| recommendation_seeking | 15% | Open to suggestions | ”which HR software do startups use in India” |
| feature_curious | 10% | Specific feature question | ”does wakefit mattress have cooling gel” |
Anti-jargon filtering
The query generation pipeline includes a banned vocabulary list and a consumer_authenticity score gate. Queries that sound like marketing copy are rejected in favor of queries that sound like real customer language. Thejargon_free flag is a hard gate — queries that fail it are discarded regardless of other quality scores.
This filter exists because LLMs respond differently to marketing-style queries than to consumer-style queries. “Which CRM is easiest to set up” surfaces different brand mentions than “which CRM leverages innovative AI capabilities.” The consumer-authentic framing produces measurements that reflect real customer discovery patterns.
Query banks and brand coverage
The Cited Index uses 185 active queries across 8 categories (20-25 per category), applied to all 253 brands in the index. Each query is used for every brand in its category — the same 25 travel queries are run for all 21 travel brands. For individual brand dashboards, custom query libraries are generated per brand using the full Query Gen V2 pipeline. Plan-tier query counts determine the library size:- Starter: 25 queries
- Pro: 75 queries
- Scale: 125+ queries
Related concepts
- Query intent taxonomy — the four-type conceptual framework
- Cited Index benchmarks — the 185-query, 253-brand dataset
- How we extract mentions — what happens after queries run
- Non-determinism — why the same query produces different results
Frequently asked questions
Can I see or modify the queries used for my brand?
Can I see or modify the queries used for my brand?
Yes. The Cited dashboard shows the full prompt library for each tracked brand. Custom queries can be requested for deep-dive audits. The standard monitoring pipeline uses the auto-generated query set.
How often are queries refreshed?
How often are queries refreshed?
Query banks have a 30-day cache TTL. When the cache expires, the brand intelligence crawl re-runs and generates fresh queries. Manual refresh is available for audits or when a brand’s positioning changes significantly.
Why not just use the most common Google searches as prompts?
Why not just use the most common Google searches as prompts?
Google search queries are keyword-optimized (“best CRM India 2026”). AI conversations are more natural and conversational (“which CRM do startups in India actually use”). Using Google-style keywords would measure something different from what customers actually ask AI platforms.