Why the same prompt produces different answers across runs — and how Cited accounts for this in measurement.