Draft ten questions from support tickets, sales calls, or front-desk notes — not hypotheticals.
Run each question three times. Record contradictions.
Include typos, missing context, and adversarial tone. Behavior shifts under stress.
Have someone who did not build the feature classify answers: acceptable, confusing, or harmful.
If two of ten are confusing or harmful, delay release — regardless of demo quality.