Good user research asks the good survey questions to the right people. If you fail on either account, you may make million-dollar decisions on bad data.
Leading questions are an easy way to poison your data. A leading question is “a question asked in a way that is intended to produce a desired answer.”
If you’ve worked in marketing or sales, you know leading questions well: They’re wonderfully effective at guiding consumers toward a “yes” for a product or service. (“Would you like to lose 10 lbs. without leaving the couch?!”)
That power is the same reason they’re so dangerous to user research, especially since a “leading question” can result from factors beyond the words in the question:
- question topic;
- order in which it’s asked;
- answers options;
- survey-wide aspects.
All have the potential to lead respondents to “a desired answer”—and ruin your data.
How a single phrase can shape responses
Early in Norman M. Bradburn’s classic, Asking Questions: The Definitive Guide to Questionnaire Design—for Marketing Research, Political Polls, and Social and Health Questionnaires, a core source for this post, the author illustrates how a subtle shift in language affects responses:
Two priests, a Dominican and a Jesuit, are discussing whether it is a sin to smoke and pray at the same time. After failing to reach a conclusion, each goes off to consult his respective superior. The next week they meet again.
The Dominican says, “Well, what did your superior say?”
The Jesuit responds, “He said it was all right.”
“That’s funny,” the Dominican replies. “My superior said it was a sin.”
The Jesuit says, “What did you ask him?”
The Dominican replies, “I asked him if it was all right to smoke while praying.”
“Oh,” says the Jesuit. “I asked my superior if it was all right to pray while smoking.”
The potential for a single phrase to change responses is real and common, as U.S. survey results suggest:
- in cases of incurable disease, doctors should be allowed to “assist the patient to commit suicide”: 51% agree
- in cases of incurable disease, doctors should be allowed to “end the patient’s life by some painless means”: 70% agree
- “having a baby outside of marriage” is morally wrong: 36% agree
- “an unmarried woman having a baby” is morally wrong: 26% agree
- “Do you think the United States should allow public speeches against democracy?” 21% agree
- “Do you think the United States should forbid public speeches against democracy?” 39% agree
There may not be a correct wording for some questions, but every choice has an impact. And, as Bradburn notes, the complexities can compound:
- Weakly held attitudes are more vulnerable to changes in phrasing.
- Any change to phrasing from year-to-year—even a change that improves the accuracy of the responses—can invalidate comparisons from prior years.
To improve the accuracy of responses, you need to know how variables in your questions, answers, and surveys can—intentionally or not—lead respondents to a given answer.
When you think of leading questions, you likely think first of the language—the words and phrasing—of those questions. But the question type, topic, and order can be equally influential.
Which of the following is a leading question?
- “Does your employer or his representative resort to trickery in order to defraud you of your part of your earnings?”
- “With regard to earnings, does your employer treat you fairly or unfairly?”
The former, a blatantly leading question, was how Karl Marx framed it in early surveys of workers. The latter, defanged of words like “trickery” and “defraud,” offers a more neutral question.
Intentionally leading questions such as Marx’s are unlikely to plague your survey. But subtle choices can be influential. For example:
- Is the new design easier to use than the old one? The use of “new” and “old” cues respondent expectations, which are also primed to consider whether the changes make the website “easier” to use.
- Was one design easier or harder to use than another? This phrasing eliminates the bias introduced by old vs. new and gives equal weight to a positive or negative experience.
Additionally, some words, though seemingly interchangeable, have connotations that skew results. For example, asking respondents about “welfare”—a politically charged topic—yields far different levels of support compared to “assistance for the poor”:
In the context of web design, it’s easy to think of similar examples: calling content an “ad” instead of “sponsored” or identifying an element as a “pop-up” rather than a “lightbox” may shape responses.
Similar to leading questions, two other types of questions can also bias response data:
Conjunctions pose risks to questions. Both “and” and “or” often result in double-barreled questions, which force respondents to respond to two things simultaneously:
- “Are you satisfied with the pay and benefits at your office?”
- “How would you describe your experience trying to find blog or webinar content?”
While double-barreled questions are technically distinct from leading questions, the end result is similar: Poor phrasing leads respondents to provide inaccurate information.
Unlike leading questions, which suggest the desired answer, loaded questions assume one:
- “Was it easier to navigate the new design?” (leading)
- “Which of the design improvements was your favorite?” (loaded)
A loaded question is, in effect, a “hard” lead: Respondents have no choice but to agree, and their answer merely justifies the agreement.
The choice of language becomes especially critical when questions tackle sensitive topics.
“Although respondents are motivated to be ‘good respondents’ and to provide the information that is asked for,” writes Bradburn, “they are also motivated to be ‘good people.’”
In surveys, social desirability bias—respondents’ desire to be perceived as moral, smart, healthy, or any other valued characteristic—can transform seemingly innocuous questions into leading ones and, as a result, skew results. (That bias is stronger, according to Bradburn, when respondents answer questions in person or over the phone.)
Take a simple question: “On average, how much time do you spend on social media each day?” Social desirability bias may lead to underreporting—devoting whole evenings to scrolling through Facebook’s News Feed isn’t something to brag about.
To get more accurate responses, the language of the question needs to offer “outs”—ways to mitigate the impact of social desirability bias. The best way to do so depends on whether the concern is underreported or overreported behavior, or for knowledge-based questions.
Leading questions, can at times, improve the accuracy of responses. Adding an opening clause to normalize behavior can make respondents feel more comfortable:
- “The average person spends more than two hours per day on social media. How much time, on average, do you spend?”
You can also use a loaded question (as long as you retain a “None” option) to suggest that “everyone is doing it”:
- “How many cigarettes do you smoke each day?”
A third alternative is to add an authority’s perspective:
- “The Mayo Clinic reports that red wine may be heart healthy. How often do you drink red wine?”
“Do you jog?” “How many books did you read this year?” Neither question is of great consequence, but both risk overreporting due to social desirability bias.
Countering that bias can be as simple as adding a short phrase to normalize a negative response:
- “Do you happen to jog, or not?”
- “How many books, if any, did you read this year?”
Similar phrases are useful for other question types. Consider the difference between these two:
- “What do you like about…?”
- “What, if anything, do you like about…?”
The latter, Estée Lauder learned, led respondents to choose “nothing” more often.
For similar questions, Bradburn offers another solution—provide reasons why someone may not perform the behavior:
The risk of overreporting increases when the behavior is uncommon. As a classic study demonstrated, a question as basic as “Do you own a library card?” vastly inflated reporting.
Owning a library card was socially desirable but, in the location of the study, a minority of the population had one. The relative rarity of card-holding inflated results more than, say, in modern times, a survey of seat belt usage would (since most people wear seatbelts).
The need to provide an “out” applies to knowledge-based questions as well.
No one wants to come off as an idiot. That desire can lead to overreporting for knowledge-based questions, such as brand-awareness surveys. Respondents believe that they should know an answer, so they’re more likely to check yes.
Neutralize knowledge-based questions with phrases that suggest the knowledge is not expected:
- “Do you happen to know…”
- “As far as you know…”
- “Can you recall offhand…”
Another strategy is to turn knowledge questions into opinion questions. The symptoms and likely age of onset for breast cancer are known, but reframing the knowledge-based questions as opinions frees respondents to give a candid account of their beliefs:
Unipolar questions—those that consider only one side of the response—can operate as leading questions due to acquiescence bias. Respondents, especially those with limited knowledge of a topic, are more likely to agree with a statement.
Unipolar questions or prompts, such as those in an agree-disagree format, don’t offer an alternative. For example, on the “strongly agree to strongly disagree” spectrum, how would you respond to the following prompt?
- “Ads are the best way for news websites to earn money.”
How would you respond if that same prompt were changed to a forced-choice format?
- “Ads are the best way for news websites to earn money.”
- “Paywalls are the best way for news websites to earn money.”
Acquiescence bias aside, the initial statement likely triggers negative feelings. (Who likes ads?) The second option that includes the primary alternative—one that requires you to open your wallet—may cause respondents to reconsider.
A Pew Research Center poll highlights the potential divide:
The advantage of bipolar questions is that they make respondents aware of options at both ends of the spectrum—rather than leading respondents to focus on one option in isolation. (Be cautious: A poorly written bipolar question may also introduce a false choice.)
The “foot-in-the-door” technique suggests that yeses beget yeses. If you start with a modest request then follow up later with a larger request, you increase your chances of succeeding with the larger request.
In the context of leading questions, it means that the answer to a previous question influences the answer to a subsequent one. Leading questions, in other words, result not just from content within a question but also the content preceding a question.
Here are two ways it can happen.
General vs. specific questions
“When a general question and a more specific-related question are asked together,” explains Bradburn, “the general question is affected by its position, whereas the more specific question is not.” The specific question, if it comes first, may lead respondents to their answer for the general one.
For example, take two questions on marketing knowledge, ordered from general to specific:
- “How would you rate your marketing team’s overall knowledge?”
- “How would you rate your marketing team’s knowledge of multivariate testing?”
The above order is more likely to generate accurate responses. If the order were reversed, the initial question on multivariate testing may affect the more general one in two ways:
- A lack of knowledge of multivariate testing may make responders more pessimistic about their teams’ overall knowledge.
- The general question may be misinterpreted as referring to everything except knowledge of multivariate testing.
A secondary benefit of asking the more general question first is that it makes responses comparable to other surveys (assuming the other surveys asked the more general question first as well).
Question order can also help gather more accurate data by using the foot-in-the-door technique—leading the respondent to admit a socially undesirable behavior.
Bradburn shares an example from a survey seeking to learn about shoplifting. The order of questions starts with more serious criminal behavior to make the real target of the survey, shoplifting, appear less deviant:
In some instances, the only way to eliminate a leading question is to “make the biased choice that is implicit in the question wording explicit in a list of choices that include alternatives.”
Answers’ role in leading questions
To some extent, the distinction between question formulation and techniques for recording answers is an artificial one, because the form of the question often dictates the most appropriate technique for recording the answer—that is, some questions take on their meaning by their response categories. – Norman M. Bradburn
Even with a perfectly framed question, the available answer choices—and the order in which they appear—can bias responses and reverse engineer a leading question. The biggest difference in answer formats is between open-ended and closed-ended responses.
Open-ended responses are often preferred for user research—rather than boxing users into a set of predetermined answers, they provide an opportunity for people to communicate, in their own words, what’s most important to them.
That often makes it easier to avoid leading questions: You can ask broader questions that don’t suggest norms or limit responses to four or five choices. On the other hand, questions with open-ended responses assume that the respondent considers all options.
For example, when Pew asked respondents about the most important issues for choosing a president, the economy became the dominant issue only in the closed-ended version:
Whether closed-ended responses are helpful reminders—perhaps the economy was the most influential factor but simply hard to recall—or distorting elements is unclear.
One solution is to run an open-ended survey with a pilot group (or for the first year of an annual survey), then use those responses to create a closed-ended version that accurately reflects the range of responses.
Open-ended questions are also useful when you want to avoid biasing respondents by normalizing a range. Respondents, Bradburn notes, tend to avoid extreme answers, so they’re less likely to choose the top-end of a range for undesirable behaviors or the bottom end for desirable behaviors.
The open-ended strategy may work well when respondents aren’t sure of the “normal” range, like the frequency with which people eat beef for dinner:
Open-ended responses, do, however, require more time to code answers and, if coding is done improperly, can introduce errors. A biased coding effort can undermine even the most neutral set of questions and answers.
There’s another reason that, in some instances, closed-ended responses have benefits: They can intentionally normalize behavior or attitudes that respondents may be loath to report.
For better and worse, closed-ended responses set boundaries and suggest norms. Those delineations, in turn, influence responses to questions:
- Yes-or-no response options may force a false degree of certainty. A Likert scale can more accurately reflect likelihood (e.g. “How likely are you to buy this product?” instead of “Would you buy this product?”)
- Numerical ranges suggest averages and extremes. The social desirability of an attitude or behavior may shift toward the end of the scale favorable to the responder.
- A scale with an even number of options forces an opinion. An odd number of options—with a neutral response in the middle—allows respondents to sit on the fence.
While the establishment of norms represents a risk, it’s also an opportunity. If you’re concerned about underreporting, a range with an artificially high top end can normalize behavior. (The inverse is also true.)
Consider the implicit judgment in these two potential response ranges:
An “average” watcher in the left-hand version qualifies as an “extreme” viewer on the right. If you’re looking to get honest feedback about a socially undesirable behavior (even a benign one like binge-watching Netflix), the version on the left likely puts more respondents at ease.
For knowledge questions with closed-ended responses, “sleeper answers” and an “I don’t know” option can catch and prevent inaccurate answers that respondents may feel compelled to select.
Sleeper answers. Sleeper answers can help manage social desirability bias. If you’re worried that respondents may claim to recognize concepts or brands that, in fact, they do not, one solution is to add a false choice:
If 20% of respondents choose Duck Peninsula, awareness of other brands, contends Bradburn, may be inflated to a similar degree.
“I don’t know.” Keeping an “I don’t know” response available for knowledge questions helps normalize a null response and reduces wild guesses.
The order in which those answers appear matters as well.
For surveys in which respondents can’t see the answers—those over the phone or in face-to-face interviews—the serial position effect plays a role. The limitations of human attention and memory make early response options more likely to be selected.
One solution to combat the serial position effect—as well as any potential effect from response order—is to vary the order of the responses. (The solution doesn’t work for ranges or degrees that have a natural order.)
For responses about socially undesirable behaviors, Bradburn recommends starting with the least desirable option. That ensures that respondents read through more of the potential responses rather than immediately seeing and selecting the most socially desirable response.
The larger context of the survey plays a role, too.
Surveys’ role in leading questions
Do respondents know why you’re running the survey? A survey on “the obesity epidemic” may underreport fast-food consumption compared to one on “consumer preferences.”
Additionally, respondents’ desire to be good people and respondents may bias them toward telling you what you want to hear. If the UX designer is also conducting interviews about the design—and interviewees know it—they may feel compelled to offer positive feedback.
In face-to-face surveys, the responses of the interviewer, whether positive or negative, can lead respondents to elaborate or reconsider their answers. A change of tone or simple head nod can shape responses. (If you can’t keep a poker face, get someone else to conduct the interviews.)
If, as an interviewer or survey maker, you do have a perspective, Irving Seidman, author of another foundational work on qualitative research, recommends:
- Making your perspective known upfront.
- Asking the respondent what they think about that assertion.
After you complete a draft survey, many sources recommended the same thing: Get someone outside your organization to review the survey.
For those with the resources, pretesting can help identify which questions may be confusing or suggestive for respondents.
Leading questions result not merely from the language they use but also the answers they offer and the context of the survey in which they appear.
The subtleties of language may mean that there’s no perfectly neutral way to phrase a question or arrange an answer set. Every choice leads respondents in some direction.
The goal, then, is not to feign total neutrality but to construct a survey—questions and answers—in a way that:
- Makes questions and potential answers clear to respondents.
- Uses language—in questions and answers—that helps respondents feel comfortable with any response.
- Allows respondents to offer a wide range of viewpoints without forcing them into false choices or artificially rigid positions.