How you design a survey or a form will affect the answers you get. This includes the language you use, the order of the questions, and, of course, the survey scale: the default values and ranges you use.
Survey response scales can be embedded in the survey (1–5, 1–10, etc.), chosen via a drop-down menu, or included as part of the survey language.
No matter how you choose to display the scale, the default ranges affect the precision of your data. For example, if a survey asks your age, a default range of 20–25 instead of 20–30 has an impact.
Surveys are a great source of insight into your visitors’ attitudes. Certain surveys also allow you to compare yourself with the competition. But as Jared Spool notes, the nuances of survey scale design adds challenges:
So what can you do to get accurate data? It starts with understanding some of the differences and shortcomings of survey scales.
3 types of survey response scales
When designing surveys, there tend to be three different models for survey response scales:
- Rating scales;
- Semantic differential scales.
1. Dichotomous scales
Dichotomous scales have two choices that are diametrically opposed to each other. Some examples:
- “Yes” or “No”;
- “True” or “False”;
- “Fair” or “Unfair”;
- “Agree” or “Disagree.”
There’s no nuance, and there’s no way for a respondent to be neutral. But there’s actually a lot of value in the lack of a neutral option.
Sometimes, especially in long surveys, you’re subject to what’s known as the error of central tendency: Answers gradually regress to the middle of the scale—the neutral options.
A dichotomous scale gives you a clearer, binary answer, but can also fall prey to fatigue. When that happens, respondents lean toward positive answers.
2. Rating scales
You’re probably most familiar with rating scales (e.g., “On a scale of 1–10, how satisfied were you with our service today?”
The three most common rating scales are:
- 1–5 (or Likert scale).
Is there a difference in the outcome based on which scale you choose? Totally. There’s more variance in larger scales, which has made the Likert scale the most common survey scale.
Dr. Rob Balon advises to “always use the 1–5 scale, with 5 being the positive end and 1 being the negative end. NEVER use 1 as the positive end.”
A point on the Likert scale
Another great point from Spool’s talk touches on Likert Scales. He rails against the labels we use on scales (satisfied and dissatisfied) instead of the scale itself:
So even if satisfied and dissatisfied are “common practices,” they may not be “best practices”—especially in user experience research. You’re trying to delight customers, not just “satisfy” them.
3. Semantic differential scales
Semantic differential scales gather data and “interpret based on the connotative meaning of the respondent’s answer.” These scales usually have dichotomous words at either end of the spectrum.
They measure more specific attitudinal responses:
According to Balon, “Ironically, when you factor analyze SD scales, they basically break out into two factors: positive and negative. There is really no need for seven steps.”
Which survey scale should you use?
It depends on the type of data you want.
Dichotomous scales (“yes” vs. “no”) are great for precise data, but they don’t allow for nuance in respondents’ answers. For instance, asking if a customer was happy with an experience (yes or no), gives you almost no insight into how to improve the experience for an average customer.
A Likert Scale or Net Promoter Score (NPS) is better for that task because of its increased range. Although—and this is a big point—says Spool, “Anytime you’re enlarging the scale to see higher-resolution data, it’s probably a flag that the data means nothing.”
The more quantifiable the information is (behavior questions, for instance), the smaller the range should be. When you want to measure attitudes or feelings, using a 5 or 7 point semantic-differential scale is a good strategy. Likert scales (satisfied vs. dissatisfied) are a little generic for attitudes, and, as SurveyGizmo said, “semantic differential questions are posed within the context of evaluating attitudes.”
There’s also an older scale, the Guttman scale, that puts a twist on dichotomous and Likert scales. You ask a series of questions that build on each other and escalate in intensity. Here’s a great example from changingminds.org:
Spool talked about the Guttman scale in its relation to customer surveys, saying:
If you’re not happy enough to recommend the product, you’re not going to be confident, and you’re not going to feel it has good integrity if you’re not confident, and you’re not going to have pride in it unless they have good integrity, and you’re definitely not going to be passionate about them unless they do everything else.
This can be a useful tool for measuring satisfaction.
Ordinal and interval scales
Developed by S.S. Stevens and published in a 1946 paper, there are four types of these scales:
There’s perpetual debate about ordinal and interval scales.
Ordinal scales are numbers that have an order, like “a runner’s finishing place in a race, the rank of a sports team, and the values you get from rating scales used in surveys or questionnaires like the Single Ease Question.”
With ordinal scales, if you’re asking a customer how satisfied they were on a scale of 1–5, a 4 doesn’t necessarily mean they were twice as satisfied as a 2. The difference between a 1 and a 2 isn’t necessarily the same as the difference between a 4 and a 5.
Interval scales establish equal distances between ordinal numbers—for example, when we measure temperature in Fahrenheit. The difference between 19 and 20 degrees is the same as between 80 and 81.
What’s the practical difference?
There are two arguments.
The classic stance, from S.S. Stevens, is that you can’t compute means on anything other than interval data. As Sauro explained it, “he said that you can’t add, subtract much less compute a mean or standard deviations on anything less than interval data.” Sauro continues:
However, the other argument, set forth by Frederick Lord (inventor of the SAT), says you can. According to him, it doesn’t matter where the numbers come from, you can work with them the same way. Jeff Sauro gave a great example:
Here are 6 task times (ratio data):
Here are 6 high temperatures in Celsius from a Northeastern US city (interval data):
Here are 6 responses to the Likelihood to Recommend Question (ordinal data):
Now here are 6 numbers that came from the back of football jerseys (nominal data):
Does it matter whether your data is interval or ordinal?
Outside of academia, there’s not a lot of debate. While the magnitude of the difference is important, too, what’s most important is the evidence of improvement. Jeff Sauro explains the practical implications:
And, according to Balon, “Outside of academia there is virtually no argument. Most online surveys utilize descriptive statistics and simple banners or cross tabs that can be analyzed using Chi-square, which is a nonparametric analytical tool.”
Anyway, it’s impossible to evaluate the validity of ratings of human perception, anyway. So, in general, you can feel good about working with ordinal data.
The limitations of survey scales
Even if you design the perfect survey with the appropriate scales, there are still limitations. This is especially true if you run a limited range of surveys or conduct surveys sporadically (and without other forms of conversion research).
The meaning behind the numbers
When you run a scale like, say, the Net Promoter Score, you get a number. You can compare that with your competitors and your past scores, but there are limitations to how much it can tell you about your user experience.
I haven’t heard a better explanation than from Spool’s talk on design and metrics:
All of this is to say that ratings scales can tell you a lot, but they can’t tell you everything. Be skeptical when people tell you there’s one question that will tell you how your company is doing.
Little tweaks, big differences
Almost any factor can influence the outcome of a survey (which is why Spool highlights the difficulty of accurate benchmarking data).
GreatBrook, a research consulting firm, ran an experiment with a client in which they created a bunch of surveys with the same attributes, just different scale designs. They gave the questionnaires to 10,000 people and found some interesting things:
- Providing a numeric scale with anchors only for the endpoints (e.g., a 1–5 scale was presented with verbal descriptions only for the 1 and 5 endpoints), which led more people to choose the endpoints.
- Presenting a scale as a series of verbal descriptions (e.g., “Are you extremely satisfied, very satisfied, somewhat satisfied, somewhat dissatisfied, very dissatisfied, or extremely dissatisfied?”) led to more dispersion and less clustering of responses.
- A “school grade” scale led to even more dispersion. A school grade scale asks the respondent to grade performance on an A, B, C, D, and F scale.
Using appropriate language and scales
For certain information (e.g., age) there are many ways you can ask for it. Each produces a different level of precision.
According to MyMarketResearchMethods.com, if you want to report an average age, use a ratio scale instead of a nominal scale.
Since the ratio scale is more accurate, why do survey makers use ranges for this question (and questions like income)? Because these are personal questions. Some people are sensitive about disclosing their exact age or income. A range makes people feel more comfortable about sharing information.
According to Balon, “You almost have to ask age, income, ethnicity, etc., using a nominal clustering approach. Otherwise you run a great risk of a non-response error.”
Grouping survey responses based on known characteristics
For certain information, like age or salary, you want to group survey responses based on known characteristics. In other words, the way you group incomes depends on the population you’re studying.
If it’s college students, the ranges will be lower. If it’s the general population, $20k or less is a good first rung; $21–39k is next; $40–69k, $70–99k; $100–150k; and $150+.
As Balon advises, “For whatever population you’re studying, make sure those income breaks line up with the known characteristics of the population. Not doing this can create additional bias.”
Similarly, writing surveys in your customers’ own language is important. Use the phrases, jargon, and emotions that your customers are familiar with.
Best practices for demographic insights
With sensitive information like demographic info, how do you establish which defaults to use, which words to use, which scale to use, etc.?
Other than focus groups and interviews, there are some general guidelines and best practices (listed here). If you follow these, your respondents will likely have taken surveys like it before and, therefore, will know how to answer questions based on past experience.
For example, the guideline for age ranges is the following:
- Under 18 years;
- 18 to 24 years;
- 25 to 34 years;
- 35 to 44 years;
- 45 to 54 years;
- 55 to 64 years;
- 65 or older.
You can also have an experienced market research consultant come in and tell you if you’re running things well. But, of course, that’s a another expense.
Ultimately, you have to balance the level of specificity you want with the comfort level of your audience.
Even though default ranges for survey questions seem arbitrary, there’s a lot of thought and design behind them. Whether you use a Likert scale, a dichotomous scale, or a semantic differential scale depends on what you’re trying to learn.
In addition, when trying to obtain sensitive information like age or income, asking for exact numbers just won’t work (non-response bias), so use nominal clusters (e.g., 18–24 etc.).
Before designing a customer survey, learn:
- Survey scale best practices;
- The words, phrases, and ranges your audience will respond to best.