How you design a survey or a form will affect the answers you get. This includes the language you use, the order of the questions, and of course the default values and ranges you use.
This article will focus on response scales. This could be in a survey (1-5 vs 1-10 scale, etc) or on a dropdown menu, and it also includes the language you use.
Think back to the last survey that you filled out that asked you your age or salary. How do the default ranges (age 20-25 vs age 20-30) affect the precision of your data?
Surveys, of course, provide a great source of insight into your visitors’ attitudes. Certain surveys also allow you to compare yourself with the competition. But as Jared Spool of UIE put it in a recent talk, because of nuances in survey design, cross-company comparison may not be that easy:
So what can you do to get accurate data? It starts with understanding some of the differences and shortcomings in different types of survey responses.
First, here are some different types of response scales…
3 Types of Survey Response Scales
When designing surveys, there tend to be three different models for survey response scales:
- Rating Scales
- Semantic Differential Scales
1. Dichotomous Scales
Dichotomous scales have two choices that are diametrically opposed to each other. Some examples would be:
- “Yes” or “No”
- “True” or “False”
- “Fair” or “Unfair”
- “Agree” or “Disagree”
There’s no chance for nuance in a response, and there’s no way for a respondent to be neutral. But there’s actually a lot of value in the lack of a neutral option.
Sometimes, especially in long surveys, you’re subject to what’s known as the error of central tendency: when answers gradually regress to the middle of the scale, or the neutral options. A dichotomous scale will give you a clearer, binary answer, but can also fall prey to fatigue – respondents then tend to lean toward positive answers.
2. Rating Scales
Rating scales are probably what you’re most familiar with. “On a scale of 1-10, how satisfied were you with our service today?”
The three most common rating scales are:
- 1-10 scale
- 1-7 scale
- Likert scale (1-5)
Is there a difference in the outcome based on which scale you choose? Totally. There’s more variance in the larger scales, so the norm is to use the Likert scale.
The most common, then, is the Likert scale. Dr. Rob Balon advises to “always use the 1-5 scale with 5 being the positive end and 1 being the negative end. NEVER use 1 as the positive end.”
A Point On The Likert Scale
Another great point from Jared Spool’s talk is on Likert Scales, one of the most commonly used in survey design. He actually rails against the labels we use on the scales (satisfied and dissatisfied) instead of the scale itself:
So even if satisfied and dissatisfied are “common practices,” they may not be “best practices.” Especially in user experience research, where you’re really trying to delight customers, not just satisfy them.
3. Semantic Differential Scales
Semantic differential scales are used to gather data and “interpret based on the connotative meaning of the respondent’s answer.” These two usually have dichotomous words at either end of the spectrum. They generally measure more specific attitudinal responses, such as the following:
According to Dr Rob Balon, CEO of The Benchmark Company, “ironically, when you factor analyze SD scales, they basically break out into two factors: positive and negative. There is really no need for seven steps.”
Which Should You Use?
It depends what type of data you want.
Dichotomous scales (“yes” vs “no”) are great for precision in your data, but they don’t allow for any sort of nuance in respondents’ answers. For instance, asking if a customer was happy with the experience (yes or no), gives you almost no insight into if you’re improving experiences for the average customer.
Something like a Likert Scale or an NPS could be better for that because of the increased range of the scale. Although, and this is a big point, Jared Spool said, “Anytime you’re enlarging the scale to see higher-resolution data it’s probably a flag that the data means nothing.”
I think, then, that the more quantifiable the information is (behavior questions for instance), the smaller the range should be. When you want to measure attitudes or feelings, using 5 or 7 point semantic differential scale is a good strategy. Likert scales (satisfied vs dissatisfied) are a little generic for attitudes, and as SurveyGizmo said, “semantic differential questions are posed within the context of evaluating attitudes.”
There’s also something an old technique known as a Guttman scale that puts a twist on either dichotomous or Likert scales. What you do is ask a series of questions that build on each other and escalate in intensity. Here’s a great example from changingminds.org:
Jared Spool talked about the Guttman scale in its relation to customer surveys, saying, “If you’re not happy enough to recommend the product, you’re not going to be confident and you’re not going to feel it has good integrity if you’re not confident and you’re not going to have pride in it unless they have good integrity, and you’re definitely not going to be passionate about them unless they do everything else.”
This can be a useful tool for measuring satisfaction.
Ordinal and Interval Scales
Developed by S.S. Stevens and published in a 1946 paper, there are 4 types of
Pertaining to response scale, there’s a decent debate forever waging over ordinal and interval scales.
Ordinal scales are numbers that have an order, like “a runner’s finishing place in a race, the rank of a sports team and the values you get from rating scales used in surveys or questionnaires like the Single Ease Question.” (source)
With ordinal scales, if you’re asking on a scale of 1-5 how satisfied a customer was, a 4 doesn’t necessarily mean they’re twice as satisfied as a 2. The difference between a 1 and a 2 isn’t necessarily the same as the difference between a 4 and a 5.
Interval scales are when we can establish equal distances between ordinal numbers – for example, when we measure temperature in degrees Fahrenheit. The difference between 19 and 20 degrees is the same as 80 and 81.
What’s The Practical Difference?
There are two arguments here.
The classic stance, from S.S. Stevens, is that you can’t compute means on anything other than interval data. As Sauro explained it, “he said that you can’t add, subtract much less compute a mean or standard deviations on anything less than interval data.” Sauro continues:
However, the other argument set forth by Frederick Lord (inventor of the SAT) says you can. According to him, it doesn’t matter where the numbers come from, you can work with them the same way. Jeff Sauro gave a great example of this…
Here are 6 task times (ratio data):
Here are 6 high temperatures in Celsius from a Northeastern US city (interval data):
Here are 6 responses to the Likelihood to Recommend Question (ordinal data):
Now here are 6 numbers that came from the back of football jerseys (nominal data):
Does it Matter Whether Your Data is Interval or Ordinal?
Outside of academia, there’s not a lot of debate. While the magnitude of the difference is important, too, what’s actually important is the evidence of improvement. Jeff Sauro explains in a practical light what this means:
And according to Dr Rob Balon, “outside of academia there is virtually no argument. Most online surveys utilize descriptive statistics and simple banners or cross tabs that can be analyzed using Chi-square which is a nonparametric analytical tool.”
Anyway, it’s impossible to evaluate the validity of the ratings of human perception, anyway. So feel good working with ordinal data in general.
The Limitations of Survey Scales
Even if you design the perfect survey with the appropriate scales, there are still limitations in the insight you can conceivable uncover. Especially if you’re only running a limited range of surveys or conduct such surveys sporadically (and without other forms of conversion research).
The Meaning Behind The Numbers
When you run a scale like, say, the Net Promoter Score, you get a number and you can compare that with your competitors and your past scores, but there are certainly limitations in how much it can tell you about your user experience.
I haven’t heard a better explanation of this than in Jared Spool’s talk on design and metrics:
All of this is to say that ratings scales can tell you a lot, but they can’t tell you everything. Be skeptical when people tell you there’s one question that will tell you how your company is doing.
Little Tweaks, Big Differences
Almost any factor can influence the outcome of a survey (which is why Spool’s quote above talked about the difficulty of accurate benchmarking data).
GreatBrook, a research consulting firm, did an experiment with a client where they created a bunch of different surveys with the same attributes, just different scale designs. They posed the questionnaires to 10,000 people, and found some interesting things:
- Providing a numeric scale with anchors only for the endpoints (e.g. a 1 to 5 scale was presented with verbal descriptions only for the 1 and 5 endpoints) led to more people choosing the endpoints.
- Presenting a scale as a series of verbal descriptions (e.g. “Are you extremely satisfied, very satisfied, somewhat satisfied, somewhat dissatisfied, very dissatisfied, or extremely dissatisfied?”) lead to more dispersion and less clustering of responses.
- A “school grade” scale led to more dispersion. A school grade scale is where you ask the respondent to grade performance on an A, B, C, D, and F scale.
Using Appropriate Language and Scales
For certain information (say age) there are many ways you can ask for it. Each one produces a different level of precision.
According to MyMarketResearchMethods.com, if you want to report an average age, you would want to use a ratio scale instead of a nominal scale.
Since the ratio scale is more accurate, why do you see ranges for this question (and questions like income)? Because they’re personal. And some people can be sensitive about disclosing their exact age or income, so people are more comfortable giving a range, such as that seen in the nominal scale example above.
According to Dr Rob Balon, “you almost have to ask age, income, ethnicity, etc. using a nominal clustering approach, Otherwise you run a great risk of non-response error.”
Group Based on Known Characteristics
For certain information, like age or salary, you want to group based on known characteristics. In other words, the way you group incomes depends on the population you’re studying.
If it’s college students the ranges would be much lower. If it’s a study of the general population, $20k or less is a good first rung. $21 to 39k is next, $40-$69, $70-$99k, $100 to $150k, and $150 plus.
As Dr Rob Balon advises, “for whatever population you’re studying, make sure those income breaks line up with the known characteristics of the population. Not doing this can create additional bias.”
Similarly, writing surveys in your customers’ own language is important. How do you do that? You get on the phone and talk to your customers. Or run focus groups. Or run some on-site surveys.
No matter what, you want to use the words – the phrases, jargon, emotions – that your customers are used to communicating with.
Best Practices for Demographic Insights
So with sensitive information like demographic info, how do you establish which defaults to use, which words to use, which scale to use?
Well, other than focus groups and interviews to get to know your customers better, there are some general guidelines and best practices (listed here). If you follow these, at the very least your respondents will likely have taken surveys like it before, and therefore will know how to answer things based on that context.
For example, but the guideline for age ranges is the following:
- Under 18 years
- 18 to 24 years
- 25 to 34 years
- 35 to 44 years
- 45 to 54 years
- 55 to 64 years
- Age 65 or older
You can also have an experienced market research consultant come in and tell you if you’re running things well, but of course, that’s a another expense entirely.
Again though, you’ll have to balance the level of specificity you want to obtain with the comfort your audience feels in answering the questions.
Even though it seems like default ranges for survey questions are arbitrary, much though and design is behind them. Whether you use a Likert scale, a dichotomous scale, or a semantic differential scale depends on what you’re trying to learn.
In addition, when trying to obtain sensitive information like age or income, asking for exact numbers just won’t work (non-response bias), so we use nominal clusters (18-24 etc).
The best thing you can do is, before designing a customer survey, learn survey scale best practices (there’s a link above) and learn what words, phrases, and ranges your audience will best respond to.