You want to speed up your testing efforts, and run more tests. So now the question is – can you run more than one A/B test at the same time on your site?
Will this increase the velocity of your testing program (and thus help you grow faster), or will it pollute the data since multiple separate tests could potentially affect each other’s outcomes? The answer is ‘yes’ to both, but what you should do about it depends.
Let’s look into the “why you shouldn’t” and “why you should” run multiple tests at once.
What you should consider when running multiple simultaneous tests
User comes to your home page, gets to be part of test A. Moves on to the category page, gets to be part of test B. Goes to product page – test C. Adds product to cart – is entered into test D. Completing checkout – test E in effect.
User ends up buying something, and “conversion” is registered.
- Did any of the variations in those tests influence each other, and thus skew the data? (Interactions)
- Which variation of which test gets the credit? Which of the tests *really* nudged the user into buying something? (Attribution)
Andrew explains why running multiple separate tests at the same time might be a bad idea:
A testing tool vendor – Maximyser – advocates that running multiple tests at the same time results in low accuracy:
It’s possible that the “interactions” between variants in the two tests are not equal to each other and uniformly spread out.
The argument is that there are cases where interaction effects between tests matter (often unnoticed), but it can have a major impact on your test conclusions. According to them, instead of simultaneous tests it’d be better to combine the tests, and run them as MVT.
Not everyone fully agrees:
Matt Gershoff recommends you figure two things out before determining whether to run multiple separate tests at once:
If the split between variations is always equal, doesn’t it balance itself out?
This is the standard answer Optimizely provides:
Even if one test’s variation is having an impact on another test’s variations, the effect is proportional on all the variations in the latter test and therefore, the results should not be materially affected.
Some think this model is oversimplified, and the argument also implies that attribution is not important.
Indeed, we should ask the question – do we really care about attribution? We might. Or not. If we want to know what really impacted user behavior, and which of the tests (or a combination of tests – something you can explore with a testing tool like Conductrics) was responsible, then attribution does matter. This why you have hypotheses and stuff, right?
Are you in the business of science, or in the business of making money?
The success of your testing program comprises of the number of tests run (e.g. per year), percentage of winning tests and average impact per successful experiment. Now if you severely limit the number of tests you run for the sake of avoiding data pollution, you are also significantly reducing the velocity of your testing.
If your primary goal is to figure out the validity of a single test, to be confident in the attribution and the impact of the test, then you might want to avoid tests with overlapping traffic. But while you do that, you are not running all those tests that might give you a lift – and thus you’re potentially losing money.
In essence, do you care more about the precision of the outcome, or making money?
Here’s what Lukas Vermeer thinks about running multiple tests at once on the same site:
Lukas also confirmed that he is running simultaneous tests himself.
Choose the right strategy
We want to run more tests, but we also want the results to be accurate. So what are the options that we have available to us? Matt Gershoff has done a great job explaining the options here, related article also on the Maximyser blog. I’m summarizing the 3 main strategies you should choose from:
1. Run multiple separate tests
Unless you suspect extreme interactions and huge overlap between tests, this is going to be OK. You’re probably fine to do it, especially if what you test is not paradigm-changing stuff and there’s little overlap.
2. Mutually exclusive tests
Most testing tools give you the option to run mutually exclusive tests, so people wouldn’t be part of more than 1 test. The reason you’d want to do this is to eliminate noise or bias from your results. The possible downside is that it might be more complex to set this kind of tests up, and it will slow down your testing as you’ll need adequate sample size for each of these tests.
3. Combine multiple tests into one, run as MVT
If you suspect strong interaction between tests, it might be better to better to combine those tests together, and run as a MVT. This option makes sense if the tests you were going to run measure the same goal (e.g. purchase), they’re in the same flow (e.g. running tests on each of the multi-step checkout steps), and you planned to run them for the same duration.
MVT doesn’t make sense if Test A was going to be about an offer and Test B experimenting with the main navigation – low interaction.
How to balance testing speed and accuracy of results?
Testing speed and accuracy of test results is a trade-off, and there is no single right answer here, although these three experts recommend similar approaches:
Like with most things in life, there’s no easy, single answer here.
In most cases you will be fine running multiple simultaneous tests, and extreme interactions are unlikely. Unless you’re testing really important stuff (e.g. something that impacts your business model, future of the company), the benefits of testing volume will most likely outweigh the noise in your data and occasional false positives.
If based on your assessment there’s a high risk of interaction between multiple tests, reduce the number of simultaneous tests and/or let the tests run longer for improved accuracy.