Learn how to run conversion optimization experiments the right way. In this video, I sit down with Chad Sanderson, Program Manager on the Microsoft Experimentation Platform team, to discuss statistical testing, calculating sample size, and selecting the right tools to help you run statistically significant conversion optimization tests.
Hey guys, I'm sitting here with Chad Sanderson from Microsoft Experiment Platform and we were just chatting about statistics and how people get even simple things wrong like calculating sample sizes. Can you explain? Yeah. So one of the most common errors that I see is people care about a metric like revenue per visitor or average order value but they base their experiment sample size off of conversion rate and that's normally because they find an online calculator that doesn't compute these continuous type of metrics.
The problem that most people don't realize is that sample size depends based on the metric variance. So if the variance is really high if there's really big swings between the lowest point of your data and the highest point the sample size is going to be way higher. So if you build a conversion rate metric where the sample size is lower might be under powering your experiment pretty drastically. So you run a test and let's say you reach a sample size you declare B as a winner and then you were measuring both conversion rate improvement and revenue per visitor improvement. But actually the RPV you can't look at it there's not enough sample size there.
Yeah that's right. You weren't even close to being able to see an impact either way. So if you want to measure RPV then you know how how would you go about it you calculate sample size differently? Well there's actually some pretty simple calculations to do in order to get those continuous metrics you can find them all online. You can just search for a continuous metric sample size calculators or there's also just pretty basic algorithms that'll do it as well. So it may take a little bit of leg work because some things haven't been developed for the marketer yet but you should still go after it and try to find these calculators or methods anyway because it's such a big deal.
But what about calculating sample sizes for other types of tests like testing my Facebook ads or doing email split
testing? Yeah so I think that's kind of a pretty big problem too or at least there is a lot of issues with it. So one thing that some email providers say is that they provide AB testing capabilities but the reality is you can't have a true AB test unless you're performing some statistical tests and the majority of these email providers are actually not performing as statistical tests there are simply randomising visitors into one group or another and then telling you the average and that's not really anything that's just doing a comparison and the other issue with email testing I think is that there's so many variables that may not give you a perfect answer.
For example most emails just go out all at a time over a single day is, are we able to extrapolate from that that this was a winner and even if we do extrapolate from that what value does that have for the next email. So what would the value then be like... Because usually when they do split testing it's like I send out emails to say 10 percent of my email list and find that subject line B is better 10 percent better. Are you saying that it's actually probably not better or I don't know that it's better? So...
I think there's a lot of variables in that equation that are unknown. So like for example let's say that I sent out a subject line B and it was 10 percent better or at least that's what I saw on paper. Well what if I had sent it out on a different day. Would it have still been ten percent better? What if I was actually tracking a different metric was would that somehow roll up and make it more accountable? What if I actually performed statistics on this and saw that well maybe I don't have the sample size to see a true 10 percent difference one way or the other? There's a lot of things that could be more robust about around e-mail testing and e-mail providers I think could do a lot better job in fixing those things.
But sometimes it's not quite enough just to say, Well we're we're doing an AB test and I'm going to take it at the word of a system who's not even performing any statistics that this thing is truly a winner. Have you seen like a tool out there that is that you could use to calculate stats for an e-mail AB test? Yeah I mean the stats are basically the same regardless so if you are calculating conversion rate you can still do it to your traditional online calculators if you're doing a continuous rate then you have to maybe use another method like I was describing earlier.
I think the biggest thing around email testing is people should stop thinking about individual tests because I'm kind of iffy on the value that that adds and instead start thinking around bigger factors over periods of time like for example we ran 50 e-mail experiments and in the vast majority of those it's the e-mails with the longer subject lines that won. That's like an actionable learning that you can then apply Gotcha. to your business. When you do split this for ads. Google Ads, Facebook Ads, you know et cetera same things applies? You know like impressions versus quakes and calculating sample sizes? Yep exactly the same.
Anytime you're doing any type of true AB testing there always has to be some type of statistical test going on. I personally don't trust many providers. Besides the actual email besides the AB testing solutions to deliver that because it's pretty explicit. So if they're not you need to do the legwork to actually figure out how to run this data yourself and maybe question... Ok, number one am I calculating the right metrics? Am I performing the right stats? Am I looking at this for a long enough time? There's a lot of things that can go wrong.
I think it's very easy to just look at two base numbers and say well yeah we have a winner or loser. Because so many people run on some sort of test on ads which makes sense but actually, they don't, I haven't heard that people are doing let's say upfront sample size calculations to figure out when the test is done. You know they're like let's test it. And, there we go... Yeah, exactly. You know one of the issues is a lot of people I think still haven't embraced some of the more scientific learnings that marketers have or CROs have from AB testing which is a very rigorous science so that doesn't exist for most people yet.
It's a slow learning curve and getting into statistics is pretty hard. But you know people will get there. If you want more interviews like this... Subscribe to my channel.
Subscribe to our YouTube Channel