Optimizing a Low Traffic Site for a 13.5% Uplift [Case Study]

Optimizing a Low Traffic Site for a 13.5% Uplift [Case Study]

What do you do when you have a low traffic site? Can you still run A/B tests? Well, it depends what you qualify as a low-traffic site. You need to have some volume, of course. And there’s higher risk involved without the surplus of data.

But sometimes you can still test.

While it’s easier to test when you have a million visitors a month and a thousand sales a day, that’s a luxury not a lot of small businesses and startups experience. But of course, we all want to optimize our sites and make data driven decisions.

So how is it possible?

Optimizing a Site with Low Traffic

First off, if your traffic is too low to test, you don’t need to run tests to optimize your site. You can do many other things such as:

  • Heuristic analysis
  • Remote user testing
  • Customer interviews and focus groups
  • Five Second Tests

You can also invest in traffic acquisition (SEO and CRO work well together). If you do all these things, you can certainly make valid gains.

But if you have a certain level of traffic (this level is different depending on a few factors) you can run A/B tests too – though that’s at the top of the low-traffic optimization pyramid (meaning you have to cover the bases first).

You just need to get really big wins, which means you probably won’t be testing button colors (or shouldn’t be at least).

Even if you’re not Amazon, if you do some conversion research and test big things, you can still take part in optimization.

Case Study: How Bob & Lush Got a 13.5% Lift

Meet Bob & Lush:

Screen Shot 2016-03-08 at 10.35.06 AM

Bob & Lush is a premium dog food supplier in the UK, with an e-commerce site active in several countries in Europe. They pride themselves in using 100% fresh meat and real vegetables. They sell the highest quality dog food without any fillers, sweeteners or artificial preservatives for dog owners that truly care about their dogs.

After conducting conversion research (using the ResearchXL framework) and running several treatments on their site, we achieved a stable conversion rate uplift. Below we’ll introduce a treatment run on their shopping cart page that resulted in a conversion rate uplift of 13.5%.

First, here’s the initial shopping cart page:

Screen Shot 2016-03-07 at 5.04.30 PM

Conversion Research and Finding The Problems

Upfront, I’ll let you know we tied three things into our treatment:

  1. Social proof
  2. Benefits-driven copy
  3. Payment information trust and clarity

We arrived at these solutions by means of multiple forms of conversion research as well as heuristic analysis.

1. Adding Social Proof to the Checkout

If you head over to the Bob & Lush homepage, you’ll notice some prominent social proof in the form of customer reviews:

Screen Shot 2016-03-08 at 1.43.08 PM

This was a test we’d run previously on their site, and it won big. We used Reevoo, which is not only a big name and well trusted in the UK, but allows for live reviews (instead of static ones). Generally, these reviews are perceived as more trustworthy since anyone can simply submit a review (as opposed to having the company self-select flowery reviews).

Since this worked so well on the homepage, we decided to test it further down the funnel and add social proof to the checkout as well. Reevoo also opens the reviews in a different tab as well so it doesn’t interrupt the checkout flow:

Screen Shot 2016-03-09 at 3.51.40 PM

2. Clearly Listing Product Benefits

Because Bob & Lush is a premium brand, they aren’t really competing on price. Therefore, we needed to make the benefits of purchasing with them very clear.

Through user testing and our customer surveys, we were able to uncover some of the most common reasons people love purchasing from Bob & Lush as well as the common doubts and objections they have.

First, we extracted some of the exact language people used on the copy. People said they loved the company because it was “100% natural, no rubbish,” so we injected that VoC right in the copy.

Second, 4 out of 4 user testers were frustrated at the lack of clear shipping and delivery information. They didn’t know how much it costed or when it would arrive. So we added some prominent information that featured one of the site’s biggest value offers, “free next day delivery.”

All user testers were impressed with the pouch of free treats. It was a positive surprise for most of them.

3. Clarity and Trust in Payment Icons

Previously, the site’s CTA was a simple proceed to checkout. Through heuristic analysis, though, we added a few ‘best practices’ that boosted trust and clarity in the checkout process.

For one, we added icons of all the types of payment the site accepts. This removes the doubt of, “do they accept AMEX?”

Next, we added a Sage Pay icon. Sage Pay is a highly recognizable icon in the UK – actually it’s the biggest independent payment service provider in Europe – and therefore it embodies trust.

Our Hypotheses

So we tied together the three solutions we uncovered into one treatment with the following hypothesis:

Adding benefits, social proof, and payment methods to the shopping cart page will improve the perceived value of the products and lead to more purchases. Social proof ensures people that the dog food is actually good, benefits remind people of all that they will be getting extra, and Sage Pay plus credit card info improves trust/clarity and draws more attention to the “Proceed to checkout” CTA.

The Treatment and Results

Well, the article headline gave away the results, but check out the variation:

Screen Shot 2016-03-07 at 5.04.39 PM

And here’s the test result:

results1

After multiple business cycles and more than 1300 visitors tested, we concluded that Variation #1 converts 13.5% better than the original page. Other goals are also positive in the same direction. Revenue improvement is 79% statistically significant and shows an AOV uplift of 15.3% from 25.83 pounds to 29.79 pounds.

Since the revenue improvement is only at 79% significance, I must mention that there’s a chance there is no real difference in revenue improvement, but it’s almost certain the revenue isn’t going to decrease. Because we had a statistically valid result in conversion rate improvement, we see it trending in a positive direction, so with a lower traffic amount it’s worth taking the risk. At best, we get a significant revenue improvement, at worst there’s no real difference in revenue.

Looking at the conversion rate throughout the testing period then we can see a stable uplift that also reassures us of the treatment validity:

Screen Shot 2016-03-07 at 5.04.54 PM

Things To Keep in Mind With Low Traffic Sites

While we’re confident in our results here, when you’re doing testing on low traffic sites there are some specific concerns you need to keep in mind.

One of them, something I covered briefly, is achieving valid results with a low sample size. This is solved mostly by testing things that will likely result in more drastic effect sizes. Here’s an illustration with Optimizely’s sample size calculator:

Screen Shot 2016-03-08 at 2.22.31 PM

You can see that for a site with a 2% baseline conversion rate, you’d need 800,000 unique visitors to see a 5% minimum detectable effect. However, when you bump that up to 50%, you need significantly fewer visitors:

Screen Shot 2016-03-08 at 2.22.41 PM

Another thing to worry about is sample pollution. If you run a test for too long, people will delete their cookies, and if they revisit your site will then be randomly assigned to a test bucket. It happens even in shorter test, but becomes a problem when you run a test for too long (read our article on sample pollution to learn more).

Make Sure Your Data is Accurate

The first tier in our optimization pyramid is that you’re accurately tracking everything on your site. When it comes time to pick things to test, you’ll want to have an accurate source of data to feed you insights.

So start out by implementing an analytics package (and doing an analytics health check). Begin a mouse tracking campaign. Look into getting some user testing data. Organize your data in a way that is actionable when it comes to prioritizing test ideas.

Remove Outliers

With low samples comes the increased ability for outliers to ruin your day.

Image Source
Image Source

Especially during the holidays or other external factors that may influence purchasing behavior, it’s important to track and remove outliers from your data. Kevin Hillstrom from Mine That Data suggests taking the top 5% or top 1% (depending on the business) of orders and changes the value (e.g. $29,000 to $800). As he says, “you are allowed to adjust outliers.”

Shoot For The Moon

Finally, as I showed in our case study, you can’t test minuscule changes. We had to wrap together 3 different hypotheses, and we saw results.

There’s palpable disillusionment when a startup founder reads a case study about a fellow startup getting a conversion lift of 200%+ (which was most likely a false positive), and then later fails to even reach a significant result. Well, it turns out many of them are testing tiny changes.

Long story short, if you have low traffic, you have to test big stuff.

Shouldn’t We Test Just One Thing at a Time?

It’s an A/B testing best practice to change only one element per test. This way, you can learn about which elements actually matter, and we can attribute uplifts to specific treatments. If we bundle multiple changes together, you can’t accurately pinpoint which element caused how much improvement.

However, testing one element at a time is a luxury for high traffic websites. The smaller the change, the smaller typically the impact is. For a website making millions of dollars per month a 1% relative improvement might be worth a lot, but for a small site it’s not. So they need to swing for bigger improvements.

The question you should ask is “does this change fundamentally change user behavior?” Small changes do not.

The other factor here is that with bigger impact, e.g. +50%, you need less sample size. Low traffic websites will probably not be able to have enough sample size to detect small improvements, like say 5%.

So the question is, if we can increase sales through bundling tests, is it worth losing the learning factor? Yes. After all, the goal of optimization is to make more money, not to make science. [Tweet It!]

Conclusion

Optimizing a low traffic site isn’t easy, but it’s possible.

In the case of Bob & Lush, we had been working on their site for a while before so we had accumulated a lot of knowledge of what works and what doesn’t with their customer base. One thing that brought strong results had been social proof, so we thought implementing that throughout the checkout process would be beneficial.

Conversion research then suggested that clear product benefits and clear/trustworthy payment icons would be a positive influence.

All of this combined lead us to a statistically valid uplift.

Feature image source

Join the Conversation Add Your Comment

  1. Great article with some nice takeaways. Thanks, Alex!

    1. Alex Birkett

      Thanks for the comment, glad you liked it!

  2. You didn’t show the number of conversions per variation in this case study. Was it more than 250 per variation? (1000 visitors, then it should be ca 50% conversion rate then)

    Peep has mentioned a couple of times that you need at least 250 conversions per variation to start testing – it’s far from significance if it’s less. Check these comments, articles & video below:

    1.http://prntscr.com/adnz8x
    2. Peep’s presentation on YouTube (it’s the right timing): https://youtu.be/Lzg_ImKL7LU?t=9m3s

    So, was that test significant or not? :)

    Of course, conversion optimization is about making more money but in my opinion this case study needs a bit more clarification. :).

    1. Peep Laja

      You bring up valid points here, but have to keep in mind a few things:

      1) There are no magic numbers – this is science, not magic; 250 conversions per variation is only a handy ballpark. Here’s how you figure it out http://conversionxl.com/stopping-ab-tests-how-many-conversions-do-i-need/
      2) The test had enough sample size as per Optimizely Stats Engine sample size calculator https://www.optimizely.com/resources/sample-size-calculator/
      3) The current test ran for over a month, and the discrepancy between variations was consistent as you can see in the graph provided (no flips in the middle of the test)

      And while indeed the absolute amount of conversions was below my go-to minimum threshold, you do with what you can when you have a low-traffic site on your hands.

      Note: Alas we don’t have the client’s permission to disclose full numbers.

  3. Sorry, but given all the CRO truths you communicated over the years, this case study seems like “rubish” in your own words, and here is why:

    – How can you say that 79% significance proves a 13.5% conversion lift? That is far from proving there is a lift at all not to say it proves such a miniature lift. And it’s not “there’s a chance there is no real difference in revenue improvement”, there is a chance it is worse. Isn’t this “confirmation bias” / “cognitive dissonance” / “torturing data till it will confess”?
    – The “13.5% lift” doesn’t go along with the concept of “You just need to get really big wins” at all
    – On the “we can see a stable uplift”. Isn’t this graph an accumulated average lift ? In that case you won’t spot spikes here, it’s the wrong graph. Show stable weekly conversion rates of A vs B in time and I’ll believe. Don’t “averages lie”?

    I really think this case is dragged to conclusions you wanted it to go.
    Please be consistent with what you communicate as this one seems to go into “most CRO content is rubish” bucket.

    1. Hi Edgar. Thanks for taking the time to discuss. Let me see if I can provide some answers here.

      • The “13.5%” and “79%” were actually talking about two different things, not directly related to each other. “13.5% uplift” is talking about the Purchase conversion rate metric lift, while “79% significance” is talking about the Revenue metric lift. You can see from the posted Optimizely chart snapshot http://conversionxl.com/wp-content/uploads/2016/03/results1-1-568×142.jpg that Purhase metric is highlighted green, which means it was statistically significant at conclusion drawing time. Optimizely is usually pre-configured for either 90% or 95% significance detection, although as discussed in numerous articles here, we never make conclusion timing plans based on whether any number highlights green.

      • “Need to get really big wins” – I’m not sure where you’re taking this concept from, at least worded as such, can you perhaps provide a link? We certainly advocate making big and bold changes for low traffic sites, but that’s not the same as to say “you *must* get really big wins”. Nothing or nobody can guarantee a “big” win with anything, anywhere, regardless of traffic being low or high. It’s simply that by creating a bigger change, you will create a higher chance (note: just “higher”, not 100%!) of a more visible uplift effect in either positive or negative direction and thereby have a better idea later which direction to iterate into. No difference result is the worst – because what do you do next? At the end of the day, regardless of the win size, if the business is consistently seeing more money flow into their bank account after the “small” 13.5% win variation is implemented, are you proposing to not implement it because this single win isn’t big enough by some criteria? I’d argue instead that the business should grab any money available lying on the table, regardless of the win size. Obviously the implementation costs should not outright trump the win size, to eliminate the argument against 0.0x% microlifts etc.

      • “Accumulated average lift” – the chart shown at http://conversionxl.com/wp-content/uploads/2016/03/Screen-Shot-2016-03-07-at-5.04.54-PM-1.jpg seems to show exactly what you’re asking for – stable variation-specific conversion rate difference over 8+ weeks time period. There doesn’t seem to be anything accumulated or averaged about it. Let me know if there’s something I’m missing about your question here. I do think the chart is posted at too low of a resolution, making it difficult to inspect and I will see if I can get a higher resolution version posted which shows the timeline more clearly.

    2. Peep Laja

      When is a test “done”? When 3 conditions are met

      • Enough sample size
      • Long enough test duration (multiple business cycles)
      • Statistical significance 95% or higher

      All 3 conditions are met in this case. So hardly bullshit.

      When it comes to “swinging big” – this is what happened here. The test was not about one tiny change, but multiple changes. And for most sites 13% uplift is a nice win. Huge wins like 50% or more only happen for sites that are very unoptimized. This one is not one of them.

  4. Love this post and love your website. I’m so glad that I found this information. My mind will begin soaking up all the strategies that I intend to learn here.

  5. Seems like a hot topic is going on the comment section. But, I enjoy the content anyway.

Comments are closed.

Current article:

Optimizing a Low Traffic Site for a 13.5% Uplift [Case Study]