Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

  • Blog Navigation
[Metrics] A/B Testing, Feature Flipping and going too far

A/B testing is probably not worth your time.  When you start hooking metrics up to your product, the feedback is addictive.  All of a sudden you’ve got lots of actionable data and you’re tacking validation goals onto feature stories.  This is great, but I implore you to not take it too far.

You’ve probably read stories proclaiming how effective A/B testing is for Twitter, 37 Signals and even the Obama campaign.  There’s no shortage of third-party services that boast one-click setup via javascript snippet and claim to deliver a double digit boost to your bottom line.

I talked in my last post about the concept of opportunity cost and it’s with this lens that I view excessive testing and experimentation.  If you are still in growth mode, you’re still figuring out what A is.  There’s too much at stake (and too few developer cycles) to distract yourself with subtle experiments that are ripe for invalidation by small sample sizes, statistical insignificance, and indecision.

“One consequence of this data-driven revolution is that the whole attitude toward writing software, or even imagining it, becomes subtly constrained. A number of developers told me that A/B has probably reduced the number of big, dramatic changes to their products. They now think of wholesale revisions as simply too risky—instead, they want to break every idea up into smaller pieces, with each piece tested and then gradually, tentatively phased into the traffic.” — The A/B Test (Wired)

Sounds like the agile software development process, right?  The difference here is that you gain efficiency and transparency by splitting feature work into atomic units of customer value.  You risk building broken software when you split features into chores that aren’t customer-focused; similarly, you risk building a broken product when you try to subcompose the UX into lots of trivial tests.

I say all this because I’ve employed A/B testing in a couple of startups and we never got our bang for the buck.  At one, we used Optimizely but found the integration points to be lacking[1] when we wanted to focus on anything embedded into our app experience.  Landing pages were easy enough to test but acquisition is only one of your challenges.

We then moved to A/Bingo, a framework written by the amazing Patrick McKenzie [2].  This felt like a framework we could grow into, but we were also moving from server- to client-side functionality and we had to shoehorn the testing payloads into a homegrown api.  The result was way too much time invested into infrastructure and not enough time delivering more customer value.  It still kills me to think about the time we devoted to just getting a great new feature to the starting line.

I then joined a startup that had rolled their own A/B testing for life-cycle and transactional emails.  I didn’t even realize this was going on until we started adding KISSmetrics tracking to the emails.  What was the result of all of this wonderful testing?  It turned out that we weren’t storing any of the results, and had been sending only one variant for the last year.  Whoops!

We did have some success with a feature flipper powered by Rollout.  A feature flipper lets you enable functionality for specific customers or a controlled subset of your audience.  We weren’t using it ambitiously, but it was helpful to have the plumbing in place to deliver new features.  I was eager to give it a try, but any changes we wanted to deploy were largely tested and validated before we started building.  Perhaps we should’ve tried turning off features that we suspected weren’t valuable, but we never got around to it (limited cycles and all).

I look forward to the day I can enthusiastically get behind A/B testing, but until that day I will encourage anyone that asks what else they could do with their time.

Is A/B testing worth it for you?  Do you have any horror stories to share?

[1] Caveat emptor: I haven’t used Optimizely in a few years so I’m not an expert on their current functionality

[2] A/B testing is definitely worth the time to Patrick (because he has found his product/market fit).  I encourage you to read through everything he’s written if you’re building a SaaS app.

  • John Barker

    I also wonder, if as a startup you look too early to let user’s dictate how your product works you risk watering down your value proposition. User’s might prefer something because of familiarity rather than because of its effectiveness, if you follow this feedback to too extreme of a conclusion you end up with a product just like everyone else’s.

  • In practice we had to roll our own feature flipper (it’s not a lot of work), the use case being:

    – define an experimental feature
    – enable / disable it globally
    – enable exp. feature to hand-picked users that have been personally contacted for feedback

    The libraries I’ve found are focused on more on a “Google” use case – rolling out features in batches and percentages.

Share This