AB Testing at ezCater Part 2: Tracking Experiments & The Exposure Event

What events should I fire to track an experiment?

A fabulous question and one with a number of different answers. During my travels I've found that there are 4 basic ways to try and tackle the problem

  1. One user property per experimentScreen Shot 2017-04-05 at 2.28.26 PM.png
  2. One "experiments" user property with many valuesScreen Shot 2017-04-05 at 2.28.31 PM.png
  3. A "super property" on regular eventsScreen Shot 2017-04-05 at 2.28.46 PM.png
  4. An exposure eventScreen Shot 2017-04-05 at 2.28.36 PM.png

So how do we choose? Well I think the right way to look at this is to go backwards. Rather than start with the events, let’s start with the results we want to analyze. What are we looking to get to measure whether these experiments were successful? I'd say that the platonic ideal of experiment results is something like:

Experiment

Arm

Exposures

Conversions

Ez74_new_header

Control

10011

1031

Ez74_new_header

ez74-on

10090

1100

Ez75_big_food_picture

Control

8000

873

Ez75_big_food_picture

ez75-on

8005

888

Let's say we accept that as the goal, what's the SQL query that you would want to write in order to get those results? I'd say it would be something like:
 

What are we doing? Well, we get all the exposure events, then see if there's any conversions for them, and group em up.

Problems with setting it on the user

Setting on the user sounds nice and the tools generally seem to suggest that's what you ought do do, but things get weird quickly. If you go with method 1, a property per experiment you'll quickly explode the number of columns on your user table (afaik the solution here is generally 'call your account rep to delete old user properties').  If you go with method 2, well, now you've got structured data within a property value and YMMV about how you can query that. Best case you'll end up having to do some weird "contains" logic each time you want to query (or worse having to regex your way to answers in SQL).

But it gets worse! Say you do any gradual turning up of the percentage of people you're exposing. Well, then it gets very confusing very quickly because you can't see when users changed their percentage.

The final pain with putting it on the user is that now you always need to join/query/update schemas for two tables in Redshift. Not the worst thing in the world, but it does mean 2 critical data warehousing syncs to monitor.

Problems with setting it on the event

The problems with putting it on an event are similar to putting them on the user, but in my experience it's even tougher to get a grasp of what experiments have been run. Each event is running around with a ton of properties on it, but in order to see whether they're currently in the experiment you have to now look at the most recent events. 

Solution: Firing an "exposure" event

So what's the answer? Let's just fire a new event when we expose a user. This solves all the issues mentioned above. We can clearly search for all experiment events, and we only need three columns (see part 1 for thoughts on experiment schema design) for as many experiments as we want to run:

  • event_category: 'exposure' this just lets us filter out 
  • event_test: ez74_new_header
  • event_result: Control or ez74-on  or Big, Bigger, Biggest  

And voila! If we simply want to track conversion rates two arms of an experiment we can stop now, but if we want to look at our conversion rates over time or look at retention there's one more subtle problem to conquer.

Analyzing experiments over time

Experiments are the most volatile part of your codebase. I know we're supposed to 'not look' at the results of the experiment until they've reached significance, but let's be honest: we all peek. But it's for a good reason I swear! Experiments break all the time. If we simply look at the big picture conversion change it's easy to miss what's really happening.

So what do we do? We add the exposure date to our SQL or choose conversion funnel over time in Amplitude. However, if we simply break down the exposure conversion funnel by when we've exposed people we're almost certainly going to have 'exposed' people many times. If the experiment is for a 'ez74_new_header' then likely we've exposed them every single time we've rendered the header.

The big problem with this is that if people enter our experiments multiple times, then they'll appear on multiple days. Amplitude and MixPanel differ in how they deal with these situations but there are major gotchas with both of them that in my experience make analysis quite error prone.

Screen Shot 2017-03-27 at 3.47.52 PM.png

MixPanel and Amplitude each treat this situation subtly different depending on which report you're running and what time window you're looking at. But if you want "each person to be in one and only one bucket and that bucket corresponds with the first time they were exposed" you're going to have a hard time doing that. You can stretch your SQL skills of course to find only the first exposure, but if you do that then your Amplitude and Redshift are always going to give you different numbers and that's nobodies idea of a good time.

Once and only once

The easiest most sane fix in my opinion is to only fire the first exposure event for each user-experiment-experiment-result. By only firing the first one, we no longer need to do some advanced SQL hackery to get the "first event" and we can simply group by. How do we go about only firing one event? Well we need a simple service that will tell us when we're already done something. You can outsource this to something like Ratelim.it or build a simple table that tracks when you've fired an event and returns false after the first time.

 

Your resulting tracking code should look something like this.

With some basic deduplication in place our analysis is now easy as Boston Cream Pie. Did I mention we're hiring and we have a cupcake policy?

01_WMT_10801_BakingCntr_C_POV_370x270_BostonCreamPie_US_ENG_01.jpg

Tags: experiments

Work With Us

We’re always looking for highly skilled full stack engineers to help execute our technology goals while riding this rocket ship of growth. Our people are terrifically talented straight-shooters who love what they do. And everyone is generous and kind, too — we have a strict no jerk policy.

View ezCater Opportunities

About ezCater

We're the #1 online – and the only nationwide – marketplace for business catering in the United States. We make it easy to order food online for your office. From routine office lunches to offsite client meetings, from 5 to 2,000 people, we have a solution for you. ezCater connects business people with over 50,000 reliable local caterers and restaurants across the U.S.

ezCater is hiring!

We’re always looking for highly skilled full stack and iOS engineers to help execute our technology goals while riding this rocket ship of growth. Our people are terrifically talented straight-shooters who love what they do. And everyone is generous and kind, too — we have a strict no jerk policy.

View ezCater Opportunities

Recent Posts