About Me

My photo
An experienced Data Professional with experience in Data Science and Data Engineering interested in the intersection of Machine Learning and Engineering.

Most Recent Post

2022-02-18

Anti-Patterns in Experimentation

Anti-Patterns in Experimentation

Running experiments on websites involves showing 2 or more different experiences to similar but distinct groups of users and measuring the differences to determine which experience works best for the website. There are a variety of methods of setting up and running experiments and industry standards are unfortunately not yet clear.

Often experiments on websites start with a single developer putting in a single "if" statement and experimentation only grows from here.

The intent of this post is to highlight a few anti-patterns I've seen in the hopes that the next team implementing an experiment might avoid some of of the common pitfalls.

Here are the 3 most common anti-patterns in AB Testing that I've seen,

  1. Experimentation is a Library
  2. All Users are in all Experiments
  3. A pile of data dropped on an Analyst

Experimentation

At it's core experimentation is

  • Giving different experiences to different groups of users
  • Measuring differences
  • Moving forward with the experience that provides the best outcome

An experiment has to be implemented in a website, app or within the sending of emails. There also must be tracking of some sort to measure which experience each user was shown.

An Experiment Assignment is defined as a single user's experience and should include,

  • User id
  • Experiment Name
  • Variation Name
    • Which variation of the website did this user experience? For example, it could be the "red_button" or the "blue_button"
  • Timestamp
    • Ideally the first time the user encountered the experience.

Anti-Patterns

An anti-pattern is a pattern of code development or structure that seems like a good idea, but ultimately turns out poorly. When discussing software systems sometimes it's just as important to discuss best practices as to point out how things can go wrong.

Experimentation is a Library

Experimentation is a library! Call a function to split the users. Send the Experiment Assignments downstream for an analyst to use. No problem.

But wait, for the analyst, what does the data mean? how is it produced? where did it come from? why are we running this experiment at all?

All these questions should come out before the experiment is run to allow for methods and data to be aligned to generate the best outcome in an optimal manner.

Experimentation is a Process from hypothesis to development, analysis and cleanup and typically looks like,

  1. An experimenter hypothesizes that a new experience would be better by some measure
  2. An engineer develops the new experience to be able to run in parallel to the existing experience
  3. Users enter and interact with the 2+ experiences
  4. An analyst evaluates and compares the two experiences quantitatively
  5. An engineer codes the website to exclusively display the experience that is quantitatively better

Experimentation is a House of Cards that can return unusable or inaccurate results if any particular piece of this process fails to properly coordinate with the other pieces.

  1. The experimenter must have a reasonable hypothesis that can be quantitatively evaluate
  2. The engineer must properly, randomly split the users and present 2+ high quality experiences
  3. Users must be given time to enter and interact with the 2+ experiences
  4. The analyst must understand the data production and apply a rigorous analysis
  5. The engineer must prioritize and properly cleanup the codebase to guarantee that the best experience is shown to all users from now on

Any step can have minor bugs, but if there is a disconnect or any major bugs the whole system will topple. The experiment may produce incorrect results without anyone ever knowing that the results were incorrect.

Experimentation is NOT a library. It's a system, it's a process, it's a puzzle and the pieces must fit properly to provide high quality results.

All Users are in all Experiments

Every user that has ever landed on the website or that visits during an experiment gets an Experiment Assignment logged for the analyst to use.

Did they actually see the experiment? Who knows, doesn't matter, the analyst can figure it out.

😳

I have worked with outstanding analysts and am 100% positive they can figure this out and tease apart the raw data into some usable answers.

But how long will this take? How many experiments are we intending to run?

Analysts are busy and analytics takes time. Is it worth it to the business to have an analyst spending a week to produce an analysis for every experiment?

This pattern is easy on the Engineering team, but hard on an Analytics team. They will feel undeserved and unappreciated and will likely be overworked when asked to keep up with a high paced experimentation culture.

It also makes automation really hard as each test's analysis needs this extra bit of meta-data that may or may not capture the nuance of when the user actually saw the experiment.

Best to Devise a System of Data Logging that analysts can easily work with and that can allow for some automation.

A pile of data dropped on an Analyst

The old joke is that when you present a pile of data to a analyst and ask "What does this all mean?" the analyst will inevitably say "Not much". Pulling strong signals from experimentation requires up front planning from analysts. They can catch obvious bugs in the design, can point out metrics that can't be calculated and will recall existing similar experiments that have been run to help catch duplicate work.

Analysts are smart, hard working and resourceful. They will find some signal in whatever dataset they are presented with but it can be hard for an organization to measure and realize that they are spending an enormous amount of time generating conclusions from experimentation.

It's also impossible for an organization to measure the quality of a single experiment analysis. The system as a whole must be healthy for an organization to rely on the results.

Pull Analysts Into the Process Early to give your analysts some heads up and make sure they are part of the experiment process at the first step.

Conclusion

Avoiding some of the anti-patterns highlighted here will allow a team to move briskly into a future full of high quality, trusted experimentation.

Instead of Experimentation is a Library, recognize that Experimentation is a Process involving multiple teams and diverse points of view.

Instead of All Users are in all Experiments, work with analysts to Devise a System of Data Logging with experimentation in mind. Something that in intended for analysts to work with and will produce trusted experimentation results.

Instead of A pile of data dropped on an Analyst, Pull Analysts Into the Process Early. Work with your analysts to produce a system of experimentation easy to work with and trusted to produce quality results.

Go forth and keep trying new ideas that delight your users and help your website grow!

Further reading

I recommend the book "Trustworthy Online Controlled Experiments (A Practical Guide to A/B Testing)" by Ron Kohavi, Diane Tang and Ya Xu.