The NEW 20 lines of code that will beat A/B testing every time
In 2012, Steve Hanov wrote the popular and controversial blog post “20 lines of code that will beat A/B testing every time” that brought the previously academic idea of multi armed bandit algorithms into the awareness of the larger developer community.
Here were his original 20 (actually 16) lines of “code”:
def choose():
if math.random() < 0.1:
# exploration!
# choose a random lever 10% of the time.
else:
# exploitation!
# for each lever,
# calculate the expectation of reward.
# This is the number of trials of the lever divided by the total reward
# given by that lever.
# choose the lever with the greatest expectation of reward.
# increment the number of times the chosen lever has been played.
# store test data in redis, choice in session key, etc..
def reward(choice, amount):
# add the reward to the total for the given lever.
What he’s describing here is a very simple form of reinforcement learning known as the Epsilon Greedy algorithm. Much of the controversy surrounding this post was that it greatly oversold the simplicity of deploying such an algorithm in production.
Now after years of work and many thousands of lines of code, I’m pleased to present a new “12 lines of code that will beat A/B testing every time” that is robust, scalable, and proven in large scale production:
from improveai import Ranker, RewardTracker
ranker = Ranker(model_url)
tracker = RewardTracker(model_name, track_url)
def choose(variants):
best = ranker.rank(variants)[0]
decision_id = tracker.track(item=best, candidates=variants)
return (best, decision_id)
def reward(amount, decision_id):
model.add_reward(amount, decision_id)
Unlike Steve’s code that envisioned querying simple statistics from something like Redis, Improve AI loads a trained machine learning model locally into the process so that decisions are made immediately with zero network latency.
Improve AI is also a contextual multi-armed bandit, which means it can make use of contextual data such as language, time of day, and screen resolution to make the best decision. This makes it simple to implement recommender systems, personalization, app optimization, and more.
Is it finally time to stop wasting conversions and throw away A/B testing for good?