In 2012, Steve Hanov wrote the popular and controversial blog post “20 lines of code that will beat A/B testing every time” that brought the previously academic idea of multi armed bandit algorithms into the awareness of the larger developer community.
Here were his original 20 (actually 16) lines of “code”:
def choose(): if math.random() < 0.1: # exploration! # choose a random lever 10% of the time. else: # exploitation! # for each lever, # calculate the expectation of reward. # This is the number of trials of the lever divided by the total reward # given by that lever. # choose the lever with the greatest expectation of reward. # increment the number of times the chosen lever has been played. # store test data in redis, choice in session key, etc.. def reward(choice, amount): # add the reward to the total for the given lever.
What he’s describing here is a very simple form of reinforcement learning known as the Epsilon Greedy algorithm. Much of the controversy surrounding this post was that it greatly oversold the simplicity of deploying such an algorithm in production.
Now after years of work and many thousands of lines of code, I’m pleased to present a new “10 lines of code that will beat A/B testing every time” that is robust, scalable, and proven in large scale production:
from improveai import DecisionModel model = DecisionModel(track_url=track_url) model.load(model_url) def choose(variants): return model.which(variants) def reward(amount, decision_id): model.add_reward(amount, decision_id)
Unlike Steve’s code that envisioned querying simple statistics from something like Redis, Improve AI loads a trained ML decision model locally into the process so that decisions are made immediately with zero network latency. This is just one of the many refinements that have been made over the years to deliver a production ready reinforcement learning system.