Improve AI 8.0 - Contextual Multi-Armed Bandit Platform for Scoring, Ranking & Decisions

We’re thrilled to introduce Improve AI 8.0, a modern, free, production-ready contextual multi-armed bandit platform that quickly scores and ranks items using intuitive reward-based training.

Multi-armed bandits and contextual bandits are corner-stone machine learning algorithms that power a myriad of applications including recommendation systems, personalization, query re-ranking, automated decisions, and multi-variate optimization.

With version 8, we’ve fully delivered on our original vision - providing a high performance, simple to use, low cost contextual multi-armed bandit platform.

Key features of v8.0 include:

  • Simplified APIs
  • 90% more memory efficient XGBoost models
  • The reward tracker & trainer is now free for most uses
  • On-device scoring, ranking, and decisions for iOS and Android apps
  • Native Swift SDK that can rank or score any Encodable
  • Ranked Value Encoding for accurate scoring of String properties
  • Compact hash tables for reduced model sizes when encoding large numbers of string values
  • Balanced exploration vs exploitation using Thompson Sampling

Simple APIs

With Swift, Python, or Java, create a list of JSON encodable items and simply call Ranker.rank(items).

For instance, in an iOS bedtime story app, you may have a list of Story objects:

struct Story: Codable {
    var title: String
    var author: String
    var pageCount: Int
}

To obtain a ranked list of stories, use just one line of code:

let rankedStories = try Ranker(modelUrl).rank(stories)

The expected best story will be the first element in the ranked list:

let bestStory = rankedStories.first

Simple Training

Easily train your rankers using reinforcement learning.

First, track when an item is used:

let tracker = RewardTracker("stories", trackUrl)
let rewardId = tracker.track(story, from: rankedStories)

Later, if a positive outcome occurs, provide a reward:

if (purchased) {
    tracker.addReward(profit, rewardId)
}

Reinforcement learning uses positive rewards for favorable outcomes (a “carrot”) and negative rewards for undesirable outcomes (a “stick”). By assigning rewards based on business metrics, such as revenue or conversions, the system optimizes these metrics over time.

Contextual Ranking & Scoring

Improve AI turns XGBoost into a contextual multi-armed bandit, meaning that context is considered when making ranking or scoring decisions.

Often, the choice of the best variant depends on the context that the decision is made within. Let’s take the example of greetings for different times of the day:

greetings = ["Good Morning", 
             "Good Afternoon", 
             "Good Evening",
             "Buenos Días",
             "Buenas Tardes",
             "Buenas Noches"]

rank() also considers the context of each decision. The context can be any JSON-encodable data structure.

ranked = ranker.rank(items=greetings, 
                     context={ "day_time": 12.0,
                               "language": "en" })
greeting = ranked[0]

Trained with appropriate rewards, Improve AI would learn from scratch which greeting is best for each time of day and language.

XGBoost Model Improvements

Improve AI v8.0 is 90%+ more memory efficient for most use cases. Feature hashing has been replaced with a feature encoding approach that only uses a single feature per item property, substantially improving both training performance as well as ranking / scoring.

Ranked Value Encoding

Ranked Value Encoding is our novel approach to encoding string values in a manner that is extremely space efficient, accurate, and helps approximate Thompson Sampling for balanced exploration vs exploitation. The concept of Ranked Value Encoding is similar to commonly used Target Value Encoding for encoding string or categorical features.

With Target Value Encoding, each string or categorical feature is replaced with the mean of the target values for that string or category. Target Value Encoding tends to provide good results for regression.

However, multi-armed bandits are less concerned with the absolute accuracy of the scores and more concerned with the relative scores between items. Since we don’t need the exact target value, we can simply store the relative ranking of the string values, which saves space in the resulting model, increasing performance and lowering distribution costs.

Compact String Encoding

In conjunction with Ranked Value Encoding, rather than store entire strings, which could be arbitrarily long, Improve AI v8 models only store compact string hashes, resulting in only ~4 bytes per string for typical models.

Proven Performance

Improve AI is a production ready implementation of a contextual multi-armed bandit algorithm, honed through years of iterative development. By merging Thompson Sampling with XGBoost, it provides a learning system that is both fast and flexible. Thompson Sampling maintains equilibrium between exploring novel possibilities and capitalizing on established options, while XGBoost ensures cost-effective, high-performance training for updated models.

Get Started Today

Improve AI is available now for Python, Swift, and Java. Check out the Quick-Start Guide for more information.

Thank you for your efforts to improve the world a little bit today.

Recent Posts

2 minute read

Making Decisions with Ranker

1 minute read

In this tutorial, we’ll walk through how to use the Ranker class to make decisions, specifically choosing a discount to offer from a list of discounts. We’ll...