Improve AI 8.0 - Contextual Multi-Armed Bandit Platform for Scoring, Ranking & Decisions
We’re thrilled to introduce Improve AI 8.0, a modern, free, production-ready contextual multi-armed bandit platform that quickly scores and ranks items using intuitive reward-based training.
Multi-armed bandits and contextual bandits are corner-stone machine learning algorithms that power a myriad of applications including recommendation systems, personalization, query re-ranking, automated decisions, and multi-variate optimization.
With version 8, we’ve fully delivered on our original vision - providing a high performance, simple to use, low cost contextual multi-armed bandit platform.
Key features of v8.0 include:
- Simplified APIs
- 90% more memory efficient XGBoost models
- The reward tracker & trainer is now free for most uses
- On-device scoring, ranking, and decisions for iOS and Android apps
- Native Swift SDK that can rank or score any Encodable
- Ranked Value Encoding for accurate scoring of String properties
- Compact hash tables for reduced model sizes when encoding large numbers of string values
- Balanced exploration vs exploitation using Thompson Sampling
Simple APIs
With Swift, Python, or Java, create a list of JSON encodable items and simply call Ranker.rank(items).
For instance, in an iOS bedtime story app, you may have a list of Story objects:
struct Story: Codable {
var title: String
var author: String
var pageCount: Int
}
To obtain a ranked list of stories, use just one line of code:
let rankedStories = try Ranker(modelUrl).rank(stories)
The expected best story will be the first element in the ranked list:
let bestStory = rankedStories.first
Simple Training
Easily train your rankers using reinforcement learning.
First, track when an item is used:
let tracker = RewardTracker("stories", trackUrl)
let rewardId = tracker.track(story, from: rankedStories)
Later, if a positive outcome occurs, provide a reward:
if (purchased) {
tracker.addReward(profit, rewardId)
}
Reinforcement learning uses positive rewards for favorable outcomes (a “carrot”) and negative rewards for undesirable outcomes (a “stick”). By assigning rewards based on business metrics, such as revenue or conversions, the system optimizes these metrics over time.
Contextual Ranking & Scoring
Improve AI turns XGBoost into a contextual multi-armed bandit, meaning that context is considered when making ranking or scoring decisions.
Often, the choice of the best variant depends on the context that the decision is made within. Let’s take the example of greetings for different times of the day:
greetings = ["Good Morning",
"Good Afternoon",
"Good Evening",
"Buenos Días",
"Buenas Tardes",
"Buenas Noches"]
rank() also considers the context of each decision. The context can be any JSON-encodable data structure.
ranked = ranker.rank(items=greetings,
context={ "day_time": 12.0,
"language": "en" })
greeting = ranked[0]
Trained with appropriate rewards, Improve AI would learn from scratch which greeting is best for each time of day and language.
XGBoost Model Improvements
Improve AI v8.0 is 90%+ more memory efficient for most use cases. Feature hashing has been replaced with a feature encoding approach that only uses a single feature per item property, substantially improving both training performance as well as ranking / scoring.
Ranked Value Encoding
Ranked Value Encoding is our novel approach to encoding string values in a manner that is extremely space efficient, accurate, and helps approximate Thompson Sampling for balanced exploration vs exploitation. The concept of Ranked Value Encoding is similar to commonly used Target Value Encoding for encoding string or categorical features.
With Target Value Encoding, each string or categorical feature is replaced with the mean of the target values for that string or category. Target Value Encoding tends to provide good results for regression.
However, multi-armed bandits are less concerned with the absolute accuracy of the scores and more concerned with the relative scores between items. Since we don’t need the exact target value, we can simply store the relative ranking of the string values, which saves space in the resulting model, increasing performance and lowering distribution costs.
Compact String Encoding
In conjunction with Ranked Value Encoding, rather than store entire strings, which could be arbitrarily long, Improve AI v8 models only store compact string hashes, resulting in only ~4 bytes per string for typical models.
Proven Performance
Improve AI is a production ready implementation of a contextual multi-armed bandit algorithm, honed through years of iterative development. By merging Thompson Sampling with XGBoost, it provides a learning system that is both fast and flexible. Thompson Sampling maintains equilibrium between exploring novel possibilities and capitalizing on established options, while XGBoost ensures cost-effective, high-performance training for updated models.
Get Started Today
Improve AI is available now for Python, Swift, and Java. Check out the Quick-Start Guide for more information.
Thank you for your efforts to improve the world a little bit today.
Recent Posts
Improve AI 8.0 - Contextual Multi-Armed Bandit Platform for Scoring, Ranking & Decisions
We’re thrilled to introduce Improve AI 8.0, a modern, free, production-ready contextual multi-armed bandit platform that quickly scores and ranks items using...
Simple Re-Ranking of SQL Queries with Machine Learning
In this tutorial, we will demonstrate how to use the Scorer and RewardTracker classes to update a ‘score’ column for a number of products in a ‘products’ tab...
Making Decisions with Ranker
In this tutorial, we’ll walk through how to use the Ranker class to make decisions, specifically choosing a discount to offer from a list of discounts. We’ll...
Optimize Any Python, Swift, or Java Object with Reinforcement Learning
Improve AI is a machine learning platform for making apps self-improving, meaning they optimize their own data structures and variables to improve revenue an...
Self Improving Apps: The Origin of a Radical Idea
I’ve been working on Improve AI for many years, but by 2017 the idea had begun to gel enough that I was ready to talk about it.
Improve AI - Easily Optimize Your App with Reinforcement Learning
Optimize and personalize your apps with fast AI decisions that get smarter over time. Decisions are made immediately, on-device, with no network latency. Th...
The NEW 20 lines of code that will beat A/B testing every time
In 2012, Steve Hanov wrote the popular and controversial blog post “20 lines of code that will beat A/B testing every time” that brought the previously acade...