Have you ever come home from the supermarket to discover one of the apples you bought is rotten? It's likely your trust for that grocer was diminished, or you might stop buying that particular brand of apples altogether.

In this episode, we discuss how the quality controls in a production line need to use smart sampling methods in order to avoid sending bad products to the customer, which could ruin the reputation of both the brand and seller.

To do this we describe a thought experiment called Apple Tasting. This allows us to demonstrate the concepts of regret and reward in a sampling process, giving rise to the use of Contextual Bandit Algorithms.  Contextual Bandits come from the field of Reinforcement Learning which is a form of Machine Learning where an agent performs an action and tries to maximise the cumulative reward from its environment over time. Standard bandit algorithms  simply choose between a number of actions and measure the reward in order to determine the average reward of each action. But a Contextual Bandit also uses information from its environment to inform both the likely reward and regret of subsequent actions. This is particularly useful in personalised product recommendation engines where the bandit algorithm is given some contextual information about the user.

Back to Apple Tasting and product quality control. The contextual bandit in this scenario, consumes a signal from a benign test that is indicative, but not conclusive, of there being a fault and then makes the decision to perform a more in-depth test or not. So the answer for when you should discard or test your product depends on the relative costs of making the right decision (reward) or wrong decision (regret) and how your experience of the environment affected these in the past.

We speak with Prof. David Leslie about how this logic can be applied to any manufacturing pipeline where there is a downside risk of not quality checking the product but a cost in a false positive detection of a bad product.

Other areas of application include:

Anomalous behaviour in a jet engine e.g. low fuel efficiency, which could be nothing or could be serious, so it might be worth taking the plane in for repair.Changepoints in network data time series - does it mean there’s a fault on the line or does it mean the next series of The Queen’s Gambit has just been released? Should we send an engineer out?

With interview guest David Leslie, Professor of Statistical Learning in the Department of Mathematics and Statistics at Lancaster University.

Further Reading

Publication list for Prof. David Leslie (http://bitly.ws/bQ4a via Lancaster University)Paper on "Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty" in Journal of the ORS (http://bitly.ws/bQ3X via Lancaster University)Paper on "Apple tasting" (http://bitly.ws/bQeW via ScienceDirect)Paper by Google Inc. on "AutoML for Contextual Bandits" (https://arxiv.org/abs/1909.03212 via arXiv)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 5 Feb 2021
Interview date: 4 Dec 2020