The multi-fidelity multi-armed bandit

Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, Barnabás Póczos

Research output: Contribution to journalConference article

7 Citations (Scopus)

Abstract

We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.

Original languageEnglish (US)
Pages (from-to)1785-1793
Number of pages9
JournalAdvances in Neural Information Processing Systems
StatePublished - Jan 1 2016
Externally publishedYes
Event30th Annual Conference on Neural Information Processing Systems, NIPS 2016 - Barcelona, Spain
Duration: Dec 5 2016Dec 10 2016

Fingerprint

Marketing
Costs
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

The multi-fidelity multi-armed bandit. / Kandasamy, Kirthevasan; Dasarathy, Gautam; Schneider, Jeff; Póczos, Barnabás.

In: Advances in Neural Information Processing Systems, 01.01.2016, p. 1785-1793.

Research output: Contribution to journalConference article

Kandasamy, Kirthevasan ; Dasarathy, Gautam ; Schneider, Jeff ; Póczos, Barnabás. / The multi-fidelity multi-armed bandit. In: Advances in Neural Information Processing Systems. 2016 ; pp. 1785-1793.
@article{3f3c761a16e242a4a287c2e3df2f0e70,
title = "The multi-fidelity multi-armed bandit",
abstract = "We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.",
author = "Kirthevasan Kandasamy and Gautam Dasarathy and Jeff Schneider and Barnab{\'a}s P{\'o}czos",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
pages = "1785--1793",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - The multi-fidelity multi-armed bandit

AU - Kandasamy, Kirthevasan

AU - Dasarathy, Gautam

AU - Schneider, Jeff

AU - Póczos, Barnabás

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.

AB - We study a variant of the classical stochastic K-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of M fidelities. The highest fidelity (desired outcome) expends cost λ(M). The mth fidelity (an approximation) expends λ(m) < λ(M) and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.

UR - http://www.scopus.com/inward/record.url?scp=85018901939&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018901939&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85018901939

SP - 1785

EP - 1793

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -