Adaptive submodular maximization in bandit setting

Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to journalConference article

    Abstract

    Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem.

    Original languageEnglish (US)
    JournalAdvances in Neural Information Processing Systems
    StatePublished - Jan 1 2013
    Event27th Annual Conference on Neural Information Processing Systems, NIPS 2013 - Lake Tahoe, NV, United States
    Duration: Dec 5 2013Dec 10 2013

    Fingerprint

    Artificial intelligence
    Learning systems
    Uncertainty

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Signal Processing

    Cite this

    Gabillon, V., Kveton, B., Wen, Z., Eriksson, B., & Muthukrishnan, S. (2013). Adaptive submodular maximization in bandit setting. Advances in Neural Information Processing Systems.

    Adaptive submodular maximization in bandit setting. / Gabillon, Victor; Kveton, Branislav; Wen, Zheng; Eriksson, Brian; Muthukrishnan, Shanmugavelayutham.

    In: Advances in Neural Information Processing Systems, 01.01.2013.

    Research output: Contribution to journalConference article

    Gabillon, V, Kveton, B, Wen, Z, Eriksson, B & Muthukrishnan, S 2013, 'Adaptive submodular maximization in bandit setting', Advances in Neural Information Processing Systems.
    Gabillon V, Kveton B, Wen Z, Eriksson B, Muthukrishnan S. Adaptive submodular maximization in bandit setting. Advances in Neural Information Processing Systems. 2013 Jan 1.
    Gabillon, Victor ; Kveton, Branislav ; Wen, Zheng ; Eriksson, Brian ; Muthukrishnan, Shanmugavelayutham. / Adaptive submodular maximization in bandit setting. In: Advances in Neural Information Processing Systems. 2013.
    @article{08fb6e56567145dd9c7a49aa934c4cdd,
    title = "Adaptive submodular maximization in bandit setting",
    abstract = "Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem.",
    author = "Victor Gabillon and Branislav Kveton and Zheng Wen and Brian Eriksson and Shanmugavelayutham Muthukrishnan",
    year = "2013",
    month = "1",
    day = "1",
    language = "English (US)",
    journal = "Advances in Neural Information Processing Systems",
    issn = "1049-5258",

    }

    TY - JOUR

    T1 - Adaptive submodular maximization in bandit setting

    AU - Gabillon, Victor

    AU - Kveton, Branislav

    AU - Wen, Zheng

    AU - Eriksson, Brian

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2013/1/1

    Y1 - 2013/1/1

    N2 - Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem.

    AB - Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem.

    UR - http://www.scopus.com/inward/record.url?scp=84898988656&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84898988656&partnerID=8YFLogxK

    M3 - Conference article

    AN - SCOPUS:84898988656

    JO - Advances in Neural Information Processing Systems

    JF - Advances in Neural Information Processing Systems

    SN - 1049-5258

    ER -