776 research outputs found

    An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

    Full text link
    We study the problem of learning to rank from pairwise preferences, and solve a long-standing open problem that has led to development of many heuristics but no provable results for our particular problem. Given a set VV of nn elements, we wish to linearly order them given pairwise preference labels. A pairwise preference label is obtained as a response, typically from a human, to the question "which if preferred, u or v?fortwoelements for two elements u,v\in V.Weassumepossiblenontransitivityparadoxeswhichmayarisenaturallyduetohumanmistakesorirrationality.Thegoalistolinearlyordertheelementsfromthemostpreferredtotheleastpreferred,whiledisagreeingwithasfewpairwisepreferencelabelsaspossible.Ourperformanceismeasuredbytwoparameters:Thelossandthequerycomplexity(numberofpairwisepreferencelabelsweobtain).Thisisatypicallearningproblem,withtheexceptionthatthespacefromwhichthepairwisepreferencesisdrawnisfinite,consistingof. We assume possible non-transitivity paradoxes which may arise naturally due to human mistakes or irrationality. The goal is to linearly order the elements from the most preferred to the least preferred, while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The loss and the query complexity (number of pairwise preference labels we obtain). This is a typical learning problem, with the exception that the space from which the pairwise preferences is drawn is finite, consisting of {n\choose 2}$ possibilities only. We present an active learning algorithm for this problem, with query bounds significantly beating general (non active) bounds for the same error guarantee, while almost achieving the information theoretical lower bound. Our main construct is a decomposition of the input s.t. (i) each block incurs high loss at optimum, and (ii) the optimal solution respecting the decomposition is not much worse than the true opt. The decomposition is done by adapting a recent result by Kenyon and Schudy for a related combinatorial optimization problem to the query efficient setting. We thus settle an open problem posed by learning-to-rank theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we show how to use it in concert with an SVM relaxation.Comment: Fixed a tiny error in theorem 3.1 statemen

    Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

    Full text link
    Given a set VV of nn objects, an online ranking system outputs at each time step a full ranking of the set, observes a feedback of some form and suffers a loss. We study the setting in which the (adversarial) feedback is an element in VV, and the loss is the position (0th, 1st, 2nd...) of the item in the outputted ranking. More generally, we study a setting in which the feedback is a subset UU of at most kk elements in VV, and the loss is the sum of the positions of those elements. We present an algorithm of expected regret O(n3/2Tk)O(n^{3/2}\sqrt{Tk}) over a time horizon of TT steps with respect to the best single ranking in hindsight. This improves previous algorithms and analyses either by a factor of either Ω(k)\Omega(\sqrt{k}), a factor of Ω(logn)\Omega(\sqrt{\log n}) or by improving running time from quadratic to O(nlogn)O(n\log n) per round. We also prove a matching lower bound. Our techniques also imply an improved regret bound for online rank aggregation over the Spearman correlation measure, and to other more complex ranking loss functions

    Reducing Dueling Bandits to Cardinal Bandits

    Full text link
    We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms
    corecore