1,290 research outputs found
Kernel-based information theoretical measures: accelerations and limits
Kernelmethoden bilden die Grundlage für einige der leistungsstärksten und fundiertesten Algorithmen für maschinelles Lernen. Die Eigenschaften, welche Kernelmethoden omnipräsent machen, sind die Anzahl der Domänen, für die sie entwickelt wurden, die Hilbert-Struktur der mit Kerneln verbundenen Funktionsklasse, die ihre statistische Analyse erlaubt, und die Möglichkeit, Wahrscheinlichkeitsmaße als Elemente in einem reproduzierenden Kernel-Hilbert-Raum ohne Informationsverlust und unter sehr milden Annahmen abzubilden. All diese Eigenschaften haben zur Entwicklung zahlreicher kernelbasierter informationstheoretischer Maße geführt, wie zum Beispiel der Maximum Mean Discrepancy (MMD; in der Statistikliteratur auch als ``energy distance\u27\u27 bezeichnet), die den Unterschied zwischen zwei Wahrscheinlichkeitsmaßen quantifiziert; dem Hilbert-Schmidt Independence Criterion (HSIC; in der Statistikliteratur auch als ``distance covariance\u27\u27 bezeichnet), welches die (Un-)abhängigkeit einer Verteilung quantifiziert; oder der Kernel Stein Discrepancy (KSD), die den Unterschied einer Verteilung zu einem gegebenen Ziel quantifiziert. Diese Maße haben zahlreiche Anwendungen gefunden, vor allem bei der Entwicklung von Zweistichproben-, Unabhängigkeits- und Anpassungsgütetests. Die existierenden U- und V-Statistik-basierten Schätzer sind zwar leistungsfähig, haben aber eine Laufzeitkomplexität, die quadratisch mit der Stichprobengröße wächst, was ihre Anwendung auf große Stichproben stark beeinträchtigt. Um dieser schwerwiegenden Einschränkung zu begegnen, leistet diese Dissertation die folgenden Beiträge.
Wir schlagen den ersten beschleunigten Nyström-basierten HSIC-Schätzer vor, der mehr als zwei Zufallsvariablen verarbeiten kann, beweisen seine -Konsistenz und evaluieren seine Leistung auf synthetischen Daten, Abhängigkeitstests von Medienannotationen und dem Finden von kausalen Zusammenhängen. Darüber hinaus zeigen wir, dass die minimax-optimale Rate der HSIC-Schätzung für kontinuierliche, beschränkte, translationsinvariante Kernel auf für Borel-Maße, die die Normalverteilungen enthalten, ist. Damit beantworten wir eine Frage, die seit der Einführung von HSIC vor mehr als Jahren unbeantwortet war. Unser Ergebnis impliziert auch die Minimax-Optimalität der von uns vorgeschlagenen HSIC-Beschleunigung. In Bezug auf KSD schlagen wir ebenfalls eine Nyström-basierte Beschleunigung vor, beweisen ihre -Konsistenz mit einer klassischen sub-Gaußschen Annahme und zeigen mit einer Reihe von Anpassungsgütetest-Benchmarks, dass diese den bisherigen Stand der Forschung übertrifft. Schließlich entwerfen wir eine effiziente Online-Approximation von MMD, die deren Berechnung auf Datenströmen ermöglicht und die Basis für einen leistungsstarken Change Detection Algorithmus bietet. Umfangreiche Experimente zeigen, dass der vorgeschlagene Algorithmus sowohl auf synthetischen als auch auf realen Daten eine herausragende Leistung erzielt.
Insgesamt leistet diese Dissertation einen wissenschaftlichen Beitrag, indem sie beschleunigte Schätzer für kernelbasierte informationstheoretische Maße vorstellt und Werkzeuge für deren Analyse einführt. Unsere theoretischen und experimentellen Ergebnisse zeigen die hervorragenden Eigenschaften dieser Schätzer. Sämtlicher Code für die Replikation der Experimente ist frei verfügbar
Robust partial-label learning by leveraging class activation values
Real-world training data is often noisy; for example, human annotators assign conflict-
ing class labels to the same instances. Partial-label learning (PLL) is a weakly supervised
learning paradigm that allows training classifiers in this context without manual data cleaning. While state-of-the-art methods have good predictive performance, their predictions are sensitive to high noise levels, out-of-distribution data, and adversarial perturbations. We propose a novel PLL method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network’s class activation values. Thereby, we effectively incorporate prior knowledge about the class labels by using a novel label weight re-distribution strategy that we prove to be optimal. We empirically show that our method yields more robust predictions in terms of predictive performance under high PLL noise levels, handling out-of-distribution examples, and handling adver-sarial perturbations on the test instances
Robust Partial-Label Learning by Leveraging Class Activation Values
Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this context without manual data cleaning. While state-of-the-art methods have good predictive performance, their predictions are sensitive to high noise levels, out-of-distribution data, and adversarial perturbations. We propose a novel PLL method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network\u27s class activation values. Thereby, we effectively incorporate prior knowledge about the class labels by using a novel label weight re-distribution strategy that we prove to be optimal. We empirically show that our method yields more robust predictions in terms of predictive performance under high PLL noise levels, handling out-of-distribution examples, and handling adversarial perturbations on the test instances
Partial-Label Learning with Conformal Candidate Cleaning
Real-world data is often ambiguous; for example, human annotation produces instances with multiple conflicting class labels. Partial-label learning (PLL) aims at training a classifier in this challenging setting, where each instance is associated with
a set of candidate labels and one correct, but unknown, class label. A multitude of algorithms targeting this setting exists and, to enhance their prediction quality, several extensions that are applicable across a wide range of PLL methods have
been introduced. While many of these extensions rely on heuristics, this article proposes a novel enhancing method that incrementally prunes candidate sets using conformal prediction. To work around the missing labeled validation set, which is typically required for conformal prediction, we propose a strategy that alternates between training a PLL classifier to label the validation set, leveraging these predicted class labels for calibration, and pruning candidate labels that are not part of the resulting conformal sets. In this sense, our method alternates between empirical risk minimization and candidate set pruning. We establish that our pruning method preserves the conformal validity with respect to the unknown ground truth.
Our extensive experiments on artificial and realworld data show that the proposed approach significantly improves the test set accuracies of several state-of-the-art PLL classifiers
The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels
Kernel techniques are among the most influential approaches in data science
and statistics. Under mild conditions, the reproducing kernel Hilbert space
associated to a kernel is capable of encoding the independence of
random variables. Probably the most widespread independence measure relying on
kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also
referred to as distance covariance in the statistics literature). Despite
various existing HSIC estimators designed since its introduction close to two
decades ago, the fundamental question of the rate at which HSIC can be
estimated is still open. In this work, we prove that the minimax optimal rate
of HSIC estimation on for Borel measures containing the Gaussians
with continuous bounded translation-invariant characteristic kernels is
. Specifically, our result implies the
optimality in the minimax sense of many of the most-frequently used estimators
(including the U-statistic, the V-statistic, and the Nystr\"om-based one) on
The minimax rate of HSIC estimation for translation-invariant kernels
Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of M ≥ 2 random variables. Probably the most widespread independence measure relying on kernels is the socalled Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on Rd for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is On−1/2 . Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nyström-based one) on Rd
Optimal Online Change Detection via Random Fourier Features
This article studies the problem of online non-parametric change point detection in multivariate data streams. We approach the problem through the lens of kernel-based two-sample testing and introduce a sequential testing procedure based on random Fourier features, running with logarithmic time complexity per observation and with overall logarithmic space complexity. The algorithm has two advantages compared to the state of the art. First, our approach is genuinely online, and no access to training data known to be from the pre-change distribution is necessary. Second, the algorithm does not require the user to specify a window parameter over which local tests are to be calculated. We prove strong theoretical guarantees on the algorithm\u27s performance, including information-theoretic bounds demonstrating that the detection delay is optimal in the minimax sense. Numerical studies on real and synthetic data show that our algorithm is competitive with respect to the state of the art
Partial-Label Learning with a Reject Option
In real-world applications, one often encounters ambiguously labeled data,
where different annotators assign conflicting class labels. Partial-label
learning allows training classifiers in this weakly supervised setting, where
state-of-the-art methods already show good predictive performance. However,
even the best algorithms give incorrect predictions, which can have severe
consequences when they impact actions or decisions. We propose a novel
risk-consistent partial-label learning algorithm with a reject option, that is,
the algorithm can reject unsure predictions. Extensive experiments on
artificial and real-world datasets show that our method provides the best
trade-off between the number and accuracy of non-rejected predictions when
compared to our competitors, which use confidence thresholds for rejecting
unsure predictions instead. When evaluated without the reject option, our
nearest neighbor-based approach also achieves competitive prediction
performance
- …
