Search CORE

1,290 research outputs found

Kernel-based information theoretical measures: accelerations and limits

Author: Kalinke Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 03/06/2025
Field of study

Kernelmethoden bilden die Grundlage für einige der leistungsstärksten und fundiertesten Algorithmen für maschinelles Lernen. Die Eigenschaften, welche Kernelmethoden omnipräsent machen, sind die Anzahl der Domänen, für die sie entwickelt wurden, die Hilbert-Struktur der mit Kerneln verbundenen Funktionsklasse, die ihre statistische Analyse erlaubt, und die Möglichkeit, Wahrscheinlichkeitsmaße als Elemente in einem reproduzierenden Kernel-Hilbert-Raum ohne Informationsverlust und unter sehr milden Annahmen abzubilden. All diese Eigenschaften haben zur Entwicklung zahlreicher kernelbasierter informationstheoretischer Maße geführt, wie zum Beispiel der Maximum Mean Discrepancy (MMD; in der Statistikliteratur auch als ``energy distance\u27\u27 bezeichnet), die den Unterschied zwischen zwei Wahrscheinlichkeitsmaßen quantifiziert; dem Hilbert-Schmidt Independence Criterion (HSIC; in der Statistikliteratur auch als ``distance covariance\u27\u27 bezeichnet), welches die (Un-)abhängigkeit einer Verteilung quantifiziert; oder der Kernel Stein Discrepancy (KSD), die den Unterschied einer Verteilung zu einem gegebenen Ziel quantifiziert. Diese Maße haben zahlreiche Anwendungen gefunden, vor allem bei der Entwicklung von Zweistichproben-, Unabhängigkeits- und Anpassungsgütetests. Die existierenden U- und V-Statistik-basierten Schätzer sind zwar leistungsfähig, haben aber eine Laufzeitkomplexität, die quadratisch mit der Stichprobengröße

n

wächst, was ihre Anwendung auf große Stichproben stark beeinträchtigt. Um dieser schwerwiegenden Einschränkung zu begegnen, leistet diese Dissertation die folgenden Beiträge. Wir schlagen den ersten beschleunigten Nyström-basierten HSIC-Schätzer vor, der mehr als zwei Zufallsvariablen verarbeiten kann, beweisen seine

\sqrt n

-Konsistenz und evaluieren seine Leistung auf synthetischen Daten, Abhängigkeitstests von Medienannotationen und dem Finden von kausalen Zusammenhängen. Darüber hinaus zeigen wir, dass die minimax-optimale Rate der HSIC-Schätzung für kontinuierliche, beschränkte, translationsinvariante Kernel auf

\mathbb R^d

für Borel-Maße, die die Normalverteilungen enthalten,

\mathcal O\!\left( n^{-1/2} \right)

ist. Damit beantworten wir eine Frage, die seit der Einführung von HSIC vor mehr als

20

Jahren unbeantwortet war. Unser Ergebnis impliziert auch die Minimax-Optimalität der von uns vorgeschlagenen HSIC-Beschleunigung. In Bezug auf KSD schlagen wir ebenfalls eine Nyström-basierte Beschleunigung vor, beweisen ihre

\sqrt n

-Konsistenz mit einer klassischen sub-Gaußschen Annahme und zeigen mit einer Reihe von Anpassungsgütetest-Benchmarks, dass diese den bisherigen Stand der Forschung übertrifft. Schließlich entwerfen wir eine effiziente Online-Approximation von MMD, die deren Berechnung auf Datenströmen ermöglicht und die Basis für einen leistungsstarken Change Detection Algorithmus bietet. Umfangreiche Experimente zeigen, dass der vorgeschlagene Algorithmus sowohl auf synthetischen als auch auf realen Daten eine herausragende Leistung erzielt. Insgesamt leistet diese Dissertation einen wissenschaftlichen Beitrag, indem sie beschleunigte Schätzer für kernelbasierte informationstheoretische Maße vorstellt und Werkzeuge für deren Analyse einführt. Unsere theoretischen und experimentellen Ergebnisse zeigen die hervorragenden Eigenschaften dieser Schätzer. Sämtlicher Code für die Replikation der Experimente ist frei verfügbar

KITopen

Robust partial-label learning by leveraging class activation values

Author: Fuchs Tobias
Kalinke Florian
Publication venue: Springer-Verlag
Publication date: 18/11/2025
Field of study

Real-world training data is often noisy; for example, human annotators assign conflict- ing class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this context without manual data cleaning. While state-of-the-art methods have good predictive performance, their predictions are sensitive to high noise levels, out-of-distribution data, and adversarial perturbations. We propose a novel PLL method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network’s class activation values. Thereby, we effectively incorporate prior knowledge about the class labels by using a novel label weight re-distribution strategy that we prove to be optimal. We empirically show that our method yields more robust predictions in terms of predictive performance under high PLL noise levels, handling out-of-distribution examples, and handling adver-sarial perturbations on the test instances

KITopen

Robust Partial-Label Learning by Leveraging Class Activation Values

Author: Fuchs Tobias
Kalinke Florian
Publication venue: Springer-Verlag
Publication date: 03/06/2025
Field of study

Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this context without manual data cleaning. While state-of-the-art methods have good predictive performance, their predictions are sensitive to high noise levels, out-of-distribution data, and adversarial perturbations. We propose a novel PLL method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network\u27s class activation values. Thereby, we effectively incorporate prior knowledge about the class labels by using a novel label weight re-distribution strategy that we prove to be optimal. We empirically show that our method yields more robust predictions in terms of predictive performance under high PLL noise levels, handling out-of-distribution examples, and handling adversarial perturbations on the test instances

KITopen

Partial-Label Learning with Conformal Candidate Cleaning

Author: Fuchs Tobias
Kalinke Florian
Publication venue: arxiv
Publication date: 03/06/2025
Field of study

Real-world data is often ambiguous; for example, human annotation produces instances with multiple conflicting class labels. Partial-label learning (PLL) aims at training a classifier in this challenging setting, where each instance is associated with a set of candidate labels and one correct, but unknown, class label. A multitude of algorithms targeting this setting exists and, to enhance their prediction quality, several extensions that are applicable across a wide range of PLL methods have been introduced. While many of these extensions rely on heuristics, this article proposes a novel enhancing method that incrementally prunes candidate sets using conformal prediction. To work around the missing labeled validation set, which is typically required for conformal prediction, we propose a strategy that alternates between training a PLL classifier to label the validation set, leveraging these predicted class labels for calibration, and pruning candidate labels that are not part of the resulting conformal sets. In this sense, our method alternates between empirical risk minimization and candidate set pruning. We establish that our pruning method preserves the conformal validity with respect to the unknown ground truth. Our extensive experiments on artificial and realworld data show that the proposed approach significantly improves the test set accuracies of several state-of-the-art PLL classifiers

KITopen

The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Author: Kalinke Florian
Szabo Zoltan
Publication venue
Publication date: 12/03/2024
Field of study

M\ge 2

random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on

\mathbb R^d

for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is

\mathcal O\!\left(n^{-1/2}\right)

. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nystr\"om-based one) on

\mathbb R^d

arXiv.org e-Print Archive

The minimax rate of HSIC estimation for translation-invariant kernels

Author: Kalinke Florian
Szabo Zoltan
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2024
Field of study

Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of M ≥ 2 random variables. Probably the most widespread independence measure relying on kernels is the socalled Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on Rd for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is On−1/2 . Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nyström-based one) on Rd

LSE Research Online

Optimal Online Change Detection via Random Fourier Features

Author: Gavioli-Akilagun Shakeel
Kalinke Florian
Publication venue: arxiv
Publication date: 03/06/2025
Field of study

This article studies the problem of online non-parametric change point detection in multivariate data streams. We approach the problem through the lens of kernel-based two-sample testing and introduce a sequential testing procedure based on random Fourier features, running with logarithmic time complexity per observation and with overall logarithmic space complexity. The algorithm has two advantages compared to the state of the art. First, our approach is genuinely online, and no access to training data known to be from the pre-change distribution is necessary. Second, the algorithm does not require the user to specify a window parameter over which local tests are to be calculated. We prove strong theoretical guarantees on the algorithm\u27s performance, including information-theoretic bounds demonstrating that the detection delay is optimal in the minimax sense. Numerical studies on real and synthetic data show that our algorithm is competitive with respect to the state of the art

KITopen

Partial-Label Learning with a Reject Option

Author: Böhm Klemens
Fuchs Tobias
Kalinke Florian
Publication venue
Publication date: 05/06/2024
Field of study

In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they impact actions or decisions. We propose a novel risk-consistent partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions. Extensive experiments on artificial and real-world datasets show that our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors, which use confidence thresholds for rejecting unsure predictions instead. When evaluated without the reject option, our nearest neighbor-based approach also achieves competitive prediction performance

arXiv.org e-Print Archive