Search CORE

287 research outputs found

Interpretable multiclass classification by MDL-based rule lists

Author: Proença Hugo M.
van Leeuwen Matthijs
Publication venue: 'Elsevier BV'
Publication date: 31/10/2019
Field of study

Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion

arXiv.org e-Print Archive

Leiden University Scholary Publications

Local Subspace-Based Outlier Detection using Global Neighbourhoods

Author: Bäck Thomas
van Leeuwen Matthijs
van Stein Bas
Publication venue
Publication date: 01/11/2016
Field of study

Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.Comment: Short version accepted at IEEE BigData 201

arXiv.org e-Print Archive

Crossref

Twente Optical Perfusion Camera: system overview and performance for video rate laser Doppler perfusion imaging

Author: Draijer Matthijs
Hondebrink Erwin
Leeuwen Ton van
Steenbergen Wiendelt
Publication venue: Optical Society of America
Publication date: 01/01/2009
Field of study

We present the Twente Optical Perfusion Camera (TOPCam), a novel laser Doppler Perfusion Imager based on CMOS technology. The tissue under investigation is illuminated and the resulting dynamic speckle pattern is recorded with a high speed CMOS camera. Based on an overall analysis of the signal-to-noise ratio of CMOS cameras, we have selected the camera which best fits our requirements. We applied a pixel-by-pixel noise correction to minimize the influence of noise in the perfusion images. We can achieve a frame rate of 0.2 fps for a perfusion image of 128×128 pixels (imaged tissue area of 7×7 cm2) if the data is analyzed online. If the analysis of the data is performed offline, we can achieve a frame rate of 26 fps for a duration of 3.9 seconds. By reducing the imaging size to 128×16 pixels, this frame rate can be achieved for up to half a minute. We show the fast imaging capabilities of the system in order of increasing perfusion frame rate. First the increase of skin perfusion after application of capsicum cream, and the perfusion during an occlusion-reperfusion procedure at the fastest frame rate allowed with online analysis is shown. With the highest frame rate allowed with offline analysis, the skin perfusion revealing the heart beat and the perfusion during an occlusion-reperfusion procedure is presented. Hence we have achieved video rate laser Doppler perfusion imaging

University of Twente Research Information

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Mining local staircase patterns in noisy data

Author: De Raedt Luc
Fierro Ana Carolina
Guns Tias
International workshop on Co-Clustering and Applications
Le Van Thanh
Marchal Kathleen
Nijssen Siegfried
van Leeuwen Matthijs
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we propose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising

Lirias

Crossref

Ghent University Academic Bibliography

DIAL UCLouvain

Helping Made Easy: Ease of Argument Generation Enhances Intentions to Help

Author: Gloudemans Renate T. M.
Greifeneder Rainer
Müller Barbara C. N.
Van Leeuwen Matthijs L.
van Someren Daniël H.
Publication venue: 'Hogrefe Publishing Group'
Publication date: 01/01/2017
Field of study

Previous work has shown that self-generating arguments is more persuasive than reading arguments provided by others, particularly if self-generation feels easy. The present study replicates and extends these findings by providing evidence for fluency effects on behavioral intention in the realm of helping. In two studies, participants were instructed to either self-generate or read two versus ten arguments about why it is good to help. Subsequently, a confederate asked them for help. Results show that self-generating few arguments is more effective than generating many arguments. While this pattern reverses for reading arguments, easy self-generation is the most effective strategy compared to all other conditions. These results have important implications for fostering behavioral change in all areas of life

Crossref

edoc

Radboud Repository (Radboud Univ.)

Explainable Contextual Anomaly Detection using Quantile Regression Forests

Author: Li Zhong
van Leeuwen Matthijs
Publication venue
Publication date: 28/04/2023
Field of study

Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a context of similar objects by dividing the features into contextual features and behavioral features. In this paper, we develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods. Based on resulting insights, we propose a novel approach to inherently interpretable contextual anomaly detection that uses Quantile Regression Forests to model dependencies between features. Extensive experiments on various synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art anomaly detection methods in identifying contextual anomalies in terms of accuracy and interpretability.Comment: Manuscript submitted to Data Mining and Knowledge Discovery in October 2022 for possible publication. This is the revised version submitted in April 202

arXiv.org e-Print Archive

Leiden University Scholary Publications

Truly Unordered Probabilistic Rule Sets for Multi-class Classification

Author: van Leeuwen Matthijs
Yang Lincen
Publication venue
Publication date: 18/07/2022
Field of study

Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, while learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which harms interpretability; and 3) currently no method exists for learning probabilistic rule sets for multi-class target variables (there is only one for probabilistic rule lists). We propose TURS, for Truly Unordered Rule Sets, which addresses these shortcomings. We first formalize the problem of learning truly unordered rule sets. To resolve conflicts caused by overlapping rules, i.e., instances covered by multiple rules, we propose a novel approach that exploits the probabilistic properties of our rule sets. We next develop a two-phase heuristic algorithm that learns rule sets by carefully growing rules. An important innovation is that we use a surrogate score to take the global potential of the rule set into account when learning a local rule. Finally, we empirically demonstrate that, compared to non-probabilistic and (explicitly or implicitly) ordered state-of-the-art methods, our method learns rule sets that not only have better interpretability but also better predictive performance.Comment: Camera ready version for ECMLPKDD 2022, with Supplementary Material

arXiv.org e-Print Archive

Probabilistic Truly Unordered Rule Sets

Author: van Leeuwen Matthijs
Yang Lincen
Publication venue
Publication date: 18/01/2024
Field of study

Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.Comment: Submitted to JML

arXiv.org e-Print Archive