287 research outputs found

    Interpretable multiclass classification by MDL-based rule lists

    Get PDF
    Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion

    Local Subspace-Based Outlier Detection using Global Neighbourhoods

    Full text link
    Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.Comment: Short version accepted at IEEE BigData 201

    Twente Optical Perfusion Camera: system overview and performance for video rate laser Doppler perfusion imaging

    Get PDF
    We present the Twente Optical Perfusion Camera (TOPCam), a novel laser Doppler Perfusion Imager based on CMOS technology. The tissue under investigation is illuminated and the resulting dynamic speckle pattern is recorded with a high speed CMOS camera. Based on an overall analysis of the signal-to-noise ratio of CMOS cameras, we have selected the camera which best fits our requirements. We applied a pixel-by-pixel noise correction to minimize the influence of noise in the perfusion images. We can achieve a frame rate of 0.2 fps for a perfusion image of 128×128 pixels (imaged tissue area of 7×7 cm2) if the data is analyzed online. If the analysis of the data is performed offline, we can achieve a frame rate of 26 fps for a duration of 3.9 seconds. By reducing the imaging size to 128×16 pixels, this frame rate can be achieved for up to half a minute. We show the fast imaging capabilities of the system in order of increasing perfusion frame rate. First the increase of skin perfusion after application of capsicum cream, and the perfusion during an occlusion-reperfusion procedure at the fastest frame rate allowed with online analysis is shown. With the highest frame rate allowed with offline analysis, the skin perfusion revealing the heart beat and the perfusion during an occlusion-reperfusion procedure is presented. Hence we have achieved video rate laser Doppler perfusion imaging

    Mining local staircase patterns in noisy data

    Get PDF
    Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we propose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising

    Helping Made Easy: Ease of Argument Generation Enhances Intentions to Help

    Get PDF
    Previous work has shown that self-generating arguments is more persuasive than reading arguments provided by others, particularly if self-generation feels easy. The present study replicates and extends these findings by providing evidence for fluency effects on behavioral intention in the realm of helping. In two studies, participants were instructed to either self-generate or read two versus ten arguments about why it is good to help. Subsequently, a confederate asked them for help. Results show that self-generating few arguments is more effective than generating many arguments. While this pattern reverses for reading arguments, easy self-generation is the most effective strategy compared to all other conditions. These results have important implications for fostering behavioral change in all areas of life

    Explainable Contextual Anomaly Detection using Quantile Regression Forests

    Get PDF
    Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a context of similar objects by dividing the features into contextual features and behavioral features. In this paper, we develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods. Based on resulting insights, we propose a novel approach to inherently interpretable contextual anomaly detection that uses Quantile Regression Forests to model dependencies between features. Extensive experiments on various synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art anomaly detection methods in identifying contextual anomalies in terms of accuracy and interpretability.Comment: Manuscript submitted to Data Mining and Knowledge Discovery in October 2022 for possible publication. This is the revised version submitted in April 202

    Truly Unordered Probabilistic Rule Sets for Multi-class Classification

    Full text link
    Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, while learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which harms interpretability; and 3) currently no method exists for learning probabilistic rule sets for multi-class target variables (there is only one for probabilistic rule lists). We propose TURS, for Truly Unordered Rule Sets, which addresses these shortcomings. We first formalize the problem of learning truly unordered rule sets. To resolve conflicts caused by overlapping rules, i.e., instances covered by multiple rules, we propose a novel approach that exploits the probabilistic properties of our rule sets. We next develop a two-phase heuristic algorithm that learns rule sets by carefully growing rules. An important innovation is that we use a surrogate score to take the global potential of the rule set into account when learning a local rule. Finally, we empirically demonstrate that, compared to non-probabilistic and (explicitly or implicitly) ordered state-of-the-art methods, our method learns rule sets that not only have better interpretability but also better predictive performance.Comment: Camera ready version for ECMLPKDD 2022, with Supplementary Material

    Probabilistic Truly Unordered Rule Sets

    Full text link
    Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.Comment: Submitted to JML
    corecore