1,671 research outputs found

    XAI-TRIS: Non-linear benchmarks to quantify ML explanation performance

    Full text link
    The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.Comment: Under revie

    Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables

    Full text link
    In recent years, the community of 'explainable artificial intelligence' (XAI) has created a vast body of methods to bridge a perceived gap between model 'complexity' and 'interpretability'. However, a concrete problem to be solved by XAI methods has not yet been formally stated. As a result, XAI methods are lacking theoretical and empirical evidence for the 'correctness' of their explanations, limiting their potential use for quality-control and transparency purposes. At the same time, Haufe et al. (2014) showed, using simple toy examples, that even standard interpretations of linear models can be highly misleading. Specifically, high importance may be attributed to so-called suppressor variables lacking any statistical relation to the prediction target. This behavior has been confirmed empirically for a large array of XAI methods in Wilming et al. (2022). Here, we go one step further by deriving analytical expressions for the behavior of a variety of popular XAI methods on a simple two-dimensional binary classification problem involving Gaussian class-conditional distributions. We show that the majority of the studied approaches will attribute non-zero importance to a non-class-related suppressor feature in the presence of correlated noise. This poses important limitations on the interpretations and conclusions that the outputs of these XAI methods can afford.Comment: Accepted at ICML 202

    The early evolution of the star cluster mass function

    Full text link
    Several recent studies have shown that the star cluster initial mass function (CIMF) can be well approximated by a power law, with indications for a steepening or truncation at high masses. This contribution considers the evolution of such a mass function due to cluster disruption, with emphasis on the part of the mass function that is observable in the first ~Gyr. A Schechter type function is used for the CIMF, with a power law index of -2 at low masses and an exponential truncation at M*. Cluster disruption due to the tidal field of the host galaxy and encounters with giant molecular clouds flattens the low-mass end of the mass function, but there is always a part of the `evolved Schechter function' that can be approximated by a power law with index -2. The mass range for which this holds depends on age, t, and shifts to higher masses roughly as t^0.6. Mean cluster masses derived from luminosity limited samples increase with age very similarly due to the evolutionary fading of clusters. Empirical mass functions are, therefore, approximately power laws with index -2, or slightly steeper, at all ages. The results are illustrated by an application to the star cluster population of the interacting galaxy M51, which can be well described by a model with M*=(1.9+/-0.5)x10^5 M_sun and a short (mass-dependent) disruption time destroying M* clusters in roughly a Gyr.Comment: 15 pages, 6 figures, accepted for MNRA

    Measurement of the inclusive and dijet cross-sections of b-jets in pp collisions at sqrt(s) = 7 TeV with the ATLAS detector

    Get PDF
    The inclusive and dijet production cross-sections have been measured for jets containing b-hadrons (b-jets) in proton-proton collisions at a centre-of-mass energy of sqrt(s) = 7 TeV, using the ATLAS detector at the LHC. The measurements use data corresponding to an integrated luminosity of 34 pb^-1. The b-jets are identified using either a lifetime-based method, where secondary decay vertices of b-hadrons in jets are reconstructed using information from the tracking detectors, or a muon-based method where the presence of a muon is used to identify semileptonic decays of b-hadrons inside jets. The inclusive b-jet cross-section is measured as a function of transverse momentum in the range 20 < pT < 400 GeV and rapidity in the range |y| < 2.1. The bbbar-dijet cross-section is measured as a function of the dijet invariant mass in the range 110 < m_jj < 760 GeV, the azimuthal angle difference between the two jets and the angular variable chi in two dijet mass regions. The results are compared with next-to-leading-order QCD predictions. Good agreement is observed between the measured cross-sections and the predictions obtained using POWHEG + Pythia. MC@NLO + Herwig shows good agreement with the measured bbbar-dijet cross-section. However, it does not reproduce the measured inclusive cross-section well, particularly for central b-jets with large transverse momenta.Comment: 10 pages plus author list (21 pages total), 8 figures, 1 table, final version published in European Physical Journal

    Measurement of the top quark-pair production cross section with ATLAS in pp collisions at \sqrt{s}=7\TeV

    Get PDF
    A measurement of the production cross-section for top quark pairs(\ttbar) in pppp collisions at \sqrt{s}=7 \TeV is presented using data recorded with the ATLAS detector at the Large Hadron Collider. Events are selected in two different topologies: single lepton (electron ee or muon μ\mu) with large missing transverse energy and at least four jets, and dilepton (eeee, μμ\mu\mu or eμe\mu) with large missing transverse energy and at least two jets. In a data sample of 2.9 pb-1, 37 candidate events are observed in the single-lepton topology and 9 events in the dilepton topology. The corresponding expected backgrounds from non-\ttbar Standard Model processes are estimated using data-driven methods and determined to be 12.2±3.912.2 \pm 3.9 events and 2.5±0.62.5 \pm 0.6 events, respectively. The kinematic properties of the selected events are consistent with SM \ttbar production. The inclusive top quark pair production cross-section is measured to be \sigmattbar=145 \pm 31 ^{+42}_{-27} pb where the first uncertainty is statistical and the second systematic. The measurement agrees with perturbative QCD calculations.Comment: 30 pages plus author list (50 pages total), 9 figures, 11 tables, CERN-PH number and final journal adde

    Inclusive search for same-sign dilepton signatures in pp collisions at root s=7 TeV with the ATLAS detector

    Get PDF
    An inclusive search is presented for new physics in events with two isolated leptons (e or mu) having the same electric charge. The data are selected from events collected from p p collisions at root s = 7 TeV by the ATLAS detector and correspond to an integrated luminosity of 34 pb(-1). The spectra in dilepton invariant mass, missing transverse momentum and jet multiplicity are presented and compared to Standard Model predictions. In this event sample, no evidence is found for contributions beyond those of the Standard Model. Limits are set on the cross-section in a fiducial region for new sources of same-sign high-mass dilepton events in the ee, e mu and mu mu channels. Four models predicting same-sign dilepton signals are constrained: two descriptions of Majorana neutrinos, a cascade topology similar to supersymmetry or universal extra dimensions, and fourth generation d-type quarks. Assuming a new physics scale of 1 TeV, Majorana neutrinos produced by an effective operator V with masses below 460 GeV are excluded at 95% confidence level. A lower limit of 290 GeV is set at 95% confidence level on the mass of fourth generation d-type quarks

    Standalone vertex finding in the ATLAS muon spectrometer

    Get PDF
    A dedicated reconstruction algorithm to find decay vertices in the ATLAS muon spectrometer is presented. The algorithm searches the region just upstream of or inside the muon spectrometer volume for multi-particle vertices that originate from the decay of particles with long decay paths. The performance of the algorithm is evaluated using both a sample of simulated Higgs boson events, in which the Higgs boson decays to long-lived neutral particles that in turn decay to bbar b final states, and pp collision data at √s = 7 TeV collected with the ATLAS detector at the LHC during 2011

    Measurement of D*+/- meson production in jets from pp collisions at sqrt(s) = 7 TeV with the ATLAS detector

    Get PDF
    This paper reports a measurement of D*+/- meson production in jets from proton-proton collisions at a center-of-mass energy of sqrt(s) = 7 TeV at the CERN Large Hadron Collider. The measurement is based on a data sample recorded with the ATLAS detector with an integrated luminosity of 0.30 pb^-1 for jets with transverse momentum between 25 and 70 GeV in the pseudorapidity range |eta| < 2.5. D*+/- mesons found in jets are fully reconstructed in the decay chain: D*+ -> D0pi+, D0 -> K-pi+, and its charge conjugate. The production rate is found to be N(D*+/-)/N(jet) = 0.025 +/- 0.001(stat.) +/- 0.004(syst.) for D*+/- mesons that carry a fraction z of the jet momentum in the range 0.3 < z < 1. Monte Carlo predictions fail to describe the data at small values of z, and this is most marked at low jet transverse momentum.Comment: 10 pages plus author list (22 pages total), 5 figures, 1 table, matches published version in Physical Review

    Measurement of inclusive two-particle angular correlations in pp collisions with the ATLAS detector at the LHC

    Get PDF
    We present a measurement of two-particle angular correlations in proton- proton collisions at s√=900 GeV and 7 TeV. The collision events were collected during 2009 and 2010 with the ATLAS detector at the Large Hadron Collider using a single-arm minimum bias trigger. Correlations are measured for charged particles produced in the kinematic range of transverse momentum p T  > 100 MeV and pseudorapidity |η| < 2.5. A complex structure in pseudorapidity and azimuth is observed at both collision energies. Results are compared to pythia 8 and herwig++ as well as to the AMBT2B, DW and Perugia 2011 tunes of pythia 6. The data are not satisfactorily described by any of these models

    GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations

    Full text link
    Large pre-trained language models have become popular for many applications and form an important backbone of many downstream tasks in natural language processing (NLP). Applying 'explainable artificial intelligence' (XAI) techniques to enrich such models' outputs is considered crucial for assuring their quality and shedding light on their inner workings. However, large language models are trained on a plethora of data containing a variety of biases, such as gender biases, affecting model weights and, potentially, behavior. Currently, it is unclear to what extent such biases also impact model explanations in possibly unfavorable ways. We create a gender-controlled text dataset, GECO, in which otherwise identical sentences appear in male and female forms. This gives rise to ground-truth 'world explanations' for gender classification tasks, enabling the objective evaluation of the correctness of XAI methods. We also provide GECOBench, a rigorous quantitative evaluation framework benchmarking popular XAI methods, applying them to pre-trained language models fine-tuned to different degrees. This allows us to investigate how pre-training induces undesirable bias in model explanations and to what extent fine-tuning can mitigate such explanation bias. We show a clear dependency between explanation performance and the number of fine-tuned layers, where XAI methods are observed to particularly benefit from fine-tuning or complete retraining of embedding layers. Remarkably, this relationship holds for models achieving similar classification performance on the same task. With that, we highlight the utility of the proposed gender-controlled dataset and novel benchmarking approach for research and development of novel XAI methods. All code including dataset generation, model training, evaluation and visualization is available at: https://github.com/braindatalab/gecobenchComment: Under revie
    corecore