140 research outputs found

    Gravity as a Gauge Theory on Three-Dimensional Noncommutative spaces

    Full text link
    We plan to translate the successful description of three-dimensional gravity as a gauge theory in the noncommutative framework, making use of the covariant coordinates. We consider two specific three-dimensional fuzzy spaces based on SU(2) and SU(1,1), which carry appropriate symmetry groups. These are the groups we are going to gauge in order to result with the transformations of the gauge fields (dreibein, spin connection and two extra Maxwell fields due to noncommutativity), their corresponding curvatures and eventually determine the action and the equations of motion. Finally, we verify their connection to three-dimensional gravity.Comment: arXiv admin note: text overlap with arXiv:1802.0755

    Solitons and giants in matrix models

    Get PDF
    We present a method for solving BPS equations obtained in the collective-field approach to matrix models. The method enables us to find BPS solutions and quantum excitations around these solutions in the one-matrix model, and in general for the Calogero model. These semiclassical solutions correspond to giant gravitons described by matrix models obtained in the framework of AdS/CFT correspondence. The two-field model, associated with two types of giant gravitons, is investigated. In this duality-based matrix model we find the finite form of the nn-soliton solution. The singular limit of this solution is examined and a realization of open-closed string duality is proposed.Comment: 17 pages, JHEP cls; v2: final version to appear in JHEP, 2 references added, physical motivation and interpretation clarifie

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

    Get PDF
    Motivation :Reconstructing the topology of a gene regulatory network is one of the key tasks in systems biology. Despite of the wide variety of proposed methods, very little work has been dedicated to the assessment of their stability properties. Here we present a methodical comparison of the performance of a novel method (RegnANN) for gene network inference based on multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER), focussing our analysis on the prediction variability induced by both the network intrinsic structure and the available data. Results: The extensive evaluation on both synthetic data and a selection of gene modules of "Escherichia coli" indicates that all the algorithms suffer of instability and variability issues with regards to the reconstruction of the topology of the network. This instability makes objectively very hard the task of establishing which method performs best. Nevertheless, RegnANN shows MCC scores that compare very favorably with all the other inference methods tested. Availability: The software for the RegnANN inference algorithm is distributed under GPL3 and it is available at the corresponding author home page (http://mpba.fbk.eu/grimaldi/regnann-supmat

    The Venus score for the assessment of the quality and trustworthiness of biomedical datasets

    Get PDF
    Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets. Although generally useful, however, they are often incomplete and impractical. The guidelines of Datasheets for Datasets, in particular, are too numerous; the requirements of the Kaggle Dataset Usability Score focus on non-scientific requisites (for example, including a cover image); and the European Union Artificial Intelligence Act (EU AI Act) sets forth sparse and general data governance requirements, which we tailored to datasets for biomedical AI. Against this backdrop, we introduce our new Venus score to assess the data quality and trustworthiness of biomedical datasets. Our score ranges from 0 to 10 and consists of ten questions that anyone developing a bioinformatics, medical informatics, or cheminformatics dataset should answer before the release. In this study, we first describe the EU AI Act, Datasheets for Datasets, and the Kaggle Dataset Usability Score, presenting their requirements and their drawbacks. To do so, we reverse-engineer the weights of the influential Kaggle Score for the first time and report them in this study. We distill the most important data governance requirements into ten questions tailored to the biomedical domain, comprising the Venus score. We apply the Venus score to twelve datasets from multiple subdomains, including electronic health records, medical imaging, microarray and bulk RNA-seq gene expression, cheminformatics, physiologic electrogram signals, and medical text. Analyzing the results, we surface fine-grained strengths and weaknesses of popular datasets, as well as aggregate trends. Most notably, we find a widespread tendency to gloss over sources of data inaccuracy and noise, which may hinder the reliable exploitation of data and, consequently, research results. Overall, our results confirm the applicability and utility of the Venus score to assess the trustworthiness of biomedical data

    The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment

    Get PDF
    To assess the quality of a binary classification, researchers often take advantage of a four-entry contingency table called confusion matrix, containing true positives, true negatives, false positives, and false negatives. To recap the four values of a confusion matrix in a unique score, researchers and statisticians have developed several rates and metrics. In the past, several scientific studies already showed why the Matthews correlation coefficient (MCC) is more informative and trustworthy than confusion-entropy error, accuracy, F1 score, bookmaker informedness, markedness, and balanced accuracy. In this study, we compare the MCC with the diagnostic odds ratio (DOR), a statistical rate employed sometimes in biomedical sciences. After examining the properties of the MCC and of the DOR, we describe the relationships between them, by also taking advantage of an innovative geometrical plot called confusion tetrahedron, presented here for the first time. We then report some use cases where the MCC and the DOR produce discordant outcomes, and explain why the Matthews correlation coefficient is more informative and reliable between the two. Our results can have a strong impact in computer science and statistics, because they clearly explain why the trustworthiness of the information provided by the Matthews correlation coefficient is higher than the one generated by the diagnostic odds ratio

    What Goes Around Comes Around: Learning Sentiments in Online Medical Forums

    Get PDF
    Currently 19%-28% of Internet users participate in online health discussions. A 2011 survey of the US population estimated that 59% of all adults have looked online for information about health topics such as a specific disease or treatment. Although empirical evidence strongly supports the importance of emotions in health-related messages, there are few studies of the relationship between a subjective lan-guage and online discussions of personal health. In this work, we study sentiments expressed on online medical forums. As well as considering the predominant sentiments expressed in individual posts, we analyze sequences of sentiments in online discussions. Individual posts are classified into one of five categories. We identified three categories as sentimental (encouragement, gratitude, confusion) and two categories as neutral (facts, endorsement). 1438 messages from 130 threads were annotated manually by two annotators with a strong inter-annotator agreement (Fleiss kappa = 0.737 and 0.763 for posts in se-quence and separate posts respectively). The annotated posts were used to analyse sentiments in consec-utive posts. In four multi-class classification problems, we assessed HealthAffect, a domain-specific af-fective lexicon, as well general sentiment lexicons in their ability to represent messages in sentiment recognition

    Homolumo Gap and Matrix Model

    Full text link
    We discuss a dynamical matrix model by which probability distribution is associated with Gaussian ensembles from random matrix theory. We interpret the matrix M as a Hamiltonian representing interaction of a bosonic system with a single fermion. We show that a system of second-quantized fermions influences the ground state of the whole system by producing a gap between the highest occupied eigenvalue and the lowest unoccupied eigenvalue.Comment: 8 pages, 2 figure

    Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

    Get PDF
    MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results
    corecore