2,410 research outputs found

    Optimism in Active Learning with Gaussian Processes

    Get PDF
    International audienceIn the context of Active Learning for classification, the classification error depends on the joint distribution of samples and their labels which is initially unknown. The minimization of this error requires estimating this distribution. Online estimation of this distribution involves a trade-off between exploration and exploitation. This is a common problem in machine learning for which multi-armed bandit theory, building upon Optimism in the Face of Uncertainty, has been proven very efficient these last years. We introduce two novel algorithms that use Optimism in the Face of Uncertainty along with Gaussian Processes for the Active Learning problem. The evaluation lead on real world datasets shows that these new algorithms compare positively to state-of-the-art methods

    Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

    Get PDF
    A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.Comment: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017

    T2{}^2K2{}^2: The Twitter Top-K Keywords Benchmark

    Full text link
    Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T2{}^2K2{}^2, which features a real tweet dataset and queries with various complexities and selectivities. T2{}^2K2{}^2 helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T2{}^2K2{}^2's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

    Prevalence of Disorders Recorded in Dogs Attending Primary-Care Veterinary Practices in England

    Get PDF
    Purebred dog health is thought to be compromised by an increasing occurence of inherited diseases but inadequate prevalence data on common disorders have hampered efforts to prioritise health reforms. Analysis of primary veterinary practice clinical data has been proposed for reliable estimation of disorder prevalence in dogs. Electronic patient record (EPR) data were collected on 148,741 dogs attending 93 clinics across central and south-eastern England. Analysis in detail of a random sample of EPRs relating to 3,884 dogs from 89 clinics identified the most frequently recorded disorders as otitis externa (prevalence 10.2%, 95% CI: 9.1-11.3), periodontal disease (9.3%, 95% CI: 8.3-10.3) and anal sac impaction (7.1%, 95% CI: 6.1-8.1). Using syndromic classification, the most prevalent body location affected was the head-and-neck (32.8%, 95% CI: 30.7-34.9), the most prevalent organ system affected was the integument (36.3%, 95% CI: 33.9-38.6) and the most prevalent pathophysiologic process diagnosed was inflammation (32.1%, 95% CI: 29.8-34.3). Among the twenty most-frequently recorded disorders, purebred dogs had a significantly higher prevalence compared with crossbreds for three: otitis externa (P = 0.001), obesity (P = 0.006) and skin mass lesion (P = 0.033), and popular breeds differed significantly from each other in their prevalence for five: periodontal disease (P = 0.002), overgrown nails (P = 0.004), degenerative joint disease (P = 0.005), obesity (P = 0.001) and lipoma (P = 0.003). These results fill a crucial data gap in disorder prevalence information and assist with disorder prioritisation. The results suggest that, for maximal impact, breeding reforms should target commonly-diagnosed complex disorders that are amenable to genetic improvement and should place special focus on at-risk breeds. Future studies evaluating disorder severity and duration will augment the usefulness of the disorder prevalence information reported herein

    A Compromise between Neutrino Masses and Collider Signatures in the Type-II Seesaw Model

    Full text link
    A natural extension of the standard SU(2)L×U(1)YSU(2)_{\rm L} \times U(1)_{\rm Y} gauge model to accommodate massive neutrinos is to introduce one Higgs triplet and three right-handed Majorana neutrinos, leading to a 6×66\times 6 neutrino mass matrix which contains three 3×33\times 3 sub-matrices MLM_{\rm L}, MDM_{\rm D} and MRM_{\rm R}. We show that three light Majorana neutrinos (i.e., the mass eigenstates of νe\nu_e, νμ\nu_\mu and ντ\nu_\tau) are exactly massless in this model, if and only if ML=MDMR1MDTM_{\rm L} = M_{\rm D} M_{\rm R}^{-1} M_{\rm D}^T exactly holds. This no-go theorem implies that small but non-vanishing neutrino masses may result from a significant but incomplete cancellation between MLM_{\rm L} and MDMR1MDTM_{\rm D} M_{\rm R}^{-1} M_{\rm D}^T terms in the Type-II seesaw formula, provided three right-handed Majorana neutrinos are of O(1){\cal O}(1) TeV and experimentally detectable at the LHC. We propose three simple Type-II seesaw scenarios with the A4×U(1)XA_4 \times U(1)_{\rm X} flavor symmetry to interpret the observed neutrino mass spectrum and neutrino mixing pattern. Such a TeV-scale neutrino model can be tested in two complementary ways: (1) searching for possible collider signatures of lepton number violation induced by the right-handed Majorana neutrinos and doubly-charged Higgs particles; and (2) searching for possible consequences of unitarity violation of the 3×33\times 3 neutrino mixing matrix in the future long-baseline neutrino oscillation experiments.Comment: RevTeX 19 pages, no figure

    One-carbon metabolism in cancer

    Get PDF
    Cells require one-carbon units for nucleotide synthesis, methylation and reductive metabolism, and these pathways support the high proliferative rate of cancer cells. As such, anti-folates, drugs that target one-carbon metabolism, have long been used in the treatment of cancer. Amino acids, such as serine are a major one-carbon source, and cancer cells are particularly susceptible to deprivation of one-carbon units by serine restriction or inhibition of de novo serine synthesis. Recent work has also begun to decipher the specific pathways and sub-cellular compartments that are important for one-carbon metabolism in cancer cells. In this review we summarise the historical understanding of one-carbon metabolism in cancer, describe the recent findings regarding the generation and usage of one-carbon units and explore possible future therapeutics that could exploit the dependency of cancer cells on one-carbon metabolism

    The Search for Invariance: Repeated Positive Testing Serves the Goals of Causal Learning

    Get PDF
    Positive testing is characteristic of exploratory behavior, yet it seems to be at odds with the aim of information seeking. After all, repeated demonstrations of one’s current hypothesis often produce the same evidence and fail to distinguish it from potential alternatives. Research on the development of scientific reasoning and adult rule learning have both documented and attempted to explain this behavior. The current chapter reviews this prior work and introduces a novel theoretical account—the Search for Invariance (SI) hypothesis—which suggests that producing multiple positive examples serves the goals of causal learning. This hypothesis draws on the interventionist framework of causal reasoning, which suggests that causal learners are concerned with the invariance of candidate hypotheses. In a probabilistic and interdependent causal world, our primary goal is to determine whether, and in what contexts, our causal hypotheses provide accurate foundations for inference and intervention—not to disconfirm their alternatives. By recognizing the central role of invariance in causal learning, the phenomenon of positive testing may be reinterpreted as a rational information-seeking strategy

    The epidemiology of osteonecrosis: findings from the GPRD and THIN databases in the UK

    Get PDF
    Summary We conducted a case–control study to examine osteonecrosis (ON) incidence, patient characteristics, and selected potential risk factors using two health record databases in the UK. Statistically significant risk factors for ON included systemic corticosteroid use, hospitalization, referral or specialist visit, bone fracture, any cancer, osteoporosis, connective tissue disease, and osteoarthritis.Introduction The purpose of this case–control study was to examine the incidence of osteonecrosis (ON), patient characteristics, and selected potential risk factors for ON using two health record databases in the UK: the General Practice Research Database and The Health Improvement Network.Methods ON cases (n? =?792) were identified from 1989 to 2003 and individually matched (age, sex, and medical practice) up to six controls (n?=?4,660) with no record of ON. Possible risk factors were considered for inclusion based on a review of published literature. Annual incidence rates were computed, and a multivariable logistic regression model was derived to evaluate selected risk factors.Results ON of the hip represented the majority of cases (75.9%). Statistically significant risk factors for ON were systemic corticosteroid use in the previous 2 years, hospitalization, referral or specialist visit, bone fracture, any cancer, osteoporosis, connective tissue disease, and osteoarthritis within the past 5 years. Only 4.4% of ON cases were exposed to bisphosphonates within the previous 2 years.Conclusions This study provides further perspective on the descriptive epidemiology of ON. Studies utilizing more recent data may further elucidate the understanding of ON key predictors.<br/
    corecore