2,410 research outputs found
Optimism in Active Learning with Gaussian Processes
International audienceIn the context of Active Learning for classification, the classification error depends on the joint distribution of samples and their labels which is initially unknown. The minimization of this error requires estimating this distribution. Online estimation of this distribution involves a trade-off between exploration and exploitation. This is a common problem in machine learning for which multi-armed bandit theory, building upon Optimism in the Face of Uncertainty, has been proven very efficient these last years. We introduce two novel algorithms that use Optimism in the Face of Uncertainty along with Gaussian Processes for the Active Learning problem. The evaluation lead on real world datasets shows that these new algorithms compare positively to state-of-the-art methods
Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity
A high degree of topical diversity is often considered to be an important
characteristic of interesting text documents. A recent proposal for measuring
topical diversity identifies three elements for assessing diversity: words,
topics, and documents as collections of words. Topic models play a central role
in this approach. Using standard topic models for measuring diversity of
documents is suboptimal due to generality and impurity. General topics only
include common information from a background corpus and are assigned to most of
the documents in the collection. Impure topics contain words that are not
related to the topic; impurity lowers the interpretability of topic models and
impure topics are likely to get assigned to documents erroneously. We propose a
hierarchical re-estimation approach for topic models to combat generality and
impurity; the proposed approach operates at three levels: words, topics, and
documents. Our re-estimation approach for measuring documents' topical
diversity outperforms the state of the art on PubMed dataset which is commonly
used for diversity experiments.Comment: Proceedings of the 39th European Conference on Information Retrieval
(ECIR2017
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
Prevalence of Disorders Recorded in Dogs Attending Primary-Care Veterinary Practices in England
Purebred dog health is thought to be compromised by an increasing occurence of inherited diseases but inadequate prevalence data on common disorders have hampered efforts to prioritise health reforms. Analysis of primary veterinary practice clinical data has been proposed for reliable estimation of disorder prevalence in dogs. Electronic patient record (EPR) data were collected on 148,741 dogs attending 93 clinics across central and south-eastern England. Analysis in detail of a random sample of EPRs relating to 3,884 dogs from 89 clinics identified the most frequently recorded disorders as otitis externa (prevalence 10.2%, 95% CI: 9.1-11.3), periodontal disease (9.3%, 95% CI: 8.3-10.3) and anal sac impaction (7.1%, 95% CI: 6.1-8.1). Using syndromic classification, the most prevalent body location affected was the head-and-neck (32.8%, 95% CI: 30.7-34.9), the most prevalent organ system affected was the integument (36.3%, 95% CI: 33.9-38.6) and the most prevalent pathophysiologic process diagnosed was inflammation (32.1%, 95% CI: 29.8-34.3). Among the twenty most-frequently recorded disorders, purebred dogs had a significantly higher prevalence compared with crossbreds for three: otitis externa (P = 0.001), obesity (P = 0.006) and skin mass lesion (P = 0.033), and popular breeds differed significantly from each other in their prevalence for five: periodontal disease (P = 0.002), overgrown nails (P = 0.004), degenerative joint disease (P = 0.005), obesity (P = 0.001) and lipoma (P = 0.003). These results fill a crucial data gap in disorder prevalence information and assist with disorder prioritisation. The results suggest that, for maximal impact, breeding reforms should target commonly-diagnosed complex disorders that are amenable to genetic improvement and should place special focus on at-risk breeds. Future studies evaluating disorder severity and duration will augment the usefulness of the disorder prevalence information reported herein
A Compromise between Neutrino Masses and Collider Signatures in the Type-II Seesaw Model
A natural extension of the standard gauge
model to accommodate massive neutrinos is to introduce one Higgs triplet and
three right-handed Majorana neutrinos, leading to a neutrino mass
matrix which contains three sub-matrices ,
and . We show that three light Majorana neutrinos (i.e., the mass
eigenstates of , and ) are exactly massless in this
model, if and only if
exactly holds. This no-go theorem implies that small but non-vanishing neutrino
masses may result from a significant but incomplete cancellation between
and terms in the Type-II
seesaw formula, provided three right-handed Majorana neutrinos are of TeV and experimentally detectable at the LHC. We propose three simple
Type-II seesaw scenarios with the flavor symmetry to
interpret the observed neutrino mass spectrum and neutrino mixing pattern. Such
a TeV-scale neutrino model can be tested in two complementary ways: (1)
searching for possible collider signatures of lepton number violation induced
by the right-handed Majorana neutrinos and doubly-charged Higgs particles; and
(2) searching for possible consequences of unitarity violation of the neutrino mixing matrix in the future long-baseline neutrino oscillation
experiments.Comment: RevTeX 19 pages, no figure
One-carbon metabolism in cancer
Cells require one-carbon units for nucleotide synthesis, methylation and reductive metabolism, and these pathways support the high proliferative rate of cancer cells. As such, anti-folates, drugs that target one-carbon metabolism, have long been used in the treatment of cancer. Amino acids, such as serine are a major one-carbon source, and cancer cells are particularly susceptible to deprivation of one-carbon units by serine restriction or inhibition of de novo serine synthesis. Recent work has also begun to decipher the specific pathways and sub-cellular compartments that are important for one-carbon metabolism in cancer cells. In this review we summarise the historical understanding of one-carbon metabolism in cancer, describe the recent findings regarding the generation and usage of one-carbon units and explore possible future therapeutics that could exploit the dependency of cancer cells on one-carbon metabolism
The Search for Invariance: Repeated Positive Testing Serves the Goals of Causal Learning
Positive testing is characteristic of exploratory behavior, yet it seems to be at odds with the aim of information seeking. After all, repeated demonstrations of one’s current hypothesis often produce the same evidence and fail to distinguish it from potential alternatives. Research on the development of scientific reasoning and adult rule learning have both documented and attempted to explain this behavior. The current chapter reviews this prior work and introduces a novel theoretical account—the Search for Invariance (SI) hypothesis—which suggests that producing multiple positive examples serves the goals of causal learning. This hypothesis draws on the interventionist framework of causal reasoning, which suggests that causal learners are concerned with the invariance of candidate hypotheses. In a probabilistic and interdependent causal world, our primary goal is to determine whether, and in what contexts, our causal hypotheses provide accurate foundations for inference and intervention—not to disconfirm their alternatives. By recognizing the central role of invariance in causal learning, the phenomenon of positive testing may be reinterpreted as a rational information-seeking strategy
The epidemiology of osteonecrosis: findings from the GPRD and THIN databases in the UK
Summary We conducted a case–control study to examine osteonecrosis (ON) incidence, patient characteristics, and selected potential risk factors using two health record databases in the UK. Statistically significant risk factors for ON included systemic corticosteroid use, hospitalization, referral or specialist visit, bone fracture, any cancer, osteoporosis, connective tissue disease, and osteoarthritis.Introduction The purpose of this case–control study was to examine the incidence of osteonecrosis (ON), patient characteristics, and selected potential risk factors for ON using two health record databases in the UK: the General Practice Research Database and The Health Improvement Network.Methods ON cases (n? =?792) were identified from 1989 to 2003 and individually matched (age, sex, and medical practice) up to six controls (n?=?4,660) with no record of ON. Possible risk factors were considered for inclusion based on a review of published literature. Annual incidence rates were computed, and a multivariable logistic regression model was derived to evaluate selected risk factors.Results ON of the hip represented the majority of cases (75.9%). Statistically significant risk factors for ON were systemic corticosteroid use in the previous 2 years, hospitalization, referral or specialist visit, bone fracture, any cancer, osteoporosis, connective tissue disease, and osteoarthritis within the past 5 years. Only 4.4% of ON cases were exposed to bisphosphonates within the previous 2 years.Conclusions This study provides further perspective on the descriptive epidemiology of ON. Studies utilizing more recent data may further elucidate the understanding of ON key predictors.<br/
- …
