1,532 research outputs found
Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability
Post-hoc model-agnostic interpretation methods such as partial dependence
plots can be employed to interpret complex machine learning models. While these
interpretation methods can be applied regardless of model complexity, they can
produce misleading and verbose results if the model is too complex, especially
w.r.t. feature interactions. To quantify the complexity of arbitrary machine
learning models, we propose model-agnostic complexity measures based on
functional decomposition: number of features used, interaction strength and
main effect complexity. We show that post-hoc interpretation of models that
minimize the three measures is more reliable and compact. Furthermore, we
demonstrate the application of these measures in a multi-objective optimization
approach which simultaneously minimizes loss and complexity
Serosurvey of selected avian pathogens in brazilian commercial Rheas (Rhea americana) and Ostriches (Struthio camelus)
Ratite farming of has expanded worldwide. Due to the intensive farming methods used by ratite producers, preventive medicine practices should be established. In this context, the surveillance and control of some avian pathogens are essential for the success of the ratite industry; however, little is known on the health status of ratites in Brazil. Therefore, the prevalence of antibodies against Newcastle Disease virus, Chlamydophila psittaci, Mycoplasma gallisepticum, Mycoplasma synoviae, and Salmonella Pullorum were evaluated in 100 serum samples collected from commercial ostriches and in 80 serum samples from commercial rheas reared in Brazil. All sampled animals were clinically healthy. The results showed that all ostriches and rheas were serologically negative to Newcastle disease virus, Chlamydophila psittaci, Mycoplasma gallisepticum, and Mycoplasma synoviae. Positive antibody responses against Salmonella Pullorum antigen were not detected in ostrich sera, but were detected in two rhea serum samples. These results can be considered as a warning as to the presence of Salmonella spp. in ratite farms. Therefore, the implementation of good health management and surveillance programs in ratite farms may contribute to improve not only animal production, but also public health conditions.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvi-mento Científico e Tecnológico (CNPq
Recommended from our members
How does predicate invention affect human comprehensibility?
During the 1980s Michie defined Machine Learning in terms of two orthogonal axes of performance: predictive accuracy and comprehensibility of generated hypotheses. Since predictive accuracy was readily measurable and comprehensibility not so, later definitions in the 1990s, such as that of Mitchell, tended to use a one-dimensional approach to Machine Learning based solely on predictive accuracy, ultimately favouring statistical over symbolic Machine Learning approaches. In this paper we provide a definition of comprehensibility of hypotheses which can be estimated using human participant trials. We present the results of experiments testing human comprehensibility of logic programs learned with and without predicate invention. Results indicate that comprehensibility is affected not only by the complexity of the presented program but also by the existence of anonymous predicate symbols
Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients
Background: Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Results: Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Conclusions: Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Figure not available: see fulltext. © 2015 Freitas et al.; licensee Springer
Learning Interpretable Rules for Multi-label Classification
Multi-label classification (MLC) is a supervised learning problem in which,
contrary to standard multiclass classification, an instance can be associated
with several class labels simultaneously. In this chapter, we advocate a
rule-based approach to multi-label classification. Rule learning algorithms are
often employed when one is not only interested in accurate predictions, but
also requires an interpretable theory that can be understood, analyzed, and
qualitatively evaluated by domain experts. Ideally, by revealing patterns and
regularities contained in the data, a rule-based theory yields new insights in
the application domain. Recently, several authors have started to investigate
how rule-based models can be used for modeling multi-label data. Discussing
this task in detail, we highlight some of the problems that make rule learning
considerably more challenging for MLC than for conventional classification.
While mainly focusing on our own previous work, we also provide a short
overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models
in Computer Vision and Machine Learning. The Springer Series on Challenges in
Machine Learning. Springer (2018). See
http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further
informatio
Historical analysis of the Brazilian cervical cancer screening program from 2006 to 2013: a time for reflection
BACKGROUND: The Cervical Cancer Database of the Brazilian National Health Service (SISCOLO) contains information regarding all cervical cytological tests and, if properly explored, can be used as a tool for monitoring and managing the cervical cancer screening program. The aim of this study was to perform a historical analysis of the cervical cancer screening program in Brazil from 2006 to 2013.
MATERIAL AND METHODS: The data necessary to calculate quality indicators were obtained from the SISCOLO, a Brazilian health system tool. Joinpoint analysis was used to calculate the annual percentage change.
RESULTS: We observed important trends showing decreased rates of low-grade squamous intraepithelial lesions (LSIL) and high-grade squamous intraepithelial lesions (HSIL) and an increased rate of rejected exams from 2009 to 2013. The index of positivity was maintained at levels below those indicated by international standards; very low frequencies of unsatisfactory cases were observed over the study period, which partially contradicts the low rate of positive cases. The number of positive cytological diagnoses was below that expected, considering that developed countries with low frequencies of cervical cancer detect more lesions annually.
CONCLUSIONS: The evolution of indicators from 2006 to 2013 suggests that actions must be taken to improve the effectiveness of cervical cancer control in Brazil
Astrobiological Complexity with Probabilistic Cellular Automata
Search for extraterrestrial life and intelligence constitutes one of the
major endeavors in science, but has yet been quantitatively modeled only rarely
and in a cursory and superficial fashion. We argue that probabilistic cellular
automata (PCA) represent the best quantitative framework for modeling
astrobiological history of the Milky Way and its Galactic Habitable Zone. The
relevant astrobiological parameters are to be modeled as the elements of the
input probability matrix for the PCA kernel. With the underlying simplicity of
the cellular automata constructs, this approach enables a quick analysis of
large and ambiguous input parameters' space. We perform a simple clustering
analysis of typical astrobiological histories and discuss the relevant boundary
conditions of practical importance for planning and guiding actual empirical
astrobiological and SETI projects. In addition to showing how the present
framework is adaptable to more complex situations and updated observational
databases from current and near-future space missions, we demonstrate how
numerical results could offer a cautious rationale for continuation of
practical SETI searches.Comment: 37 pages, 11 figures, 2 tables; added journal reference belo
Strong interface-induced spin-orbit coupling in graphene on WS2
Interfacial interactions allow the electronic properties of graphene to be
modified, as recently demonstrated by the appearance of satellite Dirac cones
in the band structure of graphene on hexagonal boron nitride (hBN) substrates.
Ongoing research strives to explore interfacial interactions in a broader class
of materials in order to engineer targeted electronic properties. Here we show
that at an interface with a tungsten disulfide (WS2) substrate, the strength of
the spin-orbit interaction (SOI) in graphene is very strongly enhanced. The
induced SOI leads to a pronounced low-temperature weak anti-localization (WAL)
effect, from which we determine the spin-relaxation time. We find that
spin-relaxation time in graphene is two-to-three orders of magnitude smaller on
WS2 than on SiO2 or hBN, and that it is comparable to the intervalley
scattering time. To interpret our findings we have performed first-principle
electronic structure calculations, which both confirm that carriers in
graphene-on-WS2 experience a strong SOI and allow us to extract a
spin-dependent low-energy effective Hamiltonian. Our analysis further shows
that the use of WS2 substrates opens a possible new route to access topological
states of matter in graphene-based systems.Comment: Originally submitted version in compliance with editorial guidelines.
Final version with expanded discussion of the relation between theory and
experiments to be published in Nature Communication
An ant colony-based semi-supervised approach for learning classification rules
Semi-supervised learning methods create models from a few labeled instances and a great number of unlabeled instances. They appear as a good option in scenarios where there is a lot of unlabeled data and the process of labeling instances is expensive, such as those where most Web applications stand. This paper proposes a semi-supervised self-training algorithm called Ant-Labeler. Self-training algorithms take advantage of supervised learning algorithms to iteratively learn a model from the labeled instances and then use this model to classify unlabeled instances. The instances that receive labels with high confidence are moved from the unlabeled to the labeled set, and this process is repeated until a stopping criteria is met, such as labeling all unlabeled instances. Ant-Labeler uses an ACO algorithm as the supervised learning method in the self-training procedure to generate interpretable rule-based models—used as an ensemble to ensure accurate predictions. The pheromone matrix is reused across different executions of the ACO algorithm to avoid rebuilding the models from scratch every time the labeled set is updated. Results showed that the proposed algorithm obtains better predictive accuracy than three state-of-the-art algorithms in roughly half of the datasets on which it was tested, and the smaller the number of labeled instances, the better the Ant-Labeler performance
Effect of synbiotic supplementation in children and adolescents with cystic fibrosis: a randomized controlled clinical trial
BACKGROUND/OBJECTIVES:Cystic fibrosis (CF) is characterized by excessive activation of immune processes. The aim of this study was to evaluate the effect of synbiotic supplementation on the inflammatory response in children/adolescents with CF.
SUBJECTS/METHODS:A randomized, placebo-controlled, double-blind, clinical-trial was conducted with control group (CG, n = 17), placebo-CF-group (PCFG, n = 19), synbiotic CF-group (SCFG, n = 22), PCFG negative (n = 8) and positive (n = 11) bacteriology, and SCFG negative (n = 12) and positive (n = 10) bacteriology. Markers of lung function (FEV1), nutritional status [body mass index-for age (BMI/A), height-for-age (H/A), weight-for-age (W/A), upper-arm fat area (UFA), upper-arm muscle area (UMA), body fat (%BF)], and inflammation [interleukin (IL)-12, tumor necrosis factor-alpha (TNF-α), IL-10, IL-6, IL-1β, IL-8, myeloperoxidase (MPO), nitric oxide metabolites (NOx)] were evaluated before and after 90-day of supplementation with a synbiotic.
RESULTS:No significance difference was found between the baseline and end evaluations of FEV1 and nutricional status markers. A significant interaction (time vs. group) was found for IL-12 (p = 0.010) and myeloperoxidase (p = 0.036) between PCFG and SCFG, however, the difference was not maintained after assessing the groups individually. NOx diminished significantly after supplementation in the SCFG (p = 0.030). In the SCFG with positive bacteriology, reductions were found in IL-6 (p = 0.033) and IL-8 (p = 0.009) after supplementation.
CONCLUSIONS: Synbiotic supplementation shown promise at diminishing the pro-inflammatory markers IL-6, IL-8 in the SCFG with positive bacteriology and NOx in the SCFG in children/adolescents with CF
- …
