759 research outputs found
Smoothing a rugged protein folding landscape by sequence-based redesign
The rugged folding landscapes of functional proteins puts them at risk of misfolding and aggregation. Serine protease inhibitors, or serpins, are paradigms for this delicate balance between function and misfolding. Serpins exist in a metastable state that undergoes a major conformational change in order to inhibit proteases. However, conformational labiality of the native serpin fold renders them susceptible to misfolding, which underlies misfolding diseases such as -antitrypsin deficiency. To investigate how serpins balance function and folding, we used consensus design to create , a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant. Characterization of its structure, folding and dynamics suggest that consensus design has remodeled the folding landscape to reconcile competing requirements for stability and function. This approach may offer general benefits for engineering functional proteins that have risky folding landscapes, including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein therapeutics.BTP is a Medical Research Council Career Development Fellow. AAN and JJH are supported by the Wellcome Trust (grant number WT 095195). SM acknowledges fellowship support from the Australian Research Council (FT100100960). NAB is an Australian Research Council Future Fellow (110100223). GIW is an Australian Research Council Discovery Outstanding Researcher Award Fellow (DP140100087). AMB is a National Health and Medical Research Senior Research Fellow (1022688). JCW is an NHMRC Senior Principal Research fellow and also acknowledges the support of an ARC Federation Fellowship. We thank the Australian Synchrotron for beam-time and technical assistance. This work was supported by the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE) (www.massive.org.au). We acknowledge the Monash Protein Production Unit and Monash Macromolecular Crystallization Facilit
Analysis of Models for Decentralized and Collaborative AI on Blockchain
Machine learning has recently enabled large advances in artificial
intelligence, but these results can be highly centralized. The large datasets
required are generally proprietary; predictions are often sold on a per-query
basis; and published models can quickly become out of date without effort to
acquire more data and maintain them. Published proposals to provide models and
data for free for certain tasks include Microsoft Research's Decentralized and
Collaborative AI on Blockchain. The framework allows participants to
collaboratively build a dataset and use smart contracts to share a continuously
updated model on a public blockchain. The initial proposal gave an overview of
the framework omitting many details of the models used and the incentive
mechanisms in real world scenarios. In this work, we evaluate the use of
several models and configurations in order to propose best practices when using
the Self-Assessment incentive mechanism so that models can remain accurate and
well-intended participants that submit correct data have the chance to profit.
We have analyzed simulations for each of three models: Perceptron, Na\"ive
Bayes, and a Nearest Centroid Classifier, with three different datasets:
predicting a sport with user activity from Endomondo, sentiment analysis on
movie reviews from IMDB, and determining if a news article is fake. We compare
several factors for each dataset when models are hosted in smart contracts on a
public blockchain: their accuracy over time, balances of a good and bad user,
and transaction costs (or gas) for deploying, updating, collecting refunds, and
collecting rewards. A free and open source implementation for the Ethereum
blockchain and simulations written in Python is provided at
https://github.com/microsoft/0xDeCA10B. This version has updated gas costs
using newer optimizations written after the original publication.Comment: Accepted to ICBC 202
Mitochondrial phylogeography and demographic history of the Vicuña: implications for conservation
The vicuña (Vicugna vicugna; Miller, 1924) is a conservation success story, having recovered from near extinction in the 1960s to current population levels estimated at 275 000. However, lack of information about its demographic history and genetic diversity has limited both our understanding of its recovery and the development of science-based conservation measures. To examine the evolution and recent demographic history of the vicuña across its current range and to assess its genetic variation and population structure, we sequenced mitochondrial DNA from the control region (CR) for 261 individuals from 29 populations across Peru, Chile and Argentina. Our results suggest that populations currently designated as Vicugna vicugna vicugna and Vicugna vicugna mensalis comprise separate mitochondrial lineages. The current population distribution appears to be the result of a recent demographic expansion associated with the last major glacial event of the Pleistocene in the northern (18 to 22°S) dry Andes 14–12 000 years ago and the establishment of an extremely arid belt known as the 'Dry Diagonal' to 29°S. Within the Dry Diagonal, small populations of V. v. vicugna appear to have survived showing the genetic signature of demographic isolation, whereas to the north V. v. mensalis populations underwent a rapid demographic expansion before recent anthropogenic impacts
Influence of topography on tide propagation and amplification in semi-enclosed basins
An idealized model for tide propagation and amplification in semi-enclosed rectangular basins is presented, accounting for depth differences by a combination of longitudinal and lateral topographic steps. The basin geometry is formed by several adjacent compartments of identical width, each having either a uniform depth or two depths separated by a transverse topographic step. The problem is forced by an incoming Kelvin wave at the open end, while allowing waves to radiate outward. The solution in each compartment is written as the superposition of (semi)-analytical wave solutions in an infinite channel, individually satisfying the depth-averaged linear shallow water equations on the f plane, including bottom friction. A collocation technique is employed to satisfy continuity of elevation and flux across the longitudinal topographic steps between the compartments. The model results show that the tidal wave in shallow parts displays slower propagation, enhanced dissipation and amplified amplitudes. This reveals a resonance mechanism, occurring when\ud
the length of the shallow end is roughly an odd multiple of the quarter Kelvin wavelength. Alternatively, for sufficiently wide basins, also Poincaré waves may become resonant. A transverse step implies different wavelengths of the incoming and reflected Kelvin wave, leading to increased amplitudes in shallow regions and a shift of amphidromic points in the direction of the deeper part. Including the shallow parts near the basin’s closed end (thus capturing the Kelvin resonance mechanism) is essential to reproduce semi-diurnal and diurnal\ud
tide observations in the Gulf of California, the Adriatic Sea and the Persian Gulf
Ensembles of jittered association rule classifiers
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias–variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.This work was partially supported by FCT project Rank! (PTDC/EIA/81178/2006) and by AdI project Palco3.0 financed by QREN and Fundo Europeu de Desenvolvimento Regional (FEDER), and also supported by Fundacao Ciencia e Tecnologia, FEDER e Programa de Financiamento Plurianual de Unidades de I & D. Thanks are due to William Cohen for kindly providing the executable code for the SLIPPER implementation. Our gratitude goes also to our anonymous reviewers who have helped to significantly improve this paper by sharing their knowledge and their informed criticism with the authors
Smoothing a rugged protein folding landscape by sequence-based redesign
The rugged folding landscapes of functional proteins puts them at risk of misfolding and aggregation.
Serine protease inhibitors, or serpins, are paradigms for this delicate balance between function and
misfolding. Serpins exist in a metastable state that undergoes a major conformational change in
order to inhibit proteases. However, conformational labiality of the native serpin fold renders them
susceptible to misfolding, which underlies misfolding diseases such as α1-antitrypsin deficiency. To
investigate how serpins balance function and folding, we used consensus design to create conserpin,
a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant.
Characterization of its structure, folding and dynamics suggest that consensus design has remodeled
the folding landscape to reconcile competing requirements for stability and function. This approach
may offer general benefits for engineering functional proteins that have risky folding landscapes,
including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein
therapeutics
Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition
Tensor decompositions are used in various data mining applications from
social network to medical applications and are extremely useful in discovering
latent structures or concepts in the data. Many real-world applications are
dynamic in nature and so are their data. To deal with this dynamic nature of
data, there exist a variety of online tensor decomposition algorithms. A
central assumption in all those algorithms is that the number of latent
concepts remains fixed throughout the entire stream. However, this need not be
the case. Every incoming batch in the stream may have a different number of
latent concepts, and the difference in latent concepts from one tensor batch to
another can provide insights into how our findings in a particular application
behave and deviate over time. In this paper, we define "concept" and "concept
drift" in the context of streaming tensor decomposition, as the manifestation
of the variability of latent concepts throughout the stream. Furthermore, we
introduce SeekAndDestroy, an algorithm that detects concept drift in streaming
tensor decomposition and is able to produce results robust to that drift. To
the best of our knowledge, this is the first work that investigates concept
drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy
on synthetic datasets, which exhibit a wide variety of realistic drift. Our
experiments demonstrate the effectiveness of SeekAndDestroy, both in the
detection of concept drift and in the alleviation of its effects, producing
results with similar quality to decomposing the entire tensor in one shot.
Additionally, in real datasets, SeekAndDestroy outperforms other streaming
baselines, while discovering novel useful components.Comment: 16 Pages, Accepted at ECML-PKDD 201
Anxiety and Depression in Adults with Autism Spectrum Disorder: A Systematic Review and Meta-analysis
Adults with autism spectrum disorder (ASD) are thought to be at disproportionate risk of developing mental health comorbidities, with anxiety and depression being considered most prominent amongst these. Yet, no systematic review has been carried out to date to examine rates of both anxiety and depression focusing specifically on adults with ASD. This systematic review and meta-analysis examined the rates of anxiety and depression in adults with ASD and the impact of factors such as assessment methods and presence of comorbid intellectual disability (ID) diagnosis on estimated prevalence rates. Electronic database searches for studies published between January 2000 and September 2017 identified a total of 35 studies, including 30 studies measuring anxiety (n = 26 070; mean age = 30.9, s.d. = 6.2 years) and 29 studies measuring depression (n = 26 117; mean age = 31.1, s.d. = 6.8 years). The pooled estimation of current and lifetime prevalence for adults with ASD were 27% and 42% for any anxiety disorder, and 23% and 37% for depressive disorder. Further analyses revealed that the use of questionnaire measures and the presence of ID may significantly influence estimates of prevalence. The current literature suffers from a high degree of heterogeneity in study method and an overreliance on clinical samples. These results highlight the importance of community-based studies and the identification and inclusion of well-characterized samples to reduce heterogeneity and bias in estimates of prevalence for comorbidity in adults with ASD and other populations with complex psychiatric presentations
Fast Generation of Best Interval Patterns for Nonmonotonic Constraints
International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches
- …
