568 research outputs found
Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics
Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process
A latent variable ranking model for content-based retrieval
34th European Conference on IR Research, ECIR 2012, Barcelona, Spain, April 1-5, 2012. ProceedingsSince their introduction, ranking SVM models [11] have become a powerful tool for training content-based retrieval systems. All we need for training a model are retrieval examples in the form of triplet constraints, i.e. examples specifying that relative to some query, a database item a should be ranked higher than database item b. These types of constraints could be obtained from feedback of users of the retrieval system. Most previous ranking models learn either a global combination of elementary similarity functions or a combination defined with respect to a single database item. Instead, we propose a “coarse to fine” ranking model where given a query we first compute a distribution over “coarse” classes and then use the linear combination that has been optimized for queries of that class. These coarse classes are hidden and need to be induced by the training algorithm. We propose a latent variable ranking model that induces both the latent classes and the weights of the linear combination for each class from ranking triplets. Our experiments over two large image datasets and a text retrieval dataset show the advantages of our model over learning a global combination as well as a combination for each test point (i.e. transductive setting). Furthermore, compared to the transductive approach our model has a clear computational advantages since it does not need to be retrained for each test query.Spanish Ministry of Science and Innovation (JCI-2009-04240)EU PASCAL2 Network of Excellence (FP7-ICT-216886
Determination of step--edge barriers to interlayer transport from surface morphology during the initial stages of homoepitaxial growth
We use analytic formulae obtained from a simple model of crystal growth by
molecular--beam epitaxy to determine step--edge barriers to interlayer
transport. The method is based on information about the surface morphology at
the onset of nucleation on top of first--layer islands in the submonolayer
coverage regime of homoepitaxial growth. The formulae are tested using kinetic
Monte Carlo simulations of a solid--on--solid model and applied to estimate
step--edge barriers from scanning--tunneling microscopy data on initial stages
of Fe(001), Pt(111), and Ag(111) homoepitaxy.Comment: 4 pages, a Postscript file, uuencoded and compressed. Physical Review
B, Rapid Communications, in press
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Pseudo Goldstone Bosons Phenomenology in Minimal Walking Technicolor
We construct the non-linear realized Lagrangian for the Goldstone Bosons
associated to the breaking pattern of SU(4) to SO(4). This pattern is expected
to occur in any Technicolor extension of the standard model featuring two Dirac
fermions transforming according to real representations of the underlying gauge
group. We concentrate on the Minimal Walking Technicolor quantum number
assignments with respect to the standard model symmetries. We demonstrate that
for, any choice of the quantum numbers, consistent with gauge and Witten
anomalies the spectrum of the pseudo Goldstone Bosons contains electrically
doubly charged states which can be discovered at the Large Hadron Collider.Comment: 25 pages, 5 figure
Classification of protein interaction sentences via gaussian processes
The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption
From dynamical scaling to local scale-invariance: a tutorial
Dynamical scaling arises naturally in various many-body systems far from
equilibrium. After a short historical overview, the elements of possible
extensions of dynamical scaling to a local scale-invariance will be introduced.
Schr\"odinger-invariance, the most simple example of local scale-invariance,
will be introduced as a dynamical symmetry in the Edwards-Wilkinson
universality class of interface growth. The Lie algebra construction, its
representations and the Bargman superselection rules will be combined with
non-equilibrium Janssen-de Dominicis field-theory to produce explicit
predictions for responses and correlators, which can be compared to the results
of explicit model studies.
At the next level, the study of non-stationary states requires to go over,
from Schr\"odinger-invariance, to ageing-invariance. The ageing algebra admits
new representations, which acts as dynamical symmetries on more general
equations, and imply that each non-equilibrium scaling operator is
characterised by two distinct, independent scaling dimensions. Tests of
ageing-invariance are described, in the Glauber-Ising and spherical models of a
phase-ordering ferromagnet and the Arcetri model of interface growth.Comment: 1+ 23 pages, 2 figures, final for
The Dynamics of a Rigid Body in Potential Flow with Circulation
We consider the motion of a two-dimensional body of arbitrary shape in a
planar irrotational, incompressible fluid with a given amount of circulation
around the body. We derive the equations of motion for this system by
performing symplectic reduction with respect to the group of volume-preserving
diffeomorphisms and obtain the relevant Poisson structures after a further
Poisson reduction with respect to the group of translations and rotations. In
this way, we recover the equations of motion given for this system by Chaplygin
and Lamb, and we give a geometric interpretation for the Kutta-Zhukowski force
as a curvature-related effect. In addition, we show that the motion of a rigid
body with circulation can be understood as a geodesic flow on a central
extension of the special Euclidian group SE(2), and we relate the cocycle in
the description of this central extension to a certain curvature tensor.Comment: 28 pages, 2 figures; v2: typos correcte
Functional diversity of chemokines and chemokine receptors in response to viral infection of the central nervous system.
Encounters with neurotropic viruses result in varied outcomes ranging from encephalitis, paralytic poliomyelitis or other serious consequences to relatively benign infection. One of the principal factors that control the outcome of infection is the localized tissue response and subsequent immune response directed against the invading toxic agent. It is the role of the immune system to contain and control the spread of virus infection in the central nervous system (CNS), and paradoxically, this response may also be pathologic. Chemokines are potent proinflammatory molecules whose expression within virally infected tissues is often associated with protection and/or pathology which correlates with migration and accumulation of immune cells. Indeed, studies with a neurotropic murine coronavirus, mouse hepatitis virus (MHV), have provided important insight into the functional roles of chemokines and chemokine receptors in participating in various aspects of host defense as well as disease development within the CNS. This chapter will highlight recent discoveries that have provided insight into the diverse biologic roles of chemokines and their receptors in coordinating immune responses following viral infection of the CNS
- …
