648 research outputs found
Objective Classification of Galaxy Spectra using the Information Bottleneck Method
A new method for classification of galaxy spectra is presented, based on a
recently introduced information theoretical principle, the `Information
Bottleneck'. For any desired number of classes, galaxies are classified such
that the information content about the spectra is maximally preserved. The
result is classes of galaxies with similar spectra, where the similarity is
determined via a measure of information. We apply our method to approximately
6000 galaxy spectra from the ongoing 2dF redshift survey, and a mock-2dF
catalogue produced by a Cold Dark Matter-based semi-analytic model of galaxy
formation. We find a good match between the mean spectra of the classes found
in the data and in the models. For the mock catalogue, we find that the classes
produced by our algorithm form an intuitively sensible sequence in terms of
physical properties such as colour, star formation activity, morphology, and
internal velocity dispersion. We also show the correlation of the classes with
the projections resulting from a Principal Component Analysis.Comment: submitted to MNRAS, 17 pages, Latex, with 14 figures embedde
Propagation of charged particle waves in a uniform magnetic field
This paper considers the probability density and current distributions
generated by a point-like, isotropic source of monoenergetic charges embedded
into a uniform magnetic field environment. Electron sources of this kind have
been realized in recent photodetachment microscopy experiments. Unlike the
total photocurrent cross section, which is largely understood, the spatial
profiles of charge and current emitted by the source display an unexpected
hierarchy of complex patterns, even though the distributions, apart from
scaling, depend only on a single physical parameter. We examine the electron
dynamics both by solving the quantum problem, i. e., finding the energy Green
function, and from a semiclassical perspective based on the simple cyclotron
orbits followed by the electron. Simulations suggest that the semiclassical
method, which involves here interference between an infinite set of paths,
faithfully reproduces the features observed in the quantum solution, even in
extreme circumstances, and lends itself to an interpretation of some (though
not all) of the rich structure exhibited in this simple problem.Comment: 39 pages, 16 figure
Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery
Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc
Identifying discrete behavioural types: A re-analysis of public goods game contributions by hierarchical clustering
We propose a framework for identifying discrete behavioural types in experimental data. We re-analyse data from six previous studies of public goods voluntary contributions games. Using hierarchical clustering analysis, we construct a typology of behaviour based on a simi- larity measure between strategies. We identify four types with distinct sterotypical behaviours, which together account for about 90% of participants. Compared to previous approaches, our method produces a classification in which different types are more clearly distinguished in terms of strategic behaviour and the resulting economic implications
The structure of the PapD-PapGII pilin complex reveals an open and flexible P5 pocket
P pili are hairlike polymeric structures that mediate binding of uropathogenic Escherichia coli to the surface of the kidney via the PapG adhesin at their tips. PapG is composed of two domains: a lectin domain at the tip of the pilus followed by a pilin domain that comprises the initial polymerizing subunit of the 1,000-plus-subunit heteropolymeric pilus fiber. Prior to assembly, periplasmic pilin domains bind to a chaperone, PapD. PapD mediates donor strand complementation, in which a beta strand of PapD temporarily completes the pilin domain's fold, preventing premature, nonproductive interactions with other pilin subunits and facilitating subunit folding. Chaperone-subunit complexes are delivered to the outer membrane usher where donor strand exchange (DSE) replaces PapD's donated beta strand with an amino-terminal extension on the next incoming pilin subunit. This occurs via a zip-in-zip-out mechanism that initiates at a relatively accessible hydrophobic space termed the P5 pocket on the terminally incorporated pilus subunit. Here, we solve the structure of PapD in complex with the pilin domain of isoform II of PapG (PapGIIp). Our data revealed that PapGIIp adopts an immunoglobulin fold with a missing seventh strand, complemented in parallel by the G1 PapD strand, typical of pilin subunits. Comparisons with other chaperone-pilin complexes indicated that the interactive surfaces are highly conserved. Interestingly, the PapGIIp P5 pocket was in an open conformation, which, as molecular dynamics simulations revealed, switches between an open and a closed conformation due to the flexibility of the surrounding loops. Our study reveals the structural details of the DSE mechanism
On reminder effects, drop-outs and dominance: evidence from an online experiment on charitable giving
We present the results of an experiment that (a) shows the usefulness of screening out drop-outs and (b) tests whether different methods of payment and reminder intervals affect charitable giving. Following a lab session, participants could make online donations to charity for a total duration of three months. Our procedure justifying the exclusion of drop-outs consists in requiring participants to collect payments in person flexibly and as known in advance and as highlighted to them later. Our interpretation is that participants who failed to collect their positive payments under these circumstances are likely not to satisfy dominance. If we restrict the sample to subjects who did not drop out, but not otherwise, reminders significantly increase the overall amount of charitable giving. We also find that weekly reminders are no more effective than monthly reminders in increasing charitable giving, and that, in our three months duration experiment, standing orders do not increase giving relative to one-off donations
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Mortality in very long-stay pediatric intensive care unit patients and incidence of withdrawal of treatment
Background: The mortality for children with prolonged stay in pediatric intensive care units (PICU) is much higher than overall mortality. The incidence of withdrawal or limitation of therapy in this group is unknown. Purpose: To assess mortality and characteristics of children admitted for ≥28 days to our ICU, and to describe the extent to which limitations of care were involved in the terminal phase preceding death. Methods: For the period 2003 to 2005 clinical data were collected retrospectively for children with prolonged stay (defined as ≥28 days) in a medical/surgical PICU of a university children's hospital. Results: In the PICU, 4.4% of the children (116/2,607, equal gender, mean age 29 days) had a prolonged stay. Median (range) stay was 56 (28-546) days. These children accounted for 3% of total admissions and occupied 63% of total admission days. Mortality during admission for this group was fiv
A unified data representation theory for network visualization, ordering and coarse-graining
Representation of large data sets became a key question of many scientific
disciplines in the last decade. Several approaches for network visualization,
data ordering and coarse-graining accomplished this goal. However, there was no
underlying theoretical framework linking these problems. Here we show an
elegant, information theoretic data representation approach as a unified
solution of network visualization, data ordering and coarse-graining. The
optimal representation is the hardest to distinguish from the original data
matrix, measured by the relative entropy. The representation of network nodes
as probability distributions provides an efficient visualization method and, in
one dimension, an ordering of network nodes and edges. Coarse-grained
representations of the input network enable both efficient data compression and
hierarchical visualization to achieve high quality representations of larger
data sets. Our unified data representation theory will help the analysis of
extensive data sets, by revealing the large-scale structure of complex networks
in a comprehensible form.Comment: 13 pages, 5 figure
Differences in social decision-making between proposers and responders during the ultimatum game: an eeg study
The Ultimatum Game (UG) is a typical paradigm to investigate social decision-making. Although the behavior of humans in this task is already well established, the underlying brain processes remain poorly understood. Previous investigations using event-related potentials (ERPs) revealed three major components related to cognitive processes in participants engaged in the responder condition, the early ERP component P2, the feedback-related negativity (FRN) and a late positive wave (late positive component, LPC). However, the comparison of the ERP waveforms between the responder and proposer conditions has never been studied. Therefore, to investigate condition-related electrophysiological changes, we applied the UG paradigm and compared parameters of the P2, LPC and FRN components in twenty healthy participants. For the responder condition, we found a significantly decreased amplitude and delayed latency for the P2 component, whereas the mean amplitudes of the LPC and FRN increased compared to the proposer condition. Additionally, the proposer condition elicited an early component consisting of a negative deflection around 190 ms, in the upward slope of the P2, probably as a result of early conflict-related processing. Using independent component analysis (ICA), we extracted one functional component time-locked to this deflection, and with source reconstruction (LAURA) we found the anterior cingulate cortex (ACC) as one of the underlying sources. Overall, our findings indicate that intensity and time-course of neuronal systems engaged in the decision-making processes diverge between both UG conditions, suggesting differential cognitive processes. Understanding the electrophysiological bases of decision-making and social interactions in controls could be useful to further detect which steps are impaired in psychiatric patients in their ability to attribute mental states (such as beliefs, intents, or desires) to oneself and others. This ability is called mentalizing (also known as theory of mind)
- …
