1,086 research outputs found
A high-reproducibility and high-accuracy method for automated topic classification
Much of human knowledge sits in large databases of unstructured text.
Leveraging this knowledge requires algorithms that extract and record metadata
on unstructured text documents. Assigning topics to documents will enable
intelligent search, statistical characterization, and meaningful
classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in
topic classification. Here, we perform a systematic theoretical and numerical
analysis that demonstrates that current optimization techniques for LDA often
yield results which are not accurate in inferring the most suitable model
parameters. Adapting approaches for community detection in networks, we propose
a new algorithm which displays high-reproducibility and high-accuracy, and also
has high computational efficiency. We apply it to a large set of documents in
the English Wikipedia and reveal its hierarchical structure. Our algorithm
promises to make "big data" text analysis systems more reliable.Comment: 23 pages, 24 figure
Transitions and Probes in Turbulent Helium
Previous analysis of a Paris turbulence experiment \cite{zoc94,tab95} shows a
transition at the Taylor Reynolds number \rel \approx 700. Here correlation
function data is analyzed which gives further evidence for this transition. It
is seen in both the power spectrum and in structure function measurements. Two
possible explanations may be offered for this observed transition: that it is
intrinsic to the turbulence flow in this closed box experiment or that it is an
effect of a change in the flow around the anemometer. We particularly examine a
pair of ``probe effects''. The first is a thermal boundary layer which does
exist about the probe and does limit the probe response, particularly at high
frequencies. Arguments based on simulations of the response and upon
observations of dissipation suggests that this effect is only crucial beyond
\rel\approx 2000. The second effect is produced by vortex shedding behind the
probe. This has been seen to produce a large modification in some of the power
spectra for large \rel. It might also complicate the interpretation of the
experimental results. However, there seems to be a remaining range of data for
\rel < 1300 uncomplicated by these effects, and which are thus suggestive of
an intrinsic transition.Comment: uuencoded .ps files. submitted to PRE. 12 figures are sent upon
request to jane wang ([email protected]
Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms
Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. Although 58 genomic regions have been associated with CAD thus far, most of the heritability is unexplained, indicating that additional susceptibility loci await identification. An efficient discovery strategy may be larger-scale evaluation of promising associations suggested by genome-wide association studies (GWAS). Hence, we genotyped 56,309 participants using a targeted gene array derived from earlier GWAS results and performed meta-analysis of results with 194,427 participants previously genotyped, totaling 88,192 CAD cases and 162,544 controls. We identified 25 new SNP-CAD associations (P < 5 × 10(-8), in fixed-effects meta-analysis) from 15 genomic regions, including SNPs in or near genes involved in cellular adhesion, leukocyte migration and atherosclerosis (PECAM1, rs1867624), coagulation and inflammation (PROCR, rs867186 (p.Ser219Gly)) and vascular smooth muscle cell differentiation (LMOD1, rs2820315). Correlation of these regions with cell-type-specific gene expression and plasma protein levels sheds light on potential disease mechanisms
CogBench: a large language model walks into a psychology lab
Large language models (LLMs) have significantly advanced the field of
artificial intelligence. Yet, evaluating them comprehensively remains
challenging. We argue that this is partly due to the predominant focus on
performance metrics in most benchmarks. This paper introduces CogBench, a
benchmark that includes ten behavioral metrics derived from seven cognitive
psychology experiments. This novel approach offers a toolkit for phenotyping
LLMs' behavior. We apply CogBench to 35 LLMs, yielding a rich and diverse
dataset. We analyze this data using statistical multilevel modeling techniques,
accounting for the nested dependencies among fine-tuned versions of specific
LLMs. Our study highlights the crucial role of model size and reinforcement
learning from human feedback (RLHF) in improving performance and aligning with
human behavior. Interestingly, we find that open-source models are less
risk-prone than proprietary models and that fine-tuning on code does not
necessarily enhance LLMs' behavior. Finally, we explore the effects of
prompt-engineering techniques. We discover that chain-of-thought prompting
improves probabilistic reasoning, while take-a-step-back prompting fosters
model-based behaviors
UBVRI Light Curves of 44 Type Ia Supernovae
We present UBVRI photometry of 44 type-Ia supernovae (SN Ia) observed from
1997 to 2001 as part of a continuing monitoring campaign at the Fred Lawrence
Whipple Observatory of the Harvard-Smithsonian Center for Astrophysics. The
data set comprises 2190 observations and is the largest homogeneously observed
and reduced sample of SN Ia to date, nearly doubling the number of
well-observed, nearby SN Ia with published multicolor CCD light curves. The
large sample of U-band photometry is a unique addition, with important
connections to SN Ia observed at high redshift. The decline rate of SN Ia
U-band light curves correlates well with the decline rate in other bands, as
does the U-B color at maximum light. However, the U-band peak magnitudes show
an increased dispersion relative to other bands even after accounting for
extinction and decline rate, amounting to an additional ~40% intrinsic scatter
compared to B-band.Comment: 84 authors, 71 pages, 51 tables, 10 figures. Accepted for publication
in the Astronomical Journal. Version with high-res figures and electronic
data at http://astron.berkeley.edu/~saurabh/cfa2snIa
Can One Trust Quantum Simulators?
Various fundamental phenomena of strongly-correlated quantum systems such as
high- superconductivity, the fractional quantum-Hall effect, and quark
confinement are still awaiting a universally accepted explanation. The main
obstacle is the computational complexity of solving even the most simplified
theoretical models that are designed to capture the relevant quantum
correlations of the many-body system of interest. In his seminal 1982 paper
[Int. J. Theor. Phys. 21, 467], Richard Feynman suggested that such models
might be solved by "simulation" with a new type of computer whose constituent
parts are effectively governed by a desired quantum many-body dynamics.
Measurements on this engineered machine, now known as a "quantum simulator,"
would reveal some unknown or difficult to compute properties of a model of
interest. We argue that a useful quantum simulator must satisfy four
conditions: relevance, controllability, reliability, and efficiency. We review
the current state of the art of digital and analog quantum simulators. Whereas
so far the majority of the focus, both theoretically and experimentally, has
been on controllability of relevant models, we emphasize here the need for a
careful analysis of reliability and efficiency in the presence of
imperfections. We discuss how disorder and noise can impact these conditions,
and illustrate our concerns with novel numerical simulations of a paradigmatic
example: a disordered quantum spin chain governed by the Ising model in a
transverse magnetic field. We find that disorder can decrease the reliability
of an analog quantum simulator of this model, although large errors in local
observables are introduced only for strong levels of disorder. We conclude that
the answer to the question "Can we trust quantum simulators?" is... to some
extent.Comment: 20 pages. Minor changes with respect to version 2 (some additional
explanations, added references...
Propagating compaction bands in confined compression of snow
Some materials are strong in response to a slowly applied deformation, yet weak when subject to rapid deformations—a materials property known as strain-rate softening1. Snow exhibits such behaviour: it is comparatively strong at low deformation rates, where it is quasi-plastic, but weak at high rates, where it deforms in a quasi-brittle manner2. During deformation, strain-rate-softening materials ranging from metals3, 4 to micellar systems5 exhibit complex spatio-temporal deformation patterns, including regular or chaotic deformation-rate oscillations and travelling deformation waves6. Here we report a systematic investigation of such phenomena in snow and show that snow can deform with the formation and propagation of localized deformation bands accompanied by oscillations of the driving force. We propose a model that accounts for these observations. Our findings demonstrate that in snow, strain localization can occur even in initially homogeneous samples deforming under homogeneous loads
- …
