149 research outputs found
Impact of lexical and sentiment factors on the popularity of scientific papers
We investigate how textual properties of scientific papers relate to the
number of citations they receive. Our main finding is that correlations are
non-linear and affect differently most-cited and typical papers. For instance,
we find that in most journals short titles correlate positively with citations
only for the most cited papers, for typical papers the correlation is in most
cases negative. Our analysis of 6 different factors, calculated both at the
title and abstract level of 4.3 million papers in over 1500 journals, reveals
the number of authors, and the length and complexity of the abstract, as having
the strongest (positive) influence on the number of citations.Comment: 9 pages, 3 figures, 3 table
Recurrence time analysis, long-term correlations, and extreme events
The recurrence times between extreme events have been the central point of
statistical analyses in many different areas of science. Simultaneously, the
Poincar\'e recurrence time has been extensively used to characterize nonlinear
dynamical systems. We compare the main properties of these statistical methods
pointing out their consequences for the recurrence analysis performed in time
series. In particular, we analyze the dependence of the mean recurrence time
and of the recurrence time statistics on the probability density function, on
the interval whereto the recurrences are observed, and on the temporal
correlations of time series. In the case of long-term correlations, we verify
the validity of the stretched exponential distribution, which is uniquely
defined by the exponent , at the same time showing that it is
restricted to the class of linear long-term correlated processes. Simple
transformations are able to modify the correlations of time series leading to
stretched exponentials recurrence time statistics with different ,
which shows a lack of invariance under the change of observables.Comment: 9 pages, 7 figure
Stochastic model for the vocabulary growth in natural languages
We propose a stochastic model for the number of different words in a given
database which incorporates the dependence on the database size and historical
changes. The main feature of our model is the existence of two different
classes of words: (i) a finite number of core-words which have higher frequency
and do not affect the probability of a new word to be used; and (ii) the
remaining virtually infinite number of noncore-words which have lower frequency
and once used reduce the probability of a new word to be used in the future.
Our model relies on a careful analysis of the google-ngram database of books
published in the last centuries and its main consequence is the generalization
of Zipf's and Heaps' law to two scaling regimes. We confirm that these
generalizations yield the best simple description of the data among generic
descriptive models and that the two free parameters depend only on the language
but not on the database. From the point of view of our model the main change on
historical time scales is the composition of the specific words included in the
finite list of core-words, which we observe to decay exponentially in time with
a rate of approximately 30 words per year for English.Comment: corrected typos and errors in reference list; 10 pages text, 15 pages
supplemental material; to appear in Physical Review
Predictability of extreme events in social media
It is part of our daily social-media experience that seemingly ordinary items
(videos, news, publications, etc.) unexpectedly gain an enormous amount of
attention. Here we investigate how unexpected these events are. We propose a
method that, given some information on the items, quantifies the predictability
of events, i.e., the potential of identifying in advance the most successful
items defined as the upper bound for the quality of any prediction based on the
same information. Applying this method to different data, ranging from views in
YouTube videos to posts in Usenet discussion groups, we invariantly find that
the predictability increases for the most extreme events. This indicates that,
despite the inherently stochastic collective dynamics of users, efficient
prediction is possible for the most extreme events.Comment: 13 pages, 3 figure
Temporal-varying failures of nodes in networks
We consider networks in which random walkers are removed because of the
failure of specific nodes. We interpret the rate of loss as a measure of the
importance of nodes, a notion we denote as failure-centrality. We show that the
degree of the node is not sufficient to determine this measure and that, in a
first approximation, the shortest loops through the node have to be taken into
account. We propose approximations of the failure-centrality which are valid
for temporal-varying failures and we dwell on the possibility of externally
changing the relative importance of nodes in a given network, by exploiting the
interference between the loops of a node and the cycles of the temporal pattern
of failures. In the limit of long failure cycles we show analytically that the
escape in a node is larger than the one estimated from a stochastic failure
with the same failure probability. We test our general formalism in two
real-world networks (air-transportation and e-mail users) and show how
communities lead to deviations from predictions for failures in hubs.Comment: 7 pages, 3 figure
Scaling laws and fluctuations in the statistics of word frequencies
In this paper we combine statistical analysis of large text databases and
simple stochastic models to explain the appearance of scaling laws in the
statistics of word frequencies. Besides the sublinear scaling of the vocabulary
size with database size (Heaps' law), here we report a new scaling of the
fluctuations around this average (fluctuation scaling analysis). We explain
both scaling laws by modeling the usage of words by simple stochastic processes
in which the overall distribution of word-frequencies is fat tailed (Zipf's
law) and the frequency of a single word is subject to fluctuations across
documents (as in topic models). In this framework, the mean and the variance of
the vocabulary size can be expressed as quenched averages, implying that: i)
the inhomogeneous dissemination of words cause a reduction of the average
vocabulary size in comparison to the homogeneous case, and ii) correlations in
the co-occurrence of words lead to an increase in the variance and the
vocabulary size becomes a non-self-averaging quantity. We address the
implications of these observations to the measurement of lexical richness. We
test our results in three large text databases (Google-ngram, Enlgish
Wikipedia, and a collection of scientific articles).Comment: 19 pages, 4 figure
Stochastic perturbations in open chaotic systems: random versus noisy maps
We investigate the effects of random perturbations on fully chaotic open
systems. Perturbations can be applied to each trajectory independently (white
noise) or simultaneously to all trajectories (random map). We compare these two
scenarios by generalizing the theory of open chaotic systems and introducing a
time-dependent conditionally-map-invariant measure. For the same perturbation
strength we show that the escape rate of the random map is always larger than
that of the noisy map. In random maps we show that the escape rate and
dimensions of the relevant fractal sets often depend nonmonotonically on
the intensity of the random perturbation. We discuss the accuracy (bias) and
precision (variance) of finite-size estimators of and , and show
that the improvement of the precision of the estimations with the number of
trajectories is extremely slow (). We also argue that the
finite-size estimators are typically biased. General theoretical results
are combined with analytical calculations and numerical simulations in
area-preserving baker maps.Comment: 12 pages, 3 figures, 1 table, manuscript submitted to Physical Review
Quantum signatures of classical multifractal measures
A clear signature of classical chaoticity in the quantum regime is the
fractal Weyl law, which connects the density of eigenstates to the dimension
of the classical invariant set of open systems. Quantum systems of
interest are often {\it partially} open (e.g., cavities in which trajectories
are partially reflected/absorbed). In the corresponding classical systems
is trivial (equal to the phase-space dimension), and the fractality is
manifested in the (multifractal) spectrum of R\'enyi dimensions . In this
paper we investigate the effect of such multifractality on the Weyl law. Our
numerical simulations in area-preserving maps show for a wide range of
configurations and system sizes that (i) the Weyl law is governed by a
dimension different from and (ii) the observed dimension oscillates as
a function of and other relevant parameters. We propose a classical model
which considers an undersampled measure of the chaotic invariant set, explains
our two observations, and predicts that the Weyl law is governed by a
non-trivial dimension in the semi-classical limit
Stickiness in mushroom billiards
We investigate dynamical properties of chaotic trajectories in mushroom
billiards. These billiards present a well-defined simple border between a
single regular region and a single chaotic component. We find that the
stickiness of chaotic trajectories near the border of the regular region occurs
through an infinite number of marginally unstable periodic orbits. These orbits
have zero measure, thus not affecting the ergodicity of the chaotic region.
Notwithstanding, they govern the main dynamical properties of the system. In
particular, we show that the marginally unstable periodic orbits explain the
periodicity and the power-law behavior with exponent observed in the
distribution of recurrence times.Comment: 7 pages, 6 figures (corrected version with a new figure
- …
