23,251 research outputs found
On the role of words in the network structure of texts: application to authorship attribution
Well-established automatic analyses of texts mainly consider frequencies of
linguistic units, e.g. letters, words and bigrams, while methods based on
co-occurrence networks consider the structure of texts regardless of the nodes
label (i.e. the words semantics). In this paper, we reconcile these distinct
viewpoints by introducing a generalized similarity measure to compare texts
which accounts for both the network structure of texts and the role of
individual words in the networks. We use the similarity measure for authorship
attribution of three collections of books, each composed of 8 authors and 10
books per author. High accuracy rates were obtained with typical values from
90% to 98.75%, much higher than with the traditional the TF-IDF approach for
the same collections. These accuracies are also higher than taking only the
topology of networks into account. We conclude that the different properties of
specific words on the macroscopic scale structure of a whole text are as
relevant as their frequency of appearance; conversely, considering the identity
of nodes brings further knowledge about a piece of text represented as a
network
Models based on Mittag-Leffler functions for anomalous relaxation in dielectrics
We revisit the Mittag-Leffler functions of a real variable , with one, two
and three order-parameters , as far as their Laplace
transform pairs and complete monotonicty properties are concerned. These
functions, subjected to the requirement to be completely monotone for ,
are shown to be suitable models for non--Debye relaxation phenomena in
dielectrics including as particular cases the classical models referred to as
Cole-Cole, Davidson-Cole and Havriliak-Negami. We show 3D plots of the response
functions and of the corresponding spectral distributions, keeping fixed one of
the three order-parameters.Comment: 22 pages, 6 figures, Second Revised Versio
Text authorship identified using the dynamics of word co-occurrence networks
The identification of authorship in disputed documents still requires human
expertise, which is now unfeasible for many tasks owing to the large volumes of
text and authors in practical applications. In this study, we introduce a
methodology based on the dynamics of word co-occurrence networks representing
written texts to classify a corpus of 80 texts by 8 authors. The texts were
divided into sections with equal number of linguistic tokens, from which time
series were created for 12 topological metrics. The series were proven to be
stationary (p-value>0.05), which permits to use distribution moments as
learning attributes. With an optimized supervised learning procedure using a
Radial Basis Function Network, 68 out of 80 texts were correctly classified,
i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in
purely dynamic network metrics were found to characterize authorship, thus
opening the way for the description of texts in terms of small evolving
networks. Moreover, the approach introduced allows for comparison of texts with
diverse characteristics in a simple, fast fashion
- …
