657 research outputs found
Observations on the Dynamic Evolution of Peer-to-Peer Networks
A fundamental theoretical challenge in peer-to-peer systems is proving statements about the evolution of the system while nodes are continuously joining and leaving. Because the system will operate for an infinite time, performance measures based on runtime are uninformative; instead, we must study the rate at which nodes consume resources to maintain the system state
Do Diffusion Protocols Govern Cascade Growth?
Large cascades can develop in online social networks as people share
information with one another. Though simple reshare cascades have been studied
extensively, the full range of cascading behaviors on social media is much more
diverse. Here we study how diffusion protocols, or the social exchanges that
enable information transmission, affect cascade growth, analogous to the way
communication protocols define how information is transmitted from one point to
another. Studying 98 of the largest information cascades on Facebook, we find a
wide range of diffusion protocols - from cascading reshares of images, which
use a simple protocol of tapping a single button for propagation, to the ALS
Ice Bucket Challenge, whose diffusion protocol involved individuals creating
and posting a video, and then nominating specific others to do the same. We
find recurring classes of diffusion protocols, and identify two key
counterbalancing factors in the construction of these protocols, with
implications for a cascade's growth: the effort required to participate in the
cascade, and the social cost of staying on the sidelines. Protocols requiring
greater individual effort slow down a cascade's propagation, while those
imposing a greater social cost of not participating increase the cascade's
adoption likelihood. The predictability of transmission also varies with
protocol. But regardless of mechanism, the cascades in our analysis all have a
similar reproduction number ( 1.8), meaning that lower rates of
exposure can be offset with higher per-exposure rates of adoption. Last, we
show how a cascade's structure can not only differentiate these protocols, but
also be modeled through branching processes. Together, these findings provide a
framework for understanding how a wide variety of information cascades can
achieve substantial adoption across a network.Comment: ICWSM 201
Fast matrix computations for pair-wise and column-wise commute times and Katz scores
We first explore methods for approximating the commute time and Katz score
between a pair of nodes. These methods are based on the approach of matrices,
moments, and quadrature developed in the numerical linear algebra community.
They rely on the Lanczos process and provide upper and lower bounds on an
estimate of the pair-wise scores. We also explore methods to approximate the
commute times and Katz scores from a node to all other nodes in the graph.
Here, our approach for the commute times is based on a variation of the
conjugate gradient algorithm, and it provides an estimate of all the diagonals
of the inverse of a matrix. Our technique for the Katz scores is based on
exploiting an empirical localization property of the Katz matrix. We adopt
algorithms used for personalized PageRank computing to these Katz scores and
theoretically show that this approach is convergent. We evaluate these methods
on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our
results show that our pair-wise commute time method and column-wise Katz
algorithm both have attractive theoretical properties and empirical
performance.Comment: 35 pages, journal version of
http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for
publication. Please see
http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for
supplemental code
Navigability is a Robust Property
The Small World phenomenon has inspired researchers across a number of
fields. A breakthrough in its understanding was made by Kleinberg who
introduced Rank Based Augmentation (RBA): add to each vertex independently an
arc to a random destination selected from a carefully crafted probability
distribution. Kleinberg proved that RBA makes many networks navigable, i.e., it
allows greedy routing to successfully deliver messages between any two vertices
in a polylogarithmic number of steps. We prove that navigability is an inherent
property of many random networks, arising without coordination, or even
independence assumptions
An algorithmic approach to social networks
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 109-120).Social networks consist of a set of individuals and some form of social relationship that ties the individuals together. In this thesis, we use algorithmic techniques to study three aspects of social networks: (1) we analyze the "small-world" phenomenon by examining the geographic patterns of friendships in a large-scale social network, showing how this linkage pattern can itself explain the small-world results; (2) using existing patterns of friendship in a social network and a variety of graph-theoretic techniques, we show how to predict new relationships that will form in the network in the near future; and (3) we show how to infer social connections over which information flows in a network, by examining the times at which individuals in the network exhibit certain pieces of information, or interest in certain topics. Our approach is simultaneously theoretical and data-driven, and our results are based upon real experiments on real social-network data in addition to theoretical investigations of mathematical models of social networks.by David Liben-Nowell.Ph.D
The evolution of interdisciplinarity in physics research
Science, being a social enterprise, is subject to fragmentation into groups
that focus on specialized areas or topics. Often new advances occur through
cross-fertilization of ideas between sub-fields that otherwise have little
overlap as they study dissimilar phenomena using different techniques. Thus to
explore the nature and dynamics of scientific progress one needs to consider
the large-scale organization and interactions between different subject areas.
Here, we study the relationships between the sub-fields of Physics using the
Physics and Astronomy Classification Scheme (PACS) codes employed for
self-categorization of articles published over the past 25 years (1985-2009).
We observe a clear trend towards increasing interactions between the different
sub-fields. The network of sub-fields also exhibits core-periphery
organization, the nucleus being dominated by Condensed Matter and General
Physics. However, over time Interdisciplinary Physics is steadily increasing
its share in the network core, reflecting a shift in the overall trend of
Physics research.Comment: Published version, 10 pages, 8 figures + Supplementary Informatio
Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions
Algorithms to find optimal alignments among strings, or to find a
parsimonious summary of a collection of strings, are well studied in a variety
of contexts, addressing a wide range of interesting applications. In this
paper, we consider chain letters, which contain a growing sequence of
signatories added as the letter propagates. The unusual constellation of
features exhibited by chain letters (one-ended growth, divergence, and
mutation) make their propagation, and thus the corresponding reconstruction
problem, both distinctive and rich. Here, inspired by these chain letters, we
formally define the problem of computing an optimal summary of a set of
diverging string sequences. From a collection of these sequences of names, with
each sequence noisily corresponding to a branch of the unknown tree
representing the letter's true dissemination, can we efficiently and accurately
reconstruct a tree ? In this paper, we give efficient exact
algorithms for this summarization problem when the number of sequences is
small; for larger sets of sequences, we prove hardness and provide an efficient
heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen
to emulate real chain letters, showing that our algorithm is competitive with
or better than previous approaches, and that it also comes close to finding the
true trees in these synthetic datasets.Comment: 18 pages, 6 figures. Accepted to Combinatorial Pattern Matching (CPM)
202
From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles
The inference of network topologies from relational data is an important
problem in data analysis. Exemplary applications include the reconstruction of
social ties from data on human interactions, the inference of gene
co-expression networks from DNA microarray data, or the learning of semantic
relationships based on co-occurrences of words in documents. Solving these
problems requires techniques to infer significant links in noisy relational
data. In this short paper, we propose a new statistical modeling framework to
address this challenge. It builds on generalized hypergeometric ensembles, a
class of generative stochastic models that give rise to analytically tractable
probability spaces of directed, multi-edge graphs. We show how this framework
can be used to assess the significance of links in noisy relational data. We
illustrate our method in two data sets capturing spatio-temporal proximity
relations between actors in a social system. The results show that our
analytical framework provides a new approach to infer significant links from
relational data, with interesting perspectives for the mining of data on social
systems.Comment: 10 pages, 8 figures, accepted at SocInfo201
Collaborative filtering with diffusion-based similarity on tripartite graphs
Collaborative tags are playing more and more important role for the
organization of information systems. In this paper, we study a personalized
recommendation model making use of the ternary relations among users, objects
and tags. We propose a measure of user similarity based on his preference and
tagging information. Two kinds of similarities between users are calculated by
using a diffusion-based process, which are then integrated for recommendation.
We test the proposed method in a standard collaborative filtering framework
with three metrics: ranking score, Recall and Precision, and demonstrate that
it performs better than the commonly used cosine similarity.Comment: 8 pages, 4 figures, 1 tabl
Risk-Averse Matchings over Uncertain Graph Databases
A large number of applications such as querying sensor networks, and
analyzing protein-protein interaction (PPI) networks, rely on mining uncertain
graph and hypergraph databases. In this work we study the following problem:
given an uncertain, weighted (hyper)graph, how can we efficiently find a
(hyper)matching with high expected reward, and low risk?
This problem naturally arises in the context of several important
applications, such as online dating, kidney exchanges, and team formation. We
introduce a novel formulation for finding matchings with maximum expected
reward and bounded risk under a general model of uncertain weighted
(hyper)graphs that we introduce in this work. Our model generalizes
probabilistic models used in prior work, and captures both continuous and
discrete probability distributions, thus allowing to handle privacy related
applications that inject appropriately distributed noise to (hyper)edge
weights. Given that our optimization problem is NP-hard, we turn our attention
to designing efficient approximation algorithms. For the case of uncertain
weighted graphs, we provide a -approximation algorithm, and a
-approximation algorithm with near optimal run time. For the case
of uncertain weighted hypergraphs, we provide a
-approximation algorithm, where is the rank of the
hypergraph (i.e., any hyperedge includes at most nodes), that runs in
almost (modulo log factors) linear time.
We complement our theoretical results by testing our approximation algorithms
on a wide variety of synthetic experiments, where we observe in a controlled
setting interesting findings on the trade-off between reward, and risk. We also
provide an application of our formulation for providing recommendations of
teams that are likely to collaborate, and have high impact.Comment: 25 page
- …
