17 research outputs found
Fast Gaussian Pairwise Constrained Spectral Clustering
International audienceWe consider the problem of spectral clustering with partial supervision in the form of must-link and cannot-link constraints. Such pairwise constraints are common in problems like coreference resolution in natural language processing. The approach developed in this paper is to learn a new representation space for the data together with a dis-tance in this new space. The representation space is obtained through a constraint-driven linear transformation of a spectral embedding of the data. Constraints are expressed with a Gaussian function that locally reweights the similarities in the projected space. A global, non-convex optimization objective is then derived and the model is learned via gradi-ent descent techniques. Our algorithm is evaluated on standard datasets and compared with state of the art algorithms, like [14,18,31]. Results on these datasets, as well on the CoNLL-2012 coreference resolution shared task dataset, show that our algorithm significantly outperforms related approaches and is also much more scalable
Ensemble approach for generalized network dismantling
Finding a set of nodes in a network, whose removal fragments the network
below some target size at minimal cost is called network dismantling problem
and it belongs to the NP-hard computational class. In this paper, we explore
the (generalized) network dismantling problem by exploring the spectral
approximation with the variant of the power-iteration method. In particular, we
explore the network dismantling solution landscape by creating the ensemble of
possible solutions from different initial conditions and a different number of
iterations of the spectral approximation.Comment: 11 Pages, 4 Figures, 4 Table
Proteinortho: Detection of (Co-)orthologs in large-scale analysis
<p>Abstract</p> <p>Background</p> <p>Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.</p> <p>Results</p> <p>The program <monospace>Proteinortho</monospace> described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply <monospace>Proteinortho</monospace> to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.</p> <p>Conclusions</p> <p><monospace>Proteinortho</monospace> significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.</p
The Path Resistance Method for Bounding the Smallest Nontrivial Eigenvalue of a Laplacian
We introduce the path resistance method for lower bounds on the smallest nontrivial
eigenvalue of the Laplacian matrix of a graph. The method is based on viewing the graph
in terms of electrical circuits: it uses clique embeddings to produce
lower bounds on λ2 and
star embeddings to produce lower bounds on the smallest Rayleigh quotient when there
is a zero Dirichlet boundary condition. The method assigns priorities to the paths in the
embedding; we show that, for an unweighted tree T, using uniform priorities for a clique
embedding produces a lower bound on λ2 that is off by at most
an O(log diameter(T)) factor. We show that the best bounds this method can
produce for clique embeddings are the same as for a related method that uses clique embeddings
and edge lengths to produce bounds.</jats:p
The Path Resistance Method for Bounding the Smallest Nontrivial Eigenvalue of a Laplacian
this paper we consider methods based on graph embeddings for estimating the smallest nontrivial eigenvalue of the Laplacian matrix representation of a graph. The Laplacian is one of many ways to view a graph as a matrix; it is de ned as follows: Let G = (V; E) be an undirected graph with vertices v 1 ; : : : ; vn . Then the Laplacian of G is an n n matrix L such that l ij = 8 degree(v i ) if i = j 1 if (i; j) 2 E 0 otherwise A version of this paper originally appeared in the Proceedings of the Eighth Annual ACM/SIAM Symposium on Discrete Algorithm
The Path Resistance Method for Bounding the Smallest Nontrivial Eigenvalue of a Laplacian
PageRank and random walks on graphs
Dedicated to Lovász on the ocassion of his sixtieth birthday. Abstract. We examine the relationship between PageRank and several invariants occurring in the study of random walks and electrical networks. We consider a generalized version of hitting time and effective resistance with an additional parameter which controls the ‘speed ’ of diffusion. We will establish their connection with PageRank. Through these connections, a combinatorial interpretation of PageRank is given in terms of rooted spanning forests by using a generalized version of the matrix-tree theorem. Using PageRank, we will illustrate that the generalized hitting time leads to finding sparse cuts and efficient approximation algorithms for PageRank can be used for approximating hitting time and effective resistance.
Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow
Abstract. We present initial results from the first empirical evaluation of a graph partitioning algorithm inspired by the Arora-Rao-Vazirani algorithm of [5], which combines spectral and flow methods in a novel way. We have studied the parameter space of this new algorithm, e.g., examining the extent to which different parameter settings interpolate between a more spectral and a more flow-based approach, and we have compared results of this algorithm to results from previously known and optimized algorithms such as Metis.
