Search CORE

4,469 research outputs found

A bagging SVM to learn from positive and unlabeled examples

Author: Mordelet Fantine
Vert Jean-Philippe
Publication venue
Publication date: 19/07/2010
Field of study

We consider the problem of learning a binary classifier from a training set of positive and unlabeled examples, both in the inductive and in the transductive setting. This problem, often referred to as \emph{PU learning}, differs from the standard supervised classification problem by the lack of negative examples in the training set. It corresponds to an ubiquitous situation in many applications such as information retrieval or gene ranking, when we have identified a set of data of interest sharing a particular property, and we wish to automatically retrieve additional data sharing the same property among a large and easily available pool of unlabeled data. We propose a conceptually simple method, akin to bagging, to approach both inductive and transductive PU learning problems, by converting them into series of supervised binary classification problems discriminating the known positive examples from random subsamples of the unlabeled set. We empirically demonstrate the relevance of the method on simulated and real data, where it performs at least as well as existing methods while being faster

arXiv.org e-Print Archive

HAL-MINES ParisTech

Kernel matrix regression

Author: Vert Jean-Philippe
Yamanishi Yoshihiro
Publication venue
Publication date: 26/02/2007
Field of study

We address the problem of filling missing entries in a kernel Gram matrix, given a related full Gram matrix. We attack this problem from the viewpoint of regression, assuming that the two kernel matrices can be considered as explanatory variables and response variables, respectively. We propose a variant of the regression model based on the underlying features in the reproducing kernel Hilbert space by modifying the idea of kernel canonical correlation analysis, and we estimate the missing entries by fitting this model to the existing samples. We obtain promising experimental results on gene network inference and protein 3D structure prediction from genomic datasets. We also discuss the relationship with the em-algorithm based on information geometry

arXiv.org e-Print Archive

HAL-MINES ParisTech

Joint segmentation of many aCGH profiles using fast group LARS

Author: Bleakley Kevin
Vert Jean-Philippe
Publication venue
Publication date: 05/10/2009
Field of study

Array-Based Comparative Genomic Hybridization (aCGH) is a method used to search for genomic regions with copy numbers variations. For a given aCGH profile, one challenge is to accurately segment it into regions of constant copy number. Subjects sharing the same disease status, for example a type of cancer, often have aCGH profiles with similar copy number variations, due to duplications and deletions relevant to that particular disease. We introduce a constrained optimization algorithm that jointly segments aCGH profiles of many subjects. It simultaneously penalizes the amount of freedom the set of profiles have to jump from one level of constant copy number to another, at genomic locations known as breakpoints. We show that breakpoints shared by many different profiles tend to be found first by the algorithm, even in the presence of significant amounts of noise. The algorithm can be formulated as a group LARS problem. We propose an extremely fast way to find the solution path, i.e., a sequence of shared breakpoints in order of importance. For no extra cost the algorithm smoothes all of the aCGH profiles into piecewise-constant regions of equal copy number, giving low-dimensional versions of the original data. These can be shown for all profiles on a single graph, allowing for intuitive visual interpretation. Simulations and an implementation of the algorithm on bladder cancer aCGH profiles are provided

arXiv.org e-Print Archive

HAL-MINES ParisTech

Graph kernels based on tree patterns for molecules

Author: Mahé Pierre
Vert Jean-Philippe
Publication venue
Publication date: 15/09/2006
Field of study

Motivated by chemical applications, we revisit and extend a family of positive definite kernels for graphs based on the detection of common subtrees, initially proposed by Ramon et al. (2003). We propose new kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs. This parameter allows to smoothly interpolate between classical graph kernels based on the count of common walks, on the one hand, and kernels that emphasize the detection of large common subtrees, on the other hand. We also propose two modular extensions to this formulation. The first extension increases the number of subtrees that define the feature space, and the second one removes noisy features from the graph representations. We validate experimentally these new kernels on binary classification tasks consisting in discriminating toxic and non-toxic molecules with support vector machines

arXiv.org e-Print Archive

HAL-MINES ParisTech

The group fused Lasso for multiple change-point detection

Author: Bleakley Kevin
Vert Jean-Philippe
Publication venue
Publication date: 01/01/2011
Field of study

We present the group fused Lasso for detection of multiple change-points shared by a set of co-occurring one-dimensional signals. Change-points are detected by approximating the original signals with a constraint on the multidimensional total variation, leading to piecewise-constant approximations. Fast algorithms are proposed to solve the resulting optimization problems, either exactly or approximately. Conditions are given for consistency of both algorithms as the number of signals increases, and empirical evidence is provided to support the results on simulated and array comparative genomic hybridization data

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

Kernel methods for in silico chemogenomics

Author: Jacob Laurent
Vert Jean-Philippe
Publication venue
Publication date: 25/09/2007
Field of study

Predicting interactions between small molecules and proteins is a crucial ingredient of the drug discovery process. In particular, accurate predictive models are increasingly used to preselect potential lead compounds from large molecule databases, or to screen for side-effects. While classical in silico approaches focus on predicting interactions with a given specific target, new chemogenomics approaches adopt cross-target views. Building on recent developments in the use of kernel methods in bio- and chemoinformatics, we present a systematic framework to screen the chemical space of small molecules for interaction with the biological space of proteins. We show that this framework allows information sharing across the targets, resulting in a dramatic improvement of ligand prediction accuracy for three important classes of drug targets: enzymes, GPCR and ion channels

arXiv.org e-Print Archive

HAL-MINES ParisTech

Can We Rebrand the Humanities?

Author: Vert Shauna
Publication venue: Canadian Historical Association / Société historique du Canada
Publication date: 12/12/2019
Field of study

As someone who studied both marketing and history (and who finds her history degree a super valuable part of that mix) the question often crosses my mind: “How can I sell my history degree?

Dépôt de documents et de données de Érudit

Reconstruction of biological networks by supervised machine learning approaches

Author: Vert Jean-Philippe
Publication venue
Publication date: 22/09/2008
Field of study

We review a recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data. We present several strategies that have been proposed and that lead to different pattern recognition problems and algorithms. The strenght of these approaches is illustrated on the reconstruction of metabolic, protein-protein and regulatory networks of model organisms. In all cases, state-of-the-art performance is reported

arXiv.org e-Print Archive

HAL-Inserm

HAL-MINES ParisTech