13 research outputs found
Efficient stochastic sampling of first-passage times with applications to self-assembly simulations
Models of reaction chemistry based on the stochastic simulation algorithm
(SSA) have become a crucial tool for simulating complicated biological reaction
networks due to their ability to handle extremely complicated reaction networks
and to represent noise in small-scale chemistry. These methods can, however,
become highly inefficient for stiff reaction systems, those in which different
reaction channels operate on widely varying time scales. In this paper, we
develop two methods for accelerating sampling in SSA models: an exact method
and a scheme allowing for sampling accuracy up to any arbitrary error bound.
Both methods depend on analysis of the eigenvalues of continuous time Markov
model graphs that define the behavior of the SSA. We demonstrate these methods
for the specific application of sampling breakage times for multiply-connected
bond networks, a class of stiff system important to models of self-assembly
processes. We show theoretically and empirically that our eigenvalue methods
provide substantially reduced sampling times for a wide range of network
breakage models. These techniques are also likely to have broad use in
accelerating SSA models so as to apply them to systems and parameter ranges
that are currently computationally intractable.Comment: 40 pages, 15 figure
Inferring the paths of somatic evolution in cancer
Motivation: Cancer cell genomes acquire several genetic alterations during somatic evolution from a normal cell type. The relative order in which these mutations accumulate and contribute to cell fitness is affected by epistatic interactions. Inferring their evolutionary history is challenging because of the large number of mutations acquired by cancer cells as well as the presence of unknown epistatic interactions. Results: We developed Bayesian Mutation Landscape (BML), a probabilistic approach for reconstructing ancestral genotypes from tumor samples for much larger sets of genes than previously feasible. BML infers the likely sequence of mutation accumulation for any set of genes that is recurrently mutated in tumor samples. When applied to tumor samples from colorectal, glioblastoma, lung and ovarian cancer patients, BML identifies the diverse evolutionary scenarios involved in tumor initiation and progression in greater detail, but broadly in agreement with prior results. Availability and implementation: Source code and all datasets are freely available at bml.molgen.mpg.de Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Efficient stochastic sampling of first-passage times with applications to self-assembly simulations.
Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become a crucial tool for simulating complicated biological reaction networks due to their ability to handle extremely complicated networks and to represent noise in small-scale chemistry. These methods can, however, become highly inefficient for stiff reaction systems, those in which different reaction channels operate on widely varying time scales. In this paper, we develop two methods for accelerating sampling in SSA models: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound. Both methods depend on the analysis of the eigenvalues of continuous time Markov models that define the behavior of the SSA. We show how each can be applied to accelerate sampling within known Markov models or to subgraphs discovered automatically during execution. We demonstrate these methods for two applications of sampling in stiff SSAs that are important for modeling self-assembly reactions: sampling breakage times for multiply connected bond networks and sampling assembly times for multisubunit nucleation reactions. We show theoretically and empirically that our eigenvalue methods provide substantially reduced sampling times for a large class of models used in simulating self-assembly. These techniques are also likely to have broader use in accelerating SSA models so as to apply them to systems and parameter ranges that are currently computationally intractable.</p
Pathway Complexity of Model Virus Capsid Assembly Systems
As computational and mathematical studies become increasingly central to studies of complicated reaction systems, it will become ever more important to identify the assumptions our models must make and determine when those assumptions are valid. Here, we examine that question with respect to viral capsid assembly by studying the ‘pathway complexity’ of model capsid assembly systems, which we informally define as the number of reaction pathways and intermediates one must consider to accurately describe a given system. We use two model types for this study: ordinary differential equation models, which allow us to precisely and deterministically compare the accuracy of capsid models under different degrees of simplification, and stochastic discrete event simulations, which allow us to sample use of reaction intermediates across a wide parameter space allowing for an extremely large number of possible reaction pathways. The models provide complementary information in support of a common conclusion that the ability of simple pathway models to adequately explain capsid assembly kinetics varies considerably across the space of biologically meaningful assembly parameters. These studies provide grounds for caution regarding our ability to reliably represent real systems with simple models and to extrapolate results from one set of assembly conditions to another. In addition, the analysis tools developed for this study are likely to have broader use in the analysis and efficient simulation of large reaction systems
AdaBoost with Feature Selection Using IoT to Bring the Paths for Somatic Mutations Evaluation in Cancer
Generalized buneman pruning for inferring the most parsimonious multi-state phylogeny.
Accurate reconstruction of phylogenies remains a key challenge in evolutionary biology. Most biologically plausible formulations of the problem are formally NP-hard, with no known efficient solution. The standard in practice are fast heuristic methods that are empirically known to work very well in general, but can yield results arbitrarily far from optimal. Practical exact methods, which yield exponential worst-case running times but generally much better times in practice, provide an important alternative. We report progress in this direction by introducing a provably optimal method for the weighted multi-state maximum parsimony phylogeny problem. The method is based on generalizing the notion of the Buneman graph, a construction key to efficient exact methods for binary sequences, so as to apply to sequences with arbitrary finite numbers of states with arbitrary state transition weights. We implement an integer linear programming (ILP) method for the multi-state problem using this generalized Buneman graph and demonstrate that the resulting method is able to solve data sets that are intractable by prior exact methods in run times comparable with popular heuristics. We further show on a collection of less difficult problem instances that the ILP method leads to large reductions in average-case run times relative to leading heuristics on moderately hard problems. Our work provides the first method for provably optimal maximum parsimony phylogeny inference that is practical for multi-state data sets of more than a few characters.</p
