7,718 research outputs found

    Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials

    Full text link
    There are several common ways to encode a tree as a matrix, such as the adjacency matrix, the Laplacian matrix (that is, the infinitesimal generator of the natural random walk), and the matrix of pairwise distances between leaves. Such representations involve a specific labeling of the vertices or at least the leaves, and so it is natural to attempt to identify trees by some feature of the associated matrices that is invariant under relabeling. An obvious candidate is the spectrum of eigenvalues (or, equivalently, the characteristic polynomial). We show for any of these choices of matrix that the fraction of binary trees with a unique spectrum goes to zero as the number of leaves goes to infinity. We investigate the rate of convergence of the above fraction to zero using numerical methods. For the adjacency and Laplacian matrices, we show that that the {\em a priori} more informative immanantal polynomials have no greater power to distinguish between trees

    Survival analysis of DNA mutation motifs with penalized proportional hazards

    Full text link
    Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with "mutation motifs", which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an 1\ell_1-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates

    Predicting B Cell Receptor Substitution Profiles Using Public Repertoire Data

    Full text link
    B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same "clonal family") are released from the germinal center, their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called "substitution profiles", are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method "Substitution Profiles Using Related Families" (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on an external dataset. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.Comment: 23 page

    Evolution of crystalline electric field effects, superconductivity, and heavy fermion behavior in the specific heat of Pr(Os1x_{1-x}Rux_x)4_4Sb12_{12}

    Full text link
    Specific heat C(T)C(T) measurements were made on single crystals of the superconducting filled skutterudite series Pr(Os1x_{1-x}Rux_x)4_4Sb12_{12} down to 0.6 K. Crystalline electric field fits in the normal state produced parameters which were in agreement with previous measurements. Bulk superconductivity was observed for all values of the Ru concentration xx with transition temperatures consistent with previous experiments, confirming a minimum in TcT_{c} at x=0.6x=0.6. The C(T)C(T) data below TcT_{c} appear to be more consistent with power law behavior for x=0x=0 (PrOs4_4Sb12_{12}), and with exponential behavior for 0.05x0.20.05 \leq x \leq 0.2. An enhanced electronic specific heat coefficient γ\gamma was observed for x0.4x \leq 0.4, further supporting x0.6x \simeq 0.6 as a critical concentration where the physical properties abruptly change. Significant enhancement of ΔC/Tc\Delta C/T_{c} above the weak coupling value was only observed for x=0x=0 and x=0.05x=0.05.Comment: 16 pages, 5 figures, submitted to Physical Review B. v2: text added and figures modifie

    Quantifying evolutionary constraints on B cell affinity maturation

    Full text link
    The antibody repertoire of each individual is continuously updated by the evolutionary process of B cell receptor mutation and selection. It has recently become possible to gain detailed information concerning this process through high-throughput sequencing. Here, we develop modern statistical molecular evolution methods for the analysis of B cell sequence data, and then apply them to a very deep short-read data set of B cell receptors. We find that the substitution process is conserved across individuals but varies significantly across gene segments. We investigate selection on B cell receptors using a novel method that side-steps the difficulties encountered by previous work in differentiating between selection and motif-driven mutation; this is done through stochastic mapping and empirical Bayes estimators that compare the evolution of in-frame and out-of-frame rearrangements. We use this new method to derive a per-residue map of selection, which provides a more nuanced view of the constraints on framework and variable regions.Comment: Previously entitled "Substitution and site-specific selection driving B cell affinity maturation is consistent across individuals

    A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis.

    Get PDF
    The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling
    corecore