140 research outputs found

    The site frequency spectrum of dispensable genes

    Full text link
    The differences between DNA-sequences within a population are the basis to infer the ancestral relationship of the individuals. Within the classical infinitely many sites model, it is possible to estimate the mutation rate based on the site frequency spectrum, which is comprised by the numbers C1,...,Cn1C_1,...,C_{n-1}, where n is the sample size and CsC_s is the number of site mutations (Single Nucleotide Polymorphisms, SNPs) which are seen in ss genomes. Classical results can be used to compare the observed site frequency spectrum with its neutral expectation, E[Cs]=θ2/sE[C_s]= \theta_2/s, where θ2\theta_2 is the scaled site mutation rate. In this paper, we will relax the assumption of the infinitely many sites model that all individuals only carry homologous genetic material. Especially, it is today well-known that bacterial genomes have the ability to gain and lose genes, such that every single genome is a mosaic of genes, and genes are present and absent in a random fashion, giving rise to the dispensable genome. While this presence and absence has been modeled under neutral evolution within the infinitely many genes model in previous papers, we link presence and absence of genes with the numbers of site mutations seen within each gene. In this work we derive a formula for the expectation of the joint gene and site frequency spectrum, denotes Gk,sG_{k,s} the number of mutated sites occurring in exactly ss gene sequences, while the corresponding gene is present in exactly kk individuals. We show that standard estimators of θ2\theta_2 for dispensable genes are biased and that the site frequency spectrum for dispensable genes differs from the classical result.Comment: 24 pages, 8 figure

    The infinitely many genes model with horizontal gene transfer

    Full text link
    The genome of bacterial species is much more flexible than that of eukaryotes. Moreover, the distributed genome hypothesis for bacteria states that the total number of genes present in a bacterial population is greater than the genome of every single individual. The pangenome, i.e. the set of all genes of a bacterial species (or a sample), comprises the core genes which are present in all living individuals, and accessory genes, which are carried only by some individuals. In order to use accessory genes for adaptation to environmental forces, genes can be transferred horizontally between individuals. Here, we extend the infinitely many genes model from Baumdicker, Hess and Pfaffelhuber (2010) for horizontal gene transfer. We take a genealogical view and give a construction -- called the Ancestral Gene Transfer Graph -- of the joint genealogy of all genes in the pangenome. As application, we compute moments of several statistics (e.g. the number of differences between two individuals and the gene frequency spectrum) under the infinitely many genes model with horizontal gene transfer.Comment: 31 pages, 3 figure

    The diversity of a distributed genome in bacterial populations

    Full text link
    The distributed genome hypothesis states that the set of genes in a population of bacteria is distributed over all individuals that belong to the specific taxon. It implies that certain genes can be gained and lost from generation to generation. We use the random genealogy given by a Kingman coalescent in order to superimpose events of gene gain and loss along ancestral lines. Gene gains occur at a constant rate along ancestral lines. We assume that gained genes have never been present in the population before. Gene losses occur at a rate proportional to the number of genes present along the ancestral line. In this infinitely many genes model we derive moments for several statistics within a sample: the average number of genes per individual, the average number of genes differing between individuals, the number of incongruent pairs of genes, the total number of different genes in the sample and the gene frequency spectrum. We demonstrate that the model gives a reasonable fit with gene frequency data from marine cyanobacteria.Comment: Published in at http://dx.doi.org/10.1214/09-AAP657 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A note on Willmore minimizing Klein bottles in Euclidean space

    Full text link
    We show that Lawson's bipolar surface τ~3,1\tilde\tau_{3,1} is after stereographic projection the unique minimizer among immersed Klein bottles in its conformal class. We conjecture that it actually is the unique minimizer among immersed Klein bottles into Rn\mathbb{R}^n, n4n\geq 4, whose existence the authors and P. Breuning proved in a previous paper.Comment: 8 page

    Existence of minimizing Willmore Klein bottles in Euclidean four-space

    Full text link
    Let KK be a Klein bottle. We show that the infimum of the Willmore energy among all immersed Klein bottles in Euclidean nn-space is attained by a smooth embedded Klein bottle, where n4n\geq 4. There are three distinct regular homotopy classes of immersed Klein bottles in the Euclidean four-space each one containing an embedding. One is characterized by the property that it contains the minimizer just mentioned. For the other two regular homotopy classes we show that the Willmore energy is bounded from below by 8π8\pi. We give a classification of the minimizers of these two classes. In particular, we prove the existence of infinitely many distinct embedded Klein bottles in Euclidean four-space that have Euler normal number 4-4 or +4+4 and Willmore energy 8π8\pi. The surfaces are distinct even when we allow conformal transformations of the ambient space. As they are all minimizers in their homotopy class they are Willmore surfaces.Comment: final version, to appear in Geometry & Topolog

    panX: pan-genome analysis and exploration

    Get PDF
    Horizontal transfer, gene loss, and duplication result in dynamic bacterial genomes shaped by a complex mixture of different modes of evolution. Closely related strains can differ in the presence or absence of many genes, and the total number of distinct genes found in a set of related isolates-the pan-genome-is often many times larger than the genome of individual isolates. We have developed a pipeline that efficiently identifies orthologous gene clusters in the pan-genome. This pipeline is coupled to a powerful yet easy-to-use web-based visualization for interactive exploration of the pan-genome. The visualization consists of connected components that allow rapid filtering and searching of genes and inspection of their evolutionary history. For each gene cluster, panX displays an alignment, a phylogenetic tree, maps mutations within that cluster to the branches of the tree and infers gain and loss of genes on the core-genome phylogeny. PanX is available at pangenome.de. Custom pan-genomes can be visualized either using a web server or by serving panX locally as a browser-based application
    corecore