719 research outputs found

    IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes.

    Get PDF
    Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called "compositional domains," each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online

    A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes

    Get PDF
    For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the “isochore theory,” which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the “murid shift,” and in many ways resembles the genome of opossum. We find no support to the “isochore theory.” Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires

    'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve

    Get PDF
    Background: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a “genome order index,” was proposed, which is defined as S = a2 + c 2 +t 2 +g2 , where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ 3 contains almost all points corresponding to various genomes, implying that S < r 2 . The distribution of the points P obtained by S was studied using the Z-curve. Results: In this work, we studied the basic properties of the Z-curve using the “genome order index” as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome. Conclusion: The “genome order index” S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively. Reviewers: This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai)

    Universality of Long-Range Correlations in Expansion-Randomization Systems

    Full text link
    We study the stochastic dynamics of sequences evolving by single site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent χ\chi of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of χ\chi, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.Comment: 23 pages, 15 figure

    Mapping biodiversity value worldwide: combining higher-taxon richness from different groups

    Get PDF
    Maps of large-scale biodiversity are urgently needed to guide conservation, and yet complete enumeration of organisms is impractical at present. One indirect approach is to measure richness at higher taxonomic ranks, such as families. The difficulty is how to combine information from different groups on numbers of higher taxa, when these taxa may in effect have been defined in different ways, particularly for more distantly related major groups. In this paper, the regional family richness of terrestrial and freshwater seed plants, amphibians, reptiles and mammals is mapped worldwide by combining: (i) absolute family richness; (ii) proportional family richness; and (iii) proportional family richness weighted for the total species richness in each major group. The assumptions of the three methods and their effects on the results are discussed, although for these data the broad pattern is surprisingly robust with respect to the method of combination. Scores from each of the methods of combining families are used to rank the top five richness hotspots and complementary areas, and hotspots of endemism are mapped by unweighted combination of range-size rarity scores

    Supernovae in the Subaru Deep Field: the rate and delay-time distribution of Type Ia supernovae out to redshift 2

    Get PDF
    The Type Ia supernova (SN Ia) rate, when compared to the cosmic star formation history (SFH), can be used to derive the delay-time distribution (DTD; the hypothetical SN Ia rate versus time following a brief burst of star formation) of SNe Ia, which can distinguish among progenitor models. We present the results of a supernova (SN) survey in the Subaru Deep Field (SDF). Over a period of 3 years, we have observed the SDF on four independent epochs with Suprime-Cam on the Subaru 8.2-m telescope, with two nights of exposure per epoch, in the R, i′and z′ bands. We have discovered 150 SNe out to redshift z≈ 2. Using 11 photometric bands from the observer-frame far-ultraviolet to the near-infrared, we derive photometric redshifts for the SN host galaxies (for 24 we also have spectroscopic redshifts). This information is combined with the SN photometry to determine the type and redshift distribution of the SN sample. Our final sample includes 28 SNe Ia in the range 1.0 1, most of the events found in this range are likely SNe Ia. Our SN Ia rate measurements are consistent with those derived from the Hubble Space Telescope (HST) Great Observatories Origins Deep Survey (GOODS) sample, but the overall uncertainty of our 1.5 2

    The delay-time distribution of type-Ia supernovae from Sloan II

    Full text link
    We derive the delay-time distribution (DTD) of type-Ia supernovae (SNe Ia) using a sample of 132 SNe Ia, discovered by the Sloan Digital Sky Survey II (SDSS2) among 66,000 galaxies with spectral-based star-formation histories (SFHs). To recover the best-fit DTD, the SFH of every individual galaxy is compared, using Poisson statistics, to the number of SNe that it hosted (zero or one), based on the method introduced in Maoz et al. (2011). This SN sample differs from the SDSS2 SN Ia sample analyzed by Brandt et al. (2010), using a related, but different, DTD recovery method. Furthermore, we use a simulation-based SN detection-efficiency function, and we apply a number of important corrections to the galaxy SFHs and SN Ia visibility times. The DTD that we find has 4-sigma detections in all three of its time bins: prompt (t < 420 Myr), intermediate (0.4 2.4 Gyr), indicating a continuous DTD, and it is among the most accurate and precise among recent DTD reconstructions. The best-fit power-law form to the recovered DTD is t^(-1.12+/-0.08), consistent with generic ~t^-1 predictions of SN Ia progenitor models based on the gravitational-wave induced mergers of binary white dwarfs. The time integrated number of SNe Ia per formed stellar mass is N_SN/M = 0.00130 +/- 0.00015 Msun^-1, or about 4% of the stars formed with initial masses in the 3-8 Msun range. This is lower than, but largely consistent with, several recent DTD estimates based on SN rates in galaxy clusters and in local-volume galaxies, and is higher than, but consistent with N_SN/M estimated by comparing volumetric SN Ia rates to cosmic SFH.Comment: MNRAS, in pres

    Lack of self-averaging in neutral evolution of proteins

    Full text link
    We simulate neutral evolution of proteins imposing conservation of the thermodynamic stability of the native state in the framework of an effective model of folding thermodynamics. This procedure generates evolutionary trajectories in sequence space which share two universal features for all of the examined proteins. First, the number of neutral mutations fluctuates broadly from one sequence to another, leading to a non-Poissonian substitution process. Second, the number of neutral mutations displays strong correlations along the trajectory, thus causing the breakdown of self-averaging of the resulting evolutionary substitution process.Comment: 4 pages, 2 figure

    Multi-Messenger Astronomy with Extremely Large Telescopes

    Get PDF
    The field of time-domain astrophysics has entered the era of Multi-messenger Astronomy (MMA). One key science goal for the next decade (and beyond) will be to characterize gravitational wave (GW) and neutrino sources using the next generation of Extremely Large Telescopes (ELTs). These studies will have a broad impact across astrophysics, informing our knowledge of the production and enrichment history of the heaviest chemical elements, constrain the dense matter equation of state, provide independent constraints on cosmology, increase our understanding of particle acceleration in shocks and jets, and study the lives of black holes in the universe. Future GW detectors will greatly improve their sensitivity during the coming decade, as will near-infrared telescopes capable of independently finding kilonovae from neutron star mergers. However, the electromagnetic counterparts to high-frequency (LIGO/Virgo band) GW sources will be distant and faint and thus demand ELT capabilities for characterization. ELTs will be important and necessary contributors to an advanced and complete multi-messenger network.Comment: White paper submitted to the Astro2020 Decadal Surve
    corecore