719 research outputs found
IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes.
Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called "compositional domains," each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online
A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes
For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the “isochore theory,” which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the “murid shift,” and in many ways resembles the genome of opossum. We find no support to the “isochore theory.” Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires
'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve
Background: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and
has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis.
Based on the Z-curve, a “genome order index,” was proposed, which is defined as S = a2
+ c
2
+t
2
+g2
, where a, c, t,
and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for
almost all tested genomes, which was taken as support for the existence of a constraint on genome composition.
A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose
distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that
an inscribed sphere of radius r = 1/ 3 contains almost all points corresponding to various genomes, implying that
S < r
2
. The distribution of the points P obtained by S was studied using the Z-curve.
Results: In this work, we studied the basic properties of the Z-curve using the “genome order index” as a case
study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect,
(2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from
the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the
dimension stands for GC content alone suffices to represent any given genome.
Conclusion: The “genome order index” S does not represent a constraint on nucleotide composition. Moreover,
S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant.
Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively.
Reviewers: This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich
(nominated by Itai Yanai)
Universality of Long-Range Correlations in Expansion-Randomization Systems
We study the stochastic dynamics of sequences evolving by single site
mutations, segmental duplications, deletions, and random insertions. These
processes are relevant for the evolution of genomic DNA. They define a
universality class of non-equilibrium 1D expansion-randomization systems with
generic stationary long-range correlations in a regime of growing sequence
length. We obtain explicitly the two-point correlation function of the sequence
composition and the distribution function of the composition bias in sequences
of finite length. The characteristic exponent of these quantities is
determined by the ratio of two effective rates, which are explicitly calculated
for several specific sequence evolution dynamics of the universality class.
Depending on the value of , we find two different scaling regimes, which
are distinguished by the detectability of the initial composition bias. All
analytic results are accurately verified by numerical simulations. We also
discuss the non-stationary build-up and decay of correlations, as well as more
complex evolutionary scenarios, where the rates of the processes vary in time.
Our findings provide a possible example for the emergence of universality in
molecular biology.Comment: 23 pages, 15 figure
Mapping biodiversity value worldwide: combining higher-taxon richness from different groups
Maps of large-scale biodiversity are urgently needed to guide conservation, and yet complete enumeration of organisms is impractical at present. One indirect approach is to measure richness at higher taxonomic ranks, such as families. The difficulty is how to combine information from different groups on numbers of higher taxa, when these taxa may in effect have been defined in different ways, particularly for more distantly related major groups. In this paper, the regional family richness of terrestrial and freshwater seed plants, amphibians, reptiles and mammals is mapped worldwide by combining: (i) absolute family richness; (ii) proportional family richness; and (iii) proportional family richness weighted for the total species richness in each major group. The assumptions of the three methods and their effects on the results are discussed, although for these data the broad pattern is surprisingly robust with respect to the method of combination. Scores from each of the methods of combining families are used to rank the top five richness hotspots and complementary areas, and hotspots of endemism are mapped by unweighted combination of range-size rarity scores
Supernovae in the Subaru Deep Field: the rate and delay-time distribution of Type Ia supernovae out to redshift 2
The Type Ia supernova (SN Ia) rate, when compared to the cosmic star formation history (SFH), can be used to derive the delay-time distribution (DTD; the hypothetical SN Ia rate versus time following a brief burst of star formation) of SNe Ia, which can distinguish among progenitor models. We present the results of a supernova (SN) survey in the Subaru Deep Field (SDF). Over a period of 3 years, we have observed the SDF on four independent epochs with Suprime-Cam on the Subaru 8.2-m telescope, with two nights of exposure per epoch, in the R, i′and z′ bands. We have discovered 150 SNe out to redshift z≈ 2. Using 11 photometric bands from the observer-frame far-ultraviolet to the near-infrared, we derive photometric redshifts for the SN host galaxies (for 24 we also have spectroscopic redshifts). This information is combined with the SN photometry to determine the type and redshift distribution of the SN sample. Our final sample includes 28 SNe Ia in the range 1.0 1, most of the events found in this range are likely SNe Ia. Our SN Ia rate measurements are consistent with those derived from the Hubble Space Telescope (HST) Great Observatories Origins Deep Survey (GOODS) sample, but the overall uncertainty of our 1.5 2
The delay-time distribution of type-Ia supernovae from Sloan II
We derive the delay-time distribution (DTD) of type-Ia supernovae (SNe Ia)
using a sample of 132 SNe Ia, discovered by the Sloan Digital Sky Survey II
(SDSS2) among 66,000 galaxies with spectral-based star-formation histories
(SFHs). To recover the best-fit DTD, the SFH of every individual galaxy is
compared, using Poisson statistics, to the number of SNe that it hosted (zero
or one), based on the method introduced in Maoz et al. (2011). This SN sample
differs from the SDSS2 SN Ia sample analyzed by Brandt et al. (2010), using a
related, but different, DTD recovery method. Furthermore, we use a
simulation-based SN detection-efficiency function, and we apply a number of
important corrections to the galaxy SFHs and SN Ia visibility times. The DTD
that we find has 4-sigma detections in all three of its time bins: prompt (t <
420 Myr), intermediate (0.4 2.4 Gyr),
indicating a continuous DTD, and it is among the most accurate and precise
among recent DTD reconstructions. The best-fit power-law form to the recovered
DTD is t^(-1.12+/-0.08), consistent with generic ~t^-1 predictions of SN Ia
progenitor models based on the gravitational-wave induced mergers of binary
white dwarfs. The time integrated number of SNe Ia per formed stellar mass is
N_SN/M = 0.00130 +/- 0.00015 Msun^-1, or about 4% of the stars formed with
initial masses in the 3-8 Msun range. This is lower than, but largely
consistent with, several recent DTD estimates based on SN rates in galaxy
clusters and in local-volume galaxies, and is higher than, but consistent with
N_SN/M estimated by comparing volumetric SN Ia rates to cosmic SFH.Comment: MNRAS, in pres
Lack of self-averaging in neutral evolution of proteins
We simulate neutral evolution of proteins imposing conservation of the
thermodynamic stability of the native state in the framework of an effective
model of folding thermodynamics. This procedure generates evolutionary
trajectories in sequence space which share two universal features for all of
the examined proteins. First, the number of neutral mutations fluctuates
broadly from one sequence to another, leading to a non-Poissonian substitution
process. Second, the number of neutral mutations displays strong correlations
along the trajectory, thus causing the breakdown of self-averaging of the
resulting evolutionary substitution process.Comment: 4 pages, 2 figure
Multi-Messenger Astronomy with Extremely Large Telescopes
The field of time-domain astrophysics has entered the era of Multi-messenger
Astronomy (MMA). One key science goal for the next decade (and beyond) will be
to characterize gravitational wave (GW) and neutrino sources using the next
generation of Extremely Large Telescopes (ELTs). These studies will have a
broad impact across astrophysics, informing our knowledge of the production and
enrichment history of the heaviest chemical elements, constrain the dense
matter equation of state, provide independent constraints on cosmology,
increase our understanding of particle acceleration in shocks and jets, and
study the lives of black holes in the universe. Future GW detectors will
greatly improve their sensitivity during the coming decade, as will
near-infrared telescopes capable of independently finding kilonovae from
neutron star mergers. However, the electromagnetic counterparts to
high-frequency (LIGO/Virgo band) GW sources will be distant and faint and thus
demand ELT capabilities for characterization. ELTs will be important and
necessary contributors to an advanced and complete multi-messenger network.Comment: White paper submitted to the Astro2020 Decadal Surve
- …
