181 research outputs found

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Unusual quasars from the Sloan Digital Sky Survey selected by means of Kohonen self-organising maps

    Full text link
    We exploit the spectral archive of the Sloan Digital Sky Survey (SDSS) Data Release 7 to select unusual quasar spectra. The selection method is based on a combination of the power of self-organising maps and the visual inspection of a huge number of spectra. Self-organising maps were applied to nearly 10^5 spectra classified as quasars by the SDSS pipeline. Particular attention was paid to minimise possible contamination by rare peculiar stellar spectral types. We present a catalogue of 1005 quasars with unusual spectra. This large sample provides a useful resource for both studying properties and relations of/between different types of unusual quasars and selecting particularly interesting objects. The spectra are grouped into six types. All these types turn out to be on average more luminous than comparison samples of normal quasars after a statistical correction is made for intrinsic reddening. Both the unusual broad absorption line (BAL) quasars and the strong iron emitters have significantly lower radio luminosities than normal quasars. We also confirm that strong BALs avoid the most radio-luminous quasars. Finally, we create a sample of quasars similar to the two "mysterious" objects discovered by Hall et al. (2002) and briefly discuss the quasar properties and possible explanations of their highly peculiar spectra. (Abstract modified to match the arXiv format)Comment: Added reference to section 6; a few typos corrected; corrections according to the version published in Astronomy and Astrophysic

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved

    Dynamics of Glycoprotein Charge in the Evolutionary History of Human Influenza

    Get PDF
    Influenza viruses show a significant capacity to evade host immunity; this is manifest both as large occasional jumps in the antigenic phenotype of viral surface molecules and in gradual antigenic changes leading to annual influenza epidemics in humans. Recent mouse studies show that avidity for host cells can play an important role in polyclonal antibody escape, and further that electrostatic charge of the hemagglutinin glycoprotein can contribute to such avidity.We test the role of glycoprotein charge on sequence data from the three major subtypes of influenza A in humans, using a simple method of calculating net glycoprotein charge. Of all subtypes, H3N2 in humans shows a striking pattern of increasing positive charge since its introduction in 1968. Notably, this trend applies to both hemagglutinin and neuraminidase glycoproteins. In the late 1980s hemagglutinin charge reached a plateau, while neuraminidase charge started to decline. We identify key groups of amino acid sites involved in this charge trend.To our knowledge these are the first indications that, for human H3N2, net glycoprotein charge covaries strongly with antigenic drift on a global scale. Further work is needed to elucidate how such charge interacts with other immune escape mechanisms, such as glycosylation, and we discuss important questions arising for future study

    Molecular characterization of partial-open reading frames 1a and 2 of the human astroviruses in South Korea

    Get PDF
    Human astroviruses (HAstVs) are among the major causes of gastroenteritis in South Korea. In this study, the partial regions of the open reading frame (ORF) 1a and ORF2 genes of HAstVs from gastroenteritis patients in nine hospitals were sequenced, and the molecular characterization of the viruses was revealed. 89 partial nucleotide sequences of ORF1a and 88 partial nucleotide sequences of ORF2 were amplified from 120 stool specimens. Phylogenetic analysis showed that most of the nucleotide sequences of ORF1a and ORF2 were grouped with HAstV type 1 but had evolutionary genetic distance compared with the reference sequences, such as the HAstV-1 prototype, Dresden strain, and Oxford strain. According to the phylogenetic analysis, some nucleotide sequences including SE0506041, SE0506043, and SE0506058, showed the discrepancy of the genotypes, but there was no proof of recombination among the HAstV types. In conclusion, this study showed that the dominant HAstV isolated from the Seoul metropolitan area in 2004-2005 was HAstV type 1, and that Korean HAstV-1 had the genetic distance in evolution compared with the reference sequences of HAstVs. Lots of nucleotide sequences of the ORF1a and ORF2 genes of HAstV will be useful for studying for the control and prevention of HAstV gastroenteritis in South Korea

    Research for All: Building a Diverse Researcher Community for the All of Us Research Program

    Get PDF
    OBJECTIVES: The NIH All of Us Research Program (All of Us) is engaging a diverse community of more than 10 000 registered researchers using a robust engagement ecosystem model. We describe strategies used to build an ecosystem that attracts and supports a diverse and inclusive researcher community to use the All of Us dataset and provide metrics on All of Us researcher usage growth. MATERIALS AND METHODS: Researcher audiences and diversity categories were defined to guide a strategy. A researcher engagement strategy was codeveloped with program partners to support a researcher engagement ecosystem. An adapted ecological model guided the ecosystem to address multiple levels of influence to support All of Us data use. Statistics from the All of Us Researcher Workbench demographic survey describe trends in researchers\u27 and institutional use of the Workbench and publication numbers. RESULTS: From 2022 to 2024, some 13 partner organizations and their subawardees conducted outreach, built capacity, or supported researchers and institutions in using the data. Trends indicate that Workbench registrations and use have increased over time, including among researchers underrepresented in the biomedical workforce. Data Use and Registration Agreements from minority-serving institutions also increased. DISCUSSION: All of Us built a diverse, inclusive, and growing research community via intentional engagement with researchers and via partnerships to address systemic data access issues. Future programs will provide additional support to researchers and institutions to ameliorate All of Us data use challenges. CONCLUSION: The approach described helps address structural inequities in the biomedical research field to advance health equity

    Identifying Changes in Selective Constraints: Host Shifts in Influenza

    Get PDF
    The natural reservoir of Influenza A is waterfowl. Normally, waterfowl viruses are not adapted to infect and spread in the human population. Sometimes, through reassortment or through whole host shift events, genetic material from waterfowl viruses is introduced into the human population causing worldwide pandemics. Identifying which mutations allow viruses from avian origin to spread successfully in the human population is of great importance in predicting and controlling influenza pandemics. Here we describe a novel approach to identify such mutations. We use a sitewise non-homogeneous phylogenetic model that explicitly takes into account differences in the equilibrium frequencies of amino acids in different hosts and locations. We identify 172 amino acid sites with strong support and 518 sites with moderate support of different selection constraints in human and avian viruses. The sites that we identify provide an invaluable resource to experimental virologists studying adaptation of avian flu viruses to the human host. Identification of the sequence changes necessary for host shifts would help us predict the pandemic potential of various strains. The method is of broad applicability to investigating changes in selective constraints when the timing of the changes is known

    A Full Year's Chandra Exposure on SDSS Quasars from the Chandra Multiwavelength Project

    Full text link
    We study the spectral energy distributions and evolution of a large sample of optically selected quasars from the Sloan Digital Sky Survey (SDSS) that were observed in 323 Chandra images analyzed by the Chandra Multiwavelength Project (ChaMP). Our highest-confidence matched sample includes 1135 X-ray detected quasars in the redshift range 0.2<z<5.4, representing some 36Msec of effective exposure. Spectroscopic redshifts are available for about 1/3 of the detected sample; elsewhere, redshifts are estimated photometrically. With 56 z>3 QSOs detected, we find no evidence for evolution out to z~5 for either the X-ray photon index Gamma or for the ratio of optical/UV to X-ray flux alpha_ox. About 10% of detected QSOs are obscured (Nh>1E22), but the fraction might reach ~1/3 if most non-detections are absorbed. We confirm a significant correlation between alpha_ox and optical luminosity, but it flattens or disappears for fainter AGN alone. Gamma hardens significantly both towards higher X-ray luminosity, and for relatively X-ray loud quasars. These trends may represent a relative increase in non-thermal X-ray emission, and our findings thereby strengthen analogies between Galactic black hole binaries and AGN.Comment: 28 pages, 21 figures. Accepted (26 Aug 2008) for publication in ApJS. Electronic datafiles (for tables 2 and 3) and high resolution figures available at http://hea-www.harvard.edu/CHAMP

    Prevalence of Epistasis in the Evolution of Influenza A Surface Proteins

    Get PDF
    The surface proteins of human influenza A viruses experience positive selection to escape both human immunity and, more recently, antiviral drug treatments. In bacteria and viruses, immune-escape and drug-resistant phenotypes often appear through a combination of several mutations that have epistatic effects on pathogen fitness. However, the extent and structure of epistasis in influenza viral proteins have not been systematically investigated. Here, we develop a novel statistical method to detect positive epistasis between pairs of sites in a protein, based on the observed temporal patterns of sequence evolution. The method rests on the simple idea that a substitution at one site should rapidly follow a substitution at another site if the sites are positively epistatic. We apply this method to the surface proteins hemagglutinin and neuraminidase of influenza A virus subtypes H3N2 and H1N1. Compared to a non-epistatic null distribution, we detect substantial amounts of epistasis and determine the identities of putatively epistatic pairs of sites. In particular, using sequence data alone, our method identifies epistatic interactions between specific sites in neuraminidase that have recently been demonstrated, in vitro, to confer resistance to the drug oseltamivir; these epistatic interactions are responsible for widespread drug resistance among H1N1 viruses circulating today. This experimental validation demonstrates the predictive power of our method to identify epistatic sites of importance for viral adaptation and public health. We conclude that epistasis plays a large role in shaping the molecular evolution of influenza viruses. In particular, sites with , which would normally not be identified as positively selected, can facilitate viral adaptation through epistatic interactions with their partner sites. The knowledge of specific interactions among sites in influenza proteins may help us to predict the course of antigenic evolution and, consequently, to select more appropriate vaccines and drugs
    corecore