18 research outputs found

    Protocol Dependence of Sequencing-Based Gene Expression Measurements

    Get PDF
    RNA Seq provides unparalleled levels of information about the transcriptome including precise expression levels over a wide dynamic range. It is essential to understand how technical variation impacts the quality and interpretability of results, how potential errors could be introduced by the protocol, how the source of RNA affects transcript detection, and how all of these variations can impact the conclusions drawn. Multiple human RNA samples were used to assess RNA fragmentation, RNA fractionation, cDNA synthesis, and single versus multiple tag counting. Though protocols employing polyA RNA selection generate the highest number of non-ribosomal reads and the most precise measurements for coding transcripts, such protocols were found to detect only a fraction of the non-ribosomal RNA in human cells. PolyA RNA excludes thousands of annotated and even more unannotated transcripts, resulting in an incomplete view of the transcriptome. Ribosomal-depleted RNA provides a more cost-effective method for generating complete transcriptome coverage. Expression measurements using single tag counting provided advantages for assessing gene expression and for detecting short RNAs relative to multi-read protocols. Detection of short RNAs was also hampered by RNA fragmentation. Thus, this work will help researchers choose from among a range of options when analyzing gene expression, each with its own advantages and disadvantages

    Biological Process Linkage Networks

    Get PDF
    BACKGROUND. The traditional approach to studying complex biological networks is based on the identification of interactions between internal components of signaling or metabolic pathways. By comparison, little is known about interactions between higher order biological systems, such as biological pathways and processes. We propose a methodology for gleaning patterns of interactions between biological processes by analyzing protein-protein interactions, transcriptional co-expression and genetic interactions. At the heart of the methodology are the concept of Linked Processes and the resultant network of biological processes, the Process Linkage Network (PLN). RESULTS. We construct, catalogue, and analyze different types of PLNs derived from different data sources and different species. When applied to the Gene Ontology, many of the resulting links connect processes that are distant from each other in the hierarchy, even though the connection makes eminent sense biologically. Some others, however, carry an element of surprise and may reflect mechanisms that are unique to the organism under investigation. In this aspect our method complements the link structure between processes inherent in the Gene Ontology, which by its very nature is species-independent. As a practical application of the linkage of processes we demonstrate that it can be effectively used in protein function prediction, having the power to increase both the coverage and the accuracy of predictions, when carefully integrated into prediction methods. CONCLUSIONS. Our approach constitutes a promising new direction towards understanding the higher levels of organization of the cell as a system which should help current efforts to re-engineer ontologies and improve our ability to predict which proteins are involved in specific biological processes.Lynn and William Frankel Center for Computer Science; the Paul Ivanier center for robotics research and production; National Science Foundation (ITR-048715); National Human Genome Research Institute (1R33HG002850-01A1, R01 HG003367-01A1); National Institute of Health (U54 LM008748

    The bioWidget consortium

    No full text

    Not All Experimental Questions Are Created Equal: Accelerating Biological Data to Knowledge Transformation (BD2K) via Science Informatics, Active Learning and Artificial Intelligence

    Full text link
    Pablo Picasso, when first told about computers, famously quipped “Computers are useless. They can only give you answers.” Indeed, the majority of effort in the first half-century of computational research has focused on methods for producing answers. Incredible progress has been achieved in computational modeling, simulation and optimization, across domains as diverse as astrophysics, climate studies, biomedicine, architecture, and chess. However, the use of computers to pose new questions, or prioritize existing ones, has thus far been quite limited.Picasso’s comment highlights the point that good questions can sometimes be more elusive than good answers. The history of science offers numerous examples of the impact of good questions. Paul Erdős, the wandering monk of mathematical graph theory, offered small prizes for anyone who could prove conjectures he identified as important (1). The prizes varied in cash amounts based on the perceived complexity of the problem posed by Erdős.Posing technical questions and allocating resources to answer them has taken on a new guise in the Internet age. The X-Prize foundation (http://www.xprize.org/) offers multi-million dollar bounties for grand technological goals, including goals for sequencing genomes or space exploration. Several companies provide portals where customers can place cash bounties on educational, scientific or technological challenges, while potential problem solvers can compete to produce the best solutions for these problems. Amazon’s Turk site (https://www.mturk.com/mturk/welcome) links people requesting performance of intellectual tasks to people willing to work on them for a fee. Such crowd-sourcing systems create markets of questions and answers, and can help allocate resources and capabilities efficiently.This paradigm suggests a number of interesting questions for scientific research. In a resource limited environment, can funds and research capacity be allocated more efficiently? Can knowledge demand provide an alternative or complementary mechanism to traditional investigator-initiated research grants?The fathers of Artificial Intelligence (AI) and Herbert Simon in particular envisioned the application of AI to Scientific Discovery in different forms and styles (focusing on physics). We follow on these early dreams and describe a novel approach aimed at remodeling of the biomedical research infrastructure and catalyze gene function determination. We aim to start a bold discussion of new ideas aimed towards increasing the efficiency of the allocation of research capacities, reproducibility, provenance tracking, removing redundancy and catalyzing knowledge gain with each experiment. In particular, we describe a tractable computational framework and infrastructure that can help researchers assess the potential information gain of millions of experiments before conducting them. The utility of experiments in this case is modeled as the predictive knowledge (formalized as information) to be gained as a result of performing the experiment. The experimentalist would then be empowered to select experiments that maximized information gain if they wished, recognizing that there are frequently other considerations, such as a specific technological or medical utility, that might over-ride the priority of maximizing information gain. The conceptual approach we develop is general, and here we apply it to the study of gene function.</jats:p

    Follow-Up SARS-CoV-2 PCR Testing Outcomes From a Large Reference Lab in the US

    No full text
    By analyzing COVID-19 sequential COVID-19 test results of patients across the United States, we herein attempt to quantify some of the observations we've made around long-term infection (and false-positive rates), as well as provide observations on the uncertainty of sampling variability and other dynamics of COVID-19 infection in the United States. Retrospective cohort study of a registry of RT-PCR testing results for all patients tested at any of the reference labs operated by Labcorp® including both positive, negative, and inconclusive results, from March 1, 2020 to January 28, 2021, including patients from all 50 states and outlying US territories. The study included 22 million patients with RT-PCR qualitative test results for SARS-CoV-2, of which 3.9 million had more than one test at Labcorp. We observed a minuscule &amp;lt;0.1% basal positive rate for follow up tests &amp;gt;115 days, which could account for false positives, long-haulers, and/or reinfection but is indistinguishable in the data. In observing repeat-testing, for patients who have a second test after a first RT-PCR, 30% across the cohort tested negative on the second test. For patients who test positive first and subsequently negative within 96 h (40% of positive test results), 18% of tests will subsequently test positive within another 96-h span. For those who first test negative and then positive within 96 h (2.3% of negative tests), 56% will test negative after a third and subsequent 96-h period. The sudden changes in RT-PCR test results for SARS-CoV-2 from this large cohort study suggest that negative test results during active infection or exposure can change rapidly within just days or hours. We also demonstrate that there does not appear to be a basal false positive rate among patients who test positive &amp;gt;115 days after their first RT-PCR positive test while failing to observe any evidence of widespread reinfection.</jats:p
    corecore