51 research outputs found
Basecalling for Traces Derived for Multiple Templates
Three methods for analyzing sequencing traces derived from sequencing reactions containing two DNA templates are presented. All rely on alignment to a segment of assembled genomic sequence containing the original template sequence. Spliced alignment algorithms are used so that traces derived from processed mRNA can be analyzed. The main application of these techniques is the elucidation of alternately spliced transcripts. Several experimental verification of one of the techniques is presented including testing on a set of 48 alternately spliced targets from the human genome and 47 negative controls
Gene prediction and verification in a compact genome with numerous small introns
The genomes of clusters of related eukaryotes are now being sequenced at an increasing rate, creating a need for accurate, low-cost annotation of exon–intron structures. In this paper, we demonstrate that reverse transcription-polymerase chain reaction (RT–PCR) and direct sequencing based on predicted gene structures satisfy this need, at least for single-celled eukaryotes. The TWINSCAN gene prediction algorithm was adapted for the fungal pathogen Cryptococcus neoformans by using a precise model of intron lengths in combination with ungapped alignments between the genome sequences of the two closely related Cryptococcus varieties. This approach resulted in ∼60% of known genes being predicted exactly right at every coding base and splice site. When previously unannotated TWINSCAN predictions were tested by RT–PCR and direct sequencing, 75% of targets spanning two predicted introns were amplified and produced high-quality sequence. When targets spanning the complete predicted open reading frame were tested, 72% of them amplified and produced high-quality sequence. We conclude that sequencing a small number of expressed sequence tags (ESTs) to provide training data, running TWINSCAN on an entire genome, and then performing RT–PCR and direct sequencing on all of its predictions would be a cost-effective method for obtaining an experimentally verified genome annotation
Foregut microbiome in development of esophageal adenocarcinoma
Esophageal adenocarcinoma (EA), the type of cancer linked to heartburn due to gastroesophageal reflux diseases (GERD), has increased six fold in the past 30 years. This cannot currently be explained by the usual environmental or by host genetic factors. EA is the end result of a sequence of GERD-related diseases, preceded by reflux esophagitis (RE) and Barrett’s esophagus (BE). Preliminary studies by Pei and colleagues at NYU on elderly male veterans identified two types of microbiotas in the esophagus. Patients who carry the type II microbiota are >15 fold likely to have esophagitis and BE than those harboring the type I microbiota. In a small scale study, we also found that 3 of 3 cases of EA harbored the type II biota. The findings have opened a new approach to understanding the recent surge in the incidence of EA. 

Our long-term goal is to identify the cause of GERD sequence. The hypothesis to be tested is that changes in the foregut microbiome are associated with EA and its precursors, RE and BE in GERD sequence. We will conduct a case control study to demonstrate the microbiome disease association in every stage of GERD sequence, as well as analyze the trend in changes in the microbiome along disease progression toward EA, by two specific aims. Aim 1 is to conduct a comprehensive population survey of the foregut microbiome and demonstrate its association with GERD sequence. Furthermore, spatial relationship between the esophageal microbiota and upstream (mouth) and downstream (stomach) foregut microbiotas as well as temporal stability of the microbiome-disease association will also be examined. Aim 2 is to define the distal esophageal metagenome and demonstrate its association with GERD sequence. Detailed analyses will include pathway-disease and gene-disease associations. Archaea, fungi and viruses, if identified, also will be correlated with the diseases. A significant association between the foregut microbiome and GERD sequence, if demonstrated, will be the first step for eventually testing whether an abnormal microbiome is required for the development of the sequence of phenotypic changes toward EA. If EA and its precursors represent a microecological disease, treating the cause of GERD might become possible, for example, by normalizing the microbiota through use of antibiotics, probiotics, or prebiotics. Causative therapy of GERD could prevent its progression and reverse the current trend of increasing incidence of EA
UC-99 Interactive Training Games - Robins Air Force Base
Our project involves converting three PowerPoint training presentations on STINFO, No Fears Act, and Records Management into engaging web-based games. Commissioned by Robins Air Force Base, our team utilizes Unity WebGL for game development and React/Firebase for website hosting. The goal is to provide Air Force personnel with interactive training modules accessible from their desks, enhancing learning retention and engagement. By gamifying the content, we aim to make learning enjoyable while ensuring critical information retention. This interdisciplinary project merges game development and web technologies to modernize training methods and improve educational outcomes for military personnel
Constrained Optimization for Validation-Guided Conditional Random Field Learning
Conditional random fields(CRFs) are a class of undirected graphical models which have been widely used for classifying and labeling sequence data. The training of CRFs is typically formulated as an unconstrained optimization problem that maximizes the conditional likelihood. However, maximum likelihood training is prone to overfitting. To address this issue, we propose a novel constrained nonlinear optimization formulation in which the prediction accuracy of cross-validation sets are included as constraints. Instead of requiring multiple passes of training, the constrained formulation allows the cross-validation be handled in one pass of constrained optimization. The new formulation is discontinuous, and classical Lagrangian based constraint handling methods are not applicable. A new constrained optimization algorithm based on the recently proposed extended saddle point theory is developed to learn the constrained CRF model. Experimental results on gene and stock-price prediction tasks show that the constrained formulation is able to significantly improve the generalization ability of CRF training
Gradient-Based Feature Selection for Conditional Random Fields and its Applications in Computational Genetics
Comparative metatranscriptomics reveals extracellular electron transfer pathways conferring microbial adaptivity to surface redox potential changes
Microbial metabolic networks in a complex electrogenic biofilm recovered from a stimulus-induced metatranscriptomics approach
- …
