243 research outputs found
Predicting the binding preference of transcription factors to individual DNA k-mers
Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA–protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members.
Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF–DNA recognition, and suggest a rational approach for future analyses of TF families.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.Canadian Institutes of Health ResearchOntario Research FundNational Institutes of Health (U.S.)National Human Genome Research Institute (U.S.
Merging Accountancy and Computer Information Systems Programs at Arizona State University:A Snapshot of Current Progress and Continuing Challenges
This article provides a brief report on progress and continuing challenges facing the recently merged accountancy and computer information systems programs at Arizona State University. It provides a case study of programmatic evolution and curricular redesign in information systems. Distinctions between computer information systems programs and programs in other functional areas of business are becoming blurred. Students are increasingly choosing to enter dual degree programs which combine computer information systems educational preparation with preparation in more traditional functional areas of business. Additionally, increasing numbers of recruiters are hiring students from both traditional functional areas and computer information systems programs. This report describes a curricular strategy involving the merged programs which integrates interleaved program delivery, heterogeneous cohorts, and an intertwined prerequisite structure
Inferring condition-specific transcription factor function from DNA binding and gene expression data
Inferring condition-specific transcription factor function from DNA binding and gene expression data
Numerous genomic and proteomic datasets are permitting the elucidation of transcriptional regulatory networks in the yeast Saccharomyces cerevisiae. However, predicting the condition dependence of regulatory network interactions has been challenging, because most protein–DNA interactions identified in vivo are from assays performed in one or a few cellular states. Here, we present a novel method to predict the condition-specific functions of S. cerevisiae transcription factors (TFs) by integrating 1327 microarray gene expression data sets and either comprehensive TF binding site data from protein binding microarrays (PBMs) or in silico motif data. Importantly, our method does not impose arbitrary thresholds for calling target regions ‘bound' or genes ‘differentially expressed', but rather allows all the information derived from a TF binding or gene expression experiment to be considered. We show that this method can identify environmental, physical, and genetic interactions, as well as distinct sets of genes that might be activated or repressed by a single TF under particular conditions. This approach can be used to suggest conditions for directed in vivo experimentation and to predict TF function
Contribution of Distinct Homeodomain DNA Binding Specificities to Drosophila Embryonic Mesodermal Cell-Specific Gene Expression Programs
Homeodomain (HD) proteins are a large family of evolutionarily conserved transcription factors (TFs) having diverse developmental functions, often acting within the same cell types, yet many members of this family paradoxically recognize similar DNA sequences. Thus, with multiple family members having the potential to recognize the same DNA sequences in cis-regulatory elements, it is difficult to ascertain the role of an individual HD or a subclass of HDs in mediating a particular developmental function. To investigate this problem, we focused our studies on the Drosophila embryonic mesoderm where HD TFs are required to establish not only segmental identities (such as the Hox TFs), but also tissue and cell fate specification and differentiation (such as the NK-2 HDs, Six HDs and identity HDs (I-HDs)). Here we utilized the complete spectrum of DNA binding specificities determined by protein binding microarrays (PBMs) for a diverse collection of HDs to modify the nucleotide sequences of numerous mesodermal enhancers to be recognized by either no or a single subclass of HDs, and subsequently assayed the consequences of these changes on enhancer function in transgenic reporter assays. These studies show that individual mesodermal enhancers receive separate transcriptional input from both I–HD and Hox subclasses of HDs. In addition, we demonstrate that enhancers regulating upstream components of the mesodermal regulatory network are targeted by the Six class of HDs. Finally, we establish the necessity of NK-2 HD binding sequences to activate gene expression in multiple mesodermal tissues, supporting a potential role for the NK-2 HD TF Tinman (Tin) as a pioneer factor that cooperates with other factors to regulate cell-specific gene expression programs. Collectively, these results underscore the critical role played by HDs of multiple subclasses in inducing the unique genetic programs of individual mesodermal cells, and in coordinating the gene regulatory networks directing mesoderm development.National Institutes of Health (U.S.) (Grant R01 HG005287
Incorporating genetic data improves target trial emulations and informs the use of polygenic scores in randomized controlled trial design.
Randomized controlled trials (RCTs) remain the gold standard for evaluating medical interventions, yet ethical, practical and financial constraints often necessitate reliance on observational data and trial emulations. This study explores how integrating genetic data can enhance both emulated and traditional trial designs. Using FinnGen (n = 425,483), we emulated four major cardiometabolic RCTs and showed how reduced differences in polygenic scores (PGS) between trial arms track improvement in study design. Simulation studies reveal that PGS alone cannot fully adjust for unmeasured confounding. Instead, Mendelian randomization analyses can be used to detect likely confounders. Finally, trial emulations provide a platform to assess and refine PGS implementation for genetic enrichment strategies. By comparing associations of PGS with trial outcomes in the general population and emulated trial cohorts, we highlight the need to validate prognostic enrichment approaches in trial-relevant populations. These results highlight the growing potential of incorporating genetic information to optimize clinical trial design
Recommended from our members
Mapping Copy Number Variation by Population Scale Genome Sequencing
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.Organismic and Evolutionary Biolog
Expression-Guided In Silico Evaluation of Candidate Cis Regulatory Codes for Drosophila Muscle Founder Cells
While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed “CodeFinder”) that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells
Genetics of myocardial interstitial fibrosis in the human heart and association with disease
Myocardial interstitial fibrosis is associated with cardiovascular disease and adverse prognosis. Here, to investigate the biological pathways that underlie fibrosis in the human heart, we developed a machine learning model to measure native myocardial T1 time, a marker of myocardial fibrosis, in 41,505 UK Biobank participants who underwent cardiac magnetic resonance imaging. Greater T1 time was associated with diabetes mellitus, renal disease, aortic stenosis, cardiomyopathy, heart failure, atrial fibrillation, conduction disease and rheumatoid arthritis. Genome-wide association analysis identified 11 independent loci associated with T1 time. The identified loci implicated genes involved in glucose transport (SLC2A12), iron homeostasis (HFE, TMPRSS6), tissue repair (ADAMTSL1, VEGFC), oxidative stress (SOD2), cardiac hypertrophy (MYH7B) and calcium signaling (CAMK2D). Using a transforming growth factor β1-mediated cardiac fibroblast activation assay, we found that 9 of the 11 loci consisted of genes that exhibited temporal changes in expression or open chromatin conformation supporting their biological relevance to myofibroblast cell state acquisition. By harnessing machine learning to perform large-scale quantification of myocardial interstitial fibrosis using cardiac imaging, we validate associations between cardiac fibrosis and disease, and identify new biologically relevant pathways underlying fibrosis.</p
- …
