30 research outputs found
The first case of vesico-vaginal fistula in a patient with primary lymphoma of the bladder – a case report
Repetitive Elements May Comprise Over Two-Thirds of the Human Genome
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed
Random Amino Acid Mutations and Protein Misfolding Lead to Shannon Limit in Sequence-Structure Communication
The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials
Sequencing technologies and genome sequencing
The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research
Modelling land cover change in the Brazilian Amazon: temporal changes in drivers and calibration issues
Multiple-scale prediction of forest loss risk across Borneo
Context The forests of Borneo have among the highest biodiversity and also the highest forest loss rates on the planet. Objectives Our objectives were to: (1) compare multiple modelling approaches, (2) evaluate the utility of landscape composition and configuration as predictors, (3) assess the influence of the ratio of forest loss and persistence points in the training sample, (4) identify the multiple-scale drivers of recent forest loss and (5) predict future forest loss risk across Borneo. Methods We compared random forest machine learning and logistic regression in a multi-scale approach to model forest loss risk between 2000 and 2010 as a function of topographical variables and landscape structure, and applied the highest performing model to predict the spatial pattern of forest loss risk between 2010 and 2020. We utilized a naïve model as a null comparison and used the total operating characteristic AUC to assess model performance. Results Our analysis produced five main results. We found that: (1) random forest consistently outperformed logistic regression and the naïve model; (2) including landscape structure variables substantially improved predictions; (3) a ratio of occurrence to non-occurrence points in the training dataset that does not match the actual ratio in the landscape biases the predictions of both random forest and logistic regression; (4) forest loss risk differed between the three nations that comprise Borneo, with patterns in Kalimantan highly related to distance from the edge of the previous frontier of forest loss, while Malaysian Borneo showed a more diffuse pattern related to the structure of the landscape; (5) we predicted continuing very high rates of forest loss in the 2010–2020 period, and produced maps of the expected risk of forest loss across the full extent of Borneo. Conclusions These results confirm that multiple-scale modelling using landscape metrics as predictors in a random forest modelling framework is a powerful approach to landscape change modelling. There is immense immanent risk to Borneo’s forests, with clear spatial patterns of risk related to topography and landscape structure that differ between the three nations that comprise Borneo
Modeling urban growth with GIS based cellular automata and least squares SVM rules: a case study in Qingpu–Songjiang area of Shanghai, China
Integrating biodiversity, remote sensing, and auxiliary information for the study of ecosystem functioning and conservation at large spatial scales
Assessing patterns and processes of plant functional, taxonomic, genetic, and structural biodiversity at large scales is essential across many disciplines, including ecosystem management, agriculture, ecosystem risk and service assessment, conservation science, and forestry. In situ data housed in databases necessary to perform such assessments over large parts of the world are growing steadily. Integrating these in situ data with remote sensing (RS) products helps not only to improve data completeness and quality but also to account for limitations and uncertainties associated with each data product. Here, we outline how auxiliary environmental and socioeconomic data might be integrated with biodiversity and RS data to expand our knowledge about ecosystem functioning and inform the conservation of biodiversity. We discuss concepts, data, and methods necessary to assess plant species and ecosystem properties across scales of space and time and provide a critical discussion of outstanding issues
