52 research outputs found

    DDRprot: a database of DNA damage response-related proteins

    Get PDF
    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome’s integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.E.A.-L. was supported by the European Commission grant [FP7-REGPOT-2012-2013-1; A.A. was partially supported by the Spanish Ministry of Science and Innovation grant [PS09/02111].Peer reviewe

    LocTree3 prediction of localization

    Get PDF
    The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18 = 80 ± 3% for eukaryotes and a six-state accuracy Q6 = 89 ± 4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3

    Klima-Referenzdatensatz 1961-2015: Analyse und Bewertung der gemessenen meteorologischen Datengrundlage im Freistaat Sachsen sowie Erzeugung eines Klima-Referenzdatensatzes

    Get PDF
    Zum Erhalt eines qualitativ gesicherten Klima- und Klimafolgen-Monitorings in Sachsen wurde die gegebene klimatologische Datengrundlage analysiert und bewertet. Der daraus entwickelte “Klima-Referenzdatensatz Sachsen” ist Grundlage für die regionale Klima- und Klimafolgenanalyse sowie für die Erzeugung neuer sächsischer Klimaprojektionen. Er besteht aus stationsbezogenen Zeitreihen mit Tages- und Monatswerten für die wichtigsten Klimaelemente sowie abgeleiteten Klimagrößen im Zeitraum von 1961 bis 2015. Der “Klima-Referenzdatensatz Sachsen” ist über ReKIS (www.rekis.org) frei zugänglich. Redaktionsschluss: 04.07.201

    ECOSTRESS: NASA's next generation mission to measure evapotranspiration from the International Space Station

    Get PDF
    The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station ECOSTRESS) was launched to the International Space Station on June 29, 2018. The primary science focus of ECOSTRESS is centered on evapotranspiration (ET), which is produced as level‐3 (L3) latent heat flux (LE) data products. These data are generated from the level‐2 land surface temperature and emissivity product (L2_LSTE), in conjunction with ancillary surface and atmospheric data. Here, we provide the first validation (Stage 1, preliminary) of the global ECOSTRESS clear‐sky ET product (L3_ET_PT‐JPL, version 6.0) against LE measurements at 82 eddy covariance sites around the world. Overall, the ECOSTRESS ET product performs well against the site measurements (clear‐sky instantaneous/time of overpass: r2 = 0.88; overall bias = 8%; normalized RMSE = 6%). ET uncertainty was generally consistent across climate zones, biome types, and times of day (ECOSTRESS samples the diurnal cycle), though temperate sites are over‐represented. The 70 m high spatial resolution of ECOSTRESS improved correlations by 85%, and RMSE by 62%, relative to 1 km pixels. This paper serves as a reference for the ECOSTRESS L3 ET accuracy and Stage 1 validation status for subsequent science that follows using these data

    The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data

    Get PDF
    The FLUXNET2015 dataset provides ecosystem-scale data on CO2, water, and energy exchange between the biosphere and the atmosphere, and other meteorological and biological measurements, from 212 sites around the globe (over 1500 site-years, up to and including year 2014). These sites, independently managed and operated, voluntarily contributed their data to create global datasets. Data were quality controlled and processed using uniform methods, to improve consistency and intercomparability across sites. The dataset is already being used in a number of applications, including ecophysiology studies, remote sensing studies, and development of ecosystem and Earth system models. FLUXNET2015 includes derived-data products, such as gap-filled time series, ecosystem respiration and photosynthetic uptake estimates, estimation of uncertainties, and metadata about the measurements, presented for the first time in this paper. In addition, 206 of these sites are for the first time distributed under a Creative Commons (CC-BY 4.0) license. This paper details this enhanced dataset and the processing methods, now made available as open-source codes, making the dataset more accessible, transparent, and reproducible.Peer reviewe

    Author Correction: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data

    Get PDF

    TMbed – Transmembrane proteins predicted through Language Model embeddings

    Full text link
    AbstractBackgroundDespite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4-5 times underrepresented compared to non-TMPs. Today’s top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions.ResultsHere, we present TMbed, a novel method inputting embeddings from protein Language Models (pLMs, here ProtT5), to predict for each residue one of four classes: transmembrane helix (TMH), transmembrane strand (TMB), signal peptide, or other. TMbed completes predictions for entire proteomes within hours on a single consumer-grade desktop machine at performance levels similar or better than methods, which are using evolutionary information from multiple sequence alignments (MSAs) of protein families. On the per-protein level, TMbed correctly identified 94±8% of the beta barrel TMPs (53 of 57) and 98±1% of the alpha helical TMPs (557 of 571) in a non-redundant data set, at false positive rates well below 1% (erred on 30 of 5654 non-membrane proteins). On the per-segment level, TMbed correctly placed, on average, 9 of 10 transmembrane segments within five residues of the experimental observation. Our method can handle sequences of up to 4200 residues on standard graphics cards used in desktop PCs (e.g., NVIDIA GeForce RTX 3060).ConclusionsBased on embeddings from pLMs and two novel filters (Gaussian and Viterbi), TMbed predicts alpha helical and beta barrel TMPs at least as accurately as any other method but at lower false positive rates. Given the few false positives and its outstanding speed, TMbed might be ideal to sieve through millions of 3D structures soon to be predicted, e.g., by AlphaFold2.AvailabilityOur code, method, and data sets are freely available in the GitHub repository, https://github.com/BernhoferM/TMbed.</jats:sec

    Additional file 1 of TMbed: transmembrane proteins predicted through language model embeddings

    No full text
    Additional file 1. Supporting Online Material (SOM) containing additional figures, tables and notes

    Correcting mistakes in predicting distributions

    No full text
    Abstract Motivation Many applications monitor predictions of a whole range of features for biological datasets, e.g. the fraction of secreted human proteins in the human proteome. Results and error estimates are typically derived from publications. Results Here, we present a simple, alternative approximation that uses performance estimates of methods to error-correct the predicted distributions. This approximation uses the confusion matrix (TP true positives, TN true negatives, FP false positives and FN false negatives) describing the performance of the prediction tool for correction. As proof-of-principle, the correction was applied to a two-class (membrane/not) and to a seven-class (localization) prediction. Availability and implementation Datasets and a simple JavaScript tool available freely for all users at http://www.rostlab.org/services/distributions. Supplementary information Supplementary data are available at Bioinformatics online. </jats:sec
    corecore