1,378 research outputs found

    Scalable Similarity Search for Molecular Descriptors

    Full text link
    Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

    Prediction of Hydrate and Solvate Formation Using Statistical Models

    Get PDF
    Novel, knowledge based models for the prediction of hydrate and solvate formation are introduced, which require only the molecular formula as input. A data set of more than 19 000 organic, nonionic, and nonpolymeric molecules was extracted from the Cambridge Structural Database. Molecules that formed solvates were compared with those that did not using molecular descriptors and statistical methods, which allowed the identification of chemical properties that contribute to solvate formation. The study was conducted for five types of solvates: ethanol, methanol, dichloromethane, chloroform, and water solvates. The identified properties were all related to the size and branching of the molecules and to the hydrogen bonding ability of the molecules. The corresponding molecular descriptors were used to fit logistic regression models to predict the probability of any given molecule to form a solvate. The established models were able to predict the behavior of ∼80% of the data correctly using only two descriptors in the predictive model

    GTI-space : the space of generalized topological indices

    Get PDF
    A new extension of the generalized topological indices (GTI) approach is carried out torepresent 'simple' and 'composite' topological indices (TIs) in an unified way. Thisapproach defines a GTI-space from which both simple and composite TIs represent particular subspaces. Accordingly, simple TIs such as Wiener, Balaban, Zagreb, Harary and Randićconnectivity indices are expressed by means of the same GTI representation introduced for composite TIs such as hyper-Wiener, molecular topological index (MTI), Gutman index andreverse MTI. Using GTI-space approach we easily identify mathematical relations between some composite and simple indices, such as the relationship between hyper-Wiener and Wiener index and the relation between MTI and first Zagreb index. The relation of the GTI space with the sub-structural cluster expansion of property/activity is also analysed and some routes for the applications of this approach to QSPR/QSAR are also given

    Peripheral T-cell lymphoma unspecified (PTCL-U): a new prognostic model from a retrospective multicentric clinical study

    Get PDF
    To assess the prognosis of peripheral T-cell lymphoma unspecified, we retrospectively analyzed 385 cases fulfilling the criteria defined by the World Health Organization classification. Factors associated with a worse overall survival (OS) in a univariate analysis were age older than 60 years (P=.0002), equal to or more than 2 extranodal sites (P=.0002), lactic dehydrogenase (LDH) value at normal levels or above (P<.0001), performance status (PS) equal to or more than 2 (Pless than or equal to.0001), stage III or higher (P=.0001), and bone marrow involvement (P=.0001). Multivariate analysis showed that age (relative risk, 1.732; 95% CI, 1.300-2.309; P<.0001), PS (relative risk, 1.719; 95% CI, 1.269-2.327, P<.0001), LDH level (relative risk, 1.905; 95% CI, 1.415-2.564; P<.0001), and bone marrow involvement (relative risk, 1.454; 95% CI, 1.045-2.023; P=.026) were factors independently predictive for survival. Using these 4 variables we constructed a new prognostic model that singled out 4 groups at different risk: group 1, no adverse factors, with 5-year and 10-year OS of 62.3% and 54.9%, respectively; group 2, one factor, with a 5-year and 10-year OS of 52.9% and 38.8%, respectively; group 3, 2 factors, with 5-year and 10-year OS of 32.9% and 18.0%, respectively; group 4,3 or 4 factors, with a 5-year and 10-year OS of 18.3 and 12.6%, respectively (Pless than or equal to.0001; log-rank, 66.79)

    Pneumocystis carinii pneumonia in patients with malignant haematological diseases: 10 years' experience of infection in GIMEMA centres.

    Get PDF
    A retrospective survey was conducted over a 10-year period (1990-99) among 52 haematology divisions in order to evaluate the clinical and laboratory characteristics and outcome of patients with proven Pneumocystis carinii pneumonia (PCP) complicating haematological diseases. The study included 55 patients (18 with non-Hodgkin's lymphoma, 10 with acute lymphoblastic leukaemia, eight with acute myeloid leukaemia, five with chronic myeloid leukaemia, four with chronic lymphocytic leukaemia, four with multiple myeloma, three with myelodys-plastic syndrome, two with myelofibrosis and one with thalassemia) who developed PCP. Among these, 18 (33%) underwent stem cell transplantation; only two received an oral prophylaxis with trimethroprim/sulphamethoxazole. Twelve patients (22%) developed PCP despite protective isolation in a laminar airflow room. The most frequent symptoms were: fever (86%), dyspnoea (78%), non-productive cough (71%), thoracic pain (14%) and chills (5%); a severe hypoxaemia was present in 39 patients (71%). Chest radiography or computerized tomography showed interstitial infiltrates in 34 patients (62%), alveolar infiltrates in 12 patients (22%), and alveolar-interstitial infiltrates in nine patients (16%). Bronchoalveolar lavage was diagnostic in 47/48 patients, induced sputum in 9/18 patients and lung biopsy in 3/8 patients. The diagnosis was made in two patients at autopsy. All patients except one started a specific treatment (52 patients trimethroprim/sulphamethoxazole, one pentamidine and one dapsone). Sixteen patients (29%) died of PCP within 30 d of diagnosis. Multivariate analysis showed that prolonged steroid treatment (P < 0.006) and a radiological picture of diffuse lung involvement (P < 0.003) were negative diagnostic factors

    A new dataset of global irrigation areas from 2001 to 2015

    Get PDF
    About 40% of global crop production takes place on irrigated land, which accounts for approximately 20% of the global farmland. The great majority of freshwater consumption by human societies is associated with irrigation, which contributes to a major modification of the global water cycle by enhancing evapotranspiration and reducing surface and groundwater runoff. In many regions of the world irrigation contributes to streamflow and groundwater depletion, soil salinization, cooler microclimate conditions, and altered land-atmosphere interactions. Despite the important role played by irrigation in food security, water cycle, soil productivity, and near-surface atmospheric conditions, its global extent remains poorly quantified. To date global maps of irrigated land are often based on estimates from circa year 2000. Here we apply artificial intelligence methods based on machine learning algorithms to satellite remote sensing and monthly climate data to map the spatial extent of irrigated areas between 2001 and 2015. We provide global annual maps of irrigated land at ≈9km resolution for the 2001-2015 and we make this dataset available online

    Niche differentiation mechanisms among canopy frugivores and zoochoric trees in the northeastern extreme of the Amazon

    Get PDF
    Frugivores and zoocoric trees represent an important proportion of tropical rainforest biodiversity. As niche differences favor species coexistence, we aimed to evaluate morphological and temporal niche segregation mechanisms among zoochoric trees and canopy frugivores in a tropical rainforest in the northeastern extreme of the Brazilian Amazon. We tested the effects of fruit morphology, tree size, frugivore body size and time of day on fruit consumption. We recorded the frugivore species that fed on 72 trees (44 species, 22 genera) and whether these frugivores swallowed the seeds. We monitored trees only once from 07:00 to 17:00 h between January and September 2017. We observed fruit consumption in 20 of the 72 trees. Seventy-three frugivore individuals from 22 species visited the trees. Heavier fruits were consumed by larger frugivores, while seed size was inversely correlated with frugivore size. Narrower fruits and fruits with smaller seeds had greater probability of having their seeds ingested, and larger frugivores were more prone to ingest seeds. Trees bearing fruits with smaller seeds were visited by a greater number of frugivores. Taxonomic groups differed in the time of arrival at fruiting trees. None of the evaluated variables (fruit weight and size, and seed size) affected the richness of frugivores that visited the trees. We concluded that, in the studied forest, fruit morphology (weight, size and seed size) is a niche segregation mechanism among zoochoric trees, while body size and time of day are niche segregation mechanisms among frugivores.info:eu-repo/semantics/publishedVersio

    The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation.

    Get PDF
    In the European Union, medicines are authorised for some rare disease only if they are judged to be dissimilar to authorised orphan drugs for that disease. This paper describes the use of 2D fingerprints to show the extent of the relationship between computed levels of structural similarity for pairs of molecules and expert judgments of the similarities of those pairs. The resulting relationship can be used to provide input to the assessment of new active compounds for which orphan drug authorisation is being sought

    Modeling complex metabolic reactions, ecological systems, and financial and legal networks with MIANN models based on Markov-Wiener node descriptors

    Get PDF
    [Abstract] The use of numerical parameters in Complex Network analysis is expanding to new fields of application. At a molecular level, we can use them to describe the molecular structure of chemical entities, protein interactions, or metabolic networks. However, the applications are not restricted to the world of molecules and can be extended to the study of macroscopic nonliving systems, organisms, or even legal or social networks. On the other hand, the development of the field of Artificial Intelligence has led to the formulation of computational algorithms whose design is based on the structure and functioning of networks of biological neurons. These algorithms, called Artificial Neural Networks (ANNs), can be useful for the study of complex networks, since the numerical parameters that encode information of the network (for example centralities/node descriptors) can be used as inputs for the ANNs. The Wiener index (W) is a graph invariant widely used in chemoinformatics to quantify the molecular structure of drugs and to study complex networks. In this work, we explore for the first time the possibility of using Markov chains to calculate analogues of node distance numbers/W to describe complex networks from the point of view of their nodes. These parameters are called Markov-Wiener node descriptors of order kth (Wk). Please, note that these descriptors are not related to Markov-Wiener stochastic processes. Here, we calculated the Wk(i) values for a very high number of nodes (>100,000) in more than 100 different complex networks using the software MI-NODES. These networks were grouped according to the field of application. Molecular networks include the Metabolic Reaction Networks (MRNs) of 40 different organisms. In addition, we analyzed other biological and legal and social networks. These include the Interaction Web Database Biological Networks (IWDBNs), with 75 food webs or ecological systems and the Spanish Financial Law Network (SFLN). The calculated Wk(i) values were used as inputs for different ANNs in order to discriminate correct node connectivity patterns from incorrect random patterns. The MIANN models obtained present good values of Sensitivity/Specificity (%): MRNs (78/78), IWDBNs (90/88), and SFLN (86/84). These preliminary results are very promising from the point of view of a first exploratory study and suggest that the use of these models could be extended to the high-throughput re-evaluation of connectivity in known complex networks (collation)

    Understanding the Influence of Diverse Non‐Volatile Media on Rheological Properties of Thermophilic Biological Sludge and Evaluation of Its Thixotropic Behaviour

    Get PDF
    In this study, the rheological properties of thermophilic biological sludge (TBS) have been investigated evaluating the influence of non‐volatile solids (NVS). Calcium carbonate, sand, and sodium bentonite were separately added to the sludge to evaluate the effect of concentration and type of NVS. Results show that TBS consistency coefficient significantly enhanced increasing sodium bentonite concentration. On the contrary, calcium carbonate and sand showed relatively small influence on the rheological properties of TBS. Thixotropic behaviour of TBS has also been investigated and is more pronounced at higher shear rate (1000 s−1). Double exponential fitting model was the best choice to represent thixotropic behaviour in case of low (100 s−1) and high shear rate (1000 s−1), while a single‐exponential model represents the best option in case of medium shear rate (400 s−1)
    corecore