293 research outputs found

    Cost-sensitive learning for rare subtype classfication of lung cancer

    Get PDF
    Machine learning (ML) algorithms assume or promote that the training set is balanced among classes. For imbalanced datasets, even though the overall accuracy is high, the classical machine learning algorithms bias toward the majority class, causing the model fit poorly to the minority class [1,2] which hinders the use of these algorithms for classification of rare events. Strategies to overcome this problem including altering the training data directly to reduce the difference between classes or changing the learning procedure so that the algorithm takes also the minority class into account are proposed [2]⁠. Usually, imbalance problem is handled with oversampling the minority or undersampling the majority class and/or generating synthetic samples from the original training data. Gene expression data is highly valuable and popular data for cancer classification by ML. However, it is highdimensional and severely imbalanced, making gene expression classification a cost-sensitive problem [1]⁠. Cost-sensitive learning (CSL), uses imbalanced costs for classes while making predictions and is required when prediction of minority class is more “interesting” than the other class(es). Instead of maximizing the overall accuracy on all classes while assuming equal costs, the goal is to minimize cost (penalty of a misclassification) as classes are associated with different penalties for misclassification. In this work, subtypes of lung cancer (AD, SC, LaC and SCLC) are classified using different CSL models that are either classical (e.g., support vector machines, naïve bayes, random forest) or ensemble learners, using imbalanced RNA-seq data from TCGA and microarray data from NCBI-GEO. Best performing model is evaluated by appropriate performance metrics (G-mean, accuracy, F-score etc.) and most important feature(s) will be extracted from this model using variable importance values

    Evaluating the integration of proteomic data for the prediction of intracellular fluxes after knockout experiments

    Get PDF
    So far, few large scale kinetic models of metabolic networks have been successfully constructed. The main reasons for this are not only the associated mathematical complexity, but also the large number of unknown kinetic parameters required in the rate equations to define the system. In contrast to kinetic models, the constraint-based modelling approach bypasses these difficulties by using basically only stoichiometric information with certain physicochemical constraints to delimit the solution space without large fitted parameter sets. Although these constraintbased models are highly relevant to predict feasible steady-state fluxes under a diverse range of genetic and environmental conditions, the steady-state assumption may oversimplify cellular behaviour and cannot predict time-course profiles. To overcome these problems, combining these two approaches appears as a reasonable alternative to modelling large-scale metabolic networks. On the other hand, several of the experimental data required for model construction are often rare and in this way it is usually assumed that the enzyme concentrations are constant. In this work, we used a central carbon metabolic network of E. coli to investigate whether including high throughput enzyme concentration data into a kinetic model allows improved predictions of metabolic flux distributions in response to single knockouts perturbations. For this purpose, an E. coli model, based on results obtained from flux balance analysis (FBA) and approximate lin-log kinetics was constructed. The intracellular fluxes distributions, obtained using this model, were compared with published in vivo measurements.(undefined

    Optimality principles in the regulation of metabolic networks.

    Get PDF
    One of the challenging tasks in systems biology is to understand how molecular networks give rise to emergent functionality and whether universal design principles apply to molecular networks. To achieve this, the biophysical, evolutionary and physiological constraints that act on those networks need to be identified in addition to the characterisation of the molecular components and interactions. Then, the cellular “task” of the network—its function—should be identified. A network contributes to organismal fitness through its function. The premise is that the same functions are often implemented in different organisms by the same type of network; hence, the concept of design principles. In biology, due to the strong forces of selective pressure and natural selection, network functions can often be understood as the outcome of fitness optimisation. The hypothesis of fitness optimisation to understand the design of a network has proven to be a powerful strategy. Here, we outline the use of several optimisation principles applied to biological networks, with an emphasis on metabolic regulatory networks. We discuss the different objective functions and constraints that are considered and the kind of understanding that they provide

    Influence of washing and quenching in profiling the metabolome of adherent mammalian cells: A case study with the metastatic breast cancer cell line MDA-MB-231

    Get PDF
    Metabolome characterisation is a powerful tool in oncology. To obtain a valid description of the intracellular metabolome, two of the preparatory steps are crucial, namely washing and quenching. Washing must effectively remove the extracellular media components and quenching should stop the metabolic activities within the cell, without altering the membrane integrity of the cell. Therefore, it is important to evaluate the efficiency of the washing and quenching solvents. In this study, we employed two previously optimised protocols for simultaneous quenching and extraction, and investigated the effects of a number of washing steps/solvents and quenching solvent additives, on metabolite leakage from the adherent metastatic breast cancer cell line MDA-MB-231. We explored five washing protocols and five quenching protocols (including a control for each), and assessed for effectiveness by detecting ATP in the medium and cell morphology changes through scanning electron microscopy (SEM) analyses. Furthermore, we studied the overall recovery of eleven different metabolite classes using the GC-MS technique and compared the results with those obtained from the ATP assay and SEM analysis. Our data demonstrate that a single washing step with PBS and quenching with 60% methanol supplemented with 70 mM HEPES (−50 °C) results in minimum leakage of intracellular metabolites. Little or no interference of PBS (used in washing) and methanol/HEPES (used in quenching) on the subsequent GC-MS analysis step was noted. Together, these findings provide for the first time a systematic study into the washing and quenching steps of the metabolomics workflow for studying adherent mammalian cells, which we believe will improve reliability in the application of metabolomics technology to study adherent mammalian cell metabolism

    Current state and challenges for dynamic metabolic modeling

    Get PDF
    While the stoichiometry of metabolism is probably the best studied cellular level, the dynamics in metabolism can still not be well described, predicted and, thus, engineered. Unknowns in the metabolic flux behavior arise from kinetic interactions, especially allosteric control mechanisms. While the stoichiometry of enzymes is preserved in vitro, their activity and kinetic behavior differs from the in vivo situation. Next to this challenge, it is infeasible to test the interaction of each enzyme with each intracellular metabolite in vitro exhaustively. As a consequence, the whole interacting metabolome has to be studied in vivo to identify the relevant enzymes properties. In this review we discuss current approaches for in vivo perturbation experiments, that is, stimulus response experiments using different setups and quantitative analytical approaches, including dynamic carbon tracing. Next to reliable and informative data, advanced modeling approaches and computational tools are required to identify kinetic mechanisms and their parameters.The authors EV, AT, KN, IR, MO, DM and AW are part of the ERA-IB funded consortium DYNAMICS (ERA-IB-14-081, NWO 053.80.724)

    Plant Growth Promoting Rhizobacteria\u27s (PGPRS) Enzyme Dynamics in Soil Remediation

    Get PDF
    Soil is the basis of agriculture and consists of organic matters, minerals, water, and several gasses. All plants require soil both as an anchor to attach and as water and nutrient source. Unfortunately, lifestyles of humans, industrial progress, chemicals used in agriculture contaminate soil and cause soil pollution. A pollutant may be natural or human‐made in origin such as petroleum hydrocarbons, pesticides, heavy metals, and solvents. Since the quality of the soil affects the growth and product yield of plants, soil pollution is a crucial problem needs to be addressed urgently. Plant growth promoting rhizobacteria (PGPR) are microorganisms living in soil, on the plants roots, or inside the plant. PGPRs synthesize chemicals to stimulate plant growth and promote nutrient uptake, help degrading soil pollutants and fending off pathogens. While some pollutants can be degraded by enzymes produced by bacteria and fungi, degradation of heavy metals requires alternative methods. In this chapter, three enzymes produced by PGPRs are reviewed briefly. Aminocyclopropane‐1‐carboxylate (ACC) deaminase is responsible of lowering the ethylene levels of plants during stress conditions, whereas nitrogenase is responsible for N2 reduction to NH3. Moreover, phytase enables the degradation of phytate which is a main storage form of phosphate in plants

    Making Soil More Accessible to Plants: The Case of Plant Growth Promoting Rhizobacteria

    Get PDF
    Plant Growth Promoting Rhizobacteria (PGPR) are beneficial soil bacteria that can live either symbiotically with plants at rhizosphere or as endophytes living on or inside of the host plants. There are two main mechanisms via PGPR contribute to the plant growth. Direct mechanism consists of phytohormone production (i.e. auxins (IAA), cytokinins and gibberellins), biological nitrogen fixation, solubilizing inorganic phosphates, mineralizing organic phosphate and producing organic matter such as amino acids. As indirect mechanisms, PGPR aid plants in combat against the pathogen microorganisms by means of stimulating the disease-resistance mechanism of plants, promote favorable symbiosis, decontaminate the soil of xenobiotics. PGPR can also help plants to cope against abiotic stress by lowering ethylene levels, or against pathogenic microorganism by means of secreting antibacterial/antifungal substances. Exact mechanisms of PGPR characteristics which stimulate the plant growth or product formation are still under investigation, yet in agriculture, PGPR are used as environmental friendly biofertilizers, biocontrol agents or biostimulants. These beneficial bacteria are usually introduced to the plants either in powder or liquid form or the seeds are covered with the inoculants before sowing. Plants are subject to many different environmental elements. Abiotic factors such as drought or water stress have been one of the main plant growth limiting factors. Agricultural PGPR application is an alternative solution against loss due to the environmental stresses, since breeding a plant with stress resistance trait is a very long and tricky process due to the fact that such traits are controlled by multiple genes. PGPR phytohormone and enzyme (i.e. ACC deaminase) production can decrease the stress levels of plants while enhancing the root structures
    corecore