4,202 research outputs found

    Scatteract: Automated extraction of data from scatter plots

    Full text link
    Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.Comment: Submitted to ECML PKDD 2017 proceedings, 16 page

    A realistic assessment of methods for extracting gene/protein interactions from free text

    Get PDF
    Background: The automated extraction of gene and/or protein interactions from the literature is one of the most important targets of biomedical text mining research. In this paper we present a realistic evaluation of gene/protein interaction mining relevant to potential non-specialist users. Hence we have specifically avoided methods that are complex to install or require reimplementation, and we coupled our chosen extraction methods with a state-of-the-art biomedical named entity tagger. Results: Our results show: that performance across different evaluation corpora is extremely variable; that the use of tagged (as opposed to gold standard) gene and protein names has a significant impact on performance, with a drop in F-score of over 20 percentage points being commonplace; and that a simple keyword-based benchmark algorithm when coupled with a named entity tagger outperforms two of the tools most widely used to extract gene/protein interactions. Conclusion: In terms of availability, ease of use and performance, the potential non-specialist user community interested in automatically extracting gene and/or protein interactions from free text is poorly served by current tools and systems. The public release of extraction tools that are easy to install and use, and that achieve state-of-art levels of performance should be treated as a high priority by the biomedical text mining community

    A Bayesian method for evaluating and discovering disease loci associations

    Get PDF
    Background: A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need. Methodology/Findings: We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found. Conclusions/Significance: We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations. © 2011 Jiang et al

    Edible crabs “Go West”: migrations and incubation cycle of Cancer pagurus revealed by electronic tags

    Get PDF
    Crustaceans are key components of marine ecosystems which, like other exploited marine taxa, show seasonable patterns of distribution and activity, with consequences for their availability to capture by targeted fisheries. Despite concerns over the sustainability of crab fisheries worldwide, difficulties in observing crabs’ behaviour over their annual cycles, and the timings and durations of reproduction, remain poorly understood. From the release of 128 mature female edible crabs tagged with electronic data storage tags (DSTs), we demonstrate predominantly westward migration in the English Channel. Eastern Channel crabs migrated further than western Channel crabs, while crabs released outside the Channel showed little or no migration. Individual migrations were punctuated by a 7-month hiatus, when crabs remained stationary, coincident with the main period of crab spawning and egg incubation. Incubation commenced earlier in the west, from late October onwards, and brooding locations, determined using tidal geolocation, occurred throughout the species range. With an overall return rate of 34%, our results demonstrate that previous reluctance to tag crabs with relatively high-cost DSTs for fear of loss following moulting is unfounded, and that DSTs can generate precise information with regards life-history metrics that would be unachievable using other conventional means

    The price of tumor control: an analysis of rare side effects of anti-CTLA-4 therapy in metastatic melanoma from the ipilimumab network

    Get PDF
    Background: Ipilimumab, a cytotoxic T-lymphocyte antigen-4 (CTLA-4) blocking antibody, has been approved for the treatment of metastatic melanoma and induces adverse events (AE) in up to 64% of patients. Treatment algorithms for the management of common ipilimumab-induced AEs have lead to a reduction of morbidity, e.g. due to bowel perforations. However, the spectrum of less common AEs is expanding as ipilimumab is increasingly applied. Stringent recognition and management of AEs will reduce drug-induced morbidity and costs, and thus, positively impact the cost-benefit ratio of the drug. To facilitate timely identification and adequate management data on rare AEs were analyzed at 19 skin cancer centers. Methods and Findings: Patient files (n = 752) were screened for rare ipilimumab-associated AEs. A total of 120 AEs, some of which were life-threatening or even fatal, were reported and summarized by organ system describing the most instructive cases in detail. Previously unreported AEs like drug rash with eosinophilia and systemic symptoms (DRESS), granulomatous inflammation of the central nervous system, and aseptic meningitis, were documented. Obstacles included patientś delay in reporting symptoms and the differentiation of steroid-induced from ipilimumab-induced AEs under steroid treatment. Importantly, response rate was high in this patient population with tumor regression in 30.9% and a tumor control rate of 61.8% in stage IV melanoma patients despite the fact that some patients received only two of four recommended ipilimumab infusions. This suggests that ipilimumab-induced antitumor responses can have an early onset and that severe autoimmune reactions may reflect overtreatment. Conclusion: The wide spectrum of ipilimumab-induced AEs demands doctor and patient awareness to reduce morbidity and treatment costs and true ipilimumab success is dictated by both objective tumor responses and controlling severe side effects

    Long term time variability of cosmic rays and possible relevance to the development of life on Earth

    Full text link
    An analysis is made of the manner in which the cosmic ray intensity at Earth has varied over its existence and its possible relevance to both the origin and the evolution of life. Much of the analysis relates to the 'high energy' cosmic rays (E>1014eV;=0.1PeVE>10^{14}eV;=0.1PeV) and their variability due to the changing proximity of the solar system to supernova remnants which are generally believed to be responsible for most cosmic rays up to PeV energies. It is pointed out that, on a statistical basis, there will have been considerable variations in the likely 100 My between the Earth's biosphere reaching reasonable stability and the onset of very elementary life. Interestingly, there is the increasingly strong possibility that PeV cosmic rays are responsible for the initiation of terrestrial lightning strokes and the possibility arises of considerable increases in the frequency of lightnings and thereby the formation of some of the complex molecules which are the 'building blocks of life'. Attention is also given to the well known generation of the oxides of nitrogen by lightning strokes which are poisonous to animal life but helpful to plant growth; here, too, the violent swings of cosmic ray intensities may have had relevance to evolutionary changes. A particular variant of the cosmic ray acceleration model, put forward by us, predicts an increase in lightning rate in the past and this has been sought in Korean historical records. Finally, the time dependence of the overall cosmic ray intensity, which manifests itself mainly at sub-10 GeV energies, has been examined. The relevance of cosmic rays to the 'global electrical circuit' points to the importance of this concept.Comment: 18 pages, 5 figures, accepted by 'Surveys in Geophysics

    Stem cell differentiation increases membrane-actin adhesion regulating cell blebability, migration and mechanics

    Get PDF
    This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/K. S. is funded by an EPSRC PhD studentship. S.T. is funded by an EU Marie Curie Intra European Fellowship (GENOMICDIFF)

    A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems

    Full text link
    In this paper we present a methodological framework that meets novel requirements emerging from upcoming types of accelerated and highly configurable neuromorphic hardware systems. We describe in detail a device with 45 million programmable and dynamic synapses that is currently under development, and we sketch the conceptual challenges that arise from taking this platform into operation. More specifically, we aim at the establishment of this neuromorphic system as a flexible and neuroscientifically valuable modeling tool that can be used by non-hardware-experts. We consider various functional aspects to be crucial for this purpose, and we introduce a consistent workflow with detailed descriptions of all involved modules that implement the suggested steps: The integration of the hardware interface into the simulator-independent model description language PyNN; a fully automated translation between the PyNN domain and appropriate hardware configurations; an executable specification of the future neuromorphic system that can be seamlessly integrated into this biology-to-hardware mapping process as a test bench for all software layers and possible hardware design modifications; an evaluation scheme that deploys models from a dedicated benchmark library, compares the results generated by virtual or prototype hardware devices with reference software simulations and analyzes the differences. The integration of these components into one hardware-software workflow provides an ecosystem for ongoing preparative studies that support the hardware design process and represents the basis for the maturity of the model-to-hardware mapping software. The functionality and flexibility of the latter is proven with a variety of experimental results

    A novel approach to simulate gene-environment interactions in complex diseases

    Get PDF
    Background: Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones. Results: We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful. Conclusions: By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study

    All clinically-relevant blood components transmit prion disease following a single blood transfusion: a sheep model of vCJD

    Get PDF
    Variant CJD (vCJD) is an incurable, infectious human disease, likely arising from the consumption of BSE-contaminated meat products. Whilst the epidemic appears to be waning, there is much concern that vCJD infection may be perpetuated in humans by the transfusion of contaminated blood products. Since 2004, several cases of transfusion-associated vCJD transmission have been reported and linked to blood collected from pre-clinically affected donors. Using an animal model in which the disease manifested resembles that of humans affected with vCJD, we examined which blood components used in human medicine are likely to pose the greatest risk of transmitting vCJD via transfusion. We collected two full units of blood from BSE-infected donor animals during the pre-clinical phase of infection. Using methods employed by transfusion services we prepared red cell concentrates, plasma and platelets units (including leucoreduced equivalents). Following transfusion, we showed that all components contain sufficient levels of infectivity to cause disease following only a single transfusion and also that leucoreduction did not prevent disease transmission. These data suggest that all blood components are vectors for prion disease transmission, and highlight the importance of multiple control measures to minimise the risk of human to human transmission of vCJD by blood transfusion
    corecore