949 research outputs found

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Monte Carlo Simulations of Star Clusters - VII. The globular cluster 47 Tuc

    Full text link
    We describe Monte Carlo models for the dynamical evolution of the massive globular cluster 47 Tuc (NGC 104). The code includes treatments of two-body relaxation, most kinds of three- and four-body interactions involving primordial binaries and those formed dynamically, the Galactic tide, and the internal evolution of both single and binary stars. We arrive at a set of initial parameters for the cluster which, after 12Gyr of evolution, gives a model with a fairly satisfactory match to surface brightness and density profiles, the velocity dispersion profile, the luminosity function in two fields, and the acceleration of pulsars. Our models appear to require a relatively steep initial mass function for stars above about turnoff, with an index of about 2.8 (where the Salpeter mass function has an index of 2.35), and a relatively flat initial mass function (index about 0.4) for the lower main sequence. According to the model, the current mass is estimated at 0.9 million solar masses, of which about 34% consists of remnants. We find that primordial binaries are gradually taking over from mass loss by stellar evolution as the main dynamical driver of the core. Despite the high concentration of the cluster, core collapse will take at least another 20Gyr.Comment: 16 pages, 16 figures, revised version submitted to MNRA

    Increased Mortality Exposure within the Family Rather than Individual Mortality Experiences Triggers Faster Life-History Strategies in Historic Human Populations

    Get PDF
    Life History Theory predicts that extrinsic mortality risk is one of the most important factors shaping (human) life histories. Evidence from contemporary populations suggests that individuals confronted with high mortality environments show characteristic traits of fast life-history strategies: they marry and reproduce earlier, have shorter birth intervals and invest less in their offspring. However, little is known of the impact of mortality experiences on the speed of life histories in historical human populations with generally higher mortality risk, and on male life histories in particular. Furthermore, it remains unknown whether individual-level mortality experiences within the family have a greater effect on life-history decisions or family membership explains life-history variation. In a comparative approach using event history analyses, we study the impact of family versus individual-level effects of mortality exposure on two central life-history parameters, ages at first marriage and first birth, in three historical human populations (Germany, Finland, Canada). Mortality experience is measured as the confrontation with sibling deaths within the natal family up to an individual's age of 15. Results show that the speed of life histories is not adjusted according to individual-level mortality experiences but is due to family-level effects. The general finding of lower ages at marriage/reproduction after exposure to higher mortality in the family holds for both females and males. This study provides evidence for the importance of the family environment for reproductive timing while individual-level mortality experiences seem to play only a minor role in reproductive life history decisions in humans

    Porewater methane transport within the gas vesicles of diurnally migrating Chaoborus spp.: An energetic advantage

    Get PDF
    We show that diurnally migrating Chaoborus sp. (phantom midge larvae), which can be highly abundant in eutrophic lakes with anoxic bottom, utilises sediment methane to inflate their tracheal sacs, which provides positive buoyancy to aid vertical migration. This process also effectively transports sediment methane bypassing oxidation to the upper water column, adding to the total methane outflux to the atmosphere

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Mining Medical Data: A Case Study of Endometriosis

    Get PDF
    [[abstract]]Ultrasound guided aspiration of ovarian endometrioma had been tried as an alternative therapeutic modality in patients whose desire to avoid surgery or surgical approach is contraindicated since 1991. Cyst puncture can reduce tumor volume and destruct the cyst wall, alleviate sticking circumstances and enhance the chance of recovery. But simple aspiration without other treatments results in high recurrence rate (28.5 % to 100 %). In order to reduce recurrence after aspiration, ultrasound-guided aspiration with instillation of tetracycline, methotrexate, and recombinant interleukin-2 has been combined and proven to be effective with the recurrence rates of 46.9 %, 18.1 %, and 40 % respectively. Noma et al. (2001) reported that conduct of ethanol instillation for more than 10 min particularly for a case with a single endometrial cyst is considered most effective from the standpoint of recurrence (14.9 %). Our goal is to analyze patients with recurrent pelvic cyst who underwent surgical intervention. The research data are based on clinical diagnosis, symptoms and medical intervention classification, and the cyst numbers are defined as forecast project target. The decision tree, methodology of data mining technology, is used to find the meaningful characteristic as well as each other mutually connection. The experimental result can help the clinical faculty doctors to better diagnose and provide treatment reference for future patients.[[notice]]補正完畢[[incitationindex]]SCI[[booktype]]紙

    NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data

    Get PDF
    Background: Biomedical applications of high-throughput sequencing methods generate a vast amount of data in which numerous chromatin features are mapped along the genome. The results are frequently analysed by creating binary data sets that link the presence/absence of a given feature to specific genomic loci. However, the nucleosome occupancy or chromatin accessibility landscape is essentially continuous. It is currently a challenge in the field to cope with continuous distributions of deep sequencing chromatin readouts and to integrate the different types of discrete chromatin features to reveal linkages between them. Results: Here we introduce the NucTools suite of Perl scripts as well as MATLAB- and R-based visualization programs for a nucleosome-centred downstream analysis of deep sequencing data. NucTools accounts for the continuous distribution of nucleosome occupancy. It allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties such as the nucleosome repeat length. Furthermore, NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors or architectural proteins, and epigenetic marks like histone modifications or DNA methylation. The applications of NucTools are demonstrated for the comparison of several datasets for nucleosome occupancy in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). Conclusions: The typical workflows of data processing and integrative analysis with NucTools reveal information on the interplay of nucleosome positioning with other features such as for example binding of a transcription factor CTCF, regions with stable and unstable nucleosomes, and domains of large organized chromatin K9me2 modifications (LOCKs). As potential limitations and problems we discuss how inter-replicate variability of MNase-seq experiments can be addressed

    Functional similarities between pigeon \u27milk\u27 and mammalian milk : induction of immune gene expression and modification of the microbiota

    Get PDF
    Pigeon ‘milk’ and mammalian milk have functional similarities in terms of nutritional benefit and delivery of immunoglobulins to the young. Mammalian milk has been clearly shown to aid in the development of the immune system and microbiota of the young, but similar effects have not yet been attributed to pigeon ‘milk’. Therefore, using a chicken model, we investigated the effect of pigeon ‘milk’ on immune gene expression in the Gut Associated Lymphoid Tissue (GALT) and on the composition of the caecal microbiota. Chickens fed pigeon ‘milk’ had a faster rate of growth and a better feed conversion ratio than control chickens. There was significantly enhanced expression of immune-related gene pathways and interferon-stimulated genes in the GALT of pigeon ‘milk’-fed chickens. These pathways include the innate immune response, regulation of cytokine production and regulation of B cell activation and proliferation. The caecal microbiota of pigeon ‘milk’-fed chickens was significantly more diverse than control chickens, and appears to be affected by prebiotics in pigeon ‘milk’, as well as being directly seeded by bacteria present in pigeon ‘milk’. Our results demonstrate that pigeon ‘milk’ has further modes of action which make it functionally similar to mammalian milk. We hypothesise that pigeon ‘lactation’ and mammalian lactation evolved independently but resulted in similarly functional products
    corecore