536 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Genetic identification of cytomegaloviruses in a rural population of Côte d'Ivoire.

    Get PDF
    BACKGROUND: Cytomegaloviruses (CMVs) are herpesviruses that infect many mammalian species, including humans. Infection generally passes undetected, but the virus can cause serious disease in individuals with impaired immune function. Human CMV (HCMV) is circulating with high seroprevalence (60-100 %) on all continents. However, little information is available on HCMV genoprevalence and genetic diversity in subsaharan Africa, especially in rural areas of West Africa that are at high risk of human-to-human HCMV transmission. In addition, there is a potential for zoonotic spillover of pathogens through bushmeat hunting and handling in these areas as shown for various retroviruses. Although HCMV and nonhuman CMVs are regarded as species-specific, potential human infection with CMVs of non-human primate (NHP) origin, shown to circulate in the local NHP population, has not been studied. FINDINGS: Analysis of 657 human oral swabs and fecal samples collected from 518 individuals living in 8 villages of Côte d'Ivoire with generic PCR for identification of human and NHP CMVs revealed shedding of HCMV in 2.5 % of the individuals. Determination of glycoprotein B sequences showed identity with strains Towne, AD169 and Toledo, respectively. NHP CMV sequences were not detected. CONCLUSIONS: HCMV is actively circulating in a proportion of the rural Côte d'Ivoire human population with circulating strains being closely related to those previously identified in non-African countries. The lack of NHP CMVs in human populations in an environment conducive to cross-species infection supports zoonotic transmission of CMVs to humans being at most a rare event

    United States Acculturation and Cancer Patients' End-of-Life Care

    Get PDF
    Background: Culture shapes how people understand illness and death, but few studies examine whether acculturation influences patients' end-of-life treatment preferences and medical care. Methods and Findings: In this multi-site, prospective, longitudinal cohort study of terminally-ill cancer patients and their caregivers (n = 171 dyads), trained interviewers administered the United States Acculturation Scale (USAS). The USAS is a 19-item scale developed to assess the degree of "Americanization" in first generation or non-US born caregivers of terminally-ill cancer patients. We evaluated the internal consistency, concurrent, criterion, and content validity of the USAS. We also examined whether caregivers' USAS scores predicted patients' communication, treatment preferences, and end-of-life medical care in multivariable models that corrected for significant confounding influences (e.g. education, country of origin, English proficiency). The USAS measure was internally consistent (Cronbach α = 0.98); and significantly associated with US birthplace (r = 0.66, P<0.0001). USAS scores were predictive of patients' preferences for prognostic information (AOR = 1.31, 95% CI:1.00-1.72), but not comfort asking physicians' questions about care (AOR 1.23, 95% CI:0.87-1.73). They predicted patients' preferences for feeding tubes (AOR = 0.68, 95% CI:0.49-0.99) and wish to avoid dying in an intensive care unit (AOR = 1.36, 95% CI:1.05-1.76). Scores indicating greater acculturation were also associated with increased odds of patient participation in clinical trials (AOR = 2.20, 95% CI:1.28-3.78), compared with lower USAS scores, and greater odds of patients receiving chemotherapy (AOR = 1.59, 95% CI:1.20-2.12). Conclusion: The USAS is a reliable and valid measure of "Americanization" associated with advanced cancer patients' end-of-life preferences and care. USAS scores indicating greater caregiver acculturation were associated with increased odds of patient participation in cancer treatment (chemotherapy, clinical trials) compared with lower scores. Future studies should examine the effects of acculturation on end-of-life care to identify patient and provider factors that explain these effects and targets for future interventions to improve care (e.g., by designing more culturally-competent health education materials). © 2013 Wright et al

    Identification and characterization of antibacterial compound(s) of cockroaches (Periplaneta americana)

    Get PDF
    Infectious diseases remain a significant threat to human health, contributing to more than 17 million deaths, annually. With the worsening trends of drug resistance, there is a need for newer and more powerful antimicrobial agents. We hypothesized that animals living in polluted environments are potential source of antimicrobials. Under polluted milieus, organisms such as cockroaches encounter different types of microbes, including superbugs. Such creatures survive the onslaught of superbugs and are able to ward off disease by producing antimicrobial substances. Here, we characterized antibacterial properties in extracts of various body organs of cockroaches (Periplaneta americana) and showed potent antibacterial activity in crude brain extract against methicillin-resistant Staphylococcus aureus and neuropathogenic E. coli K1. The size-exclusion spin columns revealed that the active compound(s) are less than 10 kDa in molecular mass. Using cytotoxicity assays, it was observed that pre-treatment of bacteria with lysates inhibited bacteria-mediated host cell cytotoxicity. Using spectra obtained with LC-MS on Agilent 1290 infinity liquid chromatograph, coupled with an Agilent 6460 triple quadruple mass spectrometer, tissues lysates were analyzed. Among hundreds of compounds, only a few homologous compounds were identified that contained isoquinoline group, chromene derivatives, thiazine groups, imidazoles, pyrrole containing analogs, sulfonamides, furanones, flavanones, and known to possess broad-spectrum antimicrobial properties, and possess anti-inflammatory, anti-tumour, and analgesic properties. Further identification, characterization and functional studies using individual compounds can act as a breakthrough in developing novel therapeutics against various pathogens including superbugs

    Competing risk and heterogeneity of treatment effect in clinical trials

    Get PDF
    It has been demonstrated that patients enrolled in clinical trials frequently have a large degree of variation in their baseline risk for the outcome of interest. Thus, some have suggested that clinical trial results should routinely be stratified by outcome risk using risk models, since the summary results may otherwise be misleading. However, variation in competing risk is another dimension of risk heterogeneity that may also underlie treatment effect heterogeneity. Understanding the effects of competing risk heterogeneity may be especially important for pragmatic comparative effectiveness trials, which seek to include traditionally excluded patients, such as the elderly or complex patients with multiple comorbidities. Indeed, the observed effect of an intervention is dependent on the ratio of outcome risk to competing risk, and these risks – which may or may not be correlated – may vary considerably in patients enrolled in a trial. Further, the effects of competing risk on treatment effect heterogeneity can be amplified by even a small degree of treatment related harm. Stratification of trial results along both the competing and the outcome risk dimensions may be necessary if pragmatic comparative effectiveness trials are to provide the clinically useful information their advocates intend

    Genome-Wide Analysis of Neuroblastomas using High-Density Single Nucleotide Polymorphism Arrays

    Get PDF
    BACKGROUND: Neuroblastomas are characterized by chromosomal alterations with biological and clinical significance. We analyzed paired blood and primary tumor samples from 22 children with high-risk neuroblastoma for loss of heterozygosity (LOH) and DNA copy number change using the Affymetrix 10K single nucleotide polymorphism (SNP) array. FINDINGS: Multiple areas of LOH and copy number gain were seen. The most commonly observed area of LOH was on chromosome arm 11q (15/22 samples; 68%). Chromosome 11q LOH was highly associated with occurrence of chromosome 3p LOH: 9 of the 15 samples with 11q LOH had concomitant 3p LOH (P = 0.016). Chromosome 1p LOH was seen in one-third of cases. LOH events on chromosomes 11q and 1p were generally accompanied by copy number loss, indicating hemizygous deletion within these regions. The one exception was on chromosome 11p, where LOH in all four cases was accompanied by normal copy number or diploidy, implying uniparental disomy. Gain of copy number was most frequently observed on chromosome arm 17q (21/22 samples; 95%) and was associated with allelic imbalance in six samples. Amplification of MYCN was also noted, and also amplification of a second gene, ALK, in a single case. CONCLUSIONS: This analysis demonstrates the power of SNP arrays for high-resolution determination of LOH and DNA copy number change in neuroblastoma, a tumor in which specific allelic changes drive clinical outcome and selection of therapy

    A Biomedically Enriched Collection of 7000 Human ORF Clones

    Get PDF
    We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in ∼15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance
    corecore