447 research outputs found

    SMART: Unique splitting-while-merging framework for gene clustering

    Get PDF
    Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc

    Statement of accounting principles

    Get PDF
    American Institute of Accountants

    Roto-Translation Covariant Convolutional Networks for Medical Image Analysis

    Full text link
    We propose a framework for rotation and translation covariant deep learning using SE(2)SE(2) group convolutions. The group product of the special Euclidean motion group SE(2)SE(2) describes how a concatenation of two roto-translations results in a net roto-translation. We encode this geometric structure into convolutional neural networks (CNNs) via SE(2)SE(2) group convolutional layers, which fit into the standard 2D CNN framework, and which allow to generically deal with rotated input samples without the need for data augmentation. We introduce three layers: a lifting layer which lifts a 2D (vector valued) image to an SE(2)SE(2)-image, i.e., 3D (vector valued) data whose domain is SE(2)SE(2); a group convolution layer from and to an SE(2)SE(2)-image; and a projection layer from an SE(2)SE(2)-image to a 2D image. The lifting and group convolution layers are SE(2)SE(2) covariant (the output roto-translates with the input). The final projection layer, a maximum intensity projection over rotations, makes the full CNN rotation invariant. We show with three different problems in histopathology, retinal imaging, and electron microscopy that with the proposed group CNNs, state-of-the-art performance can be achieved, without the need for data augmentation by rotation and with increased performance compared to standard CNNs that do rely on augmentation.Comment: 8 pages, 2 figures, 1 table, accepted at MICCAI 201

    An adaptive version of k-medoids to deal with the uncertainty in clustering heterogeneous data using an intermediary fusion approach

    Get PDF
    This paper introduces Hk-medoids, a modified version of the standard k-medoids algorithm. The modification extends the algorithm for the problem of clustering complex heterogeneous objects that are described by a diversity of data types, e.g. text, images, structured data and time series. We first proposed an intermediary fusion approach to calculate fused similarities between objects, SMF, taking into account the similarities between the component elements of the objects using appropriate similarity measures. The fused approach entails uncertainty for incomplete objects or for objects which have diverging distances according to the different component. Our implementation of Hk-medoids proposed here works with the fused distances and deals with the uncertainty in the fusion process. We experimentally evaluate the potential of our proposed algorithm using five datasets with different combinations of data types that define the objects. Our results show the feasibility of the our algorithm, and also they show a performance enhancement when comparing to the application of the original SMF approach in combination with a standard k-medoids that does not take uncertainty into account. In addition, from a theoretical point of view, our proposed algorithm has lower computation complexity than the popular PAM implementation

    Biclustering models for two-mode ordinal data

    Get PDF
    The work in this paper introduces finite mixture models that can be used to simul- taneously cluster the rows and columns of two-mode ordinal categorical response data, such as those resulting from Likert scale responses. We use the popular proportional odds parameterisation and propose models which provide insights into major patterns in the data. Model-fitting is performed using the EM algorithm and a fuzzy allocation of rows and columns to corresponding clusters is obtained. The clustering ability of the models is evaluated in a simulation study and demonstrated using two real data sets

    Whole-genome sequencing identifies genetic alterations in pediatric low-grade gliomas

    Get PDF
    The most common pediatric brain tumors are low-grade gliomas (LGGs). We used whole-genome sequencing to identify multiple new genetic alterations involving BRAF, RAF1, FGFR1, MYB, MYBL1 and genes with histone-related functions, including H3F3A and ATRX, in 39 LGGs and low-grade glioneuronal tumors (LGGNTs). Only a single non-silent somatic alteration was detected in 24 of 39 (62%) tumors. Intragenic duplications of the portion of FGFR1 encoding the tyrosine kinase domain (TKD) and rearrangements of MYB were recurrent and mutually exclusive in 53% of grade II diffuse LGGs. Transplantation of Trp53-null neonatal astrocytes expressing FGFR1 with the duplication involving the TKD into the brains of nude mice generated high-grade astrocytomas with short latency and 100% penetrance. FGFR1 with the duplication induced FGFR1 autophosphorylation and upregulation of the MAPK/ERK and PI3K pathways, which could be blocked by specific inhibitors. Focusing on the therapeutically challenging diffuse LGGs, our study of 151 tumors has discovered genetic alterations and potential therapeutic targets across the entire range of pediatric LGGs and LGGNTs.Jinghui Zhang, Gang Wu, Claudia P Miller, Ruth G Tatevossian, James D Dalton, Bo Tang, Wilda Orisme, Chandanamali Punchihewa, Matthew Parker, Ibrahim Qaddoumi, Fredrick A Boop, Charles Lu, Cyriac Kandoth, Li Ding, Ryan Lee, Robert Huether, Xiang Chen, Erin Hedlund, Panduka Nagahawatte, Michael Rusch, Kristy Boggs, Jinjun Cheng, Jared Becksfort, Jing Ma, Guangchun Song, Yongjin Li, Lei Wei, Jianmin Wang, Sheila Shurtleff, John Easton, David Zhao, Robert S Fulton, Lucinda L Fulton, David J Dooling, Bhavin Vadodaria, Heather L Mulder, Chunlao Tang, Kerri Ochoa, Charles G Mullighan, Amar Gajjar, Richard Kriwacki, Denise Sheer, Richard J Gilbertson, Elaine R Mardis, Richard K Wilson, James R Downing, Suzanne J Baker and David W Elliso

    Exploring the longitudinal dynamics of herd BVD antibody test results using model-based clustering

    Get PDF
    Determining the Bovine Viral Diarrhoea (BVD) infection status of cattle herds is a challenge for control and eradication schemes. Given the changing dynamics of BVD virus (BVDV) antibody responses in cattle, classifying herds based on longitudinal changes in the results of BVDV antibody tests could offer a novel, complementary approach to categorising herds that is less likely than the present system to result in a herd’s status changing from year to year, as it is more likely to capture the true exposure dynamics of the farms. This paper describes the dynamics of BVDV antibody test values (measured as percentage positivity (PP)) obtained from 15,500 bovines between 2007 and 2010 from thirty nine cattle herds located in Scotland and Northern England. It explores approaches of classifying herds based on trend, magnitude and shape of their antibody PP trajectories and investigates the epidemiological similarities between farms within the same cluster. Gaussian mixture models were used for the magnitude and shape clustering. Epidemiologically meaningful clusters were obtained. Farm cluster membership depends on clustering approach used. Moderate concordance was found between the shape and magnitude clusters. These methods hold potential for application to enhance control efforts for BVD and other infectious livestock diseases
    corecore