447 research outputs found
SMART: Unique splitting-while-merging framework for gene clustering
Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc
Recommended from our members
Genome-wide association study of primary open-angle glaucoma in continental and admixed African populations.
Primary open angle glaucoma (POAG) is a complex disease with a major genetic contribution. Its prevalence varies greatly among ethnic groups, and is up to five times more frequent in black African populations compared to Europeans. So far, worldwide efforts to elucidate the genetic complexity of POAG in African populations has been limited. We conducted a genome-wide association study in 1113 POAG cases and 1826 controls from Tanzanian, South African and African American study samples. Apart from confirming evidence of association at TXNRD2 (rs16984299; OR[T] 1.20; P = 0.003), we found that a genetic risk score combining the effects of the 15 previously reported POAG loci was significantly associated with POAG in our samples (OR 1.56; 95% CI 1.26-1.93; P = 4.79 × 10-5). By genome-wide association testing we identified a novel candidate locus, rs141186647, harboring EXOC4 (OR[A] 0.48; P = 3.75 × 10-8), a gene transcribing a component of the exocyst complex involved in vesicle transport. The low frequency and high degree of genetic heterogeneity at this region hampered validation of this finding in predominantly West-African replication sets. Our results suggest that established genetic risk factors play a role in African POAG, however, they do not explain the higher disease load. The high heterogeneity within Africans remains a challenge to identify the genetic commonalities for POAG in this ethnicity, and demands studies of extremely large size
Sharp bounds and normalization of Wiener-type indices
10.1371/journal.pone.0078448PLoS ONE811-POLN
Roto-Translation Covariant Convolutional Networks for Medical Image Analysis
We propose a framework for rotation and translation covariant deep learning
using group convolutions. The group product of the special Euclidean
motion group describes how a concatenation of two roto-translations
results in a net roto-translation. We encode this geometric structure into
convolutional neural networks (CNNs) via group convolutional layers,
which fit into the standard 2D CNN framework, and which allow to generically
deal with rotated input samples without the need for data augmentation.
We introduce three layers: a lifting layer which lifts a 2D (vector valued)
image to an -image, i.e., 3D (vector valued) data whose domain is
; a group convolution layer from and to an -image; and a
projection layer from an -image to a 2D image. The lifting and group
convolution layers are covariant (the output roto-translates with the
input). The final projection layer, a maximum intensity projection over
rotations, makes the full CNN rotation invariant.
We show with three different problems in histopathology, retinal imaging, and
electron microscopy that with the proposed group CNNs, state-of-the-art
performance can be achieved, without the need for data augmentation by rotation
and with increased performance compared to standard CNNs that do rely on
augmentation.Comment: 8 pages, 2 figures, 1 table, accepted at MICCAI 201
An adaptive version of k-medoids to deal with the uncertainty in clustering heterogeneous data using an intermediary fusion approach
This paper introduces Hk-medoids, a modified version of the standard k-medoids algorithm. The modification extends the algorithm for the problem of clustering complex heterogeneous objects that are described by a diversity of data types, e.g. text, images, structured data and time series. We first proposed an intermediary fusion approach to calculate fused similarities between objects, SMF, taking into account the similarities between the component elements of the objects using appropriate similarity measures. The fused approach entails uncertainty for incomplete objects or for objects which have diverging distances according to the different component. Our implementation of Hk-medoids proposed here works with the fused distances and deals with the uncertainty in the fusion process. We experimentally evaluate the potential of our proposed algorithm using five datasets with different combinations of data types that define the objects. Our results show the feasibility of the our algorithm, and also they show a performance enhancement when comparing to the application of the original SMF approach in combination with a standard k-medoids that does not take uncertainty into account. In addition, from a theoretical point of view, our proposed algorithm has lower computation complexity than the popular PAM implementation
Biclustering models for two-mode ordinal data
The work in this paper introduces finite mixture models that can be used to simul-
taneously cluster the rows and columns of two-mode ordinal categorical response data,
such as those resulting from Likert scale responses. We use the popular proportional
odds parameterisation and propose models which provide insights into major patterns
in the data. Model-fitting is performed using the EM algorithm and a fuzzy allocation
of rows and columns to corresponding clusters is obtained. The clustering ability of the
models is evaluated in a simulation study and demonstrated using two real data sets
Whole-genome sequencing identifies genetic alterations in pediatric low-grade gliomas
The most common pediatric brain tumors are low-grade gliomas (LGGs). We used whole-genome sequencing to identify multiple new genetic alterations involving BRAF, RAF1, FGFR1, MYB, MYBL1 and genes with histone-related functions, including H3F3A and ATRX, in 39 LGGs and low-grade glioneuronal tumors (LGGNTs). Only a single non-silent somatic alteration was detected in 24 of 39 (62%) tumors. Intragenic duplications of the portion of FGFR1 encoding the tyrosine kinase domain (TKD) and rearrangements of MYB were recurrent and mutually exclusive in 53% of grade II diffuse LGGs. Transplantation of Trp53-null neonatal astrocytes expressing FGFR1 with the duplication involving the TKD into the brains of nude mice generated high-grade astrocytomas with short latency and 100% penetrance. FGFR1 with the duplication induced FGFR1 autophosphorylation and upregulation of the MAPK/ERK and PI3K pathways, which could be blocked by specific inhibitors. Focusing on the therapeutically challenging diffuse LGGs, our study of 151 tumors has discovered genetic alterations and potential therapeutic targets across the entire range of pediatric LGGs and LGGNTs.Jinghui Zhang, Gang Wu, Claudia P Miller, Ruth G Tatevossian, James D Dalton, Bo Tang, Wilda Orisme, Chandanamali Punchihewa, Matthew Parker, Ibrahim Qaddoumi, Fredrick A Boop, Charles Lu, Cyriac Kandoth, Li Ding, Ryan Lee, Robert Huether, Xiang Chen, Erin Hedlund, Panduka Nagahawatte, Michael Rusch, Kristy Boggs, Jinjun Cheng, Jared Becksfort, Jing Ma, Guangchun Song, Yongjin Li, Lei Wei, Jianmin Wang, Sheila Shurtleff, John Easton, David Zhao, Robert S Fulton, Lucinda L Fulton, David J Dooling, Bhavin Vadodaria, Heather L Mulder, Chunlao Tang, Kerri Ochoa, Charles G Mullighan, Amar Gajjar, Richard Kriwacki, Denise Sheer, Richard J Gilbertson, Elaine R Mardis, Richard K Wilson, James R Downing, Suzanne J Baker and David W Elliso
Exploring the longitudinal dynamics of herd BVD antibody test results using model-based clustering
Determining the Bovine Viral Diarrhoea (BVD) infection status of cattle herds is a challenge for control and eradication schemes. Given the changing dynamics of BVD virus (BVDV) antibody responses in cattle, classifying herds based on longitudinal changes in the results of BVDV antibody tests could offer a novel, complementary approach to categorising herds that is less likely than the present system to result in a herd’s status changing from year to year, as it is more likely to capture the true exposure dynamics of the farms. This paper describes the dynamics of BVDV antibody test values (measured as percentage positivity (PP)) obtained from 15,500 bovines between 2007 and 2010 from thirty nine cattle herds located in Scotland and Northern England. It explores approaches of classifying herds based on trend, magnitude and shape of their antibody PP trajectories and investigates the epidemiological similarities between farms within the same cluster. Gaussian mixture models were used for the magnitude and shape clustering. Epidemiologically meaningful clusters were obtained. Farm cluster membership depends on clustering approach used. Moderate concordance was found between the shape and magnitude clusters. These methods hold potential for application to enhance control efforts for BVD and other infectious livestock diseases
- …
