Search CORE

2,460 research outputs found

Parallel classification and feature selection in microarray data using SPRINT

Author: Akl SG
Breiman L
Ihaka R
Kotsiantis SB
Liaw A
Shafer JC
Smith CL
Topiẃc G
Publication venue: 'Wiley'
Publication date: 13/09/2012
Field of study

The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing (HPC) systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop‐in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method

Durham Research Online

Crossref

Online Research @ Cardiff

PubMed Central

Edinburgh Research Explorer

Conditional Sampling for Max-Stable Processes with a Mixed Moving Maxima Representation

Author: BM Brown
C Dombry
C Dombry
C Lantuéjoul
D Cooley
DJ Daley
KS Weintraub
LCG Rogers
LCG Rogers
M Oesting
M Schlather
Marco Oesting
Martin Schlather
R Ihaka
RA Davis
RA Davis
S Engelke
SA Stoev
T Gneiting
Y Wang
Z Kabluchko
Z Kabluchko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This paper deals with the question of conditional sampling and prediction for the class of stationary max-stable processes which allow for a mixed moving maxima representation. We develop an exact procedure for conditional sampling using the Poisson point process structure of such processes. For explicit calculations we restrict ourselves to the one-dimensional case and use a finite number of shape functions satisfying some regularity conditions. For more general shape functions approximation techniques are presented. Our algorithm is applied to the Smith process and the Brown-Resnick process. Finally, we compare our computational results to other approaches. Here, the algorithm for Gaussian processes with transformed marginals turns out to be surprisingly competitive.Comment: 35 pages; version accepted for publication in Extremes. The final publication is available at http://link.springer.co

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server (Univ. Mannheim)

A general class of zero-or-one inflated beta regression models

Author: Akaike
Atkinson
Cook
Cook
Cox
Cox
Cox
Cox
Dunn
Espinheira
Espinheira
Fahrmeir
Ferrari
Ferrari
Hoff
Ihaka
Johnson
Kieschnick
Korhonen
McCullagh
McFadden
Moolgavkar
Ospina
Pace
Paolino
Press
Ramalho
Ramsey
Rao
Raydonal Ospina
Rigby
Schwarz
Silvia L.P. Ferrari
Simas
Smithson
Stasinopoulos
Venables
Wei
Yoo
Publication venue: 'Elsevier BV'
Publication date: 02/11/2011
Field of study

This paper proposes a general class of regression models for continuous proportions when the data contain zeros or ones. The proposed class of models assumes that the response variable has a mixed continuous-discrete distribution with probability mass at zero or one. The beta distribution is used to describe the continuous component of the model, since its density has a wide range of different shapes depending on the values of the two parameters that index the distribution. We use a suitable parameterization of the beta law in terms of its mean and a precision parameter. The parameters of the mixture distribution are modeled as functions of regression parameters. We provide inference, diagnostic, and model selection tools for this class of models. A practical application that employs real data is presented.Comment: 21 pages, 3 figures, 5 tables. Computational Statistics and Data Analysis, 17 October 2011, ISSN 0167-9473 (http://www.sciencedirect.com/science/article/pii/S0167947311003628

arXiv.org e-Print Archive

CiteSeerX

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório da Produção USP (Univ. de São Paulo)

The Robustness of Pathway Analysis in Identifying Potential Drug Targets in Non-Small Cell Lung Carcinoma

Author: Aguilar-Ruiz
Croft
Garnier
González-Barón
Ihaka
Knudsen
Perez-Moreno
Polakis
Smyth
Stekel
Wang
Weinberg
Yu
Publication venue: 'MDPI AG'
Publication date: 01/01/2014
Field of study

The identification of genes responsible for causing cancers from gene expression data has had varied success. Often the genes identified depend on the methods used for detecting expression patterns, or on the ways that the data had been normalized and filtered. The use of gene set enrichment analysis is one way to introduce biological information in order to improve the detection of differentially expressed genes and pathways. In this paper we show that the use of network models while still subject to the problems of normalization is a more robust method for detecting pathways that are differentially overrepresented in lung cancer data. Such differences may provide opportunities for novel therapeutics. In addition, we present evidence that non-small cell lung carcinoma is not a series of homogeneous diseases; rather that there is a heterogeny within the genotype which defies phenotype classification. This diversity helps to explain the lack of progress in developing therapies against non-small cell carcinoma and suggests that drug development may consider multiple pathways as treatment targets

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

PubMed Central

WestminsterResearch

Comparison of the CPU and memory performance of StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA)

Author: Allwein
Bay
Breiman
Breiman
Breiman
Breiman
Breiman
Breiman
Breiman
Brun
Dasarathy
Duda
Fisher
Freund
Friedman
Friedman
Friedman
G. Palombo
Hastie
Haykin
Ihaka
Lam
McLachlan
Meyer
Narsky
Quinlan
Vapnik
Webb
Publication venue: 'Elsevier BV'
Publication date: 28/03/2011
Field of study

High Energy Physics data sets are often characterized by a huge number of events. Therefore, it is extremely important to use statistical packages able to efficiently analyze these unprecedented amounts of data. We compare the performance of the statistical packages StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA). We focus on how CPU time and memory usage of the learning process scale versus data set size. As classifiers, we consider Random Forests, Boosted Decision Trees and Neural Networks. For our tests, we employ a data set widely used in the machine learning community, "Threenorm" data set, as well as data tailored for testing various edge cases. For each data set, we constantly increase its size and check CPU time and memory needed to build the classifiers implemented in SPR and TMVA. We show that SPR is often significantly faster and consumes significantly less memory. For example, the SPR implementation of Random Forest is by an order of magnitude faster and consumes an order of magnitude less memory than TMVA on Threenorm data

arXiv.org e-Print Archive

Crossref

Caltech Authors

Extreme Value Statistics of the Total Energy in an Intermediate Complexity Model of the Mid-latitude Atmospheric Jet. Part I: Stationary case

Author: Allen
Antonio Speranza
Castillo
Cohen
Coles
Eckmann
Embrechts
Felici
Fisher
Galambos
Gallavotti
Gallavotti
Gallavotti
Gnedenko
Houghton
Ihaka
Jenkinson
Karl
Karl
Katz
Katz
Kharin
Klein Tank
Kunkel
Lavagnini
Leadbetter
Lionello
Lorenz
Lucarini
Malguzzi
Mara Felici
Morrison
Nordhaus
Pedlosky
Perrin
Phillips
Renato Vitolo
Rootzén
Speranza
Valerio Lucarini
Vannitsem
Watson
Zhang
Zwiers
Publication venue: 'American Meteorological Society'
Publication date: 11/01/2006
Field of study

A baroclinic model for the atmospheric jet at middle-latitudes is used as a stochastic generator of time series of the total energy of the system. Statistical inference of extreme values is applied to yearly maxima sequences of the time series, in the rigorous setting provided by extreme value theory. In particular, the Generalized Extreme Value (GEV) family of distributions is used here. Several physically realistic values of the parameter

T_E

, descriptive of the forced equator-to-pole temperature gradient and responsible for setting the average baroclinicity in the atmospheric model, are examined. The location and scale GEV parameters are found to have a piecewise smooth, monotonically increasing dependence on

T_E

. This is in agreement with the similar dependence on

T_E

observed in the same system when other dynamically and physically relevant observables are considered. The GEV shape parameter also increases with

T_E

but is always negative, as \textit{a priori} required by the boundedness of the total energy of the system. The sensitivity of the statistical inference process is studied with respect to the selection procedure of the maxima: the roles of both the length of maxima sequences and of the length of data blocks over which the maxima are computed are critically analyzed. Issues related to model sensitivity are also explored by varying the resolution of the system

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Camerino

CERN Document Server

Improved testing inference in mixed linear models

Author: Barndorff-Nielsen
Bartlett
Brazzale
Brown
Cox
Crepeau
Cribari-Neto
Cysneiros
DiCiccio
Doornik
Ferrari
Ferrari
Francisco Cribari-Neto
Ihaka
Lawley
Littel
Pinheiro
Sartori
Sartori
Severini
Silvia L.P. Ferrari
Tatiane F.N. Melo
Verbeke
Zucker
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Mixed linear models are commonly used in repeated measures studies. They account for the dependence amongst observations obtained from the same experimental unit. Oftentimes, the number of observations is small, and it is thus important to use inference strategies that incorporate small sample corrections. In this paper, we develop modified versions of the likelihood ratio test for fixed effects inference in mixed linear models. In particular, we derive a Bartlett correction to such a test and also to a test obtained from a modified profile likelihood function. Our results generalize those in Zucker et al. (Journal of the Royal Statistical Society B, 2000, 62, 827-838) by allowing the parameter of interest to be vector-valued. Additionally, our Bartlett corrections allow for random effects nonlinear covariance matrix structure. We report numerical evidence which shows that the proposed tests display superior finite sample behavior relative to the standard likelihood ratio test. An application is also presented and discussed.Comment: 17 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Repositório da Produção USP (Univ. de São Paulo)

Introduced birds in urban remnant vegetation : does remnant size really matter?

Author: Anderson D. R.
Braysher M.
Burnham K. P.
Catterall C. P.
Chace J. F.
Clarke K. R.
Fox M. D.
Green R. J.
Hart Q.
Ihaka R.
Johnston M. J.
Long J. L.
Martin W. K.
Morgan G.
Pell A. S.
Van Vuren D.
Wood K. A.
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

Introduced birds are a pervasive and dominant element of urban ecosystems. We examined the richness and relative abundance of introduced bird species in small (1–5 ha) medium (6–15 ha) and large (>15 ha) remnants of native vegetation within an urban matrix. Transects were surveyed during breeding and non-breeding seasons. There was a significant relationship between introduced species richness and remnant size with larger remnants supporting more introduced species. There was no significant difference in relative abundance of introduced species in remnants of different sizes. Introduced species, as a proportion of the relative abundance of the total avifauna (native and introduced species), did not vary significantly between remnants of differing sizes. There were significant differences in the composition of introduced bird species between the different remnant sizes, with large remnants supporting significantly different assemblages than medium and small remnants. Other variables also have substantial effects on the abundance of introduced bird species. The lack of significant differences in abundance between remnant sizes suggests they were all equally susceptible to invasion. No patches in the urban matrix are likely to be unaffected by introduced species. The effective long-term control of introduced bird species is difficult and resources may be better spent managing habitat in a way which renders it less suitable for introduced species (e.g. reducing areas of disturbed ground and weed dominated areas).<br /

DRO Deakin Research Online

Crossref

Federation ResearchOnline

Superclusters of galaxies from the 2dF redshift survey. II. Comparison with simulations

Author: A. Knebe
Abell
Abell
Bahcall
Balogh
Basilakos
Basilakos
Bond
Colless
Croton
Croton
D. Tucker
Doroshkevich
Doroshkevich
E. Saar
E. Tago
Einasto
Einasto
Einasto
Einasto
Einasto
Einasto
Einasto
Einasto
Einasto
Einasto
Erdogdu
Fleenor
G. Hütsi
Gao
Goto
Gott
Gregory
I. Suhhonenko
Ihaka
J. Einasto
J. Jaaniste
Jaaniste
Jõeveer
Kalinkov
Kasun
Klypin
Kolokotronis
L. J. Liivamägi
Lahav
Lahav
M. Einasto
M. Jõeveer
Nichol
Oort
P. Heinämäki
Porter
Praton
Proust
Ragone
Ragone
Sahni
Springel
Tago
V. Müller
Zeldovich
Zucca
Publication venue: 'EDP Sciences'
Publication date: 26/04/2006
Field of study

We investigate properties of superclusters of galaxies found on the basis of the 2dF Galaxy Redshift Survey, and compare them with properties of superclusters from the Millennium Simulation. We study the dependence of various characteristics of superclusters on their distance from the observer, on their total luminosity, and on their multiplicity. The multiplicity is defined by the number of Density Field (DF) clusters in superclusters. Using the multiplicity we divide superclusters into four richness classes: poor, medium, rich and extremely rich. We show that superclusters are asymmetrical and have multi-branching filamentary structure, with the degree of asymmetry and filamentarity being higher for the more luminous and richer superclusters. The comparison of real superclusters with Millennium superclusters shows that most properties of simulated superclusters agree very well with real data, the main differences being in the luminosity and multiplicity distributions.Comment: 15 pages, 13 Figures, submitted for Astronomy and Astrophysic

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

The Dual Origin of the Terrestrial Atmosphere

Author: Abe
Anders
Ballentine
Balsiger
Bar-Nun
Bar-Nun
Benz
Bockelée-Morvan
Burnard
Busemann
Caffee
Chen
Chyba
Chyba
Dauphas
Dauphas
Dauphas
Dauphas
Dauphas
Dauphas
Delsemme
Delsemme
Eberhardt
Holland
Honda
Honda
Hunten
Hunten
Ihaka
Iro
Jagoutz
Kerridge
Kimura
Kleine
Kunz
Laufer
Marty
Marty
Marty
Maréchal
Mathew
Mathew
Matsui
Matsumoto
Mazor
Meier
Meisel
Meisel
Morbidelli
Moreira
Moreira
Nicolas Dauphas
Notesco
Notesco
Owen
Owen
Owen
Owen
Ozima
Ozima
Ozima
Pepin
Pepin
Pepin
Pepin
Pepin
Pepin
Podosek
Porcelli
Porcelli
Press
Sagan
Sarda
Sasaki
Schoenberg
Shoemaker
Swindle
Swindle
Tolstikhin
Trieloff
Walker
Wieler
Wieler
Yamamoto
Yin
Yokochi
Zahnle
Zahnle
Zahnle
Zahnle
Zahnle
Zhang
Publication venue: 'Elsevier BV'
Publication date: 28/06/2003
Field of study

The origin of the terrestrial atmosphere is one of the most puzzling enigmas in the planetary sciences. It is suggested here that two sources contributed to its formation, fractionated nebular gases and accreted cometary volatiles. During terrestrial growth, a transient gas envelope was fractionated from nebular composition. This transient atmosphere was mixed with cometary material. The fractionation stage resulted in a high Xe/Kr ratio, with xenon being more isotopically fractionated than krypton. Comets delivered volatiles having low Xe/Kr ratios and solar isotopic compositions. The resulting atmosphere had a near-solar Xe/Kr ratio, almost unfractionated krypton delivered by comets, and fractionated xenon inherited from the fractionation episode. The dual origin therefore provides an elegant solution to the long-standing "missing xenon" paradox. It is demonstrated that such a model could explain the isotopic and elemental abundances of Ne, Ar, Kr, and Xe in the terrestrial atmosphere.Comment: Icarus, in press, 31 pages, 6 tables, and 6 figure

arXiv.org e-Print Archive

Crossref