Search CORE

60 research outputs found

An efficient density-based clustering algorithm using reverse nearest neighbour

Author: A Gionis
AK Jain
AK Jain
AK Jain
B Mirkin
C Cassisi
C Hennig
CJ Veenman
CT Zahn
DOUGLAS H. FISHER
F Korn
F Limin
H Chang
J Hou
JC Dunn
L Hubert
M Halkidi
Mihael Ankerst
MING TAN
NS Altman
O Arbelaitz
PJ Rousseeuw
RA Fisher
T Caliński
U Maulik
Publication venue
Publication date: 19/11/2018
Field of study

Density-based clustering is the task of discovering high-density regions of entities (clusters) that are separated from each other by contiguous regions of low-density. DBSCAN is, arguably, the most popular density-based clustering algorithm. However, its cluster recovery capabilities depend on the combination of the two parameters. In this paper we present a new density-based clustering algorithm which uses reverse nearest neighbour (RNN) and has a single parameter. We also show that it is possible to estimate a good value for this parameter using a clustering validity index. The RNN queries enable our algorithm to estimate densities taking more than a single entity into account, and to recover clusters that are not well-separated or have different densities. Our experiments on synthetic and real-world data sets show our proposed algorithm outperforms DBSCAN and its recent variant ISDBSCAN.Comment: Accepted in: Computing Conference 2019 in London, UK. http://saiconference.com/Computin

University of Essex Research Repository

arXiv.org e-Print Archive

Crossref

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Author: AK Jain
B Efron
B Hausdorf
B Hausdorf
C Fraley
C Fraley
C Hennig
C Keribin
Catherine Sugar
Chien-Ju Lin
Christian Hennig
Christian Hennig
F Drasgow
G Milligan
H Xiong
HH Bock
L Kaufman
O Arbelaitz
R Tibshirani
T Calinski
TF Cox
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Looking for Conflict: Gaze Dynamics in a Dyadic Mixed-Motive Game

Author: A Abele
A Vinciarelli
Ana Paiva
BG Tabachnick
C Castelfranchi
C Yu
CD Frith
D Heylen
EVd Vliert
G Doherty-Sneddon
HH Kelley
I Poggi
I Poggi
J Nadler
JA Hartigan
JD Boucher
Joana Campos
K Horney
K Sigmund
L Kriesberg
M Argyle
M Argyle
M Foddy
M Tomasello
MG Glaholt
N Bolshakova
NJ Emery
O Arbelaitz
Patrícia Alves-Oliveira
R Bakeman
TS Jones
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

Author: A Bifet
A Gruzd
A Guerbas
A Martínez-Ruiz
A Noruzi
A Noruzi
A Rettinger
A Schubert
A Zuccala
AB Barragáns-Martínez
ARH Fischer
B Mobasher
B Mobasher
B Yang
B Yang
BN Miller
C Romero
C Wang
C Woo-Young
C-L Hsu
CJ Williams
D Ai
D Minguillo
D Pierrakos
D Stuart
D Wilkinson
David Gunnarsson Lorentzen
E Angus
E Kontopoulos
E Orduña-Malea
E Otte
E Romero-Frías
F Aminpour
F Barjak
F Didegah
FM Facca
G Lappas
G Paliouras
G Qiu
G Somprasertsri
GD Kumar
H Kretschmer
H Small
H-F Li
H-W Park
H-W Park
H-W Park
I Aguillo
I-C Yeh
IF Aguillo
J Bar-Ilan
J Bar-Ilan
J Borges
J Canny
J Fernández
J Srivastava
J-C Ou
JA Kirby
JA Pratt
JD Velásquez
JD Velásquez
JL Ortega
JL Ortega
JL Ortega
JL Ortega
JM Kleinberg
JW Palmer
K Holmberg
K Holmberg
K Jonkers
K Poongothai
K-Y Wang
KA-I Nekaris
L Björneborn
L Björneborn
L Björneborn
L Vaughan
L Vaughan
L Vaughan
L Vaughan
L Vaughan
L Zoonen Van
L-W Ku
M Asadi
M Biehl
M Chau
M Cheong
M Deshpande
M Efron
M Eirinaki
M Erfanmanesh
M Shekofteh
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M-L Shyu
MA Bayir
MA Islam
MA Islam
MR Martínez-Torres
MR Martínez-Torres
O Arbelaitz
O Etzioni
O Nasraoui
O Nasraoui
P Ingwersen
P Wang
P Wang
P-H Chou
PB Lang
PB Lang
Q He
Q Zhang
R Ball
R Das
R Duane Ireland
R Kosala
R Malinský
RL Glass
S Alsaleh
S Brin
S Kundu
S Milgram
S-H Lin
SA Hale
SE Cho
T Becher
T Hofmann
T Holloway
T Leeuwen Van
T Takahashi
TC Almind
TJ Ruller
V Panchal
V Popova
VD Blondel
WE Nwagwu
X Polanco
Y Lai
Y Nam
Y Zhang
Yuan Shunbo
Z Huang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms

Crossref

University of Borås

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swepub

Identification of anti-tumour biologics using primary tumour models, 3-D phenotypic screening and image-based multi-parametric profiling

Author: AC Siva
Alan M. Sandercock
AR Webb
B Casar
B Casar
B Neumann
Bram Herpers
C Lloyd
Carl Hay
D Siolas
DC Swinney
DS Spassov
DS Spassov
DS Spassov
DT Dudley
E Fennema
E Lopez-Crapez
G Kollmorgen
G Kurosawa
G Sawada
H Kawasaki
HJ Buhring
J Bierwolf
JC Bezdek
JG Moffat
JI Ikeda
Jim Freeth
JJ Park
Jo Soden
K Fukuchi
K Mark von der
K Yan
Kris F. Sachsenmeier
Kuan Yan
L Breiman
L Turner
Leo S. Price
LH Loo
Lutz Jermutus
Matt Flynn
N Veitonmaki
Nick Holoweckyj
NT Elliott
O Arbelaitz
P Loukopoulos
Qihui Huang
R Genuer
Ralph Minter
Robert Hollingsworth
RZ Lin
S Miura
S Rust
Sandrine Guillard
SE Perry
Steven Rust
T Uekita
T Uekita
T Uekita
TJ Vaughan
VC Daniel
W Yu
X Zhao
Y Feng
Y He
YS DeRose
Z Di
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Low cost parallel solutions for the VRPTW optimization problem

Author: C. Rodriguez
I. Zamakola
O. Arbelaitz
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Crossref

Intelligent Routing System for a Personalised Electronic Tourist

Author: Arbelaitz O
Garcia Ander
Linaza M
Vansteenwegen Pieter
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

When tourists are at a destination, they typically search for information in the Local Tourist Organizations. There, the staff categorizes tourists’ profile and restrictions. Combining this information with their up-to-date knowledge about the local attractions, weather and public transportation, they suggest a personalised route for the tourist agenda. This paper presents an intelligent routing system for a Personalised Electronic Tourist Guide to fulfil the same task. This system improves the automatic route creation functionality of existing PETs to solve better the needs of tourists in several aspects: i) it includes public transportation, ii) it takes varying travelling times into account, adapting to real circumstances as rush-hours, iii) it calculates routes in real time to react to unexpected events, iv) it applies last generation heuristics from Operations Research to create routes efficiently, even in destinations with a large number of point of interests and a dense public transportation network.status: publishe

Lirias

Credible Information Foraging on Social Media

Author: J Liu
O Arbelaitz
P Pirolli
Y Drias
Y Drias
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Analysing Effects of Customer Clustering for Customer’s Account Balance Forecasting

Author: F Petitjean
O Arbelaitz
PJ Rousseeuw
PK Juan
Y Hua
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref