Search CORE

57 research outputs found

Big Data and Causality

Author: A Bate
A Casillas
A Fujita
A Montalto
A Sharma
AKH Tung
B Widrow
BJ Ale
BJM Ale
C Bizer
C Hashimoto
C Mihăilă
C Mihăilă
C Silverstein
CC Chang
CC Yang
CM Bishop
CW Granger
D Birant
D Xu
D Zhang
EA Wan
G Sugihara
G Wu
GF Cooper
GK Gupta
GP Zhang
H Chen
H Chen
H Hassani
H Hassani
H Hassani
H Ibrahim
H Kargupta
H Yang
H Yun
J Cowie
J Han
J Han
J Li
J Li
J Li
J Ma
J Pustejovsky
J Sadek
J Vohradský
JA Suykens
JB Classen
JD Kim
JR Quinlan
JR Quinlan
JR Sato
JV Tu
JW Hunt
JW Seol
K Fundel
L Breiman
L Sanmiquel
L Talmy
L Wang
M Collins
M Hall
M Herland
M Lagazio
M Wahde
MD Richard
N Rizzolo
P Langley
PA Shoemaker
R Agrawal
R Bunescu
R Maesschalck De
R Xu
S Karimi
S Kleinberg
S Lee
S Pyysalo
S Zhang
S Zhao
SC Chen
SH Chen
ST Li
TC Fu
U Fayyad
U Hahn
U Roshan
U Soytas
V Mayer-Schonberger
WW Chow
X Zhang
Y Ji
Y Ji
Y Ji
YL Hsieh
YL Hsieh
Z Ghodsi
Z Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2017
Field of study

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Causality analysis continues to remain one of the fundamental research questions and the ultimate objective for a tremendous amount of scientific studies. In line with the rapid progress of science and technology, the age of big data has significantly influenced the causality analysis on various disciplines especially for the last decade due to the fact that the complexity and difficulty on identifying causality among big data has dramatically increased. Data mining, the process of uncovering hidden information from big data is now an important tool for causality analysis, and has been extensively exploited by scholars around the world. The primary aim of this paper is to provide a concise review of the causality analysis in big data. To this end the paper reviews recent significant applications of data mining techniques in causality analysis covering a substantial quantity of research to date, presented in chronological order with an overview table of data mining applications in causality analysis domain as a reference directory

Crossref

De Montfort University Open Research Archive

EBOC: Ensemble-Based Ordinal Classification in Transportation

Author: Birant D.
Birant U.K.
Moghaddam M.H.Y.
Yildirim P.
Publication venue: Hindawi Limited
Publication date: 01/01/2019
Field of study

Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions. © 2019 Pelin Yildirim et al

Manisa Celal Bayar Üniversitesi Akademik Arşiv Sistemi

K-Linkage: A New Agglomerative Approach for Hierarchical Clustering

Author: BIRANT D.
YILDIRIM P.
Publication venue: 'Universitatea Stefan cel Mare din Suceava'
Publication date: 01/01/2017
Field of study

In agglomerative hierarchical clustering, the traditional approaches of computing cluster distances are single, complete, average and centroid linkages. However, single-link and complete-link approaches cannot always reflect the true underlying relationship between clusters, because they only consider just a single pair between two clusters. This situation may promote the formation of spurious clusters. To overcome the problem, this paper proposes a novel approach, named k-Linkage, which calculates the distance by considering k observations from two clusters separately. This article also introduces two novel concepts: k-min linkage (the average of k closest pairs) and k-max linkage (the average of k farthest pairs). In the experimental studies, the improved hierarchical clustering algorithm based on k-Linkage was executed on five well-known benchmark datasets with varying k values to demonstrate its efficiency. The results show that the proposed k-Linkage method can often produce clusters with better accuracy, compared to the single, complete, average and centroid linkages

Crossref

Directory of Open Access Journals

Dokuz Eylul University Research Information System

Naive Bayes Classifier for Continuous Variables using Novel Method (NBC4D) and Distributions

Author: Birant D
Yildirim P
Publication venue
Publication date: 10/04/2025
Field of study

In data mining, when using Naive Bayes classification technique, it is necessary to overcome the problem of how to deal with continuous attributes. Most previous work has solved the problem either by using discretization, normal method or kernel method. This study proposes the usage of different continuous probability distribution techniques for Naive Bayes classification. It explores various probability density functions of distributions. The experimental results show that the proposed probability distributions also classify continuous data with potentially high accuracy. In addition, this paper introduces a novel method, named NBC4D, which offers a new approach for classification by applying different distribution types on different attributes. The results (obtained classification accuracy rates) show that our proposed method (the usage of more than one distribution types) has success on real-world datasets when compared with the usage of only one well known distribution type

Manisa Celal Bayar Üniversitesi Akademik Arşiv Sistemi

Comparative Analysis of Ensemble Learning Methods for Signal Classification

Author: Birant D
Birant KU
Kut A
Radevski V
Yildirim P
Publication venue
Publication date: 10/04/2025
Field of study

In recent years, the machine learning algorithms commenced to be used widely in signal classification area as well as many other areas. Ensemble learning has become one of the most popular Machine Learning approaches due to the high classification performance it provides. In this study, the application of four fundamental ensemble learning methods (Bagging, Boosting, Stacking, and Voting) with five different classification algorithms (Neural Network, Support Vector Machines, k-Nearest Neighbor, Naive Bayes, and C4.5) with the most optimal parameter values on signal datasets is presented. In the experimental studies, ensemble learning methods were applied on 14 different signal datasets and the results were compared in terms of classification accuracy rates. According to the results, the best classification performance was obtained with the Random Forest algorithm which is a Bagging based method

Manisa Celal Bayar Üniversitesi Akademik Arşiv Sistemi

Statistical Data Generation Using Sample Data

Author: B Nowok
D Birant
T Rabl
T Rabl
Z Ming
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A novel method for discovering spatio-temporal clusters of different sizes, shapes, and densities in the presence of noise

Author: Baiesi M.
Birant D.
Deng M.
Jiantao Bi
Min Deng
Qiliang Liu
Wentao Yang
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Integrating Cluster Analysis to the ARIMA Model for Forecasting Geosensor Data

Author: D. Birant
G. Andrienko
P.J. Rousseeuw
R.D..M. Orkin
S. Pravilovic
S. Rinzivillo
W. Tobler
Y. Kamarianakis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Integrating Cluster Analysis to the ARIMA Model for Forecasting Geosensor Data

Author: D. Birant
G. Andrienko
P.J. Rousseeuw
R.D..M. Orkin
S. Pravilovic
S. Rinzivillo
W. Tobler
Y. Kamarianakis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Clustering geosensor data is a problem that has recently attracted a large amount of research. In this paper, we focus on clustering geophysical time series data measured by a geo-sensor network. Clusters are built by accounting for both spatial and temporal information of data. We use clusters to produce globally meaningful information from time series obtained by individual sensors. The cluster information is integrated to the ARIMA model, in order to yield accurate forecasting results. Experiments investigate the trade-off between accuracy and efficiency of the proposed algorithm

Crossref

Archivio istituzionale della ricerca - Università di Bari