Search CORE

347 research outputs found

Sequential Symbolic Regression with Genetic Programming

Author: D White
GY Lee
J Demšar
JA Walker
L Vanneschi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference to GP. More importantly, they show the potential of the proposed strategy: an effective way of applying geometric semantic operators to combine different (partial) solutions, avoiding the exponential growth problem arising from the use of these operators

Crossref

Kent Academic Repository

Randomized Reference Classifier with Gaussian Distribution and Soft Confusion Matrix Applied to the Improving Weak Classifiers

Author: B. Bergmann
D Yekutieli
DJ Hand
F Provost
F Wilcoxon
HA David
J Demšar
James O. Berger
JR Quinlan
L Breiman
L Kuncheva
M Friedman
M Hall
M Kurzynski
Marcin Majak
Marek Kurzynski
Marina Sokolova
N Johnson
Pawel Trajdos
R Lysiak
S Garcia
T Cover
T Woloszynski
Y Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/05/2019
Field of study

In this paper, an issue of building the RRC model using probability distributions other than beta distribution is addressed. More precisely, in this paper, we propose to build the RRR model using the truncated normal distribution. Heuristic procedures for expected value and the variance of the truncated-normal distribution are also proposed. The proposed approach is tested using SCM-based model for testing the consequences of applying the truncated normal distribution in the RRC model. The experimental evaluation is performed using four different base classifiers and seven quality measures. The results showed that the proposed approach is comparable to the RRC model built using beta distribution. What is more, for some base classifiers, the truncated-normal-based SCM algorithm turned out to be better at discovering objects coming from minority classes.Comment: arXiv admin note: text overlap with arXiv:1901.0882

arXiv.org e-Print Archive

Crossref

Classification of time series by shapelet transformation

Author: Anthony Bagnall
C Cortes
C Hoare
C Shannon
C Stransky
D Vries De
Edgaras Baranauskas
H Ding
J Demšar
J Lines
James Mapp
Jason Lines
JJ Rodriguez
Jon Hills
L Breiman
L Ye
M Bober
M Hall
N Friedman
P Duarte-Neto
S Campana
W Kruskal
Y Jeong
Z Xing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2014
Field of study

Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A \emph{shapelet} is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best

k

shapelets, which are used to produce a transformed dataset, where each of the

k

features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselve

Crossref

University of East Anglia digital repository

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Crossref

Brunel University Research Archive

Combination of linear classifiers using score function -- analysis of possible combination strategies

Author: AH Ko
AS Britto
B Cyganek
B. Bergmann
C Cortes
CD Manning
D Yekutieli
E Hüllermeier
F Wilcoxon
G Giacinto
Geoffrey J. McLachlan
H Drucker
J Demšar
Karl Pearson
L Xu
L.I. Kuncheva
Luc Devroye
M Friedman
M Hall
M Przybyła-Kasperek
M Przybyła-Kasperek
M Reif
M Skurichina
M Woźniak
Marina Sokolova
S Garcia
S Holm
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/05/2019
Field of study

In this work, we addressed the issue of combining linear classifiers using their score functions. The value of the scoring function depends on the distance from the decision boundary. Two score functions have been tested and four different combination strategies were investigated. During the experimental study, the proposed approach was applied to the heterogeneous ensemble and it was compared to two reference methods -- majority voting and model averaging respectively. The comparison was made in terms of seven different quality criteria. The result shows that combination strategies based on simple average, and trimmed average are the best combination strategies of the geometrical combination

arXiv.org e-Print Archive

Crossref

Iso-osmotic regulation of nitrate accumulation in lettuce (Lactuca sativa L.)

Author: Abd-Elmoniem E. M
Barker A. V.
Behr U.
Breteler H.
Burns I. G.
Cantliffe D. L.
Corré W. J.
Demšar J.
Drews M.
Drews M.
Graves C. J.
Gunes A.
Houba V. J. G.
Ian G. Burns
Kefeng Zhang
Marschner H.
Marschner H.
Mary K. Turner
McCall D.
McCall D.
Mott R. L.
Nobel P. A.
Raab T. K.
Raynal Lacroix C.
Rodney Edmondson
Savvas D.
Scaife A.
Seginer I.
Seginer I.
Steingröver E. G.
Stienstra A. W.
Van Der Boon J.
Wyn Jones R. G.
Zhang K.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2010
Field of study

Concerns about possible health hazards arising from human consumption of lettuce and other edible vegetable crops with high concentrations of nitrate have generated demands for a greater understanding of processes involved in its uptake and accumulation in order to devise more sustainable strategies for its control. This paper evaluates a proposed iso-osmotic mechanism for the regulation of nitrate accumulation in lettuce (Lactuca sativa L.) heads. This mechanism assumes that changes in the concentrations of nitrate and all other endogenous osmotica (including anions, cations and neutral solutes) are continually adjusted in tandem to minimise differences in osmotic potential of the shoot sap during growth, with these changes occurring independently of any variations in external water potential. The hypothesis was tested using data from six new experiments, each with a single unique treatment comprising a separate combination of light intensity, N source (nitrate with or without ammonium) and nitrate concentration carried out hydroponically in a glasshouse using a butterhead lettuce variety. Repeat measurements of plant weights and estimates of all of the main soluble constituents (nitrate, potassium, calcium, magnesium, organic anions, chloride, phosphate, sulphate and soluble carbohydrates) in the shoot sap were made at intervals from about 2 weeks after transplanting until commercial maturity, and the data used to calculate changes in average osmotic potential in the shoot. Results showed that nitrate concentrations in the sap increased when average light levels were reduced by between 30 and 49 % and (to a lesser extent) when nitrate was supplied at a supra-optimal concentration, and declined with partial replacement of nitrate by ammonium in the external nutrient supply. The associated changes in the proportions of other endogenous osmotica, in combination with the adjustment of shoot water content, maintained the total solute concentrations in shoot sap approximately constant and minimised differences in osmotic potential between treatments at each sampling date. There was, however, a gradual increase in osmotic potential (ie a decline in total solute concentration) over time largely caused by increases in shoot water content associated with the physiological and morphological development of the plants. Regression analysis using normalised data (to correct for these time trends) showed that the results were consistent with a 1:1 exchange between the concentrations of nitrate and the sum of all other endogenous osmotica throughout growth, providing evidence that an iso-osmotic mechanism (incorporating both concentration and volume regulation) was involved in controlling nitrate concentrations in the shoot

Crossref

Warwick Research Archives Portal Repository

Improving the k-Nearest Neighbour Rule by an Evolutionary Voting Approach

Author: A. Abraham
D. Mateos-García
E. Corchado
F. Fernandez
J. Demšar
J. Sanz
J.M. Keller
J.M. Mendel
M. Raymer
M.A. Tahir
M.L. Raymer
R. Paredes
S. García
W. Liu
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

This work presents an evolutionary approach to modify the voting system of the k-Nearest Neighbours (kNN). The main novelty of this article lies on the optimization process of voting regardless of the distance of every neighbour. The calculated real-valued vector through the evolutionary process can be seen as the relative contribution of every neighbour to select the label of an unclassified example. We have tested our approach on 30 datasets of the UCI repository and results have been compared with those obtained from other 6 variants of the kNN predictor, resulting in a realistic improvement statistically supported

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Learning Interpretable Rules for Multi-label Classification

Author: A Gabriel
AA Freitas
AJ Knobbe
B Liu
B Minnaert
D Malerba
E Gibaja
E Gibaja
E Loza Mencía
E Montañés
F Charte
F Herrera
F Janssen
F Thabtah
G Bosc
G Tsoumakas
Grigorios Tsoumakas
H Allahyari
J Arunadevi
J Demšar
J Fürnkranz
J Han
J Hipp
J Read
JN Sulzmann
K Dembczyński
K Dembczyński
L Chekina
L Raedt De
LE Sucar
M Atzmüller
M Beckerle
M Friedman
M Zhang
Miltiadis Allamanis
MR Boutell
P Kralj Novak
PJ Hayes
R Senge
RM Cameron-Jones
Shantanu Godbole
W Duivesteijn
W Waegeman
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio

arXiv.org e-Print Archive

TUbiblio

Crossref

Recommended from our members

A visual analytics framework for spatio-temporal analysis and modelling

Author: D Guo
D Keim
G Andrienko
G Andrienko
Gennady Andrienko
JW Sammon
K Matković
MC Hao
N Andrienko
Natalia Andrienko
PC Kyriakidis
PC Kyriakidis
R Maciejewski
R Maciejewski
S Rinzivillo
T Kohonen
T Schreck
U Demšar
Y Kamarianakis
Y Kamarianakis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

To support analysis and modelling of large amounts of spatio-temporal data having the form of spatially referenced time series (TS) of numeric values, we combine interactive visual techniques with computational methods from machine learning and statistics. Clustering methods and interactive techniques are used to group TS by similarity. Statistical methods for TS modelling are then applied to representative TS derived from the groups of similar TS. The framework includes interactive visual interfaces to a library of modelling methods supporting the selection of a suitable method, adjustment of model parameters, and evaluation of the models obtained. The models can be externally stored, communicated, and used for prediction and in further computational analyses. From the visual analytics perspective, the framework suggests a way to externalize spatio-temporal patterns emerging in the mind of the analyst as a result of interactive visual analysis: the patterns are represented in the form of computer-processable and reusable models. From the statistical analysis perspective, the framework demonstrates how TS analysis and modelling can be supported by interactive visual interfaces, particularly, in a case of numerous TS that are hard to analyse individually. From the application perspective, the framework suggests a way to analyse large numbers of spatial TS with the use of well-established statistical methods for TS analysis

City Research Online

Crossref

Fraunhofer-Publica

Impact of UV radiation on the physical properties of polypropylene floating row covers

Author: Demšar A
Svetec DG
Žnidarčič D
Publication venue: 'African Journals Online (AJOL)'
Publication date: 16/10/2013
Field of study

In the intensive horticulture, various ways of protected area are used for the growth of seedlings and the cultivation of vegetables in all seasons. The easiest and the cheapest form of protected area is agrotextile, which can be laid directly over vegetable crops (row cover). Agrotextiles are nonwovens which are manufactured from textile fibres which are usually of chemical origin. Textiles, used as agrotextiles require suitable tensile strength and good permeability characteristics with no significant deterioration under the influence of weather changes and UV radiation. Properties of agrotextiles depend on the fibres made of and on the type and conditions of production. The purpose of this study was to analyse the influence of simulated sun light radiation (xenon lamp) on physical properties of polypropylene (PP) nonwoven material, which is used for the production of agrotextiles. The research showed that the properties of row cover change when radiated with UV light. Tensile, tearing and bursting properties worsen after radiation and air permeability and water vapour show little increase. The changes in the properties are a consequence of changes in fibres, molecular and supermolecular structure which is exhibited in changed fibres and consequently also nonwoven properties.Key words: Agrotextile, polypropylene, nonwovens, UV radiation, properties

AJOL - African Journals Online