Search CORE

9 research outputs found

PROBLEMS IN STATISTICAL GENETICS: CLASSIFICATION AND TESTING FOR NETWORK CHANGES

Author: ADOLPHUS WAGALA
Publication venue
Publication date: 07/03/2018
Field of study

his thesis addresses the problems of classification of microarray data and the statistical integration of molecular data to test for network changes. For the classification problem, we consider the unpreprocessed and preprocessed microarray data sets. We implement an extension of the partial least squares generalized linear regression (PLSGLR) Bastien et al. (2005) achieved by combining it with the logistic regression to get partial least squares generalized linear regression-logistic regression model (PLSGLR-log) and also with the linear discriminant analysis to get the partial least squares generalized linear regression-linear discriminant analysis denoted by (PLSGLRDA). These two classification methodologies are then compared with the classical methodologies namely the k-nearest neighbours (KNN), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), ridge partial least squares (RPLS), the support vector machine (SVM). Furthermore, we implement a recent algorithm by Dalmau et al. (2015) known as kernel multilogit algorithm (KMA). The results indicate that for the noisy unpreprocessed data, the KMA emerged as the clear “winner” based on based on their low misclassification error rates. For the preprocessed normalized data, there was no clear “winner” since there was no single method that performed outstandingly better than the rest. The KNN emerged as a clear “loser” since it consistently had a relatively higher rate of misclassification both when applied to the un-preprocessed and preprocessed data sets. The statistical integration of molecular data to test for network changes considers an experiment involving two main groups namely the healthy (H) and acute rheumatic fever (ARF) subjects. For each group, each specimen is divided in two portions so that one portion is group A streptococcus (GAS) stimulated while the other is unstimulated so that we end up with four sub groups: Healthy GAS stimulated, Healthy unstimulated, ARF-GAS stimulated and ARF unstimulated. As a result, we have dependence within the groups and independence between the groups. For all the groups, p genes are measured for expression. We identify a prior network from the curated literature and online sources. The genes considered in the experiment are then matched with the ones in the prior network so that we reduce the prior network to only the genes that are found in the experimental data. We then construct two networks, one for the healthy and th

Repositorio Institucional de CIMAT

Anomalies Detection Using the Benford's Law: Application to the Kenyan Presidential Elections of 2017

Author: Wagala Adolphus
Publication venue: Mathematical Theory and Modeling
Publication date: 03/10/2019
Field of study

In the modern times, the populace in most African countries are left wondering whether the declared election winner actually got the most votes. The validity of the declared election results in most cases remain questionable. In order to determine the validity of the declared results, an empirical statistical methodology could be used to give some hint and or evidence of anomalies in the declared election count data. This paper therefore considers a statistical method based on the pattern of digits in vote counts known as 2 digit Benfords Law (2BL) that is useful for detecting fraud or other anomalies. The 2BL methodology and other extensions are applied to detect the possible anomalies and fraud in the 2017 Kenyan presidential elections results data. The analysis show that the data for the top two presidential candidates: Uhuru Kenyatta and Raila Odinga do not follow the 2BL distribution. The digits are signi cantly di erent at 5% signi cance level when tested using the chi-square and the Euclidean tests. The mean absolute deviation (M.A.D) also con rms the non-conformity of the data to the 2BL distributions test. Further tests namely,the second order test, the summation test and the duplication test are utilized in order to detected any possible anomalies and fraud that could be present. All the three additional tests con rm the presence of fraud and anomalies in the data. These are red ags on the credibility of the presidential election results data published by the Independent Electoral and Boundaries Commission (IEBC).Keywords: Anomalies, Benford's Law, Kenyan Presidential Elections 201

International Institute for Science, Technology and Education (IISTE): E-Journals

Non Linear Time Series Modelling Of the Diesel Prices in Kenya

Author: Adolphus Wagala
Destaings Nyongesa Nyongesa
Publication venue: Human Resources Management Academic Research Society (HRMARS)
Publication date: 01/11/2016
Field of study

Crossref

Singular Spectrum Analysis: An Application to Kenya’s Industrial Inputs Price Index

Author: Adolphus Wagala
Dennis K. Muriithi
Kimutai K. Emmanuel
Publication venue: European Open Science Publishing
Publication date: 07/01/2022
Field of study

Time series modeling and forecasting techniques serve as gauging tools to understand the time-related properties of a given time series and its future course. Most financial and economic time series data do not meet the restrictive assumptions of normality, linearity, and stationarity of the observed data, limiting the application of classical models without data transformation. As non-parametric methods, Singular Spectrum Analysis (SSA) is data-adaptive; hence do not necessarily consider these restrictive assumptions as in classical methods. The current study employed a longitudinal research design to evaluate how SSA fist Kenya’s monthly industrial inputs price index from January 1992 to April 2022. Since 2018, reducing the costs of industrial inputs has been one of Kenya’s manufacturing agendas to level the playing field and foster Kenya’s manufacturing sector. It was expected that Kenya’s Manufacturing Value Added hit a tune of 22% by 2022. The study results showed that the SSA (L = 12, r =7) (MAPE = 0.707%) provides more reliable forecasts. The 24-period forecasts showed that the industrial inputs price index remains high above the index in 2017 before the post-industrial agenda targeting a reduction in the cost of industrial inputs. Thus, the industrial input prices should be reduced to a sustainable level.</jats:p

Crossref

Forecasting Commodity Price Index of Food and Beverages in Kenya Using Seasonal Autoregressive Integrated Moving Average (SARIMA) Models

Author: Adolphus Wagala
Dennis K. Muriithi
Teddy Mutugi Wanjuki
Publication venue: European Open Science Publishing
Publication date: 21/12/2021
Field of study

Price stability is the primary monetary policy objective in any economy since it protects the interests of both consumers and producers. As a result, forecasting is a common practice and a vital aspect of monetary policymaking. Future predictions guide monetary and fiscal policy tools that that be used to stabilize commodity prices. As a result, developing an accurate and precise forecasting model is critical. The current study fitted and forecasted the food and beverages price index (FBPI) in Kenya using seasonal autoregressive integrated moving average (SARIMA) models. Unlike other ARIMA models like the autoregressive (AR), Moving Average (MA), and non-seasonal ARMA models, the SARIMA model accounts for the seasonal component in a given time series data better forecasts. The study relied on secondary data obtained from the KNBS website on monthly food and beverage price index in Kenya from January 1991 to February 2020. R-statistical software was used to analyze the data. The parameter estimation was done using the Maximum Likelihood Estimation method. Competing SARIMA models were compared using the Mean Absolute Error (MAE), Mean Absolute Scaled Error (MASE),.and Mean Absolute Percentage Error (MAPE). A first-order differenced SARIMA (1,1,1) (0,1,1)12 minimized these model evaluation criteria (AIC = 1818.15, BIC =1833.40). The forecasting ability evaluation statistics MAE = 2.00%, MAPE = 1.62% and MASE = 0.87%. The 24-step ahead forecasts showed that the FPBI is unstable with an overall increasing trend. Therefore, the monetary policy committee ought to control inflation through monetary or fiscal policy, strengthening food security and trade liberalization.</jats:p

Crossref

Evaluating the Predictive Ability of Seasonal Autoregressive Integrated Moving Average (SARIMA) Models using Food and Beverages Price Index in Kenya

Author: Adolphus Wagala
Dennis K. Muriithi
Teddy M. Wanjuki
Publication venue: European Open Science Publishing
Publication date: 08/04/2022
Field of study

Price instability has been a major concern in most economies. Kenya's commodity markets have been characterized by high price volatility affecting investment and consumer behaviour due to uncertainty on future prices. Therefore, precise forecasting models can help consumers plan for their expenditure and government policymakers formulate price control measures. Due to the seasonality of Kenya's food and beverage price indices, the current study postulates that the Seasonal Autoregressive Integrated Moving Average (SARIMA) model can best be the best fit model for the data. The study used secondary data on Kenya's monthly food and beverage prices index from January 1991 to February 2020 to examine the predictive ability of the possible SARIMA models based on the minimisation of the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). A first-order differenced SARIMA (1,1,1) (0,1,1)12 minimized these model evaluation criteria (AIC = 1818.15, BIC =1833.40). The cross-validation test results of 6, 12, 18, 24, 30, and 36 step-ahead forecasts demonstrated that SARIMA models are unstable for use in forecasting over a long-time period with a tendency of increasing prediction errors with an increase in the forecast period. It is anticipated that the findings of the current study will provide necessary valuable information to the policymakers and stakeholders to understand future trends in commodity price</jats:p

Crossref

PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classiﬁcation Problem

Author: Adolphus Wagala
Graciela González-Farías
Oscar Dalmau
Rogelio Ramos
Publication venue: Universidad Nacional de Colombia
Publication date: 01/07/2020
Field of study

Crossref

Regresión lineal generalizada por MCP y algoritmo kernel multilogit para la clasiﬁcación de datos de microarreglos

Author: Dalmau Oscar
González-Farías Graciela
Ramos Rogelio
Wagala Adolphus
Publication venue: Universidad Nacional de Colombia - Sede Bogotá - Facultad de Ciencias - Departamento de Estadística
Publication date: 01/07/2020
Field of study

This study involves the implentation of the extensions of the partial least squares generalized linear regression (PLSGLR) by combining it with logistic regression and linear discriminant analysis, to get a partial least squares generalized linear regression-logistic regression model (PLSGLR-log), and a partial least squares generalized linear regression-linear discriminant analysis model (PLSGLRDA). A comparative study of the obtained classiﬁers with the classical methodologies like the k-nearest neighbours (KNN), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), ridge partial least squares (RPLS), and support vector machines(SVM) is then carried out. Furthermore, a new methodology known as kernel multilogit algorithm (KMA) is also implemented and its performance compared with those of the other classiﬁers. The KMA emerged as the best classiﬁer based on the lowest classiﬁcation error rates compared to the others when applied to the types of data are considered; the un- preprocessed and preprocessed.Este estudio combina el modelo de regresión lineal generalizado por mínimos cuadrado parciales (RLGMCP), con regresión logística y análisis discriminante lineal, para obtener los modelos de regresión logística generalizada por mínimos cuadrados parciales, (RLGMCP) y regresión logística generalizada-discriminante por mínimos cuadrados parciales (RLGDMCP). Se realiza un estudio comparativo con clasiﬁcadores clásicos como, k-vecinos más cercanos (KVC), análisis discriminante lineal (ADL), análisis discriminante de por mínimos cuadrados parciales (ADMCP), regresión por mínimos cuadrados parciales (RMCP) y máquinas de vectores de soporte de soporte vectorial (MSV). Además, se implementa una nueva metodología conocida como algoritmo de kernel multilogit (AKM). Su desempeño es comparado con los de los otros clasiﬁcadores. De acuerdo con las tasas de error de clasiﬁcación obtenidas a partir de los diferentes tipos de datos, el KMA es el de mejor resultado

Portal de Revistas UNAL (Univ. Nacional de Colombia)