9 research outputs found

    PROBLEMS IN STATISTICAL GENETICS: CLASSIFICATION AND TESTING FOR NETWORK CHANGES

    Get PDF
    his thesis addresses the problems of classification of microarray data and the statistical integration of molecular data to test for network changes. For the classification problem, we consider the unpreprocessed and preprocessed microarray data sets. We implement an extension of the partial least squares generalized linear regression (PLSGLR) Bastien et al. (2005) achieved by combining it with the logistic regression to get partial least squares generalized linear regression-logistic regression model (PLSGLR-log) and also with the linear discriminant analysis to get the partial least squares generalized linear regression-linear discriminant analysis denoted by (PLSGLRDA). These two classification methodologies are then compared with the classical methodologies namely the k-nearest neighbours (KNN), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), ridge partial least squares (RPLS), the support vector machine (SVM). Furthermore, we implement a recent algorithm by Dalmau et al. (2015) known as kernel multilogit algorithm (KMA). The results indicate that for the noisy unpreprocessed data, the KMA emerged as the clear “winner” based on based on their low misclassification error rates. For the preprocessed normalized data, there was no clear “winner” since there was no single method that performed outstandingly better than the rest. The KNN emerged as a clear “loser” since it consistently had a relatively higher rate of misclassification both when applied to the un-preprocessed and preprocessed data sets. The statistical integration of molecular data to test for network changes considers an experiment involving two main groups namely the healthy (H) and acute rheumatic fever (ARF) subjects. For each group, each specimen is divided in two portions so that one portion is group A streptococcus (GAS) stimulated while the other is unstimulated so that we end up with four sub groups: Healthy GAS stimulated, Healthy unstimulated, ARF-GAS stimulated and ARF unstimulated. As a result, we have dependence within the groups and independence between the groups. For all the groups, p genes are measured for expression. We identify a prior network from the curated literature and online sources. The genes considered in the experiment are then matched with the ones in the prior network so that we reduce the prior network to only the genes that are found in the experimental data. We then construct two networks, one for the healthy and th

    Anomalies Detection Using the Benford's Law: Application to the Kenyan Presidential Elections of 2017

    No full text
    In the modern times, the populace in most African countries are left wondering whether the declared election winner actually got the most votes. The validity of the declared election results in most cases remain questionable. In order to determine the validity of the declared results, an empirical statistical methodology could be used to give some hint and or evidence of anomalies in the declared election count data. This paper therefore considers a statistical method based on the pattern of digits in vote counts known as 2 digit Benfords Law (2BL) that is useful for detecting fraud or other anomalies. The 2BL methodology and other extensions are applied to detect the possible anomalies and fraud in the 2017 Kenyan presidential elections results data. The analysis show that the data for the top two presidential candidates: Uhuru Kenyatta and Raila Odinga do not follow the 2BL distribution. The digits are signi cantly di erent at 5% signi cance level when tested using the chi-square and the Euclidean tests. The mean absolute deviation (M.A.D) also con rms the non-conformity of the data to the 2BL distributions test. Further tests namely,the second order test, the summation test and the duplication test are utilized in order to detected any possible anomalies and fraud that could be present. All the three additional tests con rm the presence of fraud and anomalies in the data. These are red ags on the credibility of the presidential election results data published by the Independent Electoral and Boundaries Commission (IEBC).Keywords: Anomalies, Benford's Law, Kenyan Presidential Elections 201

    Non Linear Time Series Modelling Of the Diesel Prices in Kenya

    No full text

    Singular Spectrum Analysis: An Application to Kenya’s Industrial Inputs Price Index

    No full text
    Time series modeling and forecasting techniques serve as gauging tools to understand the time-related properties of a given time series and its future course. Most financial and economic time series data do not meet the restrictive assumptions of normality, linearity, and stationarity of the observed data, limiting the application of classical models without data transformation. As non-parametric methods, Singular Spectrum Analysis (SSA) is data-adaptive; hence do not necessarily consider these restrictive assumptions as in classical methods. The current study employed a longitudinal research design to evaluate how SSA fist Kenya’s monthly industrial inputs price index from January 1992 to April 2022. Since 2018, reducing the costs of industrial inputs has been one of Kenya’s manufacturing agendas to level the playing field and foster Kenya’s manufacturing sector. It was expected that Kenya’s Manufacturing Value Added hit a tune of 22% by 2022. The study results showed that the SSA (L = 12, r =7) (MAPE = 0.707%) provides more reliable forecasts. The 24-period forecasts showed that the industrial inputs price index remains high above the index in 2017 before the post-industrial agenda targeting a reduction in the cost of industrial inputs. Thus, the industrial input prices should be reduced to a sustainable level.</jats:p

    Forecasting Commodity Price Index of Food and Beverages in Kenya Using Seasonal Autoregressive Integrated Moving Average (SARIMA) Models

    No full text
    Price stability is the primary monetary policy objective in any economy since it protects the interests of both consumers and producers. As a result, forecasting is a common practice and a vital aspect of monetary policymaking. Future predictions guide monetary and fiscal policy tools that that be used to stabilize commodity prices. As a result, developing an accurate and precise forecasting model is critical. The current study fitted and forecasted the food and beverages price index (FBPI) in Kenya using seasonal autoregressive integrated moving average (SARIMA) models. Unlike other ARIMA models like the autoregressive (AR), Moving Average (MA), and non-seasonal ARMA models, the SARIMA model accounts for the seasonal component in a given time series data better forecasts. The study relied on secondary data obtained from the KNBS website on monthly food and beverage price index in Kenya from January 1991 to February 2020. R-statistical software was used to analyze the data. The parameter estimation was done using the Maximum Likelihood Estimation method. Competing SARIMA models were compared using the Mean Absolute Error (MAE), Mean Absolute Scaled Error (MASE),.and Mean Absolute Percentage Error (MAPE). A first-order differenced SARIMA (1,1,1) (0,1,1)12 minimized these model evaluation criteria (AIC = 1818.15, BIC =1833.40). The forecasting ability evaluation statistics MAE = 2.00%, MAPE = 1.62% and MASE = 0.87%. The 24-step ahead forecasts showed that the FPBI is unstable with an overall increasing trend. Therefore, the monetary policy committee ought to control inflation through monetary or fiscal policy, strengthening food security and trade liberalization.</jats:p

    Evaluating the Predictive Ability of Seasonal Autoregressive Integrated Moving Average (SARIMA) Models using Food and Beverages Price Index in Kenya

    No full text
    Price instability has been a major concern in most economies. Kenya's commodity markets have been characterized by high price volatility affecting investment and consumer behaviour due to uncertainty on future prices. Therefore, precise forecasting models can help consumers plan for their expenditure and government policymakers formulate price control measures. Due to the seasonality of Kenya's food and beverage price indices, the current study postulates that the Seasonal Autoregressive Integrated Moving Average (SARIMA) model can best be the best fit model for the data. The study used secondary data on Kenya's monthly food and beverage prices index from January 1991 to February 2020 to examine the predictive ability of the possible SARIMA models based on the minimisation of the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). A first-order differenced SARIMA (1,1,1) (0,1,1)12 minimized these model evaluation criteria (AIC = 1818.15, BIC =1833.40). The cross-validation test results of 6, 12, 18, 24, 30, and 36 step-ahead forecasts demonstrated that SARIMA models are unstable for use in forecasting over a long-time period with a tendency of increasing prediction errors with an increase in the forecast period. It is anticipated that the findings of the current study will provide necessary valuable information to the policymakers and stakeholders to understand future trends in commodity price</jats:p

    PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem

    No full text
    This study involves the implentation of the extensions of the partial least squares generalized linear regression (PLSGLR) by combining  it with logistic regression and  linear  discriminant analysis,  to  get a  partial least  squares generalized linear  regression-logistic regression model (PLSGLR-log),  and a partial least squares generalized linear regression-linear discriminant analysis model (PLSGLRDA). A comparative  study  of  the obtained  classifiers with   the   classical  methodologies like  the k-nearest  neighbours (KNN), linear   discriminant  analysis  (LDA),   partial  least  squares discriminant analysis (PLSDA),  ridge  partial least squares (RPLS), and  support vector machines(SVM)  is  then  carried  out.    Furthermore,  a  new  methodology known as kernel multilogit algorithm (KMA) is also implemented and its performance compared with those of the other classifiers. The KMA emerged as the best classifier based  on the lowest  classification error  rates  compared to  the  others  when  applied   to  the  types   of data   are considered;  the  un- preprocessed and preprocessed.</jats:p

    Regresión lineal generalizada por MCP y algoritmo kernel multilogit para la clasificación de datos de microarreglos

    No full text
    This study involves the implentation of the extensions of the partial least squares generalized linear regression (PLSGLR) by combining  it with logistic regression and  linear  discriminant analysis,  to  get a  partial least  squares generalized linear  regression-logistic regression model (PLSGLR-log),  and a partial least squares generalized linear regression-linear discriminant analysis model (PLSGLRDA). A comparative  study  of  the obtained  classifiers with   the   classical  methodologies like  the k-nearest  neighbours (KNN), linear   discriminant  analysis  (LDA),   partial  least  squares discriminant analysis (PLSDA),  ridge  partial least squares (RPLS), and  support vector machines(SVM)  is  then  carried  out.    Furthermore,  a  new  methodology known as kernel multilogit algorithm (KMA) is also implemented and its performance compared with those of the other classifiers. The KMA emerged as the best classifier based  on the lowest  classification error  rates  compared to  the  others  when  applied   to  the  types   of data   are considered;  the  un- preprocessed and preprocessed.Este  estudio   combina   el  modelo  de  regresión   lineal  generalizado  por mínimos cuadrado parciales (RLGMCP), con regresión  logística y análisis discriminante lineal,  para  obtener  los modelos  de regresión  logística generalizada  por  mínimos  cuadrados  parciales,  (RLGMCP)   y  regresión logística generalizada-discriminante por mínimos  cuadrados parciales (RLGDMCP).  Se realiza un estudio  comparativo con clasificadores clásicos como,  k-vecinos  más  cercanos (KVC),  análisis discriminante lineal  (ADL), análisis discriminante de por mínimos  cuadrados parciales (ADMCP), regresión  por mínimos  cuadrados parciales (RMCP)  y máquinas de vectores de soporte  de soporte vectorial (MSV).  Además,  se implementa una  nueva metodología conocida  como algoritmo de kernel multilogit (AKM). Su desempeño es  comparado con  los  de  los  otros  clasificadores.   De acuerdo con  las  tasas de  error  de  clasificación obtenidas a  partir de  los diferentes tipos  de datos,  el KMA es el de mejor  resultado
    corecore