1,119 research outputs found

    A case study on cumulative logit models with low frequency and mixed effects

    Get PDF
    Master of ScienceDepartment of StatisticsPerla E. Reyes CuellarData with ordinal responses may be encountered in many research fields, such as social, medical, agriculture or financial sciences. In this paper, we present a case study on cumulative logit models with low frequency and mixed effects and discuss some strengths and limitations of the current methodology. Two plant pathologists requested our statistical advice to fit a cumulative logit mixed model seeking for the effect of six commercial products on the control of a seed and seedling disease in soybeans in vitro. In their attempt to estimate the model parameters using a generalized linear mixed model approach with PROC GLIMMIX, the model failed to converge. Three alternative approaches to solve the problem were examined: 1) stratifying the data searching for the random effect; 2) assuming the random effect would be small and reducing the model to a fixed model; and 3) combining the original categories of the response variable to a lower number of categories. In addition, we conducted a power analysis to evaluate the required sample size to detect treatment differences. The results of all the proposed solutions were similar. Collapsing categories for a cumulative/proportional odds model has little effect on estimation. The sample size used in the case study is enough to detect a large shift of frequencies between categories, but not for moderated changes. Moreover, we do not have enough information to estimate a random effect. Even when it is present, the results regarding the fixed factors: pathogen, evaluation day, and treatment effects are the same as the obtained by the fixed model alternatives. All six products had a significant effect in slowing the effect of the pathogen, but the effects vary between pathogen species and assessment timing or date

    Breast cancer diagnosis using a hybrid genetic algorithm for feature selection based on mutual information

    Get PDF
    Feature Selection is the process of selecting a subset of relevant features (i.e. predictors) for use in the construction of predictive models. This paper proposes a hybrid feature selection approach to breast cancer diagnosis which combines a Genetic Algorithm (GA) with Mutual Information (MI) for selecting the best combination of cancer predictors, with maximal discriminative capability. The selected features are then input into a classifier to predict whether a patient has breast cancer. Using a publicly available breast cancer dataset, experiments were performed to evaluate the performance of the Genetic Algorithm based on the Mutual Information approach with two different machine learning classifiers, namely the k-Nearest Neighbor (KNN), and Support vector machine (SVM), each tuned using different distance measures and kernel functions, respectively. The results revealed that the proposed hybrid approach is highly accurate for predicting breast cancer, and it is very promising for predicting other cancers using clinical data

    CEIoT: A Framework for Interlinking Smart Things in the Internet of Things

    Get PDF
    In the emerging Internet of Things (IoT) environment, things are interconnected but not interlinked. Interlinking relevant things offers great opportunities to discover implicit relationships and enable potential interactions among things. To achieve this goal, implicit correlations between things need to be discovered. However, little work has been done on this important direction and the lack of correlation discovery has inevitably limited the power of interlinking things in IoT. With the rapidly growing number of things that are connected to the Internet, there are increasing needs for correlations formation and discovery so as to support interlinking relevant things together effectively. In this paper, we propose a novel approach based on Multi-Agent Systems (MAS) architecture to extract correlations between smart things. Our MAS system is able to identify correlations on demand due to the autonomous behaviors of object agents. Specifically, we introduce a novel open-sourced framework, namely CEIoT, to extract correlations in the context of IoT. Based on the attributes of things our IoT dataset, we identify three types of correlations in our system and propose a new approach to extract and represent the correlations between things. We implement our architecture using Java Agent Development Framework (JADE) and conduct experimental studies on both synthetic and real-world datasets. The results demonstrate that our approach can extract the correlations at a much higher speed than the naive pairwise computation method

    An Analysis of the Environmental Account of Obesity as a Form of Financial Neo-Imperialism

    Get PDF
    Socioeconomic developments of the past half-century have created mass social concern over an “obesity epidemic.” This concern is given a sense of legitimacy by studies that warn of a totally obese America in only a few decades, and children being more likely than ever to be diagnosed with Type 2 Diabetes. The “environmentalist account” of obesity asserts that factors like genetics, food availability, and social or political climate are the major determinants of an “obesogenic environment” – that is, an environment of obese adults likely to raise obese children. Some feminist thinkers argue that the environmentalist account is necessary to fighting racism and sexism, as poor POC women are most likely to be the victims from this perspective. Dr. Anna Kirkland’s essay, The Environmental Account of Obesity: A Case for Feminist Skepticism,” rejects the environmental account as another way to intrude on the lives of these women and moralize their choices. This essay will examine the economic impact from implementing the environmentalist account as a neo-imperialist ideology

    Evolutionary and deep mining models for effective biomarker discovery

    Get PDF
    With the advent of high-throughput biology, large amounts of molecular data are available for purposeful analysis and evaluation. Extracting relevant knowledge from high-throughput biomedical datasets has become a common goal of current approaches to personalised cancer medicine and understanding cancer genotype and phenotype. However, the datasets are characterised by high dimensionality and relatively small sample sizes with small signal-to-noise ratios. Extracting and interpreting relevant knowledge from such complex datasets therefore remains a significant challenge for the fields of machine learning and data mining. This is evidenced by the limited success these methods have had in detecting robust and reliable biomarkers for cancers and other complicated diseases. This could also explain the lack of finding generic biomarkers among the identified published genes for identical diseases or clinical conditions. This thesis proposes and evaluates the efficacy of two novel feature mining models established on the basis of the evolutionary computation and deep learning paradigms to position and solve biomarker discovery as an optimisation problem. Deep learning methods lack the transparency and interpretability found in the evolutionary paradigm. To overcome the inherent issue of poor explanatory power associated with the deep learning, this research also introduces a novel deep mining model that helps to deconstruct the internal state of such deep learning models to reveal key determinants underlying its latent representations to aid feature selection. As a result, salient biomarkers for breast cancer and the positivity of the Estrogen and Progesterone receptors are discovered robustly and validated reliably across a wide range of independently generated breast cancer data samples

    Analysis of Qualitative Behavior of Fifth Order Difference Equations

    Get PDF
    The main aim of this paper is to investigate the stability, global attractivity and periodic nature of the solutions of the difference equationsThe main aim of this paper is to investigate the stability, global attractivity and periodic nature of the solutions of the difference equations x_{n+1}=ax_{n-1}±((bx_{n-1}x_{n-2})/(cx_{n-2}±dx_{n-4})),    n=0,1,2,..., where the initial conditions x₋₄, x₋₃ ,x₋₂, x₋₁ and x₀ are arbitrary positive real numbers and a, b, c, d are constants

    A Comparative Study on Statistical and Machine Learning Forecasting Methods for an FMCG Company

    Get PDF
    Demand forecasting has been an area of study among scholars and businessmen ever since the start of the industrial revolution and has only gained focus in recent years with the advancements in AI. Accurate forecasts are no longer a luxury, but a necessity to have for effective decisions made in planning production and marketing. Many aspects of the business depend on demand, and this is particularly true for the Fast-Moving Consumer Goods industry where the high volume and demand volatility poses a challenge for planners to generate accurate forecasts as consumer demand complexity rises. Inaccurate demand forecasts lead to multiple issues such as high holding costs on excess inventory, shortages on certain SKUs in the market leading to sales loss and a significant impact on both top line and bottom line for the business. Researchers have attempted to look at the performance of statistical time series models in comparison to machine learning methods to evaluate their robustness, computational time and power. In this paper, a comparative study was conducted using statistical and machine learning techniques to generate an accurate forecast using shipment data of an FMCG company. Naïve method was used as a benchmark to evaluate performance of other forecasting techniques, and was compared to exponential smoothing, ARIMA, KNN, Facebook Prophet and LSTM using past 3 years shipments. Methodology followed was CRISP-DM from data exploration, pre-processing and transformation before applying different forecasting algorithms and evaluation. Moreover, secondary goals behind this paper include understanding associations between SKUs through market basket analysis, and clustering using KNN based on brand, customer, order quantity and value to propose a product segmentation strategy. The results of both clustering and forecasting models are then evaluated to choose the optimal forecasting technique, and a visual representation of the forecast and exploratory analysis conducted is displayed using R
    corecore