54 research outputs found
Differential Evolution Algorithm in the Construction of Interpretable Classification Models
In this chapter, the application of a differential evolution-based approach to induce oblique decision trees (DTs) is described. This type of decision trees uses a linear combination of attributes to build oblique hyperplanes dividing the instance space. Oblique decision trees are more compact and accurate than the traditional univariate decision trees. On the other hand, as differential evolution (DE) is an efficient evolutionary algorithm (EA) designed to solve optimization problems with real-valued parameters, and since finding an optimal hyperplane is a hard computing task, this metaheuristic (MH) is chosen to conduct an intelligent search of a near-optimal solution. Two methods are described in this chapter: one implementing a recursive partitioning strategy to find the most suitable oblique hyperplane of each internal node of a decision tree, and the other conducting a global search of a near-optimal oblique decision tree. A statistical analysis of the experimental results suggests that these methods show better performance as decision tree induction procedures in comparison with other supervised learning approaches
Visualizaciones de datos para el análisis de bacterias coexistentes en modelos de clustering de pacientes con Vaginosis Bacteriana
Las visualizaciones de datos y el aprendizaje automático son dos áreas de las ciencias de la computación en las que existe una convergencia en el análisis y entendimiento de los conjuntos de datos. Esta convergencia permite la comprensión completa y multidimensional de un fenómeno, con el propósito de abordar interrogantes complejas desde diferentes perspectivas. El objetivo de este artículo es describir y mostrar el uso de visualizaciones sobre modelos de clustering para derivar contextos de coexistencia bacteriana en grupos subyacentes de pacientes con un diagnóstico de Vaginosis Bacteriana (VB) positivo. Para este fin, se empleó un modelo previamente desarrollado de clustering jerárquico aglomerativo en el cual se utilizó un conjunto de datos de VB. En conclusión, nuestras evidencias demuestran que las visualizaciones de datos son efectivas para identificar patrones relevantes de coexistencia bacteriana en grupos de pacientes con VB-Positivo. Esto revela relaciones significativas entre distintas bacterias en grupos específicos, en los cuales es posible distinguir la presencia de varios tipos de Lactobacillus, tales como Jensenii, Crispatus, LGasseri, Liner y bacterias anaeróbicas tales como Atopobium, Gardnerella, Megasphaera y Mycoplasma. Estos hallazgos destacan la utilidad de las visualizaciones de datos para comprender e identificar los contextos de coexistencia bacteriana en esta condición médica
A Predictive Model for Guillain-Barré Syndrome Based on Single Learning Algorithms
Background. Guillain-Barré Syndrome (GBS) is a potentially fatal autoimmune neurological disorder. The severity varies among the four main subtypes, named as Acute Inflammatory Demyelinating Polyneuropathy (AIDP), Acute Motor Axonal Neuropathy (AMAN), Acute Motor Sensory Axonal Neuropathy (AMSAN), and Miller-Fisher Syndrome (MF). A proper subtype identification may help to promptly carry out adequate treatment in patients. Method. We perform experiments with 15 single classifiers in two scenarios: four subtypes’ classification and One versus All (OvA) classification. We used a dataset with the 16 relevant features identified in a previous phase. Performance evaluation is made by 10-fold cross validation (10-FCV). Typical classification performance measures are used. A statistical test is conducted in order to identify the top five classifiers for each case. Results. In four GBS subtypes’ classification, half of the classifiers investigated in this study obtained an average accuracy above 0.90. In OvA classification, the two subtypes with the largest number of instances resulted in the best classification results. Conclusions. This study represents a comprehensive effort on creating a predictive model for Guillain-Barré Syndrome subtypes. Also, the analysis performed in this work provides insight about the best single classifiers for each classification case
Algoritmo basado en el Forrajeo de Bacterias con mutación para resolver problemas con restricciones
Se propone una versión simplificada de un algoritmo de Inteligencia Colectiva denominado algoritmo de optimización basado en el forrajeo de bacterias con mutación y tamaño de paso dinámico (BFOAM-DS). Este algoritmo tiene la habilidad de explorar y explotar el espacio de búsqueda mediante su operador quimiotáxico. Sin embargo, la convergencia prematura es una desventaja particular. Esta propuesta implementa un operador de mutación en el nado, similar al utilizado por los algoritmos evolutivos, y un tamaño de paso dinámico para mejorar el desempeño del algoritmo. BFOAM-DS se probó en tres problemas de optimización de diseño ingenieril. Los resultados obtenidos fueron analizados con estadísticas básicas y medidas de rendimiento comunes para evaluar el comportamiento del operador de nado con mutación y el operador de tamaño de paso dinámico. Se concluye que BFOAM-DS obtiene soluciones mejores que una versión previa del algoritmo y similares a la mejor solución conocida en la literatura especializadaA simple version of a Swarm Intelligence algorithm called bacterial foraging optimization algorithm with mutation and dynamic stepsize (BFOAM-DS) is proposed. The bacterial foraging algorithm has the ability to explore and exploit the search space through its chemotactic operator. However, premature convergence is a disadvantage. This proposal uses a mutation operator in a swim, similar to evolutionary algorithms, combined with a dynamic stepsize operator to improve its performance and allows a better balance between the exploration and exploitation of the search space. BFOAM-DS was tested in three well-known engineeringdesign optimization problems. Results were analyzed with basic statistics and common measures for nature-inspired constrained optimization problems to evaluate the behavior of the swim with a mutation operator and the dynamic stepsize operator. Results were compared against a previous version of the proposed algorithm to conclude that BFOAM-DS is competitive and better than a previous version of the algorith
Una Propuesta para Atender a Estudiantes Limitados por Sordomudez en la UJAT
Documento del Xl Congreso Internacional Retos y Expectativas de la Universidad (2011
An Iterative Feature Perturbation Method for Gene Selection from Microarray Data
Gene expression microarray datasets often consist of a limited number of samples relative to a large number of expression measurements, usually on the order of thousands of genes. These characteristics pose a challenge to any classification model as they might negatively impact its prediction accuracy. Therefore, dimensionality reduction is a core process prior to any classification task.
This dissertation introduces the iterative feature perturbation method (IFP), an embedded gene selector that iteratively discards non-relevant features. IFP considers relevant features as those which after perturbation with noise cause a change in the predictive accuracy of the classification model. Non-relevant features do not cause any change in the predictive accuracy in such a situation.
We apply IFP to 4 cancer microarray datasets: colon cancer (cancer vs. normal), leukemia (subtype classification), Moffitt colon cancer (prognosis predictor) and lung cancer (prognosis predictor). We compare results obtained by IFP to those of SVM-RFE and the t-test using a linear support vector machine as the classifier in all cases. We do so using the original entire set of features in the datasets, and using a preselected set of 200 features (based on p values) from each dataset. When using the entire set of features, the IFP approach results in comparable accuracy (and higher at some points) with respect to SVM-RFE on 3 of the 4 datasets. The simple t-test feature ranking typically produces classifiers with the highest accuracy across the 4 datasets. When using 200 features chosen by the t-test, the accuracy results show up to 3% performance improvement for both IFP and SVM-RFE across the 4 datasets. We corroborate these results with an AUC analysis and a statistical analysis using the Friedman/Holm test.
Similar to the application of the t-test, we used the methodsinformation gain and reliefF as filters and compared all three. Results of the AUC analysis show that IFP and SVM-RFE obtain the highest AUC value when applied on the t-test-filtered datasets. This result is additionally corroborated with statistical analysis.
The percentage of overlap between the gene sets selected by any two methods across the four datasets indicates that different sets of genes can and do result in similar accuracies.
We created ensembles of classifiers using the bagging technique with IFP, SVM-RFE and the t-test, and showed that their performance can be at least equivalent to those of the non-bagging cases, as well as better in some cases
An Iterative Feature Perturbation Method for Gene Selection from Microarray Data
Gene expression microarray datasets often consist of a limited number of samples relative to a large number of expression measurements, usually on the order of thousands of genes. These characteristics pose a challenge to any classification model as they might negatively impact its prediction accuracy. Therefore, dimensionality reduction is a core process prior to any classification task.
This dissertation introduces the iterative feature perturbation method (IFP), an embedded gene selector that iteratively discards non-relevant features. IFP considers relevant features as those which after perturbation with noise cause a change in the predictive accuracy of the classification model. Non-relevant features do not cause any change in the predictive accuracy in such a situation.
We apply IFP to 4 cancer microarray datasets: colon cancer (cancer vs. normal), leukemia (subtype classification), Moffitt colon cancer (prognosis predictor) and lung cancer (prognosis predictor). We compare results obtained by IFP to those of SVM-RFE and the t-test using a linear support vector machine as the classifier in all cases. We do so using the original entire set of features in the datasets, and using a preselected set of 200 features (based on p values) from each dataset. When using the entire set of features, the IFP approach results in comparable accuracy (and higher at some points) with respect to SVM-RFE on 3 of the 4 datasets. The simple t-test feature ranking typically produces classifiers with the highest accuracy across the 4 datasets. When using 200 features chosen by the t-test, the accuracy results show up to 3% performance improvement for both IFP and SVM-RFE across the 4 datasets. We corroborate these results with an AUC analysis and a statistical analysis using the Friedman/Holm test.
Similar to the application of the t-test, we used the methodsinformation gain and reliefF as filters and compared all three. Results of the AUC analysis show that IFP and SVM-RFE obtain the highest AUC value when applied on the t-test-filtered datasets. This result is additionally corroborated with statistical analysis.
The percentage of overlap between the gene sets selected by any two methods across the four datasets indicates that different sets of genes can and do result in similar accuracies.
We created ensembles of classifiers using the bagging technique with IFP, SVM-RFE and the t-test, and showed that their performance can be at least equivalent to those of the non-bagging cases, as well as better in some cases
A Global Search Approach for Inducing Oblique Decision Trees Using Differential Evolution
Sistema computarizado de apoyo al seguimiento de inspecciones en el extranjero
La comisión Federal de Electricidad tiene como norma someter a inspección todo pedido que se efectúe al extranjero y que exceda de un monto determinado.
Para obtener el articulo completo pueden contactar al Editor de la Revista Ecosistemas y Recursos Agropecuarios en el siguiente correo electrónico [email protected] y se les enviará sin algún costo
Un modelo de Red Bayesiana para datos cualitativos de Vaginosis Bacteriana en mujeres embarazadas
BV Bacterial Vaginosis is a risk factor for preterm birth. The World Health Organization WHO, highlights that although BV is already considered a more common condition and if left untreated it can cause problems in pregnancy such as spontaneous abortion and premature birth as well as an increased risk of other sexually transmitted infections. Artificial Intelligence computational techniques such as Causal Bayesian Networks CBNs have been used in different studies with the objective of disease diagnostic analysis. This study aims to share how a CBN can identify which bacteria coexist and the relationship they have with a positive DxBV+ diagnostic. The data that was used for this study is based on a collection of real data by research personnel from the Multidisciplinary Academic Division of Comalcalco DAMC, forming a dataset of positive, negative and indeterminate cases of BV in pregnant women. The result obtained gave a causal graphic model that is presented as a CBN and its conditional probability table that details the percentage of relationship with DxBV+.La Vaginosis Bacteriana BV es un factor de riesgo para el parto prematuro. La Organización Mundial de la Salud OMS, destaca que a pesar que la BV ya se considera una afección más común y de no tratarse puede provocar problemas en el embarazo como aborto espontáneo y parto prematuro así como mayor riesgo de otras infecciones de transmisión sexual. Las técnicas computacionales de Inteligencia Artificial como Redes Bayesianas Causales CBNs, se han utilizado en diferentes estudios con el objetivo de análisis de diagnóstico de enfermedades. Este estudio tiene la finalidad de compartir como una CBN puede identificar que bacterias coexisten y la relación que tienen frente a un diagnóstivo positivo DxBV+. Los datos que se utilizaron para este estudio tienen como base una recolección de datos reales por parte de personal investigador de la División Académica Multidisplinaria de Comalcalco DAMC formando un dataset de casos positivos, negativos e indeterminados de BV en mujeres embarazadas. El resultado que se obtuvo dio un modelo gráfico causal que es presentado como una CBN y su tabla de probabilidad condicional que detalla el porcentaje de relación con el DxBV+
- …
