260 research outputs found
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
Empirical evaluation of resampling procedures for optimising SVM hyperparameters
Tuning the regularisation and kernel hyperparameters is a vital step in optimising the generalisation performance of kernel methods, such as the support vector machine (SVM). This is most often performed by minimising a resampling/cross-validation based model selection criterion, however there seems little practical guidance on the most suitable form of resampling. This paper presents the results of an extensive empirical evaluation of resampling procedures for SVM hyperparameter selection, designed to address this gap in the machine learning literature. Wetested 15 different resampling procedures on 121 binary classification data sets in order to select the best SVM hyperparameters. Weused three very different statistical procedures to analyse the results: the standard multi-classifier/multidata set procedure proposed by Demˇsar, the confidence intervals on the excess loss of each procedure in relation to 5-fold cross validation, and the Bayes factor analysis proposed by Barber. We conclude that a 2-fold procedure is appropriate to select the hyperparameters of an SVM for data sets for 1000or more datapoints, while a 3-fold procedure is appropriate for smaller data sets
Como os especialistas reconhecem a esquizofrenia: o papel do sintoma desorganização
OBJETIVE: Research on clinical reasoning has been useful in developing expert systems. These tools are based on Artificial Intelligence techniques which assist the physician in the diagnosis of complex diseases. The development of these systems is based on a cognitive model extracted through the identification of the clinical reasoning patterns applied by experts within the clinical decision-making context. This study describes the method of knowledge acquisition for the identification of the triggering symptoms used in the reasoning of three experts for the diagnosis of schizophrenia. METHOD: Three experts on schizophrenia, from two University centers in São Paulo, were interviewed and asked to identify and to represent the triggering symptoms for the diagnosis of schizophrenia according to the graph methodology. RESULTS: Graph methodology showed a remarkable disagreement on how the three experts established their diagnosis of schizophrenia. They differed in their choice of triggering-symptoms for the diagnosis of schizophrenia: disorganization, blunted affect and thought disturbances. CONCLUSIONS: The results indicate substantial differences between the experts as to their diagnostic reasoning patterns, probably under the influence of different theoretical tendencies. The disorganization symptom was considered to be the more appropriate to represent the heterogeneity of schizophrenia and also, to further develop an expert system for the diagnosis of schizophrenia.OBJETIVO: As pesquisas sobre o raciocínio clínico foram importantes para o surgimento de sistemas de apoio à decisão diagnóstica. Essas ferramentas são desenvolvidas por meio de técnicas de inteligência artificial e têm com objetivo principal auxiliar o médico no diagnóstico de doenças complexas. A abordagem utilizada para a construção desses sistemas constitui na formulação de um modelo baseado na identificação de padrões no raciocínio dos expertos quando de uma tomada de decisão diagnóstica. Este estudo descreve a metodologia empregada para identificar os elementos-chave utilizados no raciocínio de três expertos no processo de diagnóstico do transtorno da esquizofrenia. MÉTODO: Para explorar o raciocínio clínico foram selecionados três expertos em esquizofrenia de dois centros universitários de São Paulo. Foi utilizado o método dos grafos, por meio do qual o experto podia esquematizar a combinação de sintomas-chave que ele utilizava para identificar um diagnóstico de esquizofrenia. RESULTADOS: A partir da análise qualitativa dos grafos foi possível notar uma diferença marcante nos padrões de raciocínio diagnóstico. Essa diferença ocorreu, sobretudo, nos sintomas-chave do processo de decisão diagnóstica: desorganização, afeto embotado e distúrbio do pensamento. CONCLUSÕES: Os resultados apontam para uma diferença substancial entre os expertos quanto a um padrão de raciocínio diagnóstico provavelmente influenciado por diferentes correntes teóricas. Essas diferenças constituem um impedimento para a construção de um modelo único. O sintoma desorganização foi considerado o elemento-chave mais apropriado para representar a heterogeneidade da esquizofrenia e ser modelado para a construção de sistema de apoio à decisão diagnóstica.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Universidade Federal de São Paulo (UNIFESP) Psychiatry DepartmentUniversidade Federal de São Paulo (UNIFESP) Department of Medical InformaticsUniversidade Estadual de Campinas Computing InstituteUNIFESP, Psychiatry DepartmentUNIFESP, Department of Medical InformaticsFAPESP: 98/11120-5SciEL
Decision support system for the diagnosis of schizophrenia disorders
Clinical decision support systems are useful tools for assisting physicians to diagnose complex illnesses. Schizophrenia is a complex, heterogeneous and incapacitating mental disorder that should be detected as early as possible to avoid a most serious outcome. These artificial intelligence systems might be useful in the early detection of schizophrenia disorder. The objective of the present study was to describe the development of such a clinical decision support system for the diagnosis of schizophrenia spectrum disorders (SADDESQ). The development of this system is described in four stages: knowledge acquisition, knowledge organization, the development of a computer-assisted model, and the evaluation of the system's performance. The knowledge was extracted from an expert through open interviews. These interviews aimed to explore the expert's diagnostic decision-making process for the diagnosis of schizophrenia. A graph methodology was employed to identify the elements involved in the reasoning process. Knowledge was first organized and modeled by means of algorithms and then transferred to a computational model created by the covering approach. The performance assessment involved the comparison of the diagnoses of 38 clinical vignettes between an expert and the SADDESQ. The results showed a relatively low rate of misclassification (18-34%) and a good performance by SADDESQ in the diagnosis of schizophrenia, with an accuracy of 66-82%. The accuracy was higher when schizophreniform disorder was considered as the presence of schizophrenia disorder. Although these results are preliminary, the SADDESQ has exhibited a satisfactory performance, which needs to be further evaluated within a clinical setting.Universidade Federal de São Paulo (UNIFESP) Escola Paulista de Medicina Departamento de PsiquiatriaUniversidade Federal de São Paulo (UNIFESP) Escola Paulista de Medicina Departamento de Informática MédicaUNIFESP, EPM, Depto. de PsiquiatriaUNIFESP, EPM, Depto. de Informática MédicaSciEL
Peer-selected "best papers" - are they really that "good"?
Background Peer evaluation is the cornerstone of science evaluation. In this paper, we analyze whether or not a form of peer evaluation, the pre-publication selection of the best papers in Computer Science (CS) conferences, is better than random, when considering future citations received by the papers. Methods Considering 12 conferences (for several years), we collected the citation counts from Scopus for both the best papers and the non-best papers. For a different set of 17 conferences, we collected the data from Google Scholar. For each data set, we computed the proportion of cases whereby the best paper has more citations. We also compare this proportion for years before 2010 and after to evaluate if there is a propaganda effect. Finally, we count the proportion of best papers that are in the top 10% and 20% most cited for each conference instance. Results The probability that a best paper will receive more citations than a non best paper is 0.72 (95% CI = 0.66, 0.77) for the Scopus data, and 0.78 (95% CI = 0.74, 0.81) for the Scholar data. There are no significant changes in the probabilities for different years. Also, 51% of the best papers are among the top 10% most cited papers in each conference/year, and 64% of them are among the top 20% most cited. Discussion There is strong evidence that the selection of best papers in Computer Science conferences is better than a random selection, and that a significant number of the best papers are among the top cited papers in the conference.Peer evaluation is the cornerstone of science evaluation. In this paper, we analyze whether or not a form of peer evaluation, the pre-publication selection of the best papers in Computer Science (CS) conferences, is better than random, when considering fu103112sem informaçãosem informaçã
- …
