67 research outputs found

    Comparative Performance Analysis of State-of-the-Art Classification Algorithms Applied to Lung Tissue Categorization

    Get PDF
    In this paper, we compare five common classifier families in their ability to categorize six lung tissue patterns in high-resolution computed tomography (HRCT) images of patients affected with interstitial lung diseases (ILD) and with healthy tissue. The evaluated classifiers are naive Bayes, k-nearest neighbor, J48 decision trees, multilayer perceptron, and support vector machines (SVM). The dataset used contains 843 regions of interest (ROI) of healthy and five pathologic lung tissue patterns identified by two radiologists at the University Hospitals of Geneva. Correlation of the feature space composed of 39 texture attributes is studied. A grid search for optimal parameters is carried out for each classifier family. Two complementary metrics are used to characterize the performances of classification. These are based on McNemar's statistical tests and global accuracy. SVM reached best values for each metric and allowed a mean correct prediction rate of 88.3% with high class-specific precision on testing sets of 423 ROI

    Recommendation system using autoencoders

    Get PDF
    The magnitude of the daily explosion of high volumes of data has led to the emergence of the Big Data paradigm. The ever-increasing amount of information available on the Internet makes it increasingly difficult for individuals to find what they need quickly and easily. Recommendation systems have appeared as a solution to overcome this problem. Collaborative filtering is widely used in this type of systems, but high dimensions and data sparsity are always a main problem. With the idea of deep learning gaining more importance, several works have emerged to improve this type of filtering. In this article, a product recommendation system is proposed where an autoencoder based on a collaborative filtering method is employed. A comparison of this model with the Singular Value Decomposition is made and presented in the results section. Our experiment shows a very low Root Mean Squared Error (RMSE) value, considering that the recommendations presented to the users are in line with their interests and are not affected by the data sparsity problem as the datasets are very sparse, 0.996. The results are quite promising achieving an RMSE value of 0.029 in the first dataset and 0.010 in the second one.This research has been supported by FCT—Fundação para a Ciência e Tecnologia within the R&D UnitsProject Scope: UIDB/00319/202

    Modeling time‐to‐event (survival) data using classification tree analysis

    Full text link
    Rationale, aims, and objectivesTime to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow‐up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a “decision‐tree”–like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross‐generalizability.MethodUsing empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross‐generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves.ResultsThe Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time.ConclusionsClassification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA‐survival framework.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/141923/1/jep12779.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/141923/2/jep12779_am.pd

    Risk of Rupture of Small Anterior Communicating Artery Aneurysms Is Similar to Posterior Circulation Aneurysms

    Get PDF
    Background and Purpose According to the International Study of Unruptured Intracranial Aneurysms (ISUIA), anterior circulation (AC) aneurysms of <7 mm in diameter have a minimal risk of rupture. It is general experience, however, that anterior communicating artery (AcoA) aneurysms are frequent and mostly rupture at <7 mm. The aim of the study was to assess whether AcoA aneurysms behave differently from other AC aneurysms. Methods Information about 932 patients newly diagnosed with intracranial aneurysms between November 1, 2006, and March 31, 2012, including aneurysm status at diagnosis, its location, size, and risk factors, was collected during the multicenter @neurIST project. For each location or location and size subgroup, the odds ratio (OR) of aneurysms being ruptured at diagnosis was calculated. Results The OR for aneurysms to be discovered ruptured was significantly higher for AcoA (OR, 3.5 [95% confidence interval, 2.6-4.5]) and posterior circulation (OR, 2.6 [95% confidence interval, 2.1-3.3]) than for AC excluding AcoA (OR, 0.5 [95% confidence interval, 0.4-0.6]). Although a threshold of 7 mm has been suggested by ISUIA as a threshold for aggressive treatment, AcoA aneurysms <7 mm were more frequently found ruptured (OR, 2.0 [95% confidence interval, 1.3-3.0]) than AC aneurysms of 7 Conclusions We found that AC aneurysms are not a homogenous group. Aneurysms between 4 and 7 mm located in AcoA or distal anterior cerebral artery present similar rupture odds to posterior circulation aneurysms. Intervention should be recommended for this high-risk lesion group

    Clinical data mining with Kernel-based algorithms

    No full text
    Cette thèse traite le développement d’un système d’aide à la décision et se focalise sur la création de "modèle" : la sélection de modèle optimal, la sélection de variables les plus significatives, l’interprétabilité du modèle et sa validation historique. Ces questions sont traitées avec l’algorithme machine à vecteurs supports à unique et multiples noyaux. Deux applications cliniques ont ainsi été choisies: la prédiction de cas d’infection nosocomiale et la classification de tissus pulmonaires caractérisés par des maladies interstitielles. Cette thèse apporte ses contributions à quatre principales problématiques : 1) la méthode d’analyse de données déséquilibrées sur lesquelles les méthodes de fouille de données peuvent avoir un faible taux d’erreur sans être sensibles aux cas mal représentés ; 2) la portabilité de modèles prédictifs en les évaluant dans le temps ; 3) l’analyse du compromis interprétabilité de modèle et sa complexité et/ou stabilité ; 4) l’analyse de l’exploitation des résultats obtenus

    Fusing Visual and Clinical Information for Lung Tissue Classification in HRCT Data

    No full text
    International audienceIn this paper, we investigate the influence of the clinical context of high–resolution computed tomography (HRCT) images of the chest on tissue classification. 2D regions of interest (ROI) in HRCT axial slices from patients affected with an interstitial lung disease (ILD) are automatically classified into five classes of lung tissue. Relevance of the clinical parameters is studied before fusing them with visual attributes. Two multimedia fusion techniques are compared: early versus late fusion. Early fusion concatenates features in one single vector, yielding a true multimedia feature space. Late fusion consisting of the combination of the probability outputs of two support vector machines (SVM) allowed a maximum of 84% correct predictions of testing instances among the five classes of lung tissue. This represents a significant improvement of 10% compared to a pure visual–based classification. Moreover, the late fusion scheme showed high robustness to the number of clinical parameters used, which suggests that it is appropriate for mining clinical attributes with missing values in clinical routine
    corecore